linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 03:45:20 +07:00

Author	SHA1	Message	Date
Nirmoy Das	2639f453f2	drm/amdgpu: fix doc by clarifying sched_list definition expand sched_list definition for better understanding. Also fix a typo atleast -> at least Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-27 16:46:44 -05:00
Nirmoy Das	9e3e90c50d	drm/scheduler: fix documentation by replacing rq_list with sched_list This also replaces old artifacts with a correct one in drm_sched_entity_init() declaration Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-16 13:41:00 -05:00
Nirmoy Das	56822db194	drm/scheduler: improve job distribution with multiple queues This patch uses score based logic to select a new rq for better loadbalance between multiple rq/scheds instead of num_jobs. Below are test results after running amdgpu_test from mesa drm Before this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 314 sdma1 32 comp_1.0.0 56 comp_1.0.1 0 comp_1.1.0 0 comp_1.1.1 0 comp_1.2.0 0 comp_1.2.1 0 comp_1.3.0 0 comp_1.3.1 0 After this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 216 sdma1 185 comp_1.0.0 39 comp_1.0.1 9 comp_1.1.0 12 comp_1.1.1 0 comp_1.2.0 12 comp_1.2.1 0 comp_1.3.0 12 comp_1.3.1 0 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-01-16 13:37:54 -05:00
Nirmoy Das	8c23056bdc	drm/scheduler: do not keep a copy of sched list entity should not keep copy and maintain sched list for itself. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Nirmoy Das	b3ac17667f	drm/scheduler: rework entity creation Entity currently keeps a copy of run_queue list and modify it in drm_sched_entity_set_priority(). Entities shouldn't modify run_queue list. Use drm_gpu_scheduler list instead of drm_sched_rq list in drm_sched_entity struct. In this way we can select a runqueue based on entity/ctx's priority for a drm scheduler. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-12-18 16:09:12 -05:00
Daniel Vetter	6c56e8adc0	drm-misc-next for v5.6: UAPI Changes: - Add support for DMA-BUF HEAPS. Cross-subsystem Changes: - mipi dsi definition updates, pulled into drm-intel as well. - Add lockdep annotations for dma_resv vs mmap_sem and fs_reclaim. - Remove support for dma-buf kmap/kunmap. - Constify fb_ops in all fbdev drivers, including drm drivers and drm-core, and media as well. Core Changes: - Small cleanups to ttm. - Fix SCDC definition. - Assorted cleanups to core. - Add todo to remove load/unload hooks, and use generic fbdev emulation. - Assorted documentation updates. - Use blocking ww lock in ttm fault handler. - Remove drm_fb_helper_fbdev_setup/teardown. - Warning fixes with W=1 for atomic. - Use drm_debug_enabled() instead of drm_debug flag testing in various drivers. - Fallback to nontiled mode in fbdev emulation when not all tiles are present. (Later on reverted) - Various kconfig indentation fixes in core and drivers. - Fix freeing transactions in dp-mst correctly. - Sean Paul is steping down as core maintainer. :-( - Add lockdep annotations for atomic locks vs dma-resv. - Prevent use-after-free for a bad job in drm_scheduler. - Fill out all block sizes in the P01x and P210 definitions. - Avoid division by zero in drm/rect, and fix bounds. - Add drm/rect selftests. - Add aspect ratio and alternate clocks for HDMI 4k modes. - Add todo for drm_framebuffer_funcs and fb_create cleanup. - Drop DRM_AUTH for prime import/export ioctls. - Clear DP-MST payload id tables downstream when initializating. - Fix for DSC throughput definition. - Add extra FEC definitions. - Fix fake offset in drm_gem_object_funs.mmap. - Stop using encoder->bridge in core directly - Handle bridge chaining slightly better. - Add backlight support to drm/panel, and use it in many panel drivers. - Increase max number of y420 modes from 128 to 256, as preparation to add the new modes. Driver Changes: - Small fixes all over. - Fix documentation in vkms. - Fix mmap_sem vs dma_resv in nouveau. - Small cleanup in komeda. - Add page flip support in gma500 for psb/cdv. - Add ddc symlink in the connector sysfs directory for many drivers. - Add support for analogic an6345, and fix small bugs in it. - Add atomic modesetting support to ast. - Fix radeon fault handler VMA race. - Switch udl to use generic shmem helpers. - Unconditional vblank handling for mcde. - Miscellaneous fixes to mcde. - Tweak debug output from komeda using debugfs. - Add gamma and color transform support to komeda for DOU-IPS. - Add support for sony acx424AKP panel. - Various small cleanups to gma500. - Use generic fbdev emulation in udl, and replace udl_framebuffer with generic implementation. - Add support for Logic PD Type 28 panel. - Use drm_panel_* wrapper functions in exynos/tegra/msm. - Add devicetree bindings for generic DSI panels. - Don't include drm_pci.h directly in many drivers. - Add support for begin/end_cpu_access in udmabuf. - Stop using drm_get_pci_dev in gma500 and mga200. - Fixes to UDL damage handling, and use dma_buf_begin/end_cpu_access. - Add devfreq thermal support to panfrost. - Fix hotplug with daisy chained monitors by removing VCPI when disabling topology manager. - meson: Add support for OSD1 plane AFBC commit. - Stop displaying garbage when toggling ast primary plane on/off. - More cleanups and fixes to UDL. - Add D32 suport to komeda. - Remove globle copy of drm_dev in gma500. - Add support for Boe Himax8279d MIPI-DSI LCD panel. - Add support for ingenic JZ4770 panel. - Small null pointer deference fix in ingenic. - Remove support for the special tfp420 driver, as there is a generic way to do it. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl34lkkACgkQ/lWMcqZw E8M76g//WRYl9fWnV063s44FBVJYjGxaus0vQJSGidaPCIE6Ep6TNjXp8DVzV82M HR79P9glL02DC9B8pflioNNXdIRGSVk/FJcKVB2seFAqEFCAknvWDM/X/y+mOUpp fUeFl+Znlwx3YlM8f4Qujdbm+CbTewfbya4VAWeWd8XG2V8jfq5cmODPPlUMNenZ J6Ja+W3ph741uSIfAKaP69LVJgOcuUjXINE4SWhRk/i5QF3GIRej/A7ZjWGLQ/t2 2zUUF7EiCzhPomM40H3ddKtXb4ZjNJuc5pOD4GpxR8ciNbe2gUOHEZ5aenwYBdsU 5MwbxNKyMbKXATtn3yv3fSc4jH3DtmEKpmovONeO8ZDBrQBnxeYa3tQvfkNghA2f acoZMzYUImV+ft6DMIgpXppASvo7mQYDAbLPOGEJ9E44AL4UP00jesEjnK5FOHSR 3BEzGUnK/6QL5zFNPni8YZQ8dan4jDIno1mqIV+cQ4WCGlaKckzIWO6243Bf13b/ kROSJpgWkiK6Ngq0ofhD0MHyT/m1QnqUzWRKTJhRtPflSWRBsDZqWCQ5Vx1QlNIE /HfTNbTpXWwa+5wXbbB8TkDw5t9cQGnR+QcrEd9HgoIec7B5Re8rx9i0TJAT4N05 03RCQCecSfD8gwKd2wgaFIpFGRl9lTdLYSpffSmyL2X5a20lZhM= =b15X -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2019-12-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.6: UAPI Changes: - Add support for DMA-BUF HEAPS. Cross-subsystem Changes: - mipi dsi definition updates, pulled into drm-intel as well. - Add lockdep annotations for dma_resv vs mmap_sem and fs_reclaim. - Remove support for dma-buf kmap/kunmap. - Constify fb_ops in all fbdev drivers, including drm drivers and drm-core, and media as well. Core Changes: - Small cleanups to ttm. - Fix SCDC definition. - Assorted cleanups to core. - Add todo to remove load/unload hooks, and use generic fbdev emulation. - Assorted documentation updates. - Use blocking ww lock in ttm fault handler. - Remove drm_fb_helper_fbdev_setup/teardown. - Warning fixes with W=1 for atomic. - Use drm_debug_enabled() instead of drm_debug flag testing in various drivers. - Fallback to nontiled mode in fbdev emulation when not all tiles are present. (Later on reverted) - Various kconfig indentation fixes in core and drivers. - Fix freeing transactions in dp-mst correctly. - Sean Paul is steping down as core maintainer. :-( - Add lockdep annotations for atomic locks vs dma-resv. - Prevent use-after-free for a bad job in drm_scheduler. - Fill out all block sizes in the P01x and P210 definitions. - Avoid division by zero in drm/rect, and fix bounds. - Add drm/rect selftests. - Add aspect ratio and alternate clocks for HDMI 4k modes. - Add todo for drm_framebuffer_funcs and fb_create cleanup. - Drop DRM_AUTH for prime import/export ioctls. - Clear DP-MST payload id tables downstream when initializating. - Fix for DSC throughput definition. - Add extra FEC definitions. - Fix fake offset in drm_gem_object_funs.mmap. - Stop using encoder->bridge in core directly - Handle bridge chaining slightly better. - Add backlight support to drm/panel, and use it in many panel drivers. - Increase max number of y420 modes from 128 to 256, as preparation to add the new modes. Driver Changes: - Small fixes all over. - Fix documentation in vkms. - Fix mmap_sem vs dma_resv in nouveau. - Small cleanup in komeda. - Add page flip support in gma500 for psb/cdv. - Add ddc symlink in the connector sysfs directory for many drivers. - Add support for analogic an6345, and fix small bugs in it. - Add atomic modesetting support to ast. - Fix radeon fault handler VMA race. - Switch udl to use generic shmem helpers. - Unconditional vblank handling for mcde. - Miscellaneous fixes to mcde. - Tweak debug output from komeda using debugfs. - Add gamma and color transform support to komeda for DOU-IPS. - Add support for sony acx424AKP panel. - Various small cleanups to gma500. - Use generic fbdev emulation in udl, and replace udl_framebuffer with generic implementation. - Add support for Logic PD Type 28 panel. - Use drm_panel_* wrapper functions in exynos/tegra/msm. - Add devicetree bindings for generic DSI panels. - Don't include drm_pci.h directly in many drivers. - Add support for begin/end_cpu_access in udmabuf. - Stop using drm_get_pci_dev in gma500 and mga200. - Fixes to UDL damage handling, and use dma_buf_begin/end_cpu_access. - Add devfreq thermal support to panfrost. - Fix hotplug with daisy chained monitors by removing VCPI when disabling topology manager. - meson: Add support for OSD1 plane AFBC commit. - Stop displaying garbage when toggling ast primary plane on/off. - More cleanups and fixes to UDL. - Add D32 suport to komeda. - Remove globle copy of drm_dev in gma500. - Add support for Boe Himax8279d MIPI-DSI LCD panel. - Add support for ingenic JZ4770 panel. - Small null pointer deference fix in ingenic. - Remove support for the special tfp420 driver, as there is a generic way to do it. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ba73535a-9334-5302-2e1f-5208bd7390bd@linux.intel.com	2019-12-17 13:57:54 +01:00
Andrey Grodzovsky	135517d356	drm/scheduler: Avoid accessing freed bad job. Problem: Due to a race between drm_sched_cleanup_jobs in sched thread and drm_sched_job_timedout in timeout work there is a possiblity that bad job was already freed while still being accessed from the timeout thread. Fix: Instead of just peeking at the bad job in the mirror list remove it from the list under lock and then put it back later when we are garanteed no race with main sched thread is possible which is after the thread is parked. v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. v3: Rebase on top of drm-misc-next. v2 is not needed anymore as drm_sched_get_cleanup_job already has a lock there. v4: Fix comments to relfect latest code in drm-misc. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Tested-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/342356	2019-11-27 16:05:49 +01:00
Andrey Grodzovsky	2b6f717c33	drm/sched: Avoid job cleanup if sched thread is parked. When the sched thread is parked we assume ring_mirror_list is not accessed from here. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-11-07 18:08:07 -05:00
Andrey Grodzovsky	83a7772ba2	drm/sched: Use completion to wait for sched->thread idle v2. Removes thread park/unpark hack from drm_sched_entity_fini and by this fixes reactivation of scheduler thread while the thread is supposed to be stopped. v2: Per sched entity completion. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-11-07 18:08:07 -05:00
Andrey Grodzovsky	d7c5782acd	drm/sched: Fix passing zero to 'PTR_ERR' warning v2 Fix a static code checker warning. v2: Drop PTR_ERR_OR_ZERO. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-11-06 16:27:47 -05:00
Dave Airlie	8a86b00a43	Merge tag 'drm-next-5.5-2019-11-01' of git://people.freedesktop.org/~agd5f/linux into drm-next drm-next-5.5-2019-11-01: amdgpu: - Add EEPROM support for Arcturus - Enable VCN encode support for Arcturus - Misc PSP fixes - Misc DC fixes - swSMU cleanup amdkfd: - Misc cleanups - Fix typo in cu bitmap parsing Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101190607.3763-1-alexander.deucher@amd.com	2019-11-04 10:22:53 +10:00
Andrey Grodzovsky	e91e5f080e	drm/sched: Set error to s_fence if HW job submission failed. Problem: When run_job fails and HW fence returned is NULL we still signal the s_fence to avoid hangs but the user has no way of knowing if the actual HW job was ran and finished. Fix: Allow .run_job implementations to return ERR_PTR in the fence pointer returned and then set this error for s_fence->finished fence so whoever wait on this fence can inspect the signaled fence for an error. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-10-25 16:50:11 -04:00
Steven Price	588b9828f0	drm: Don't free jobs in wait_event_interruptible() drm_sched_cleanup_jobs() attempts to free finished jobs, however because it is called as the condition of wait_event_interruptible() it must not sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. Instead let's rename drm_sched_cleanup_jobs() to drm_sched_get_cleanup_job() and simply return a job for processing if there is one. The caller can then call the free_job() callback outside the wait_event_interruptible() where sleeping is possible before re-checking and returning to sleep if necessary. Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com> Fixes: `5918045c4e` ("drm/scheduler: rework job destruction") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/337652/	2019-10-25 13:47:58 +02:00
Ben Dooks	2636a5172d	drm/scheduler: make unexported items static The drm_sched_fence_ops_{scheduled,finished} are not exported from the file so make them static to avoid the following warnings from sparse: drivers/gpu/drm/scheduler/sched_fence.c:131:28: warning: symbol 'drm_sched_fence_ops_scheduled' was not declared. Should it be static? drivers/gpu/drm/scheduler/sched_fence.c:137:28: warning: symbol 'drm_sched_fence_ops_finished' was not declared. Should it be static? Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191009121447.31017-1-ben.dooks@codethink.co.uk	2019-10-10 09:11:46 -05:00
Christian König	85cb9d5067	drm/scheduler: use job count instead of peek The spsc_queue_peek function is accessing queue->head which belongs to the consumer thread and shouldn't be accessed by the producer This is fixing a rare race condition when destroying entities. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Monk.liu@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-08-15 10:52:10 -05:00
Sam Ravnborg	7c1be93c8e	drm/scheduler: drop use of drmP.h Drop use of the deprecated drmP.h header file. Fix fallout. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Eric Anholt <eric@anholt.net> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Sharat Masetty <smasetty@codeaurora.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org	2019-07-15 18:11:31 +02:00
Andrey Grodzovsky	d0f29d4980	drm/sched: Fix make htmldocs warnings. Document the missing parameters. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1559140180-6762-1-git-send-email-andrey.grodzovsky@amd.com	2019-05-29 11:49:51 -05:00
Andrey Grodzovsky	b576ff902f	drm/sched: Fix static checker warning for potential NULL ptr Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1558533443-7795-1-git-send-email-andrey.grodzovsky@amd.com	2019-05-24 13:21:06 -05:00
Sean Paul	374ed54293	Merge drm/drm-next into drm-misc-next Backmerging 5.2-rc1 to -misc-next for robher Signed-off-by: Sean Paul <seanpaul@chromium.org>	2019-05-22 16:08:21 -04:00
Erico Nunes	794c686eb7	drm/scheduler: Fix job cleanup without timeout handler After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are only deleted when the timeout handler is able to be cancelled successfully. In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT), job cleanup would be skipped which may result in memory leaks. Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in drm_sched_cleanup_jobs. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/306025/?series=60878&rev=2	2019-05-22 08:42:50 +02:00
Andrey Grodzovsky	a5343b8a2c	drm/scheduler: Add flag to hint the release of guilty job. Problem: Sched thread's cleanup function races against TO handler and removes the guilty job from mirror list and we have no way of differentiating if the job was removed from within the TO handler or from the sched thread's clean-up function. Fix: Add a flag to scheduler to hint the TO handler that the guilty job needs to be explicitly released. v2: whitespace fix Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-5-git-send-email-andrey.grodzovsky@amd.com	2019-05-02 15:50:55 -05:00
Andrey Grodzovsky	290764af7e	drm/sched: Keep s_fence->parent pointer For later driver's reference to see if the fence is signaled. v2: Move parent fence put to resubmit jobs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-4-git-send-email-andrey.grodzovsky@amd.com	2019-05-02 15:46:52 -05:00
Christian König	5918045c4e	drm/scheduler: rework job destruction We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. v6: remove unused variable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com	2019-05-02 15:45:48 -05:00
Jonathan Neuschäfer	f5d356328d	drm/sched: Fix description of drm_sched_stop Since commit `222b5f0441` ("drm/sched: Refactor ring mirror list handling."), drm_sched_hw_job_reset is no longer there, so let's adjust the doc comment accordingly. Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-04-23 11:15:38 -05:00
Dave Airlie	fbac3c48fa	Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next Fixes for 5.1: amdgpu: - Fix missing fw declaration after dropping old CI DPM code - Fix debugfs access to registers beyond the MMIO bar size - Fix context priority handling - Add missing license on some new files - Various cleanups and bug fixes radeon: - Fix missing break in CS parser for evergreen - Various cleanups and bug fixes sched: - Fix entities with 0 run queues Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com	2019-02-22 15:56:42 +10:00
Dave Airlie	c06de56121	Linux 5.0-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxqHJYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWl8H/jPI4EipzD2GbnjZ GaFpMBBjcXBaVmoA+Y69so+7BHx1Ql+5GQtqbK0RHJRb9qEPLw3FBhHNjM/N8Sgf nSrK+GnBZp9s+k/NR/Yf2RacUR3jhz+Q9JEoQd3u9bFUeQyvE8Rf3vgtoBBwFOfz +t7N1memYVF3asLGWB4e4sP1YVMGfseTQpSPojvM30YWM86Bv+QtSx1AGgHczQIM kMKealR8ZPelN6JAXgLhQ5opDojBrE4YKB98pwsMDI6abz0Tz2JLFEUTTxsv5XNN o/Iz+XDoylskEyxN2unNWfHx7Swkvoklog8J/hDg5XlTvipL/WkT66PHBgcGMNvj BW9GgU8= =ZizU -----END PGP SIGNATURE----- Merge v5.0-rc7 into drm-next Backmerging for nouveau and imx that needed some fixes for next pulls. Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-02-18 13:27:15 +10:00
Bas Nieuwenhuizen	1decbf6bb0	drm/sched: Fix entities with 0 rqs. Some blocks in amdgpu can have 0 rqs. Job creation already fails with -ENOENT when entity->rq is NULL, so jobs cannot be pushed. Without a rq there is no scheduler to pop jobs, and rq selection already does the right thing with a list of length 0. So the operations we need to fix are: - Creation, do not set rq to rq_list[0] if the list can have length 0. - Do not flush any jobs when there is no rq. - On entity destruction handle the rq = NULL case. - on set_priority, do not try to change the rq if it is NULL. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-02-15 11:15:08 -05:00
Eric Anholt	82abf33766	drm/sched: Always trace the dependencies we wait on, to fix a race. The entity->dependency can go away completely once we've called drm_sched_entity_add_dependency_cb() (if the cb is called before we get around to tracing). The tracepoint is more useful if we trace every dependency instead of just ones that get callbacks installed, anyway, so just do that. Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with "perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-02-08 14:02:33 -05:00
Andrey Grodzovsky	3741540e04	drm/sched: Rework HW fence processing. Expedite job deletion from ring mirror list to the HW fence signal callback instead from finish_work, together with waiting for all such fences to signal in drm_sched_stop we garantee that already signaled job will not be processed twice. Remove the sched finish fence callback and just submit finish_work directly from the HW fence callback. v2: Fix comments. v3: Attach hw fence cb to sched_job v5: Rebase Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-01-25 16:15:36 -05:00
Andrey Grodzovsky	222b5f0441	drm/sched: Refactor ring mirror list handling. Decauple sched threads stop and start and ring mirror list handling from the policy of what to do about the guilty jobs. When stoppping the sched thread and detaching sched fences from non signaled HW fenes wait for all signaled HW fences to complete before rerunning the jobs. v2: Fix resubmission of guilty job into HW after refactoring. v4: Full restart for all the jobs, not only from guilty ring. Extract karma increase into standalone function. v5: Rework waiting for signaled jobs without relying on the job struct itself as those might already be freed for non 'guilty' job's schedulers. Expose karma increase to drivers. v6: Use list_for_each_entry_safe_continue and drm_sched_process_job in case fence already signaled. Call drm_sched_increase_karma only once for amdgpu and add documentation. v7: Wait only for the latest job's fence. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-01-25 16:15:36 -05:00
Sharat Masetty	1db8c142b6	drm/scheduler: Add drm_sched_suspend/resume_timeout() This patch adds two new functions to help client drivers suspend and resume the scheduler job timeout. This can be useful in cases where the hardware has preemption support enabled. Using this, it is possible to have the timeout active only for the ring which is active on the ringbuffer. This patch also makes the job_list_lock IRQ safe. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-12-05 17:56:16 -05:00
Sharat Masetty	9afd07566b	drm/scheduler: Set sched->thread to NULL on failure In cases where the scheduler instance is used as a base object of another driver object, it's not clear if the driver can call scheduler cleanup on the fail path. So, Set the sched->thread to NULL, so that the driver can safely call drm_sched_fini() during cleanup. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-12-05 17:56:16 -05:00
Christian König	68c12d24ce	drm/sched: revert "fix timeout handling v2" v2 This reverts commit `0efd2d2f68`. It's still causing problems for V3D. v2: keep rearming the timeout. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-28 15:49:58 -05:00
Trigger Huang	85744e9c10	drm/scheduler: Fix bad job be re-processed in TDR A bad job is the one triggered TDR(In the current amdgpu's implementation, actually all the jobs in the current joq-queue will be treated as bad jobs). In the recovery process, its fence will be fake signaled and as a result, the work behind will be scheduled to delete it from the mirror list, but if the TDR process is invoked before the work's execution, then this bad job might be processed again and the call dma_fence_set_error to its fence in TDR process will lead to kernel warning trace: [ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi [ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu [ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched] [ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] [ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202 [ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08 [ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18 [ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6 [ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430 [ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00 [ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000 [ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0 [ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 143.033670] Call Trace: [ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu] [ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu] [ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched] [ 143.033795] process_one_work+0x1de/0x410 [ 143.033797] worker_thread+0x32/0x410 [ 143.033799] kthread+0x121/0x140 [ 143.033801] ? process_one_work+0x410/0x410 [ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70 [ 143.033806] ret_from_fork+0x35/0x40 So just delete the bad job from mirror list directly Changes in v3: - Add a helper function to delete the bad jobs from mirror list and call it directly before the job's fence is signaled Changes in v2: - delete the useless list node check - also delete bad jobs in drm_sched_main because: kthread_unpark(ring->sched.thread) will be invoked very early before amdgpu_device_gpu_recover's return, then drm_sched_main will have chance to pick up a new job from the job queue. This new job will be added into the mirror list and processed by amdgpu_job_run, but may not be deleted from the mirror list on time due to the same reason. And finally re-processed by drm_sched_job_recovery Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Christian König <chrstian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-19 16:38:15 -05:00
Sharat Masetty	26efecf955	drm/scheduler: Add drm_sched_job_cleanup This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:27 -05:00
Andrey Grodzovsky	faf6e1a87e	drm/sched: Add boolean to mark if sched is ready to work v5 Problem: A particular scheduler may become unsuable (underlying HW) after some event (e.g. GPU reset). If it's later chosen by the get free sched. policy a command will fail to be submitted. Fix: Add a driver specific callback to report the sched status so rq with bad sched can be avoided in favor of working one or none in which case job init will fail. v2: Switch from driver callback to flag in scheduler. v3: rebase v4: Remove ready paramter from drm_sched_init, set uncoditionally to true once init done. v5: fix missed change in v3d in v4 (Alex) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:22 -05:00
Christian König	8fe159b014	drm/sched: add drm_sched_fault Add a helper to immediately start timeout handling in case of a hardware fault. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:02 -05:00
Christian König	19067e522d	drm/sched: make sure timer is restarted Make sure we always restart the timer after a timeout and remove the device specific workarounds. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:01 -05:00
Christian König	0efd2d2f68	drm/sched: fix timeout handling v2 We need to make sure that we don't race between job completion and timeout. v2: put revert label after calling the handling manually Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-12 12:52:32 -05:00
Christian König	b981c86f03	drm/sched: add drm_sched_start_timeout helper Cleanup starting the timeout a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-12 12:52:21 -05:00
Nathan Chancellor	c1f0320e03	drm/scheduler: Simplify spsc_queue_count check in drm_sched_entity_select_rq Clang generates a warning when it sees a logical not followed by a conditional operator like ==, >, or <. drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is only applied to the left hand side of this comparison [-Wlogical-not-parentheses] if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ~~ drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses after the '!' to evaluate the comparison first if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ( ) drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses around left hand side expression to silence this warning if (!spsc_queue_count(&entity->job_queue) == 0 \|\| ^ ( ) 1 warning generated. It assumes the author might have made a mistake in their logic: if (!a == b) -> if (!(a == b)) Sometimes that is the case; other times, it's just a super convoluted way of saying 'if (a)' when b = 0: if (!1 == 0) -> if (0 == 0) -> if (true) Alternatively: if (!1 == 0) -> if (!!1) -> if (1) Simplify this comparison so that Clang doesn't complain. Fixes: `35e160e781` ("drm/scheduler: change entities rq even earlier") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-09 17:06:00 -05:00
Nayan Deshmukh	6a96243056	drm/scheduler: remove timeout work_struct from drm_sched_job (v3) having a delayed work item per job is redundant as we only need one per scheduler to track the time out the currently executing job. v2: the first element of the ring mirror list is the currently executing job so we don't need a additional variable for it v3: squash in fixes for v3d and etnaviv Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-09-27 09:55:45 -05:00
Nayan Deshmukh	c89677afb3	drm/scheduler: avoid redundant shifting of the entity v2 do not remove entity from the rq if the current rq is from the least loaded scheduler. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:11:14 -05:00
Andrey Grodzovsky	62347a3300	drm/scheduler: Add stopped flag to drm_sched_entity The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removewd from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:11:10 -05:00
Christian König	23f67981fd	drm/scheduler: rename gpu_scheduler.c to sched_main.c Better match the naming of the other components. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:44 -05:00
Christian König	7b10574eac	drm/scheduler: cleanup entity coding style Cleanup coding style in sched_entity.c Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:43 -05:00
Christian König	620e762f9a	drm/scheduler: move entity handling into separate file This is complex enough on it's own. Move it into a separate C file. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:43 -05:00
Christian König	8acc725457	drm/scheduler: trivial error handling fix Return -ENOMEM when allocating the rq_list fails. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:42 -05:00
Christian König	35e160e781	drm/scheduler: change entities rq even earlier Looks like for correct debugging we need to know the scheduler even earlier. So move picking a rq for an entity into job creation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:07 -05:00
Christian König	573edb241b	drm/scheduler: fix last_scheduled handling Make sure we access last_scheduled only after checking that there are no more jobs on the entity. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-08-27 11:10:06 -05:00

1 2

88 Commits