linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Lucas Stach	ca8cb69580	drm/etnaviv: fix deadlock in GPU coredump The GPU coredump function violates the locking order by holding the MMU context lock while trying to acquire the etnaviv_gem_object lock. This results in a possible ABBA deadlock with other codepaths which follow the established locking order. Fortunately this is easy to fix by dropping the MMU context lock earlier, as the BO dumping doesn't need the MMU context to be stable. The only thing the BO dumping cares about are the BO mappings, which are stable across the lifetime of the job. Fixes: `27b67278e0` (drm/etnaviv: rework MMU handling) [ Not really the first bad commit, but the one where this fix applies cleanly. Stable kernels need a manual backport. ] Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-10-29 18:11:06 +01:00
Lucas Stach	17e4660ae3	drm/etnaviv: implement per-process address spaces on MMUv2 This builds on top of the MMU contexts introduced earlier. Instead of having one context per GPU core, each GPU client receives its own context. On MMUv1 this still means a single shared pagetable set is used by all clients, but on MMUv2 there is now a distinct set of pagetables for each client. As the command fetch is also translated via the MMU on MMUv2 the kernel command ringbuffer is mapped into each of the client pagetables. As the MMU context switch is a bit of a heavy operation, due to the needed cache and TLB flushing, this patch implements a lazy way of switching the MMU context. The kernel does not have its own MMU context, but reuses the last client context for all of its operations. This has some visible impact, as the GPU can now only be started once a client has submitted some work and we got the client MMU context assigned. Also the MMU context has a different lifetime than the general client context, as the GPU might still execute the kernel command buffer in the context of a client even after the client has completed all GPU work and has been terminated. Only when the GPU is runtime suspended or switches to another clients MMU context is the old context freed up. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>	2019-08-15 11:44:27 +02:00
Lucas Stach	27b67278e0	drm/etnaviv: rework MMU handling This reworks the MMU handling to make it possible to have multiple MMU contexts. A context is basically one instance of GPU page tables. Currently we have one set of page tables per GPU, which isn't all that clever, as it has the following two consequences: 1. All GPU clients (aka processes) are sharing the same pagetables, which means there is no isolation between clients, but only between GPU assigned memory spaces and the rest of the system. Better than nothing, but also not great. 2. Clients operating on the same set of buffers with different etnaviv GPU cores, e.g. a workload using both the 2D and 3D GPU, need to map the used buffers into the pagetable sets of each used GPU. This patch reworks all the MMU handling to introduce the abstraction of the MMU context. A context can be shared across different GPU cores, as long as they have compatible MMU implementations, which is the case for all systems with Vivante GPUs seen in the wild. As MMUv1 is not able to change pagetables on the fly, without a "stop the world" operation, which stops GPU, changes pagetables via CPU interaction, restarts GPU, the implementation introduces a shared context on MMUv1, which is returned whenever there is a request for a new context. This patch assigns a MMU context to each GPU, so on MMUv2 systems there is still one set of pagetables per GPU, but due to the shared context MMUv1 systems see a change in behavior as now a single pagetable set is used across all GPU cores. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>	2019-08-15 10:56:45 +02:00
Lucas Stach	db82a0435b	drm/etnaviv: split out cmdbuf mapping into address space This allows to decouple the cmdbuf suballocator create and mapping the region into the GPU address space. Allowing multiple AS to share a single cmdbuf suballoc. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>	2019-08-15 10:55:03 +02:00
Lucas Stach	3001eeb7f2	drm/etnaviv: pass mmu pointer to etnaviv_core_dump_mmu This function does only need the mmu part part of the gpu struct. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2019-08-15 10:53:31 +02:00
Lucas Stach	9a1fdae587	drm/etnaviv: dump only failing submit Due to the tracking provided by the scheduler we know exactly which submit is failing. Only dump this single submit and the required auxiliary information. This cuts down the size of the devcoredumps by only including relevant information. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2019-08-15 10:48:51 +02:00
Lucas Stach	2e737e5205	drm/etnaviv: clean up includes Drop unused includes, move more includes from the generic etnaviv_drv.h to the units where they are actually used, sort includes. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org>	2019-08-02 19:17:33 +02:00
Daniel Vetter	52d2d44eee	Linux 5.2-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Gj1MeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGctkH/0At3+SQPY2JJSy8 i6+TDeytFx9OggeGLPHChRfehkAlvMb/kd34QHnuEvDqUuCAMU6HZQJFKoK9mvFI sDJVayPGDSqpm+iv8qLpMBPShiCXYVnGZeVfOdv36jUswL0k6wHV1pz4avFkDeZa 1F4pmI6O2XRkNTYQawbUaFkAngWUCBG9ECLnHJnuIY6ohShBvjI4+E2JUaht+8gO M2h2b9ieddWmjxV3LTKgsK1v+347RljxdZTWnJ62SCDSEVZvsgSA9W2wnebVhBkJ drSmrFLxNiM+W45mkbUFmQixRSmjv++oRR096fxAnodBxMw0TDxE1RiMQWE6rVvG N6MC6xA= =+B0P -----END PGP SIGNATURE----- Merge v5.2-rc5 into drm-next Maarten needs -rc4 backmerged so he can pull in the fbcon notifier removal topic branch into drm-misc-next. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2019-06-19 12:07:29 +02:00
Lucas Stach	1396500d67	drm/etnaviv: lock MMU while dumping core The devcoredump needs to operate on a stable state of the MMU while it is writing the MMU state to the coredump. The missing lock allowed both the userspace submit, as well as the GPU job finish paths to mutate the MMU state while a coredump is under way. Fixes: `a8c21a5451` (drm/etnaviv: add initial etnaviv DRM driver) Reported-by: David Jander <david@protonic.nl> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: David Jander <david@protonic.nl> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2019-05-27 16:08:38 +02:00
Christian König	5918045c4e	drm/scheduler: rework job destruction We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. v6: remove unused variable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com	2019-05-02 15:45:48 -05:00
Dan Carpenter	f8261c376e	drm/etnaviv: NULL vs IS_ERR() buf in etnaviv_core_dump() The etnaviv_gem_get_pages() never returns NULL. It returns error pointers on error. Fixes: `a8c21a5451` ("drm/etnaviv: add initial etnaviv DRM driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2019-01-16 16:29:12 +01:00
Dave Airlie	e7df065a69	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next [airlied: make etnaviv build again] amdgpu: - DC trace support - More DC documentation - XGMI hive reset support - Rework IH interaction with KFD - Misc fixes and cleanups - Powerplay updates for newer polaris variants - Add cursor plane update fast path - Enable gpu reset by default on CI parts - Fix config with KFD/HSA not enabled amdkfd: - Limit vram overcommit - dmabuf support - Support for doorbell BOs ttm: - Support for simultaneous submissions to multiple engines scheduler: - Add helpers for hw with preemption support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181207233119.16861-1-alexander.deucher@amd.com	2018-12-13 10:06:34 +10:00
Sharat Masetty	1db8c142b6	drm/scheduler: Add drm_sched_suspend/resume_timeout() This patch adds two new functions to help client drivers suspend and resume the scheduler job timeout. This can be useful in cases where the hardware has preemption support enabled. Using this, it is possible to have the timeout active only for the ring which is active on the ringbuffer. This patch also makes the job_list_lock IRQ safe. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-12-05 17:56:16 -05:00
Lucas Stach	f6ffbd4fc1	drm/etnaviv: replace license text with SPDX tags This replaces the repetitive GPL-2.0 license text in code and header files with the SPDX tags. Generated hardware headers aren't changed, as any changes there need to be done in the upstream rnndb repository. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-05-18 15:27:56 +02:00
Lucas Stach	6d7a20c077	drm/etnaviv: replace hangcheck with scheduler timeout This replaces the etnaviv internal hangcheck logic with the job timeout handling provided by the DRM scheduler. This simplifies the driver further and allows to replay jobs after a GPU reset, so only minimal state is lost. This introduces a user-visible change in that we don't allow jobs to run indefinitely as long as they make progress anymore, as this introduces quality of service issues when multiple processes are using the GPU. Userspace is now responsible to flush jobs in a way that the finish in a reasonable time, where reasonable is currently defined as less than 500ms. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-02-12 16:31:01 +01:00
Lucas Stach	2f9225dbc0	drm/etnaviv: move cmdbuf into submit object Less dynamic allocations and slims down the cmdbuf object to only the required information, as everything else is already available in the submit object. This also simplifies buffer and mappings lifetime management, as they are now exlusively attached to the submit object and not additionally to the cmdbuf. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-01-02 17:33:36 +01:00
Michal Hocko	19809c2da2	mm, vmalloc: use __GFP_HIGHMEM implicitly __vmalloc* allows users to provide gfp flags for the underlying allocation. This API is quite popular $ git grep "=[[:space:]]__vmalloc\\|return[[:space:]]*__vmalloc" \| wc -l 77 The only problem is that many people are not aware that they really want to give __GFP_HIGHMEM along with other flags because there is really no reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages which are mapped to the kernel vmalloc space. About half of users don't use this flag, though. This signals that we make the API unnecessarily too complex. This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM are simplified and drop the flag. Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Cristopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-08 17:15:13 -07:00
Lucas Stach	c3ef4b8c3e	drm/etnaviv: wire up iova handling in new cmdbuf abstraction Don't call the IOMMU directly, but go through the new cmdbuf abstraction. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-02-02 10:30:20 +01:00
Lucas Stach	ea1f5729aa	drm/etnaviv: move cmdbuf de-/allocation into own file This will get more complex with the following changes, so move it into its own place. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-02-02 10:30:15 +01:00
Lucas Stach	172dbac35e	drm/etnaviv: don't invoke OOM killer from dump code The dumper is only a debugging aid so we don't want to invoke the OOM killer if buffer for the potentially large GPU state can't be vmalloced. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2016-12-02 19:30:23 +01:00
Lucas Stach	06225487ae	drm/etnaviv: record correct cmdbuf IOVA in dump For cmdbufs the CPU IOVA was recorded instead of the GPU one. Fix this to make it consistent with other BOs and to make reading the dumps easier. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2016-09-15 15:29:45 +02:00
Lucas Stach	ce3088fdb5	drm/etnaviv: rename etnaviv_gem_vaddr to etnaviv_gem_vmap This function follows the semantics of vmap() by returning NULL in case of an error. To make things less confusing rename it to make make both functions more closely related. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2016-01-26 18:54:01 +01:00
Lucas Stach	9f07bb0d4a	drm/etnaviv: fix get pages error path in etnaviv_gem_vaddr In case that etnaviv_gem_get_pages is unable to get the required pages the object mutex needs to be unlocked. Also return NULL in this case instead of propagating the error, as callers of this function might not be prepared to handle a pointer error, but expect this call to follow the semantics of a plain vmap to return NULL in case of an error. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2016-01-26 18:54:00 +01:00
Lucas Stach	339073ef77	drm/etnaviv: hold object lock while getting pages for coredump While all objects that get coredumped have an active IOVA and thus pages already populated, etnaviv_gem_get_pages() still requires the object lock to be held. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2016-01-25 14:23:49 +01:00
The etnaviv authors	a8c21a5451	drm/etnaviv: add initial etnaviv DRM driver This adds the etnaviv DRM driver and hooks it up in Makefiles and Kconfig. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-12-15 14:48:02 +01:00

25 Commits