linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-22 00:09:45 +07:00

Author	SHA1	Message	Date
Chris Wilson	01f624f018	drm/i915: Ratelimit i915_globals_park When doing our global park, we like to be a good citizen and shrink our slab caches (of which we have quite a few now), but each kmem_cache_shrink() incurs a stop_machine() and so ends up being quite expensive, causing machine-wide stalls. While ideally we would like to throw away unused pages in our slab caches whenever it appears that we are idling, doing so will require a much cheaper mechanism. In the meantime use a delayed worked to impose a rate-limit that means we have to have been idle for more than 2 seconds before we start shrinking. References: https://gitlab.freedesktop.org/drm/intel/issues/848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191218094057.3510459-1-chris@chris-wilson.co.uk	2019-12-18 17:38:56 +00:00
Matthew Auld	14d1b9a624	drm/i915: buddy allocator Simple buddy allocator. We want to allocate properly aligned power-of-two blocks to promote usage of huge-pages for the GTT, so 64K, 2M and possibly even 1G. While we do support allocating stuff at a specific offset, it is more intended for preallocating portions of the address space, say for an initial framebuffer, for other uses drm_mm is probably a much better fit. Anyway, hopefully this can all be thrown away if we eventually move to having the core MM manage device memory. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190809202926.14545-2-matthew.auld@intel.com	2019-08-10 19:47:40 +01:00
Chris Wilson	10be98a77c	drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	98932149ae	drm/i915: Move object->pages API to i915_gem_object.[ch] Currently the code for manipulating the pages on an object is still residing in i915_gem.c, move it to i915_gem_object.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-3-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	da23379f15	drm/i915: Use static allocation for i915_globals_park() In order to avoid the malloc inside i915_globals_park() occurring underneath a lock connected to the shrinker (thus causing circular lockdeps warnings), move the rcu_worker to a global. <4> [39.085073] ====================================================== <4> [39.085273] WARNING: possible circular locking dependency detected <4> [39.085552] 5.1.0-rc3-CI-Trybot_4088+ #1 Tainted: G U <4> [39.085752] ------------------------------------------------------ <4> [39.085949] kswapd0/32 is trying to acquire lock: <4> [39.086121] 00000000004b5f91 (wakeref#3){+.+.}, at: intel_engine_pm_put+0x1b/0x40 [i915] <4> [39.086493] but task is already holding lock: <4> [39.086682] 00000000dd009a9a (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30 <4> [39.086910] which lock already depends on the new lock. <4> [39.087139] the existing dependency chain (in reverse order) is: <4> [39.087356] -> #2 (fs_reclaim){+.+.}: <4> [39.087604] fs_reclaim_acquire.part.24+0x24/0x30 <4> [39.087785] kmem_cache_alloc_trace+0x2a/0x290 <4> [39.087998] i915_globals_park+0x22/0xa0 [i915] <4> [39.088478] idle_work_handler+0x1df/0x220 [i915] <4> [39.089016] process_one_work+0x245/0x610 <4> [39.089447] worker_thread+0x37/0x380 <4> [39.089956] kthread+0x119/0x130 <4> [39.090374] ret_from_fork+0x3a/0x50 <4> [39.090868] -> #1 (wakeref#4){+.+.}: <4> [39.091569] __mutex_lock+0x8c/0x960 <4> [39.092054] atomic_dec_and_mutex_lock+0x33/0x50 <4> [39.092521] intel_gt_pm_put+0x1b/0x40 [i915] <4> [39.093047] intel_engine_park+0xeb/0x1d0 [i915] <4> [39.093514] __intel_wakeref_put_once+0x10/0x30 [i915] <4> [39.094062] i915_request_retire+0x477/0xaf0 [i915] <4> [39.094547] ring_retire_requests+0x86/0x160 [i915] <4> [39.095110] i915_retire_requests+0x58/0xc0 [i915] <4> [39.095587] i915_gem_wait_for_idle.part.22+0xb2/0xf0 [i915] <4> [39.096142] switch_to_kernel_context_sync+0x2a/0x70 [i915] <4> [39.096633] i915_gem_init+0x59c/0x9c0 [i915] <4> [39.097174] i915_driver_load+0xd96/0x1880 [i915] <4> [39.097640] i915_pci_probe+0x29/0xa0 [i915] <4> [39.098145] pci_device_probe+0xa1/0x120 <4> [39.098607] really_probe+0xf3/0x3e0 <4> [39.099031] driver_probe_device+0x10a/0x120 <4> [39.099599] device_driver_attach+0x4b/0x50 <4> [39.100033] __driver_attach+0x97/0x130 <4> [39.100525] bus_for_each_dev+0x74/0xc0 <4> [39.100954] bus_add_driver+0x13f/0x210 <4> [39.101441] driver_register+0x56/0xe0 <4> [39.101891] do_one_initcall+0x58/0x2e0 <4> [39.102319] do_init_module+0x56/0x1ea <4> [39.102805] load_module+0x2701/0x29e0 <4> [39.103231] __se_sys_finit_module+0xd3/0xf0 <4> [39.103727] do_syscall_64+0x55/0x190 <4> [39.104153] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [39.104736] -> #0 (wakeref#3){+.+.}: <4> [39.105437] lock_acquire+0xa6/0x1c0 <4> [39.105923] __mutex_lock+0x8c/0x960 <4> [39.106345] atomic_dec_and_mutex_lock+0x33/0x50 <4> [39.106897] intel_engine_pm_put+0x1b/0x40 [i915] <4> [39.107375] i915_request_retire+0x477/0xaf0 [i915] <4> [39.107930] ring_retire_requests+0x86/0x160 [i915] <4> [39.108412] i915_retire_requests+0x58/0xc0 [i915] <4> [39.108934] i915_gem_shrink+0xd8/0x5b0 [i915] <4> [39.109431] i915_gem_shrinker_scan+0x59/0x130 [i915] <4> [39.109884] do_shrink_slab+0x131/0x3e0 <4> [39.110380] shrink_slab+0x228/0x2c0 <4> [39.110810] shrink_node+0x177/0x460 <4> [39.111317] balance_pgdat+0x239/0x580 <4> [39.111743] kswapd+0x186/0x570 <4> [39.112221] kthread+0x119/0x130 <4> [39.112641] ret_from_fork+0x3a/0x50 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190408091728.20207-3-chris@chris-wilson.co.uk	2019-04-08 17:04:01 +01:00
Chris Wilson	c4d52feb2c	drm/i915: Move over to intel_context_lookup() In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk	2019-03-08 13:59:52 +00:00
Chris Wilson	103b76eeff	drm/i915: Use i915_global_register() Rather than manually add every new global into each hook, use i915_global_register() function and keep a list of registered globals to invoke instead. However, I haven't found a way for random drivers to add an .init table to avoid having to manually add ourselves to i915_globals_init() each time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2019-03-06 10:00:50 +00:00
Chris Wilson	13f1bfd3b3	drm/i915: Make object/vma allocation caches global As our allocations are not device specific, we can move our slab caches to a global scope. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-2-chris@chris-wilson.co.uk	2019-02-28 11:08:02 +00:00
Chris Wilson	32eb6bcfdd	drm/i915: Make request allocation caches global As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk	2019-02-28 11:07:56 +00:00

9 Commits