linux_dsm_epyc7002/drivers/gpu/drm/i915/gem
Chris Wilson 16c46fd505 drm/i915/gem: Avoid rcu_barrier() from shrinker paths
As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm
releases (and destruction of their bound vma), we have to be careful not
to invoke that barrier from beneath the shrinker:

<4> [430.222671] WARNING: possible circular locking dependency detected
<4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G     U
<4> [430.222675] ------------------------------------------------------
<4> [430.222677] gem_pwrite/2317 is trying to acquire lock:
<4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [430.222685]
but task is already holding lock:
<4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222691]
which lock already depends on the new lock.

<4> [430.222693]
the existing dependency chain (in reverse order) is:
<4> [430.222695]
-> #2 (fs_reclaim){+.+.}:
<4> [430.222698]        fs_reclaim_acquire.part.117+0x24/0x30
<4> [430.222702]        kmem_cache_alloc_trace+0x2a/0x2c0
<4> [430.222705]        intel_cpuc_prepare+0x37/0x1a0
<4> [430.222709]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [430.222712]        _cpu_up+0xa2/0x140
<4> [430.222714]        do_cpu_up+0x61/0xa0
<4> [430.222718]        smp_init+0x57/0x96
<4> [430.222722]        kernel_init_freeable+0xac/0x1c7
<4> [430.222725]        kernel_init+0x5/0x100
<4> [430.222728]        ret_from_fork+0x24/0x50
<4> [430.222729]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [430.222733]        cpus_read_lock+0x34/0xd0
<4> [430.222734]        rcu_barrier+0xaa/0x190
<4> [430.222736]        kernel_init+0x21/0x100
<4> [430.222737]        ret_from_fork+0x24/0x50
<4> [430.222739]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [430.222742]        __lock_acquire+0x1328/0x15d0
<4> [430.222743]        lock_acquire+0xa7/0x1c0
<4> [430.222746]        __mutex_lock+0x9a/0x9d0
<4> [430.222747]        rcu_barrier+0x23/0x190
<4> [430.222850]        i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.222882]        i915_gem_shrink+0x297/0x5f0 [i915]
<4> [430.222912]        i915_gem_shrink_all+0x38/0x60 [i915]
<4> [430.222934]        i915_drop_caches_set+0x1f0/0x240 [i915]
<4> [430.222938]        simple_attr_write+0xb0/0xd0
<4> [430.222941]        full_proxy_write+0x51/0x80
<4> [430.222943]        vfs_write+0xb9/0x1d0
<4> [430.222944]        ksys_write+0x9f/0xe0
<4> [430.222946]        do_syscall_64+0x4f/0x210
<4> [430.222948]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [430.222950]
other info that might help us debug this:

<4> [430.222952] Chain exists of:
  rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim

<4> [430.222955]  Possible unsafe locking scenario:

<4> [430.222957]        CPU0                    CPU1
<4> [430.222958]        ----                    ----
<4> [430.222960]   lock(fs_reclaim);
<4> [430.222961]                                lock(cpu_hotplug_lock.rw_sem);
<4> [430.222963]                                lock(fs_reclaim);
<4> [430.222964]   lock(rcu_state.barrier_mutex);
<4> [430.222966]
 *** DEADLOCK ***

<4> [430.222968] 3 locks held by gem_pwrite/2317:
<4> [430.222969]  #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0
<4> [430.222973]  #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
<4> [430.222976]  #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222980]
stack backtrace:
<4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7508+ #1
<4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
<4> [430.222989] Call Trace:
<4> [430.222992]  dump_stack+0x71/0x9b
<4> [430.222995]  check_noncircular+0x19b/0x1c0
<4> [430.222998]  ? __lock_acquire+0x1328/0x15d0
<4> [430.222999]  __lock_acquire+0x1328/0x15d0
<4> [430.223001]  ? mark_held_locks+0x49/0x70
<4> [430.223003]  lock_acquire+0xa7/0x1c0
<4> [430.223005]  ? rcu_barrier+0x23/0x190
<4> [430.223008]  __mutex_lock+0x9a/0x9d0
<4> [430.223009]  ? rcu_barrier+0x23/0x190
<4> [430.223011]  ? rcu_barrier+0x23/0x190
<4> [430.223013]  ? find_held_lock+0x2d/0x90
<4> [430.223045]  ? i915_gem_object_unbind+0x24a/0x3d0 [i915]
<4> [430.223048]  ? rcu_barrier+0x23/0x190
<4> [430.223049]  rcu_barrier+0x23/0x190
<4> [430.223081]  i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.223119]  i915_gem_shrink+0x297/0x5f0 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/743
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191208161252.3015727-1-chris@chris-wilson.co.uk
2019-12-09 10:49:38 +00:00
..
selftests drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_busy.c
i915_gem_clflush.c drm/i915: Generalise the clflush dma-worker 2019-08-22 08:27:44 +01:00
i915_gem_clflush.h
i915_gem_client_blt.c drm/i915: Drop struct_mutex from around i915_retire_requests() 2019-10-04 15:39:17 +01:00
i915_gem_client_blt.h
i915_gem_context_types.h drm/i915/gem: Excise the per-batch whitelist from the context 2019-11-28 11:39:50 +00:00
i915_gem_context.c drm/i915/gem: Pin gen6_ppgtt prior to constructing the request 2019-12-06 23:38:54 +00:00
i915_gem_context.h drm/i915/gem: Make context persistence optional 2019-10-29 21:02:52 +00:00
i915_gem_dmabuf.c drm/i915/gem: Distinguish each object type 2019-10-22 16:23:32 +01:00
i915_gem_domain.c drm/i915/gem: Avoid rcu_barrier() from shrinker paths 2019-12-09 10:49:38 +00:00
i915_gem_execbuffer.c drm/i915: Remove vestigal i915_gem_context locals from cmdparser 2019-12-05 10:27:29 +00:00
i915_gem_fence.c Merge drm/drm-next into drm-intel-next-queued 2019-08-22 00:10:36 -07:00
i915_gem_internal.c drm/i915/gem: Distinguish each object type 2019-10-22 16:23:32 +01:00
i915_gem_ioctls.h drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_lmem.c drm/i915/lmem: add the fake lmem region 2019-10-31 20:41:47 +00:00
i915_gem_lmem.h drm/i915/lmem: support kernel mapping 2019-10-25 22:55:43 +01:00
i915_gem_mman.c drm/i915/gem: Comment on inability to check args.pad for MMAP_OFFSET 2019-12-09 09:57:57 +00:00
i915_gem_mman.h drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_object_blt.c drm/i915/blt: fixup block_size rounding 2019-10-29 06:57:20 +00:00
i915_gem_object_blt.h
i915_gem_object_types.h drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_object.c drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_object.h drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_pages.c drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_phys.c drm/i915: Switch obj->mm.lock lockdep annotations on its head 2019-11-07 09:58:11 +01:00
i915_gem_pm.c drm/i915: Defer rc6 shutdown to suspend_late 2019-11-05 16:05:42 +02:00
i915_gem_pm.h drm/i915: Teach record_defaults to operate on the intel_gt 2019-10-22 20:43:07 +01:00
i915_gem_region.c drm/i915: treat shmem as a region 2019-10-18 12:41:03 +01:00
i915_gem_region.h drm/i915/region: support contiguous allocations 2019-10-08 20:50:01 +01:00
i915_gem_shmem.c drm/i915/gem: Distinguish each object type 2019-10-22 16:23:32 +01:00
i915_gem_shrinker.c drm/i915: Switch obj->mm.lock lockdep annotations on its head 2019-11-07 09:58:11 +01:00
i915_gem_shrinker.h
i915_gem_stolen.c drm/i915/gem: Pass mem region to preallocated stolen 2019-11-11 19:22:18 +00:00
i915_gem_stolen.h drm/i915: treat stolen as a region 2019-10-18 12:41:05 +01:00
i915_gem_throttle.c drm/i915: Keep drm_i915_file_private around under RCU 2019-08-23 22:13:17 +01:00
i915_gem_tiling.c drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET 2019-12-04 15:11:44 +00:00
i915_gem_userptr.c drm/i915/userptr: Try to acquire the page lock around set_page_dirty() 2019-11-12 10:22:58 +02:00
i915_gem_wait.c
i915_gemfs.c
i915_gemfs.h
Makefile