linux_dsm_epyc7002/drivers/gpu/drm
ozeng a50ecc54ff drm/amdgpu: Fixed a potential circular lock
The dead circular lock senario captured is as followed.
The idea of the fix is moving read_user_wptr outside of
acquire_queue...release_queue critical section

[   63.477482] WARNING: possible circular locking dependency detected
[   63.484091] 4.12.0-kfd-ozeng #3 Not tainted
[   63.488531] ------------------------------------------------------
[   63.495146] HelloWorldLoop/2526 is trying to acquire lock:
[   63.501011]  (&mm->mmap_sem){++++++}, at: [<ffffffff911898ce>] __might_fault+0x3e/0x90
[   63.509472]
               but task is already holding lock:
[   63.515716]  (&adev->srbm_mutex){+.+...}, at: [<ffffffffc0484feb>] lock_srbm+0x2b/0x50 [amdgpu]
[   63.525099]
               which lock already depends on the new lock.

[   63.533841]
               the existing dependency chain (in reverse order) is:
[   63.541839]
               -> #2 (&adev->srbm_mutex){+.+...}:
[   63.548178]        lock_acquire+0x6d/0x90
[   63.552461]        __mutex_lock+0x70/0x8c0
[   63.556826]        mutex_lock_nested+0x16/0x20
[   63.561603]        gfx_v8_0_kiq_resume+0x1039/0x14a0 [amdgpu]
[   63.567817]        gfx_v8_0_hw_init+0x204d/0x2210 [amdgpu]
[   63.573675]        amdgpu_device_init+0xdea/0x1790 [amdgpu]
[   63.579640]        amdgpu_driver_load_kms+0x63/0x220 [amdgpu]
[   63.585743]        drm_dev_register+0x145/0x1e0
[   63.590605]        amdgpu_pci_probe+0x11e/0x160 [amdgpu]
[   63.596266]        local_pci_probe+0x40/0xa0
[   63.600803]        pci_device_probe+0x134/0x150
[   63.605650]        driver_probe_device+0x2a1/0x460
[   63.610785]        __driver_attach+0xdc/0xe0
[   63.615321]        bus_for_each_dev+0x5f/0x90
[   63.619984]        driver_attach+0x19/0x20
[   63.624337]        bus_add_driver+0x40/0x270
[   63.628908]        driver_register+0x5b/0xe0
[   63.633446]        __pci_register_driver+0x5b/0x60
[   63.638586]        rtsx_pci_switch_output_voltage+0x1d/0x20 [rtsx_pci]
[   63.645564]        do_one_initcall+0x4c/0x1b0
[   63.650205]        do_init_module+0x56/0x1ea
[   63.654767]        load_module+0x208c/0x27d0
[   63.659335]        SYSC_finit_module+0x96/0xd0
[   63.664058]        SyS_finit_module+0x9/0x10
[   63.668629]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   63.674088]
               -> #1 (reservation_ww_class_mutex){+.+.+.}:
[   63.681257]        lock_acquire+0x6d/0x90
[   63.685551]        __ww_mutex_lock.constprop.11+0x8c/0xed0
[   63.691426]        ww_mutex_lock+0x67/0x70
[   63.695802]        amdgpu_verify_access+0x6d/0x100 [amdgpu]
[   63.701743]        ttm_bo_mmap+0x8e/0x100 [ttm]
[   63.706615]        amdgpu_bo_mmap+0xd/0x60 [amdgpu]
[   63.711814]        amdgpu_mmap+0x35/0x40 [amdgpu]
[   63.716904]        mmap_region+0x3b5/0x5a0
[   63.721255]        do_mmap+0x400/0x4d0
[   63.725260]        vm_mmap_pgoff+0xb0/0xf0
[   63.729625]        SyS_mmap_pgoff+0x19e/0x260
[   63.734292]        SyS_mmap+0x1d/0x20
[   63.738199]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   63.743681]
               -> #0 (&mm->mmap_sem){++++++}:
[   63.749641]        __lock_acquire+0x1401/0x1420
[   63.754491]        lock_acquire+0x6d/0x90
[   63.758750]        __might_fault+0x6b/0x90
[   63.763176]        kgd_hqd_load+0x24f/0x270 [amdgpu]
[   63.768432]        load_mqd+0x4b/0x50 [amdkfd]
[   63.773192]        create_queue_nocpsch+0x535/0x620 [amdkfd]
[   63.779237]        pqm_create_queue+0x34d/0x4f0 [amdkfd]
[   63.784835]        kfd_ioctl_create_queue+0x282/0x670 [amdkfd]
[   63.790973]        kfd_ioctl+0x310/0x4d0 [amdkfd]
[   63.795944]        do_vfs_ioctl+0x90/0x6e0
[   63.800268]        SyS_ioctl+0x74/0x80
[   63.804207]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   63.809607]
               other info that might help us debug this:

[   63.818026] Chain exists of:
                 &mm->mmap_sem --> reservation_ww_class_mutex --> &adev->srbm_mutex

[   63.830382]  Possible unsafe locking scenario:

[   63.836605]        CPU0                    CPU1
[   63.841364]        ----                    ----
[   63.846123]   lock(&adev->srbm_mutex);
[   63.850061]                                lock(reservation_ww_class_mutex);
[   63.857475]                                lock(&adev->srbm_mutex);
[   63.864084]   lock(&mm->mmap_sem);
[   63.867657]
                *** DEADLOCK ***

[   63.873884] 3 locks held by HelloWorldLoop/2526:
[   63.878739]  #0:  (&process->mutex){+.+.+.}, at: [<ffffffffc06e1a9a>] kfd_ioctl_create_queue+0x24a/0x670 [amdkfd]
[   63.889543]  #1:  (&dqm->lock){+.+...}, at: [<ffffffffc06eedeb>] create_queue_nocpsch+0x3b/0x620 [amdkfd]
[   63.899684]  #2:  (&adev->srbm_mutex){+.+...}, at: [<ffffffffc0484feb>] lock_srbm+0x2b/0x50 [amdgpu]
[   63.909500]
               stack backtrace:
[   63.914187] CPU: 3 PID: 2526 Comm: HelloWorldLoop Not tainted 4.12.0-kfd-ozeng #3
[   63.922184] Hardware name: AMD Carrizo/Gardenia, BIOS WGA5819N_Weekly_15_08_1 08/19/2015
[   63.930865] Call Trace:
[   63.933464]  dump_stack+0x85/0xc9
[   63.936999]  print_circular_bug+0x1f9/0x207
[   63.941442]  __lock_acquire+0x1401/0x1420
[   63.945745]  ? lock_srbm+0x2b/0x50 [amdgpu]
[   63.950185]  lock_acquire+0x6d/0x90
[   63.953885]  ? __might_fault+0x3e/0x90
[   63.957899]  __might_fault+0x6b/0x90
[   63.961699]  ? __might_fault+0x3e/0x90
[   63.965755]  kgd_hqd_load+0x24f/0x270 [amdgpu]
[   63.970577]  load_mqd+0x4b/0x50 [amdkfd]
[   63.974745]  create_queue_nocpsch+0x535/0x620 [amdkfd]
[   63.980242]  pqm_create_queue+0x34d/0x4f0 [amdkfd]
[   63.985320]  kfd_ioctl_create_queue+0x282/0x670 [amdkfd]
[   63.991021]  kfd_ioctl+0x310/0x4d0 [amdkfd]
[   63.995499]  ? kfd_ioctl_destroy_queue+0x70/0x70 [amdkfd]
[   64.001234]  do_vfs_ioctl+0x90/0x6e0
[   64.005065]  ? up_read+0x1a/0x40
[   64.008496]  SyS_ioctl+0x74/0x80
[   64.011955]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   64.016863] RIP: 0033:0x7f4b3bd35f07
[   64.020696] RSP: 002b:00007ffe7689ec38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   64.028786] RAX: ffffffffffffffda RBX: 00000000002a2000 RCX: 00007f4b3bd35f07
[   64.036414] RDX: 00007ffe7689ecb0 RSI: 00000000c0584b02 RDI: 0000000000000005
[   64.044045] RBP: 00007f4a3212d000 R08: 00007f4b3c919000 R09: 0000000000080000
[   64.051674] R10: 00007f4b376b64b8 R11: 0000000000000246 R12: 00007f4a3212d000
[   64.059324] R13: 0000000000000015 R14: 0000000000000064 R15: 00007ffe7689ef50

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-10-06 16:48:00 -04:00
..
amd drm/amdgpu: Fixed a potential circular lock 2017-10-06 16:48:00 -04:00
arc Merge tag 'drm-misc-next-2017-08-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2017-08-17 07:33:41 +10:00
arm drm: Nuke drm_atomic_helper_plane_set_property 2017-08-08 14:45:16 +02:00
armada drm: armada: remove dead empty functions 2017-08-04 11:35:34 +02:00
ast Merge tag 'drm-misc-next-2017-08-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2017-08-17 07:33:41 +10:00
atmel-hlcdc drm: Nuke drm_atomic_helper_plane_set_property 2017-08-08 14:45:16 +02:00
bochs drm/bochs: Use the drm_driver.dumb_destroy default 2017-08-16 20:18:55 +02:00
bridge Merge tag 'drm-misc-next-2017-08-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2017-08-17 07:33:41 +10:00
cirrus drm/cirrus: Use the drm_driver.dumb_destroy default 2017-08-16 20:14:22 +02:00
etnaviv Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next 2017-08-21 14:34:15 +10:00
exynos drm/exynos: simplify set_pixfmt() in DECON and FIMD drivers 2017-08-25 14:30:27 +09:00
fsl-dcu drm: Nuke drm_atomic_helper_connector_dpms 2017-08-08 14:48:48 +02:00
gma500 drm/gma500: fix potential NULL pointer dereference dereference 2017-08-18 09:10:46 +02:00
hisilicon drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate 2017-08-29 05:20:35 +10:00
i2c drm: Nuke drm_atomic_helper_connector_dpms 2017-08-08 14:48:48 +02:00
i810
i915 i915: Use drm_syncobj_fence_get 2017-08-29 06:20:31 +10:00
imx imx-drm: lock scanout transfers for consecutive bursts 2017-08-22 16:51:11 +10:00
lib
mediatek drm/mediatek: switch to drm_*_get(), drm_*_put() helpers 2017-08-11 11:35:02 -04:00
meson drm/meson: Use .dumb_map_offset and .dumb_destroy defaults 2017-08-16 20:11:43 +02:00
mga
mgag200 drm/mgag200: Use the drm_driver.dumb_destroy default 2017-08-16 20:18:22 +02:00
msm Merge tag 'drm-msm-next-2017-08-22' of git://people.freedesktop.org/~robclark/linux into drm-next 2017-08-25 09:29:45 +10:00
mxsfb drm/mxsfb: Use .dumb_map_offset and .dumb_destroy defaults 2017-08-16 20:12:19 +02:00
nouveau drm/nouveau/kms/nv50: perform null check on msto[i] rathern than msto 2017-08-22 18:04:36 +10:00
omapdrm drm/omap: work-around for omap3 display enable 2017-08-23 12:22:12 +03:00
panel
pl111 drm/pl111: Use drm_gem_fb_create() and drm_gem_fb_prepare_fb() 2017-08-16 21:35:38 +02:00
qxl drm/qxl: Use the drm_driver.dumb_destroy default 2017-08-16 20:15:38 +02:00
r128
radeon drm/radeon: make functions alloc_pasid and free_pasid static 2017-10-06 16:47:58 -04:00
rcar-du Merge tag 'drm-misc-next-2017-08-08' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2017-08-10 10:47:33 +10:00
rockchip Merge tag 'drm-misc-next-2017-08-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2017-08-17 07:33:41 +10:00
savage
selftests
shmobile
sis
sti drm: Nuke drm_atomic_helper_connector_dpms 2017-08-08 14:48:48 +02:00
stm drm: make DRM_STM default n 2017-08-10 11:26:49 +10:00
sun4i sun4i DRM changes for 4.14, take 2 2017-08-25 09:30:54 +10:00
tdfx
tegra drm/tegra: Changes for v4.14-rc1 2017-08-21 17:37:33 +10:00
tilcdc drm: Nuke drm_atomic_helper_connector_dpms 2017-08-08 14:48:48 +02:00
tinydrm drm/tinydrm: make function st7586_pipe_enable static 2017-08-16 21:39:26 +02:00
ttm drm/ttm: Remove TTM dma tracepoint since it's not required anymore 2017-09-26 15:14:06 -04:00
udl drm: udl: constify usb_device_id 2017-08-18 09:10:46 +02:00
vc4 drm/vc4: Use drm_gem_fb_create() 2017-08-16 21:35:57 +02:00
vgem drm/vgem: switch to drm_*_get(), drm_*_put() helpers 2017-08-11 11:41:43 -04:00
via
virtio drm/ttm: make ttm_mem_type_manager_func debug more useful 2017-08-17 15:45:59 -04:00
vmwgfx drm/vmwgfx: Bump the version for fence FD support 2017-08-28 17:53:32 +02:00
zte drm: Nuke drm_atomic_helper_connector_dpms 2017-08-08 14:48:48 +02:00
ati_pcigart.c
drm_agpsupport.c
drm_atomic_helper.c Merge airlied/drm-next into drm-intel-next-queued 2017-08-10 18:12:01 +02:00
drm_atomic.c drm: Nuke drm_atomic_legacy_backoff 2017-08-08 14:49:29 +02:00
drm_auth.c
drm_blend.c
drm_bridge.c
drm_bufs.c
drm_cache.c
drm_color_mgmt.c
drm_connector.c drm: Handle properties in the core for atomic drivers 2017-08-08 14:45:09 +02:00
drm_context.c
drm_crtc_helper_internal.h
drm_crtc_helper.c drm: Handle properties in the core for atomic drivers 2017-08-08 14:45:09 +02:00
drm_crtc_internal.h drm: Handle properties in the core for atomic drivers 2017-08-08 14:45:09 +02:00
drm_crtc.c drm: Handle properties in the core for atomic drivers 2017-08-08 14:45:09 +02:00
drm_debugfs_crc.c
drm_debugfs.c
drm_dma.c
drm_dp_aux_dev.c
drm_dp_dual_mode_helper.c
drm_dp_helper.c
drm_dp_mst_topology.c
drm_drv.c drm: Clean up drm_dev_unplug 2017-08-11 10:49:21 +02:00
drm_dumb_buffers.c
drm_edid_load.c
drm_edid.c
drm_encoder_slave.c
drm_encoder.c
drm_fb_cma_helper.c drm/fb-cma-helper: Use drm_gem_framebuffer_helper 2017-08-16 21:34:38 +02:00
drm_fb_helper.c drm/fb-helper: pass physical dimensions to fbdev 2017-08-07 17:01:15 +02:00
drm_file.c drm: Document device unplug infrastructure 2017-08-11 10:48:03 +02:00
drm_flip_work.c
drm_fourcc.c
drm_framebuffer.c
drm_gem_cma_helper.c drm/gem-cma-helper: Remove drm_gem_cma_dumb_map_offset() 2017-08-16 20:21:24 +02:00
drm_gem_framebuffer_helper.c drm: Add GEM backed framebuffer library 2017-08-16 21:32:23 +02:00
drm_gem.c drm: Document device unplug infrastructure 2017-08-11 10:48:03 +02:00
drm_global.c
drm_hashtab.c
drm_info.c
drm_internal.h drm/syncobj: Add a signal ioctl (v3) 2017-08-29 10:16:25 +10:00
drm_ioc32.c
drm_ioctl.c drm/syncobj: Add a signal ioctl (v3) 2017-08-29 10:16:25 +10:00
drm_irq.c
drm_kms_helper_common.c
drm_legacy.h
drm_lock.c
drm_memory.c
drm_mipi_dsi.c
drm_mm.c
drm_mode_config.c
drm_mode_object.c drm: Handle properties in the core for atomic drivers 2017-08-08 14:45:09 +02:00
drm_modes.c
drm_modeset_helper.c
drm_modeset_lock.c
drm_of.c
drm_panel.c
drm_pci.c
drm_plane_helper.c
drm_plane.c drm: Shift wrap bug in create_in_format_blob() 2017-08-09 10:15:52 -04:00
drm_prime.c
drm_print.c
drm_probe_helper.c
drm_property.c
drm_rect.c
drm_scatter.c
drm_scdc_helper.c
drm_simple_kms_helper.c
drm_syncobj.c drm/syncobj: add a new helper drm_syncobj_get_fd 2017-10-06 16:47:53 -04:00
drm_sysfs.c
drm_trace_points.c
drm_trace.h
drm_vblank.c
drm_vm.c drm: Document device unplug infrastructure 2017-08-11 10:48:03 +02:00
drm_vma_manager.c
Kconfig drm/amdgpu: Track pending retry faults in IH and VM (v2) 2017-09-26 14:53:20 -04:00
Makefile drm/amd: Closed hash table with low overhead (v2) 2017-09-26 14:53:19 -04:00