linux_dsm_epyc7002/drivers
Lyude Paul 7fec8f5379 drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
function provided by DRM as it's output_poll_changed callback.
Unfortunately however, this function doesn't grab runtime PM references
early enough and even if it did-we can't block waiting for the device to
resume in output_poll_changed() since it's very likely that we'll need
to grab the fb_helper lock at some point during the runtime resume
process. This currently results in deadlocking like so:

[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
           Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870

[  246.775522] =============================================

After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.

Changes since v7:
 - Fixup commit message - Daniel Vetter

Changes since v6:
 - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia

Changes since v5:
 - Come up with the (hopefully final) solution for solving this dumb
   problem, one that is a lot less likely to cause issues with locking in
   the future. This should work around all deadlock conditions with fbcon
   brought up thus far.

Changes since v4:
 - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
   condition that Lukas described
 - Just move all of this out of drm_fb_helper. It seems that other DRM
   drivers have already figured out other workarounds for this. If other
   drivers do end up needing this in the future, we can just move this
   back into drm_fb_helper again.

Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
  the following conditions are true (as otherwise, calling this in the
  wrong spot will cause Bad Things to happen):
  - fb_helper hotplug handling was actually inhibited previously
  - fb_helper actually has a delayed hotplug pending
  - fb_helper is actually bound
  - fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
  situation where a driver would actually want to use this without
  checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
  that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
  isn't enabled
- Actually grab the toplevel fb_helper lock in
  drm_fb_helper_resume_hotplug(), since it's possible other activity
  (such as a hotplug) could be going on at the same time the driver
  calls drm_fb_helper_resume_hotplug(). We need this to check whether or
  not drm_fb_helper_hotplug_event() needs to be called anyway

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
..
accessibility
acpi libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
amba
android
ata Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2018-08-24 13:20:33 -07:00
atm
auxdisplay Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
base Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
bcma
block Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
bluetooth Bluetooth: mediatek: pass correct size to h4_recv_buf() 2018-08-13 15:59:39 +02:00
bus ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
cdrom
char RTC for 4.19 2018-08-20 16:30:27 -07:00
clk ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
clocksource RISC-V Updates for the 4.19 Merge Window 2018-08-19 09:56:38 -07:00
connector
cpufreq ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
cpuidle More power management updates for 4.19-rc1 2018-08-22 07:42:36 -07:00
crypto Merge branch 'akpm' (patches from Andrew) 2018-08-23 19:20:12 -07:00
dax libnvdimm-for-4.19_dax-memory-failure 2018-08-25 18:43:59 -07:00
dca
devfreq Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
dio
dma Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
dma-buf
edac EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[] 2018-08-17 15:13:34 +02:00
eisa
extcon
firewire firewire: use 64-bit time_t based interfaces 2018-08-17 16:20:27 -07:00
firmware fbdev changes for v4.19: 2018-08-23 15:44:58 -07:00
fmc
fpga
fsi
gnss
gpio - New Drivers 2018-08-20 15:38:44 -07:00
gpu drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests 2018-09-07 06:54:26 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2018-08-20 15:59:01 -07:00
hsi
hv
hwmon ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
hwspinlock
hwtracing drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t 2018-08-23 18:48:43 -07:00
i2c i2c: don't use any __deprecated handling anymore 2018-08-24 17:26:43 +02:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide 2018-08-22 07:40:33 -07:00
idle
iio treewide: convert ISO_8859-1 text comments to utf-8 2018-08-23 18:48:43 -07:00
infiniband Second merge window update 2018-08-23 15:34:48 -07:00
input ARM: 32-bit SoC platform updates 2018-08-23 13:44:43 -07:00
iommu ARM: SoC: late updates 2018-08-25 14:12:36 -07:00
ipack
irqchip Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-26 09:55:28 -07:00
isdn isdn: Disable IIOCDBGVAR 2018-08-16 12:26:24 -07:00
leds
lightnvm
macintosh macintosh: therm_windtunnel: drop using attach_adapter 2018-08-24 14:42:42 +02:00
mailbox mailbox: Add support for i.MX messaging unit 2018-08-15 09:53:07 +05:30
mcb
md libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
media Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
memory ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
memstick
message
mfd Merge branch 'i2c/for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-08-21 17:40:46 -07:00
misc Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
mmc Merge branch 'asoc-4.19' into asoc-next 2018-08-09 14:47:05 +01:00
mtd This pull request contains updates for both UBI and UBIFS: 2018-08-23 15:58:04 -07:00
mux
net ARM: 32-bit SoC platform updates 2018-08-23 13:44:43 -07:00
nfc
ntb
nubus
nvdimm libnvdimm-for-4.19_dax-memory-failure 2018-08-25 18:43:59 -07:00
nvme Merge branch 'linus/master' into rdma.git for-next 2018-08-16 14:21:29 -06:00
nvmem
of Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-08-15 15:04:25 -07:00
opp
oprofile
parisc
parport Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
pci Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
phy
pinctrl - New Drivers 2018-08-20 15:38:44 -07:00
platform platform-drivers-x86 for v4.19-1 2018-08-22 14:14:15 -07:00
pnp
power treewide: convert ISO_8859-1 text comments to utf-8 2018-08-23 18:48:43 -07:00
powercap
pps
ps3
ptp Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00
pwm pwm: mediatek: Add MT7628 support 2018-08-20 11:36:07 +02:00
rapidio drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md 2018-08-22 10:52:51 -07:00
ras
regulator - New Drivers 2018-08-20 15:38:44 -07:00
remoteproc remoteproc/davinci: use the reset framework 2018-08-16 17:39:55 -07:00
reset ARM: SoC: late updates 2018-08-25 14:12:36 -07:00
rpmsg
rtc RTC for 4.19 2018-08-20 16:30:27 -07:00
s390 libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
sbus
scsi Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
sfi
sh
siox
slimbus
sn
soc ARM: Device-tree updates 2018-08-23 14:02:22 -07:00
soundwire
spi hwspinlock updates for v4.19 2018-08-18 16:45:27 -07:00
spmi
ssb ssb: Remove SSB_WARN_ON, SSB_BUG_ON and SSB_DEBUG 2018-08-09 18:47:47 +03:00
staging ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
target Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
tc
tee ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2018-08-24 13:03:51 -07:00
thunderbolt
tty powerpc fixes for 4.19 #2 2018-08-24 09:34:23 -07:00
uio Char/Misc fix for 4.19-rc1 2018-08-19 09:30:44 -07:00
usb ARM: SoC driver updates 2018-08-23 13:52:46 -07:00
uwb
vfio powerpc updates for 4.19 2018-08-17 11:32:50 -07:00
vhost virtio, vhost: fixes, tweaks 2018-08-24 08:45:19 -07:00
video fbdev changes for v4.19: 2018-08-23 15:44:58 -07:00
virt
virtio virtio, vhost: fixes, tweaks 2018-08-24 08:45:19 -07:00
visorbus
vlynq
vme
w1 power supply and reset changes for the v4.19 series 2018-08-21 18:06:27 -07:00
watchdog include/linux/compiler*.h: make compiler-*.h mutually exclusive 2018-08-22 17:31:34 -07:00
xen xen: fixes and cleanups for 4.19-rc1, second round 2018-08-23 14:52:23 -07:00
zorro
Kconfig
Makefile Char/Misc driver patches for 4.19-rc1 2018-08-18 11:04:51 -07:00