Commit Graph

4807 Commits

Author SHA1 Message Date
Ben Skeggs
3483f08106 drm/nouveau/devinit: fix warning when PMU/PRE_OS is missing
Messed up when sending pull request and sent an outdated version of
previous patch, this fixes it up to remove warnings.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-13 10:56:58 +10:00
Ben Skeggs
53b0cc46f2 drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels
Fixes eDP backlight issues on more recent laptops.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
e04cfdc9b7 drm/nouveau/disp: fix DP disable race
If a HPD pulse signalling the need to retrain the link occurs between
the KMS driver releasing the output and the supervisor interrupt that
finishes the teardown, it was possible get a NULL-ptr deref.

Avoid this by marking the link as inactive earlier.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
f6d52b2172 drm/nouveau/disp: move eDP panel power handling
We need to do this earlier to prevent aux channel timeouts in resume
paths on certain systems.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
606557708f drm/nouveau/disp: remove unused struct member
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
0a6986c659 drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
This Falcon application doesn't appear to be present on some newer
systems, so let's not fail init if we can't find it.

TBD: is there a way to determine whether it *should* be there?

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
51ed833c88 drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer
Fixes oopses in certain failure paths.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
a43b16dda2 drm/nouveau: fix oops in client init failure path
The NV_ERROR macro requires drm->client to be initialised, which it may not
be at this stage of the init process.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
d5986a1c4d drm/nouveau: Fix nouveau_connector_ddc_detect()
It looks like that when we moved over to using
drm_connector_for_each_possible_encoder() in nouveau, that one rather
important part of this function got dropped by accident:

	/*          Right   v   here */
	for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) {
		int id = connector->encoder_ids[i];
		if (id == 0)
			break;

Since it's rather difficult to notice: the conditional in this loop is
actually:

	nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER

Meaning that all early breaks result in nv_encoder keeping it's value,
otherwise nv_encoder = NULL. Ugh.

Since this got dropped, nouveau_connector_ddc_detect() now returns an
encoder for every single connector, regardless of whether or not it's
detected:

    [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2

So: fix this to ensure we only return an encoder if we actually found
one, and clean up the rest of the function while we're at it since it's
nearly impossible to read properly.

Changes since v1:
- Don't skip ddc probing for LVDS if we can't switch DDC through
  vga-switcheroo, just do the DDC probing without calling
  vga_switcheroo_lock_ddc() - skeggsb

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: ddba766dd0 ("drm/nouveau: Use drm_connector_for_each_possible_encoder()")
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
2f7ca781fd drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
Currently, there's nothing in nouveau that actually cancels this work
struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky
enough hpd_work might try to keep running up until the system is
suspended.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
79e765ad66 drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
On most systems with ACPI hotplugging support, it seems that we always
receive a hotplug event once we re-enable EC interrupts even if the GPU
hasn't even been resumed yet.

This can cause problems since even though we schedule hpd_work to handle
connector reprobing for us, hpd_work synchronizes on
pm_runtime_get_sync() to wait until the device is ready to perform
reprobing. Since runtime suspend/resume callbacks are disabled before
the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during
this period will grab a runtime PM ref and return immediately with
-EACCES. Because we schedule hpd_work from our ACPI HPD handler, and
hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch
a connector reprobe immediately even if the GPU isn't actually resumed
just yet. This causes various warnings in dmesg and occasionally, also
prevents some displays connected to the dedicated GPU from coming back
up after suspend. Example:

usb 1-4: USB disconnect, device number 14
usb 1-4.1: USB disconnect, device number 15
WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau]
CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1
Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018
Workqueue: events nouveau_display_hpd_work [nouveau]
RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau]
RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000
RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002
RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000
R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418
FS:  0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? _cond_resched+0x15/0x30
 nouveau_connector_detect+0x2ce/0x520 [nouveau]
 ? _cond_resched+0x15/0x30
 ? ww_mutex_lock+0x12/0x40
 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper]
 drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
 nouveau_display_hpd_work+0x2a/0x60 [nouveau]
 process_one_work+0x187/0x340
 worker_thread+0x2e/0x380
 ? pwq_unbound_release_workfn+0xd0/0xd0
 kthread+0x112/0x130
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x35/0x40
Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff
---[ end trace 55d811b38fc8e71a ]---

So, to fix this we attempt to grab a runtime PM reference in the ACPI
handler itself asynchronously. If the GPU is already awake (it will have
normal hotplugging at this point) or runtime PM callbacks are currently
disabled on the device, we drop our reference without updating the
autosuspend delay. We only schedule connector reprobes when we
successfully managed to queue up a resume request with our asynchronous
PM ref.

This also has the added benefit of preventing redundant connector
reprobes from ACPI while the GPU is runtime resumed!

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <kherbst@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
fa3cdf8d0b drm/nouveau: Reset MST branching unit before enabling
When probing a new MST device, it's not safe to make any assumptions
about it's current state. While most well mannered MST hubs will just
disable the branching unit on hotplug disconnects, this isn't enough to
save us from various other scenarios that might have resulted in
something writing to the MST branching unit before we got control of it.
This could happen if a previous probe we tried failed, if we're booting
in kexec context and the hub is still in the state the last kernel put
it in, etc.

Luckily; there is no reason we can't just reset the branching unit
every time we enable a new topology. So, fix this by resetting it on
enabling new topologies to ensure that we always start off with a clean,
unmodified topology state on MST sinks.

This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX
times out all DPCD trasactions) observed after multiple docks, undocks,
and module reloads.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
b26b4590dd drm/nouveau: Only write DP_MSTM_CTRL when needed
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST
hub every time it receives a long HPD pulse on DP. This isn't actually
necessary and additionally, has some unintended side effects.

With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to
make it rather likely (1 out of 5 times usually) that bringing up MST
with it's ThinkPad dock will fail and result in sideband messages timing
out in the middle. Afterwards, successive probes don't manage to get the
dock to communicate properly over MST sideband properly.

Many times sideband message timeouts from MST hubs are indicative of
either the source or the sink dropping an ESI event, which can cause
DRM's perspective of the topology's current state to go out of sync with
reality. While it's tough to really know for sure what's happening to
the dock, using userspace tools to write to DP_MSTM_CTRL in the middle
of the MST link probing process does appear to make things flaky. It's
possible that when we write to DP_MSTM_CTRL, the function that gets
triggered to respond in the dock's firmware temporarily puts it in a
state where it might end up not reporting an ESI to the source, or ends
up dropping a sideband message we sent it.

So, to fix this we make it so that when probing an MST topology, we
respect it's current state. If the dock's already enabled, we simply
read DP_MSTM_CTRL and disable the topology if it's value is not what we
expected. Otherwise, we perform the normal MST probing dance. We avoid
taking any action except if the state of the MST topology actually
changes.

This fixes MST sideband message timeouts and detection failures on my
P50 with its ThinkPad dock.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
7326ead982 drm/nouveau: Remove useless poll_enable() call in drm_load()
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have
already been called in nouveau_display_init()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
0d7b2d4def drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
This won't do anything but potentially make us miss hotplugs. We already
call drm_kms_helper_poll_disable() in
nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
0445f7537d drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
This doesn't do anything, drm_kms_helper_poll_enable() gets called in
nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init()
already.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
3e1a12754d drm/nouveau: Fix deadlocks in nouveau_connector_detect()
When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this by making it so that normal hotplug handling /only/ happens
so long as the GPU is currently awake without any pending runtime PM
requests. In the event that a hotplug occurs while the device is
suspending or resuming, we can simply defer our response until the GPU
is fully runtime resumed again.

Changes since v4:
- Use a new trick I came up with using pm_runtime_get() instead of the
  hackish junk we had before

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
6833fb1ec1 drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
7fec8f5379 drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
function provided by DRM as it's output_poll_changed callback.
Unfortunately however, this function doesn't grab runtime PM references
early enough and even if it did-we can't block waiting for the device to
resume in output_poll_changed() since it's very likely that we'll need
to grab the fb_helper lock at some point during the runtime resume
process. This currently results in deadlocking like so:

[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
           Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870

[  246.775522] =============================================

After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.

Changes since v7:
 - Fixup commit message - Daniel Vetter

Changes since v6:
 - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia

Changes since v5:
 - Come up with the (hopefully final) solution for solving this dumb
   problem, one that is a lot less likely to cause issues with locking in
   the future. This should work around all deadlock conditions with fbcon
   brought up thus far.

Changes since v4:
 - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
   condition that Lukas described
 - Just move all of this out of drm_fb_helper. It seems that other DRM
   drivers have already figured out other workarounds for this. If other
   drivers do end up needing this in the future, we can just move this
   back into drm_fb_helper again.

Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
  the following conditions are true (as otherwise, calling this in the
  wrong spot will cause Bad Things to happen):
  - fb_helper hotplug handling was actually inhibited previously
  - fb_helper actually has a delayed hotplug pending
  - fb_helper is actually bound
  - fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
  situation where a driver would actually want to use this without
  checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
  that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
  isn't enabled
- Actually grab the toplevel fb_helper lock in
  drm_fb_helper_resume_hotplug(), since it's possible other activity
  (such as a hotplug) could be going on at the same time the driver
  calls drm_fb_helper_resume_hotplug(). We need this to check whether or
  not drm_fb_helper_hotplug_event() needs to be called anyway

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
611ce85542 drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
Since actual hotplug notifications don't get disabled until
nouveau_display_fini() is called, all this will do is cause any hotplugs
that happen between this drm_kms_helper_poll_disable() call and the
actual hotplug disablement to potentially be dropped if ACPI isn't
around to help us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
d77ef138ff drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
Turns out this part is my fault for not noticing when reviewing
9a2eba337c ("drm/nouveau: Fix drm poll_helper handling"). Currently
we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
This makes basically no sense however, because that means we're calling
drm_kms_helper_poll_enable() every time we schedule the hotplug
detection work. This is also against the advice mentioned in
drm_kms_helper_poll_enable()'s documentation:

 Note that calls to enable and disable polling must be strictly ordered,
 which is automatically the case when they're only call from
 suspend/resume callbacks.

Of course, hotplugs can't really be ordered. They could even happen
immediately after we called drm_kms_helper_poll_disable() in
nouveau_display_fini(), which can lead to all sorts of issues.

Additionally; enabling polling /after/ we call
drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
probe connectors so long as polling is disabled.

So; simply move this back into nouveau_display_init() again. The race
condition that both of these patches attempted to work around has
already been fixed properly in

  d61a5c1063 ("drm/nouveau: Fix deadlock on runtime suspend")

Fixes: 9a2eba337c ("drm/nouveau: Fix drm poll_helper handling")
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Dave Airlie
3fce461827 BackMerge v4.18-rc7 into drm-next
rmk requested this for armada and I think we've had a few
conflicts build up.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-07-30 10:39:22 +10:00
Dave Airlie
294f96ae8a drm-misc-next for 4.19:
Core Changes:
 - add support for DisplayPort CEC-Tunneling-over-AUX (Hans Verkuil)
 - more doc updates (Daniel Vetter)
 - fourcc: Add is_yuv field to drm_format_info (Ayan Kumar Halder)
 - dma-buf: correctly place BUG_ON (Michel Dänzer)
 
 Driver Changes:
 - more vkms support(Rodrigo Siqueira)
 - many fixes and small improments to all drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbT52JAAoJEEN0HIUfOBk06UsQAIy5YwUQ9l+8GdS5bKU299KW
 ZMMi0pTgB/bg0uuqGqN1zf23kpyRTNBGu2UMZgHWTcM4gjTP9qxb5GPFyOhr5he4
 pkp0p13fcn85Mkpt6ZQQD4ErMnhJSodzPRRT+ypnM+HzcWWehQOnSbLWCTOpaCeg
 5SsSFT7RfpDcICXzZZKAHFwHAp1y1y6V027RWu0/amUTwoZPn+ktU/s0thGIdqFk
 EGb/dP4K0PAHE4ZnhZOHPFlYbVQWp0J8X7+NmkXvPgwVPahLvKbNMBfG9M3mGcku
 cMwW8phngd0ih9gd1rblG3J8pdISArg6EgqAwwUV6p8tHUBQff5mL/RTh5zrUs6D
 seLqzRM4V74WDp2meMSYogISo2b+39RiL1IhayTytdW/oaterXloSChAwKUz4pi/
 Nj3/Kn59m9DH9NoAh3DYvDg+e06U9csR6TUJZ0B6BlXIwju9/QLybsDbUdmjtSW+
 yqttEs8m4k2gB2ZRo9y2RVi/XCNv0t+GYa2HQcTGrYVZpIxKioT6WdnlobQZ6L2E
 9CClacN6v2e27cQUbZEFuU7phUkM/nw18dROFrIwJ0OxsA5nElO1DTiOy+KDwzAU
 E+l4DqZZknyxEfTxUq79+9J2HmhqA7ikQbgNJMQyQ25iRFrkvYsI7XfF4ix5z+a5
 I0/CkPP3UsTibnVhM7wn
 =HyBh
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2018-07-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 4.19:

Core Changes:
- add support for DisplayPort CEC-Tunneling-over-AUX (Hans Verkuil)
- more doc updates (Daniel Vetter)
- fourcc: Add is_yuv field to drm_format_info (Ayan Kumar Halder)
- dma-buf: correctly place BUG_ON (Michel Dänzer)

Driver Changes:
- more vkms support(Rodrigo Siqueira)
- many fixes and small improments to all drivers

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180718200826.GA20165@juma
2018-07-20 10:46:49 +10:00
Dave Airlie
090cbdd073 Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-next
misc fixes and cleanups for next.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv55CfRonQ0bo2XiitkCiWTjKwhsP=+ZFhoa-BaJ72Ryew@mail.gmail.com
2018-07-20 10:34:33 +10:00
Dave Airlie
02e546eacc Merge branch 'linux-4.18' of git://github.com/skeggsb/linux into drm-fixes
- fix problem with pascal and large memory systems
- fix a bunch of MST problems
- fix a runtime PM interaction with MST

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv79O8deSts2fxJ_oS6=q8yA+OgwBSEpp5R=BQBmWa+oyg@mail.gmail.com
2018-07-20 10:27:53 +10:00
Ben Skeggs
d00ddd9da7 drm/nouveau/kms/nv50-: allocate push buffers in vidmem on pascal
Workaround for issues seen on systems with large amounts of RAM, caused
by display not supporting the same physical address limits as the other
parts of the GPU.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-19 14:38:09 +10:00
Ben Skeggs
2f958e8240 drm/nouveau/fb/gp100-: disable address remapper
This was causing problems on a system with a large amount of RAM, where
display push buffers were being fetched incorrectly when placed in high
system memory addresses.

While this commit will resolve the issue on that particular system, the
issue will be avoided completely with another patch to more fully solve
problems with display and large amounts of system memory on Pascal.

It's still probably a good idea to disable this to prevent weird issues
in the future.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-19 14:36:51 +10:00
Thierry Reding
b59fb482b5 drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping
Depending on the kernel configuration, early ARM architecture setup code
may have attached the GPU to a DMA/IOMMU mapping that transparently uses
the IOMMU to back the DMA API. Tegra requires special handling for IOMMU
backed buffers (a special bit in the GPU's MMU page tables indicates the
memory path to take: via the SMMU or directly to the memory controller).
Transparently backing DMA memory with an IOMMU prevents Nouveau from
properly handling such memory accesses and causes memory access faults.

As a side-note: buffers other than those allocated in instance memory
don't need to be physically contiguous from the GPU's perspective since
the GPU can map them into contiguous buffers using its own MMU. Mapping
these buffers through the IOMMU is unnecessary and will even lead to
performance degradation because of the additional translation. One
exception to this are compressible buffers which need large pages. In
order to enable these large pages, multiple small pages will have to be
combined into one large (I/O virtually contiguous) mapping via the
IOMMU. However, that is a topic outside the scope of this fix and isn't
currently supported. An implementation will want to explicitly create
these large pages in the Nouveau driver, so detaching from a DMA/IOMMU
mapping would still be required.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:36 +10:00
Kees Cook
0d46690155 drm/nouveau/secboot/acr: Remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the working buffers before starting the writing so it won't
abort in the middle. This needs an initial walk of the lists to figure
out how large the buffer should be.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:30 +10:00
Thomas Zimmermann
94a0b8634f drm/nouveau: Replace drm_dev_unref with drm_dev_put
This patch unifies the naming of DRM functions for reference counting
of struct drm_device. The resulting code is more aligned with the rest
of the Linux kernel interfaces.

Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:30 +10:00
Thomas Zimmermann
743e0f079a drm/nouveau: Replace drm_gem_object_unreference_unlocked with put function
This patch unifies the naming of DRM functions for reference counting
of struct drm_gem_object. The resulting code is more aligned with the
rest of the Linux kernel interfaces.

Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Thomas Zimmermann
f066f79507 drm/nouveau: Replace drm_framebuffer_{un/reference} with put, get functions
This patch unifies the naming of DRM functions for reference counting
of struct drm_framebuffer. The resulting code is more aligned with the
rest of the Linux kernel interfaces.

Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Nick Desaulniers
c9fb2cc84c drm/nouveau/nvif: remove const attribute from nvif_mclass
Similar to commit 0bf8bf50ed ("module: Remove
const attribute from alias for MODULE_DEVICE_TABLE")

Fixes many -Wduplicate-decl-specifier warnings due to the combination of
const typeof() of already const variables.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Dan Carpenter
da71f0efe7 drm/nouveau/hwmon: potential uninitialized variables
Smatch complains that "value" can be uninitialized when kstrtol()
returns -ERANGE.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Lyude Paul
922a8c82fa drm/nouveau: Fix runtime PM leak in drm_open()
Noticed this as I was skimming through, if we fail to allocate memory
for cli we'll end up returning without dropping the runtime PM ref we
got. Additionally, we'll even return the wrong return code! (ret most
likely will == 0 here, we want -ENOMEM).

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Karol Herbst
eaeb9010bb drm/nouveau/debugfs: Wake up GPU before doing any reclocking
Fixes various reclocking related issues on prime systems.

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Karol Herbst
f706037c4e drm/nouveau/bios/vpstate: There are some fermi vbios with no boost or tdp entry
If the entry size is too small, default to invalid values for both
boost_id and tdp_id, so as to default to the base clock in both cases.

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Mario Kleiner
2ae4c5f6ff drm/nouveau/kms/nv50-: Allow vblank_disable_immediate
With instantaneous high precision vblank timestamping
that updates at leading edge of vblank, the emulated
"hw vblank counter" from vblank timestamping, which
increments at leading edge of vblank, and reliable
page flip execution and completion at leading edge of
vblank, we should meet the requirements for fast/
immediate vblank irq disable/enable.

This is only allowed on nv50+ gpu's, ie. the ones with
atomic modesetting. One requirement for immediate vblank
disable is that high precision vblank timestamping works
reliably all the time on all connectors. This is not the
case on all pre-nv50 parts for analog VGA outputs, where we
currently don't always have support for scanout position
queries and therefore fall back to vblank interrupt
timestamping. The implementation in nv04_head_state() does
not return valid values for vblanks, vtotal, hblanks, htotal
for VGA outputs on all cards, but those are needed for scanout
position queries.

Testing on Linux-4.12-rc5 + drm-next on a GeForce 9500 GT
(NV G96) with timing measurement equipment indicates this
works fine, so allow immediate vblank disable for power
saving.

For debugging in case of unexpected trouble, booting
with kernel cmdline option drm.vblankoffdelay=0
(or echo 0 > /sys/module/drm/parameters/vblankoffdelay)
would keep vblank irqs permanently on to approximate old
behavior.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Ben Skeggs
6ec7aecf1f drm/nouveau/kms/nv50-: remove duplicate assignment
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
kbuild test robot
01981aeb47 drm/nouveau/kms/nv50-: fix drm-get-put.cocci warnings
Use drm_*_get() and drm_*_put() helpers instead of drm_*_reference() and
 drm_*_unreference() helpers.

Generated by: scripts/coccinelle/api/drm-get-put.cocci

Fixes: 30ed49b55b ("drm/nouveau/kms/nv50-: move code underneath dispnv50/")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Ben Skeggs
7a26c92367 drm/nouveau/disp/nv50-gp10x: fix coverity warning
Change values to u32, there's no need for them to be 64-bit.

Reported-by: Colin Ian King <colin.king@canonical.com>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Ben Skeggs
f7fbbf2cca drm/nouveau/core: ERR_PTR vs NULL bug in nvkm_engine_info()
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Jérôme Glisse
f0fffeeb14 drm/nouveau/mmu/gp10b: remove ghost file
This ghost file have been haunting us.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Nicolas Chauvet
3c9f27eeed drm/nouveau/secboot/tegra: Enable gp20b/gp10b firmware tag when relevant
This allows to have the related MODULE_FIRMWARE tag only
on relevant arch (arm64).
This will saves about 400k on initramfs when not relevant

Signed-off-by: Nicolas Chauvet <kwizart@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:29 +10:00
Ben Skeggs
60cda66572 drm/nouveau/fault/gv100: fix fault buffer initialisation
Not sure how this happened, it worked last time I tested it!

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:28 +10:00
Ben Skeggs
bdf4424dc3 drm/nouveau/gr/gv100: handle multiple SM-per-TPC for shader exceptions
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 18:06:28 +10:00
Lyude Paul
eb493fbc15 drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
Currently nouveau doesn't actually expose the state debugfs file that's
usually provided for any modesetting driver that supports atomic, even
if nouveau is loaded with atomic=1. This is due to the fact that the
standard debugfs files that DRM creates for atomic drivers is called
when drm_get_pci_dev() is called from nouveau_drm.c. This happens well
before we've initialized the display core, which is currently
responsible for setting the DRIVER_ATOMIC cap.

So, move the atomic option into nouveau_drm.c and just add the
DRIVER_ATOMIC cap whenever it's enabled on the kernel commandline. This
shouldn't cause any actual issues, as the atomic ioctl will still fail
as expected even if the display core doesn't disable it until later in
the init sequence. This also provides the added benefit of being able to
use the state debugfs file to check the current display state even if
clients aren't allowed to modify it through anything other than the
legacy ioctls.

Additionally, disable the DRIVER_ATOMIC cap in nv04's display core, as
this was already disabled there previously.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 17:59:59 +10:00
Lyude Paul
68fe23a626 drm/nouveau: Remove bogus crtc check in pmops_runtime_idle
This both uses the legacy modesetting structures in a racy manner, and
additionally also doesn't even check the right variable (enabled != the
CRTC is actually turned on for atomic).

This fixes issues on my P50 regarding the dedicated GPU not entering
runtime suspend.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 17:59:59 +10:00
Lyude Paul
e5d54f1935 drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()
A CRTC being enabled doesn't mean it's on! It doesn't even necessarily
mean it's being used. This fixes runtime PM leaks on the P50 I've got
next to me.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 17:59:59 +10:00
Lyude Paul
37afe55b4a drm/nouveau: Avoid looping through fake MST connectors
When MST and atomic were introduced to nouveau, another structure that
could contain a drm_connector embedded within it was introduced; struct
nv50_mstc. This meant that we no longer would be able to simply loop
through our connector list and assume that nouveau_connector() would
return a proper pointer for each connector, since the assertion that
all connectors coming from nouveau have a full nouveau_connector struct
became invalid.

Unfortunately, none of the actual code that looped through connectors
ever got updated, which means that we've been causing invalid memory
accesses for quite a while now.

An example that was caught by KASAN:

[  201.038698] ==================================================================
[  201.038792] BUG: KASAN: slab-out-of-bounds in nvif_notify_get+0x190/0x1a0 [nouveau]
[  201.038797] Read of size 4 at addr ffff88076738c650 by task kworker/0:3/718
[  201.038800]
[  201.038822] CPU: 0 PID: 718 Comm: kworker/0:3 Tainted: G           O      4.18.0-rc4Lyude-Test+ #1
[  201.038825] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  201.038882] Workqueue: events nouveau_display_hpd_work [nouveau]
[  201.038887] Call Trace:
[  201.038894]  dump_stack+0xa4/0xfd
[  201.038900]  print_address_description+0x71/0x239
[  201.038929]  ? nvif_notify_get+0x190/0x1a0 [nouveau]
[  201.038935]  kasan_report.cold.6+0x242/0x2fe
[  201.038942]  __asan_report_load4_noabort+0x19/0x20
[  201.038970]  nvif_notify_get+0x190/0x1a0 [nouveau]
[  201.038998]  ? nvif_notify_put+0x1f0/0x1f0 [nouveau]
[  201.039003]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
[  201.039049]  nouveau_display_init.cold.12+0x34/0x39 [nouveau]
[  201.039089]  ? nouveau_user_framebuffer_create+0x120/0x120 [nouveau]
[  201.039133]  nouveau_display_resume+0x5c0/0x810 [nouveau]
[  201.039173]  ? nvkm_client_ioctl+0x20/0x20 [nouveau]
[  201.039215]  nouveau_do_resume+0x19f/0x570 [nouveau]
[  201.039256]  nouveau_pmops_runtime_resume+0xd8/0x2a0 [nouveau]
[  201.039264]  pci_pm_runtime_resume+0x130/0x250
[  201.039269]  ? pci_restore_standard_config+0x70/0x70
[  201.039275]  __rpm_callback+0x1f2/0x5d0
[  201.039279]  ? rpm_resume+0x560/0x18a0
[  201.039283]  ? pci_restore_standard_config+0x70/0x70
[  201.039287]  ? pci_restore_standard_config+0x70/0x70
[  201.039291]  ? pci_restore_standard_config+0x70/0x70
[  201.039296]  rpm_callback+0x175/0x210
[  201.039300]  ? pci_restore_standard_config+0x70/0x70
[  201.039305]  rpm_resume+0xcc3/0x18a0
[  201.039312]  ? rpm_callback+0x210/0x210
[  201.039317]  ? __pm_runtime_resume+0x9e/0x100
[  201.039322]  ? kasan_check_write+0x14/0x20
[  201.039326]  ? do_raw_spin_lock+0xc2/0x1c0
[  201.039333]  __pm_runtime_resume+0xac/0x100
[  201.039374]  nouveau_display_hpd_work+0x67/0x1f0 [nouveau]
[  201.039380]  process_one_work+0x7a0/0x14d0
[  201.039388]  ? cancel_delayed_work_sync+0x20/0x20
[  201.039392]  ? lock_acquire+0x113/0x310
[  201.039398]  ? kasan_check_write+0x14/0x20
[  201.039402]  ? do_raw_spin_lock+0xc2/0x1c0
[  201.039409]  worker_thread+0x86/0xb50
[  201.039418]  kthread+0x2e9/0x3a0
[  201.039422]  ? process_one_work+0x14d0/0x14d0
[  201.039426]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  201.039431]  ret_from_fork+0x3a/0x50
[  201.039441]
[  201.039444] Allocated by task 79:
[  201.039449]  save_stack+0x43/0xd0
[  201.039452]  kasan_kmalloc+0xc4/0xe0
[  201.039456]  kmem_cache_alloc_trace+0x10a/0x260
[  201.039494]  nv50_mstm_add_connector+0x9a/0x340 [nouveau]
[  201.039504]  drm_dp_add_port+0xff5/0x1fc0 [drm_kms_helper]
[  201.039511]  drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
[  201.039518]  drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
[  201.039525]  drm_dp_mst_link_probe_work+0x71/0xb0 [drm_kms_helper]
[  201.039529]  process_one_work+0x7a0/0x14d0
[  201.039533]  worker_thread+0x86/0xb50
[  201.039537]  kthread+0x2e9/0x3a0
[  201.039541]  ret_from_fork+0x3a/0x50
[  201.039543]
[  201.039546] Freed by task 0:
[  201.039549] (stack is not available)
[  201.039551]
[  201.039555] The buggy address belongs to the object at ffff88076738c1a8
                                 which belongs to the cache kmalloc-2048 of size 2048
[  201.039559] The buggy address is located 1192 bytes inside of
                                 2048-byte region [ffff88076738c1a8, ffff88076738c9a8)
[  201.039563] The buggy address belongs to the page:
[  201.039567] page:ffffea001d9ce200 count:1 mapcount:0 mapping:ffff88084000d0c0 index:0x0 compound_mapcount: 0
[  201.039573] flags: 0x8000000000008100(slab|head)
[  201.039578] raw: 8000000000008100 ffffea001da3be08 ffffea001da25a08 ffff88084000d0c0
[  201.039582] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[  201.039585] page dumped because: kasan: bad access detected
[  201.039588]
[  201.039591] Memory state around the buggy address:
[  201.039594]  ffff88076738c500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  201.039598]  ffff88076738c580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  201.039601] >ffff88076738c600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  201.039604]                                                  ^
[  201.039607]  ffff88076738c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  201.039611]  ffff88076738c700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  201.039613] ==================================================================

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16 17:59:59 +10:00