Commit Graph

48044 Commits

Author SHA1 Message Date
Thomas Hellstrom
e71cf59187 drm/vmwgfx: Fix buffer object eviction
Commit 19be557010 ("drm/ttm: add operation ctx to ttm_bo_validate v2")
introduced a regression where the vmwgfx driver refused to evict a
buffer that was still busy instead of waiting for it to become idle.

Fix this.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2018-09-20 08:05:14 +02:00
Deepak Rawat
a4bd815a94 drm/vmwgfx: Don't impose STDU limits on framebuffer size
If framebuffers are larger, we create bounce surfaces that are within
STDU limits.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
140b4e67c2 drm/vmwgfx: limit mode size for all display unit to texture_max
For all display units, limit mode size exposed to texture_max_width/
height as this is the maximum framebuffer size that virtual device can
create.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
0c1b174b1b drm/vmwgfx: limit screen size to stdu_max during check_modeset
For STDU individual screen target size is limited by
SVGA_REG_SCREENTARGET_MAX_WIDTH/HEIGHT registers so add that limit
during atomic check_modeset.

An additional limit is placed in the update_layout ioctl to avoid
requesting layouts that current user-space typically can't support.
Also modified the comments to reflect current limitation on topology.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:03 +02:00
Deepak Rawat
bfc8882614 drm/vmwgfx: don't check for old_crtc_state enable status
During atomic check to prepare the new topology no need to check if
old_crtc_state was enabled or not. This will cause atomic_check to fail
because due to connector routing a crtc can be in atomic_state even if
there was no change to enable status.

Detected this issue with igt run.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2018-09-20 08:00:02 +02:00
Dave Airlie
8ca4fff974 Only fixes coming from gvt containing "Two more BXT fixes from Colin,
one srcu locking fix and one fix for GGTT clear when destroy vGPU."
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbomjjAAoJEPpiX2QO6xPKlaMH/0sp97hPs11qVzYNrKk3Znh8
 DJaI9IzRWfmSwLbfnpyywK7VqqErUltRSwUW+R8X2FqLXeG4shni154/jRdIMy1a
 zx7Or/8fIyvVbCEJteMvn+Lv+8ucm8tTG3YL9JqQj7blyo3T1JbtA8zsIoVgug3T
 pf4niyqcoO1plpZUsrnGKHmdrhUG+oGUkG6AWOBS8NlGgobvFY5nviyfVhdLEyG1
 JZRjruFRnVNmyIgyUCHwSN9ILO6DDykMW6xpqv7CIm8eLcImetHQvwfgEsl8mMUq
 SCT5EoUEnzSJrlRHkzoso4X7slM35wJ/JNnCN3NsmznWSs3FoIuFt3R3qsD8GlE=
 =XeR4
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Only fixes coming from gvt containing "Two more BXT fixes from Colin,
one srcu locking fix and one fix for GGTT clear when destroy vGPU."

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180919151915.GA6309@intel.com
2018-09-20 10:01:53 +10:00
Dave Airlie
d5b3a31b1c drm-misc-fixes for v4.19-rc5:
- Fix crash in vgem in drm_drv_uses_atomic_modeset.
 - Allow atomic drivers that don't set DRIVER_ATOMIC to create debugfs entries.
 - Fix compiler warning for unused connector_funcs.
 - Fix null pointer deref on UDL unplug.
 - Disable DRM support for sun4i's R40 for now.
   (Not all patches went in for v4.19, so it has to wait a cycle.)
 - NULL-terminate the of_device_id table in pl111.
 - Make sure vc4 NV12 planar format works when displaying an unscaled fb.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAluiXbsACgkQ/lWMcqZw
 E8P9MBAAjmltMZr9KbmtksYSDFHVldiTqwiToPDl4iZPmMDKCQQP14apJhqwFqdO
 pa+N3n7zrbR9PTUcPxt2pfT3I8J7vCGARMc66wnlPbrKDes+dkKm7KqCRFJGcyrD
 faN++FGfTBn3rsLo1iM7mLOMVE+72B5gdjcxIqewEXSxWjX9ED6N7JaVR7krcQbs
 MVT/ENvLZTRVCYla+eey+wQoZR/bh/E7HtuvqsLRaQOGSk6Go2gBzEiZiWfT+6sS
 BzEXaYKL61AKhsh68oiPB2elxVWrnPyf3liLAzoTF0MhXuGxmlu9F50jByQvDuz8
 lAzm53Hg5uFj6Ca1E81I1UDy2i5IAgaiRXGfVikeWwTsBiLgxhcRDGbQPki2rHRu
 1Cs+D/F1gE94WqWhu9ydV2rU5X/5/NdDvYH0LkeD5jI9VcB8KtK89r3zXkxh3f9B
 BfhVOGq3RTVgdAFFPujRrZCTQyyNW8zo51mYmncVykB//9awWE5nQcK3HGLh2wvL
 0Oar5oJE3UlHa5No91zMyxmJIxZVp7SE/4A7+ih1LGTu5SyaT9K718pAgv2lpakd
 HMhgF+338rCqMfL7TFqYJ2N+srXTzNRruHXdElcSg1wHbfEFejyMt0KRtv0x32qE
 9IJ1CFuJVWgQ5pXu5zbj+NLKo5kR4ow0BCvsk7HCWiqkJ5N+pY8=
 =bZgz
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v4.19-rc5:
- Fix crash in vgem in drm_drv_uses_atomic_modeset.
- Allow atomic drivers that don't set DRIVER_ATOMIC to create debugfs entries.
- Fix compiler warning for unused connector_funcs.
- Fix null pointer deref on UDL unplug.
- Disable DRM support for sun4i's R40 for now.
  (Not all patches went in for v4.19, so it has to wait a cycle.)
- NULL-terminate the of_device_id table in pl111.
- Make sure vc4 NV12 planar format works when displaying an unscaled fb.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dda393bb-f13f-8d36-711b-cacfc578e5a3@linux.intel.com
2018-09-20 10:00:46 +10:00
Icenowy Zheng
558a9ef94a
drm: sun4i: drop second PLL from A64 HDMI PHY
The A64 HDMI PHY seems to be not able to use the second video PLL as
clock parent in experiments.

Drop the support for the second PLL from A64 HDMI PHY driver.

Fixes: b46e2c9f5f ("drm/sun4i: Add support for A64 HDMI PHY")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180916043409.62374-2-icenowy@aosc.io
2018-09-19 09:58:40 +02:00
Rodrigo Vivi
a530bf948a Merge tag 'gvt-fixes-2018-09-18' of https://github.com/intel/gvt-linux into drm-intel-fixes
gvt-fixes-2018-09-18

- Fix initial DPIO PHY register state for BXT (Colin)
- BXT untracked GEN9_CLKGATE_DIS_4 warning fix (Colin)
- Fix srcu lock for GFN valid check (Weinan)
- Should clear GGTT entry value after vGPU destroy (Zhipeng)

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180918073349.GQ20737@zhen-hp.sh.intel.com
2018-09-18 08:09:44 -07:00
Zhipeng Gong
7759ca3aac drm/i915/gvt: clear ggtt entries when destroy vgpu
When one vgpu is destroyed, its ggtt entries are not cleared.
This patch clears ggtt entries to avoid information leak.

v2: add 'Fixes' tag (Zhenyu)

Fixes: 2707e44466 ("drm/i915/gvt: vGPU graphics memory virtualization")
Signed-off-by: Zhipeng Gong <zhipeng.gong@intel.com>
Reviewed-by: Hang Yuan <hang.yuan@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18 10:39:44 +08:00
Weinan Li
a1ac5f0943 drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid
Fix the suspicious RCU usage issue in intel_vgpu_emulate_mmio_write.
Here need to request the srcu read lock of kvm->srcu before doing
gfn_to_memslot(). The detailed log is as below:
[  218.710688] =============================
[  218.710690] WARNING: suspicious RCU usage
[  218.710693] 4.14.15-dd+ #314 Tainted: G     U
[  218.710695] -----------------------------
[  218.710697] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage!
[  218.710699]
               other info that might help us debug this:

[  218.710702]
               rcu_scheduler_active = 2, debug_locks = 1
[  218.710704] 1 lock held by qemu-system-x86/2144:
[  218.710706]  #0:  (&gvt->lock){+.+.}, at: [<ffffffff816a1eea>] intel_vgpu_emulate_mmio_write+0x5a/0x2d0
[  218.710721]
               stack backtrace:
[  218.710724] CPU: 0 PID: 2144 Comm: qemu-system-x86 Tainted: G     U 4.14.15-dd+ #314
[  218.710727] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
[  218.710729] Call Trace:
[  218.710734]  dump_stack+0x7c/0xb3
[  218.710739]  gfn_to_memslot+0x15f/0x170
[  218.710743]  kvm_is_visible_gfn+0xa/0x30
[  218.710746]  intel_vgpu_emulate_gtt_mmio_write+0x267/0x3c0
[  218.710751]  ? __mutex_unlock_slowpath+0x3b/0x260
[  218.710754]  intel_vgpu_emulate_mmio_write+0x182/0x2d0
[  218.710759]  intel_vgpu_rw+0xba/0x170 [kvmgt]
[  218.710763]  intel_vgpu_write+0x14d/0x1a0 [kvmgt]
[  218.710767]  __vfs_write+0x23/0x130
[  218.710770]  vfs_write+0xb0/0x1b0
[  218.710774]  SyS_pwrite64+0x73/0x90
[  218.710777]  entry_SYSCALL_64_fastpath+0x25/0x9c
[  218.710780] RIP: 0033:0x7f33e8a91da3
[  218.710783] RSP: 002b:00007f33dddc8700 EFLAGS: 00000293

v2: add 'Fixes' tag, refine log format.(Zhenyu)
Fixes: cc753fbe1a ("drm/i915/gvt: validate gfn before set shadow page")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18 10:37:55 +08:00
Colin Xu
d817de3bc1 drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler
Host prints lots of untracked MMIO at 0x4653c when creating linux guest.
"gvt: vgpu 2: untracked MMIO 0004653c len 4"

GEN9_CLKGATE_DIS_4 (0x4653c) is accessed by i915 for gmbus clockgating.
However vgpu doesn't support any clockgating powergating operations
on related mmio access trap so need add it to default handler.
GEN9_CLKGATE_DIS_4 is accessed in bxt_gmbus_clock_gating() which only
applies to GEN9_LP so doens't show the warning on other platforms.

The solution is to add it to default handler init_bxt_mmio_info().

Reviewed-by: He, Min <min.he@intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18 10:37:44 +08:00
Colin Xu
db7c8f1e5f drm/i915/gvt: Init PHY related registers for BXT
Recent patch fixed the call trace
"ERROR Port B enabled but PHY powered down? (PHY_CTL 00000000)".
but introduced another similar call trace shown as:
"ERROR Port C enabled but PHY powered down? (PHY_CTL 00000200)".
The call trace will appear when host and guest enabled different ports,
i.e. host using PORT C or neither PORT is enabled, while guest is always
using PORT B as simulated by gvt. The issue is actually covered previously
before the commit and reverals now when the commit do the right thing.

On BXT, some PHY registers are initialized by vbios, before i915 loaded.
Later i915 will re-program some, or skip some based on the implementation.
The initialized mmio for guest i915 is done by gvt, based on the snapshot
taken from host. If host and guest have different PORT enabled, some
DPIO PHY mmios that gvt initialized for guest i915 will not match the
simualted monitor for guest, which leads to guest i915 print the calltrace
when it's trying to enable PHY and PORT.

The solution is to init these DPIO PHY registers to default value, then
guest i915 will program them to reasonable value based on the default
powerwell table and enabled PORT. Together with the old patch, all similar
call trace in guest kernel on BXT can be resolved.

v2: Move PHY register init to intel_vgpu_reset_mmio (Min)
v3: Do not delete empty line in issue fix patch. (zhenyu)

Fixes: c8ab5ac30c ("drm/i915/gvt: Make correct handling to vreg
BXT_PHY_CTL_FAMILY")
Reviewed-by: He, Min <min.he@intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18 10:37:29 +08:00
Lyude Paul
3c499ea0c6 drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation
As pointed out by Daniel Vetter, we should be usinng
drm_drv_uses_atomic_modeset() for determining whether or not we want to
make the debugfs nodes for atomic instead of checking DRIVER_ATOMIC, as
the former isn't an accurate representation of whether or not the driver
is actually using atomic modesetting internally (even though it might
not be exposing atomic capabilities).

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180917173733.21293-1-lyude@redhat.com
2018-09-17 19:24:37 -04:00
Dave Airlie
2b6318a09f Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-fixes
One more nouveau fix to remove some debug warnings.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==GF63dy8a9j611=-0x8G6FRu7uC-ZQypsLO_hqV4OAcA@mail.gmail.com
2018-09-14 09:38:42 +10:00
Dave Airlie
25824ca38e Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 4.19:
- Fix a small memory leak
- SR-IOV reset fix
- Fix locking in MMU-notifier error path
- Updated SDMA golden settings to fix a PRT hang

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912154735.2683-1-alexander.deucher@amd.com
2018-09-14 09:36:35 +10:00
Dave Airlie
db7f06d490 This contains a regression fix for video playbacks on gen 2 hardware,
a IPS timeout error suppression on Broadwell and GVT bucked with
 "Most critical one is to fix KVM's mm reference when we access guest memory,
 issue was raised by Linus [1], and another one with virtual opregion fix."
 
 [1] - https://lists.freedesktop.org/archives/intel-gvt-dev/2018-August/004130.html
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbmEJvAAoJEPpiX2QO6xPKNNMIAItkqjJpqsGZq0Ii1/ph3D9U
 gX3Jj4H7ZoAoE/c+wx2nNADjmn8RhV7PnKrGFuXmLkRJaNJllXA8U7h7YfWqRBT9
 jQ0RIuFdbfiRrwHmqtEZ2q3XP4HCWbdSIigimO/Lk/A2l+sCY8oA+9bGB7IRxeRV
 e162GARNsxN0a8D+5i+KR0mrTezSoKOvYFtUFp76UKUDAKrK85XNQsn1TR8ZMXGC
 fIi03y6ZM66I+bzIsq15HEj/jrRcaXm7xjd94/HetqeVJaEOEs2ztfaex+v3yDnC
 meWZKWCJ5x9noqHTO3XdHJGlKnfFQxopZsINNiqpqwOKpH7oaHWyAnnUrafJfhA=
 =/IQQ
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2018-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

This contains a regression fix for video playbacks on gen 2 hardware,
a IPS timeout error suppression on Broadwell and GVT bucked with
"Most critical one is to fix KVM's mm reference when we access guest memory,
issue was raised by Linus [1], and another one with virtual opregion fix."

[1] - https://lists.freedesktop.org/archives/intel-gvt-dev/2018-August/004130.html

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180911223229.GA30328@intel.com
2018-09-14 09:33:16 +10:00
Ben Skeggs
3483f08106 drm/nouveau/devinit: fix warning when PMU/PRE_OS is missing
Messed up when sending pull request and sent an outdated version of
previous patch, this fixes it up to remove warnings.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-13 10:56:58 +10:00
YueHaibing
6ee67e351c drm/fb-helper: Remove set but not used variable 'connector_funcs'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/drm_fb_helper.c: In function 'drm_pick_crtcs':
drivers/gpu/drm/drm_fb_helper.c:2373:43: warning:
 variable 'connector_funcs' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1536722130-108819-1-git-send-email-yuehaibing@huawei.com
2018-09-12 09:27:28 -04:00
Christian König
0165de9832 drm/amdgpu: fix error handling in amdgpu_cs_user_fence_chunk
Slowly leaking memory one page at a time :)

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-11 16:35:00 -05:00
Chris Wilson
17dc7af70e drm/i915/overlay: Allocate physical registers from stolen
Given that we are now reasonably confident in our ability to detect and
reserve the stolen memory (physical memory reserved for graphics by the
BIOS) for ourselves on most machines, we can put it to use. In this
case, we need a page to hold the overlay registers.

On an i915g running MythTv, H Buus noticed that

	commit 6a2c4232ec
	Author: Chris Wilson <chris@chris-wilson.co.uk>
	Date:   Tue Nov 4 04:51:40 2014 -0800
	drm/i915: Make the physical object coherent with GTT

introduced stuttering into his video playback. After discarding the
likely suspect of it being the physical cursor updates, we were left
with the use of the phys object for the overlay. And lo, if we
completely avoid using the phys object (allocated just once on module
load!) by switching to stolen memory, the stuttering goes away.

For lack of a better explanation, claim victory and kill two birds with
one stone.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107600
Fixes: 6a2c4232ec ("drm/i915: Make the physical object coherent with GTT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180906190144.1272-1-chris@chris-wilson.co.uk
(cherry picked from commit c8124d3992)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-09-11 08:24:03 -07:00
Dave Airlie
2887e5ce15 Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-fixes
A bunch of fixes for MST/runpm problems and races, as well as fixes
for issues that prevent more recent laptops from booting.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==GF63dy8a9j611=-0x8G6FRu7uC-ZQypsLO_hqV4OAcA@mail.gmail.com
2018-09-11 16:54:46 +10:00
Emily Deng
3a74987b24 drm/amdgpu: move PSP init prior to IH in gpu reset
since we use PSP to program IH regs now

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:58:21 -05:00
Tao Zhou
68ebc13ea4 drm/amdgpu: Fix SDMA hang in prt mode v2
Fix SDMA hang in prt mode, clear XNACK_WATERMARK in reg SDMA0_UTCL1_WATERMK to avoid the issue

Affected ASICs: VEGA10 VEGA12 RV1 RV2

v2: add reg clear for SDMA1

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Tested-by: Yukun Li <yukun1.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:56:27 -05:00
Christian König
b463d4e53c drm/amdgpu: fix amdgpu_mn_unlock() in the CS error path
Avoid unlocking a lock we never locked.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:53:29 -05:00
Emil Lundmark
fcb74da1eb drm: udl: Destroy framebuffer only if it was initialized
This fixes a NULL pointer dereference that can happen if the UDL
driver is unloaded before the framebuffer is initialized. This can
happen e.g. if the USB device is unplugged right after it was plugged
in.

As explained by Stéphane Marchesin:

It happens when fbdev is disabled (which is the case for Chrome OS).
Even though intialization of the fbdev part is optional (it's done in
udlfb_create which is the callback for fb_probe()), the teardown isn't
optional (udl_driver_unload -> udl_fbdev_cleanup ->
udl_fbdev_destroy).

Note that udl_fbdev_cleanup *tries* to be conditional (you can see it
does if (!udl->fbdev)) but that doesn't work, because udl->fbdev is
always set during udl_fbdev_init.

Cc: stable@vger.kernel.org
Suggested-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Emil Lundmark <lndmrk@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180528142711.142466-1-lndmrk@chromium.org
Signed-off-by: Sean Paul <seanpaul@chromium.org>
2018-09-10 16:02:51 -04:00
Chen-Yu Tsai
3510e7a7f9 drm/sun4i: Remove R40 display pipeline compatibles
Two patches from the R40 display pipeline support series weren't applied
with the rest of the series. When they did get applied, the -rc6
deadline for drm-misc-next had past, so they didn't get into 4.19-rc1
with the rest of the series. However, the two patches are crucial in
the parsing of the R40's display pipeline graph in the device tree.
Without them, the driver crashes because it can't follow the odd graph
structure.

This patch removes the R40 compatibles from the sun4i-drm driver,
effectively disabling DRM support for the R40 for one release cycle.
This will prevent the driver from crashing upon probing.

The compatibles should be reinstated for the next release.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180827083950.602-1-wens@csie.org
Signed-off-by: Sean Paul <seanpaul@chromium.org>
2018-09-10 16:01:22 -04:00
zhong jiang
7eb3322457 drm/pl111: Make sure of_device_id tables are NULL terminated
We prefer to of_device_id tables are NULL terminated. So make
vexpress_muxfpga_match is NULL terminated.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/1533379767-15629-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: Sean Paul <seanpaul@chromium.org>
2018-09-10 16:01:22 -04:00
Boris Brezillon
658d8cbd07 drm/vc4: Fix the "no scaling" case on multi-planar YUV formats
When there's no scaling requested ->is_unity should be true no matter
the format.

Also, when no scaling is requested and we have a multi-planar YUV
format, we should leave ->y_scaling[0] to VC4_SCALING_NONE and only
set ->x_scaling[0] to VC4_SCALING_PPF.

Doing this fixes an hardly visible artifact (seen when using modetest
and a rather big overlay plane in YUV420).

Fixes: fc04023faf ("drm/vc4: Add support for YUV planes.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180725122907.13702-1-boris.brezillon@bootlin.com
Signed-off-by: Sean Paul <seanpaul@chromium.org>
2018-09-10 16:01:22 -04:00
Rodrigo Vivi
50cbc03e50 Merge tag 'gvt-fixes-2018-09-10' of https://github.com/intel/gvt-linux into drm-intel-fixes
gvt-fixes-2018-09-10

- KVM mm access reference fix (Zhenyu)
- Fix child device config length for virtual opregion (Weinan)

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180910092212.GZ20737@zhen-hp.sh.intel.com
2018-09-10 12:37:35 -07:00
Imre Deak
92a6803149 drm/i915/bdw: Increase IPS disable timeout to 100ms
During IPS disabling the current 42ms timeout value leads to occasional
timeouts, increase it to 100ms which seems to get rid of the problem.

References: https://bugs.freedesktop.org/show_bug.cgi?id=107494
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107562
Reported-by: Diego Viola <diego.viola@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Cc: Diego Viola <diego.viola@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180905100005.7663-1-imre.deak@intel.com
(cherry picked from commit acb3ef0ee4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-09-10 10:18:42 -07:00
Ben Skeggs
53b0cc46f2 drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels
Fixes eDP backlight issues on more recent laptops.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
e04cfdc9b7 drm/nouveau/disp: fix DP disable race
If a HPD pulse signalling the need to retrain the link occurs between
the KMS driver releasing the output and the supervisor interrupt that
finishes the teardown, it was possible get a NULL-ptr deref.

Avoid this by marking the link as inactive earlier.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
f6d52b2172 drm/nouveau/disp: move eDP panel power handling
We need to do this earlier to prevent aux channel timeouts in resume
paths on certain systems.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
606557708f drm/nouveau/disp: remove unused struct member
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
0a6986c659 drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
This Falcon application doesn't appear to be present on some newer
systems, so let's not fail init if we can't find it.

TBD: is there a way to determine whether it *should* be there?

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
51ed833c88 drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer
Fixes oopses in certain failure paths.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:28 +10:00
Ben Skeggs
a43b16dda2 drm/nouveau: fix oops in client init failure path
The NV_ERROR macro requires drm->client to be initialised, which it may not
be at this stage of the init process.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
d5986a1c4d drm/nouveau: Fix nouveau_connector_ddc_detect()
It looks like that when we moved over to using
drm_connector_for_each_possible_encoder() in nouveau, that one rather
important part of this function got dropped by accident:

	/*          Right   v   here */
	for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) {
		int id = connector->encoder_ids[i];
		if (id == 0)
			break;

Since it's rather difficult to notice: the conditional in this loop is
actually:

	nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER

Meaning that all early breaks result in nv_encoder keeping it's value,
otherwise nv_encoder = NULL. Ugh.

Since this got dropped, nouveau_connector_ddc_detect() now returns an
encoder for every single connector, regardless of whether or not it's
detected:

    [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2

So: fix this to ensure we only return an encoder if we actually found
one, and clean up the rest of the function while we're at it since it's
nearly impossible to read properly.

Changes since v1:
- Don't skip ddc probing for LVDS if we can't switch DDC through
  vga-switcheroo, just do the DDC probing without calling
  vga_switcheroo_lock_ddc() - skeggsb

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: ddba766dd0 ("drm/nouveau: Use drm_connector_for_each_possible_encoder()")
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
2f7ca781fd drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
Currently, there's nothing in nouveau that actually cancels this work
struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky
enough hpd_work might try to keep running up until the system is
suspended.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
79e765ad66 drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
On most systems with ACPI hotplugging support, it seems that we always
receive a hotplug event once we re-enable EC interrupts even if the GPU
hasn't even been resumed yet.

This can cause problems since even though we schedule hpd_work to handle
connector reprobing for us, hpd_work synchronizes on
pm_runtime_get_sync() to wait until the device is ready to perform
reprobing. Since runtime suspend/resume callbacks are disabled before
the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during
this period will grab a runtime PM ref and return immediately with
-EACCES. Because we schedule hpd_work from our ACPI HPD handler, and
hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch
a connector reprobe immediately even if the GPU isn't actually resumed
just yet. This causes various warnings in dmesg and occasionally, also
prevents some displays connected to the dedicated GPU from coming back
up after suspend. Example:

usb 1-4: USB disconnect, device number 14
usb 1-4.1: USB disconnect, device number 15
WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau]
CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1
Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018
Workqueue: events nouveau_display_hpd_work [nouveau]
RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau]
RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000
RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002
RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000
R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418
FS:  0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? _cond_resched+0x15/0x30
 nouveau_connector_detect+0x2ce/0x520 [nouveau]
 ? _cond_resched+0x15/0x30
 ? ww_mutex_lock+0x12/0x40
 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper]
 drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
 nouveau_display_hpd_work+0x2a/0x60 [nouveau]
 process_one_work+0x187/0x340
 worker_thread+0x2e/0x380
 ? pwq_unbound_release_workfn+0xd0/0xd0
 kthread+0x112/0x130
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x35/0x40
Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff
---[ end trace 55d811b38fc8e71a ]---

So, to fix this we attempt to grab a runtime PM reference in the ACPI
handler itself asynchronously. If the GPU is already awake (it will have
normal hotplugging at this point) or runtime PM callbacks are currently
disabled on the device, we drop our reference without updating the
autosuspend delay. We only schedule connector reprobes when we
successfully managed to queue up a resume request with our asynchronous
PM ref.

This also has the added benefit of preventing redundant connector
reprobes from ACPI while the GPU is runtime resumed!

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <kherbst@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
fa3cdf8d0b drm/nouveau: Reset MST branching unit before enabling
When probing a new MST device, it's not safe to make any assumptions
about it's current state. While most well mannered MST hubs will just
disable the branching unit on hotplug disconnects, this isn't enough to
save us from various other scenarios that might have resulted in
something writing to the MST branching unit before we got control of it.
This could happen if a previous probe we tried failed, if we're booting
in kexec context and the hub is still in the state the last kernel put
it in, etc.

Luckily; there is no reason we can't just reset the branching unit
every time we enable a new topology. So, fix this by resetting it on
enabling new topologies to ensure that we always start off with a clean,
unmodified topology state on MST sinks.

This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX
times out all DPCD trasactions) observed after multiple docks, undocks,
and module reloads.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:27 +10:00
Lyude Paul
b26b4590dd drm/nouveau: Only write DP_MSTM_CTRL when needed
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST
hub every time it receives a long HPD pulse on DP. This isn't actually
necessary and additionally, has some unintended side effects.

With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to
make it rather likely (1 out of 5 times usually) that bringing up MST
with it's ThinkPad dock will fail and result in sideband messages timing
out in the middle. Afterwards, successive probes don't manage to get the
dock to communicate properly over MST sideband properly.

Many times sideband message timeouts from MST hubs are indicative of
either the source or the sink dropping an ESI event, which can cause
DRM's perspective of the topology's current state to go out of sync with
reality. While it's tough to really know for sure what's happening to
the dock, using userspace tools to write to DP_MSTM_CTRL in the middle
of the MST link probing process does appear to make things flaky. It's
possible that when we write to DP_MSTM_CTRL, the function that gets
triggered to respond in the dock's firmware temporarily puts it in a
state where it might end up not reporting an ESI to the source, or ends
up dropping a sideband message we sent it.

So, to fix this we make it so that when probing an MST topology, we
respect it's current state. If the dock's already enabled, we simply
read DP_MSTM_CTRL and disable the topology if it's value is not what we
expected. Otherwise, we perform the normal MST probing dance. We avoid
taking any action except if the state of the MST topology actually
changes.

This fixes MST sideband message timeouts and detection failures on my
P50 with its ThinkPad dock.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Cc: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
7326ead982 drm/nouveau: Remove useless poll_enable() call in drm_load()
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have
already been called in nouveau_display_init()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
0d7b2d4def drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
This won't do anything but potentially make us miss hotplugs. We already
call drm_kms_helper_poll_disable() in
nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
0445f7537d drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
This doesn't do anything, drm_kms_helper_poll_enable() gets called in
nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init()
already.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
3e1a12754d drm/nouveau: Fix deadlocks in nouveau_connector_detect()
When we disable hotplugging on the GPU, we need to be able to
synchronize with each connector's hotplug interrupt handler before the
interrupt is finally disabled. This can be a problem however, since
nouveau_connector_detect() currently grabs a runtime power reference
when handling connector probing. This will deadlock the runtime suspend
handler like so:

[  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
[  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.486332] kworker/0:2     D    0    61      2 0x80000000
[  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
[  861.487737] Call Trace:
[  861.488394]  __schedule+0x322/0xaf0
[  861.489070]  schedule+0x33/0x90
[  861.489744]  rpm_resume+0x19c/0x850
[  861.490392]  ? finish_wait+0x90/0x90
[  861.491068]  __pm_runtime_resume+0x4e/0x90
[  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
[  861.492416]  process_one_work+0x231/0x620
[  861.493068]  worker_thread+0x44/0x3a0
[  861.493722]  kthread+0x12b/0x150
[  861.494342]  ? wq_pool_ids_show+0x140/0x140
[  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.495648]  ret_from_fork+0x3a/0x50
[  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
[  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.498341] kworker/6:2     D    0   320      2 0x80000080
[  861.499045] Workqueue: pm pm_runtime_work
[  861.499739] Call Trace:
[  861.500428]  __schedule+0x322/0xaf0
[  861.501134]  ? wait_for_completion+0x104/0x190
[  861.501851]  schedule+0x33/0x90
[  861.502564]  schedule_timeout+0x3a5/0x590
[  861.503284]  ? mark_held_locks+0x58/0x80
[  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
[  861.504710]  ? wait_for_completion+0x104/0x190
[  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
[  861.506136]  ? wait_for_completion+0x104/0x190
[  861.506845]  wait_for_completion+0x12c/0x190
[  861.507555]  ? wake_up_q+0x80/0x80
[  861.508268]  flush_work+0x1c9/0x280
[  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
[  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
[  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
[  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
[  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
[  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
[  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.514897]  __rpm_callback+0x7a/0x1d0
[  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.516313]  rpm_callback+0x24/0x80
[  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
[  861.517741]  rpm_suspend+0x142/0x6b0
[  861.518449]  pm_runtime_work+0x97/0xc0
[  861.519144]  process_one_work+0x231/0x620
[  861.519831]  worker_thread+0x44/0x3a0
[  861.520522]  kthread+0x12b/0x150
[  861.521220]  ? wq_pool_ids_show+0x140/0x140
[  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.522622]  ret_from_fork+0x3a/0x50
[  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
[  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.525349] kworker/6:0     D    0  1329      2 0x80000000
[  861.526073] Workqueue: events nvif_notify_work [nouveau]
[  861.526751] Call Trace:
[  861.527411]  __schedule+0x322/0xaf0
[  861.528089]  schedule+0x33/0x90
[  861.528758]  rpm_resume+0x19c/0x850
[  861.529399]  ? finish_wait+0x90/0x90
[  861.530073]  __pm_runtime_resume+0x4e/0x90
[  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
[  861.531459]  ? ww_mutex_lock+0x47/0x80
[  861.532097]  ? ww_mutex_lock+0x47/0x80
[  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
[  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
[  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
[  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
[  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
[  861.536221]  process_one_work+0x231/0x620
[  861.536994]  worker_thread+0x44/0x3a0
[  861.537757]  kthread+0x12b/0x150
[  861.538463]  ? wq_pool_ids_show+0x140/0x140
[  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.539815]  ret_from_fork+0x3a/0x50
[  861.540521]
               Showing all locks held in the system:
[  861.541696] 2 locks held by kworker/0:2/61:
[  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.543814] 1 lock held by khungtaskd/64:
[  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  861.545160] 3 locks held by kworker/6:2/320:
[  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
[  861.548146] 1 lock held by dmesg/983:
[  861.548889] 2 locks held by zsh/1250:
[  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[  861.551122] 6 locks held by kworker/6:0/1329:
[  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
[  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
[  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
[  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
[  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]

[  861.557864] =============================================

[  861.559507] NMI backtrace for cpu 2
[  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
[  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[  861.561948] Call Trace:
[  861.562757]  dump_stack+0x8e/0xd3
[  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
[  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
[  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
[  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
[  861.566558]  watchdog+0x316/0x580
[  861.567355]  kthread+0x12b/0x150
[  861.568114]  ? reset_hung_task_detector+0x20/0x20
[  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
[  861.569598]  ret_from_fork+0x3a/0x50
[  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
[  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
[  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
[  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
[  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
[  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
[  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
[  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
[  861.572428] Kernel panic - not syncing: hung_task: blocked tasks

So: fix this by making it so that normal hotplug handling /only/ happens
so long as the GPU is currently awake without any pending runtime PM
requests. In the event that a hotplug occurs while the device is
suspending or resuming, we can simply defer our response until the GPU
is fully runtime resumed again.

Changes since v4:
- Use a new trick I came up with using pm_runtime_get() instead of the
  hackish junk we had before

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
6833fb1ec1 drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
7fec8f5379 drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
function provided by DRM as it's output_poll_changed callback.
Unfortunately however, this function doesn't grab runtime PM references
early enough and even if it did-we can't block waiting for the device to
resume in output_poll_changed() since it's very likely that we'll need
to grab the fb_helper lock at some point during the runtime resume
process. This currently results in deadlocking like so:

[  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.676527] kworker/4:0     D    0    37      2 0x80000000
[  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[  246.678704] Call Trace:
[  246.679753]  __schedule+0x322/0xaf0
[  246.680916]  schedule+0x33/0x90
[  246.681924]  schedule_preempt_disabled+0x15/0x20
[  246.683023]  __mutex_lock+0x569/0x9a0
[  246.684035]  ? kobject_uevent_env+0x117/0x7b0
[  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.686179]  mutex_lock_nested+0x1b/0x20
[  246.687278]  ? mutex_lock_nested+0x1b/0x20
[  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[  246.692611]  process_one_work+0x231/0x620
[  246.693725]  worker_thread+0x214/0x3a0
[  246.694756]  kthread+0x12b/0x150
[  246.695856]  ? wq_pool_ids_show+0x140/0x140
[  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.697998]  ret_from_fork+0x3a/0x50
[  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.702278] kworker/0:1     D    0    60      2 0x80000000
[  246.703293] Workqueue: pm pm_runtime_work
[  246.704393] Call Trace:
[  246.705403]  __schedule+0x322/0xaf0
[  246.706439]  ? wait_for_completion+0x104/0x190
[  246.707393]  schedule+0x33/0x90
[  246.708375]  schedule_timeout+0x3a5/0x590
[  246.709289]  ? mark_held_locks+0x58/0x80
[  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
[  246.711222]  ? wait_for_completion+0x104/0x190
[  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
[  246.713094]  ? wait_for_completion+0x104/0x190
[  246.713964]  wait_for_completion+0x12c/0x190
[  246.714895]  ? wake_up_q+0x80/0x80
[  246.715727]  ? get_work_pool+0x90/0x90
[  246.716649]  flush_work+0x1c9/0x280
[  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  246.718442]  __cancel_work_timer+0x146/0x1d0
[  246.719247]  cancel_delayed_work_sync+0x13/0x20
[  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
[  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.723737]  __rpm_callback+0x7a/0x1d0
[  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.725607]  rpm_callback+0x24/0x80
[  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
[  246.727376]  rpm_suspend+0x142/0x6b0
[  246.728185]  pm_runtime_work+0x97/0xc0
[  246.728938]  process_one_work+0x231/0x620
[  246.729796]  worker_thread+0x44/0x3a0
[  246.730614]  kthread+0x12b/0x150
[  246.731395]  ? wq_pool_ids_show+0x140/0x140
[  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.732878]  ret_from_fork+0x3a/0x50
[  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
[  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.736113] kworker/4:2     D    0   422      2 0x80000080
[  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  246.737665] Call Trace:
[  246.738490]  __schedule+0x322/0xaf0
[  246.739250]  schedule+0x33/0x90
[  246.739908]  rpm_resume+0x19c/0x850
[  246.740750]  ? finish_wait+0x90/0x90
[  246.741541]  __pm_runtime_resume+0x4e/0x90
[  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
[  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
[  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[  246.752314]  process_one_work+0x231/0x620
[  246.752979]  worker_thread+0x44/0x3a0
[  246.753838]  kthread+0x12b/0x150
[  246.754619]  ? wq_pool_ids_show+0x140/0x140
[  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
[  246.756162]  ret_from_fork+0x3a/0x50
[  246.756847]
           Showing all locks held in the system:
[  246.758261] 3 locks held by kworker/4:0/37:
[  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[  246.761516] 2 locks held by kworker/0:1/60:
[  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.763890] 1 lock held by khungtaskd/64:
[  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[  246.765588] 5 locks held by kworker/4:2/422:
[  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[  246.770839] 1 lock held by dmesg/1038:
[  246.771739] 2 locks held by zsh/1172:
[  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870

[  246.775522] =============================================

After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.

Changes since v7:
 - Fixup commit message - Daniel Vetter

Changes since v6:
 - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia

Changes since v5:
 - Come up with the (hopefully final) solution for solving this dumb
   problem, one that is a lot less likely to cause issues with locking in
   the future. This should work around all deadlock conditions with fbcon
   brought up thus far.

Changes since v4:
 - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
   condition that Lukas described
 - Just move all of this out of drm_fb_helper. It seems that other DRM
   drivers have already figured out other workarounds for this. If other
   drivers do end up needing this in the future, we can just move this
   back into drm_fb_helper again.

Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
  the following conditions are true (as otherwise, calling this in the
  wrong spot will cause Bad Things to happen):
  - fb_helper hotplug handling was actually inhibited previously
  - fb_helper actually has a delayed hotplug pending
  - fb_helper is actually bound
  - fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
  situation where a driver would actually want to use this without
  checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
  that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
  isn't enabled
- Actually grab the toplevel fb_helper lock in
  drm_fb_helper_resume_hotplug(), since it's possible other activity
  (such as a hotplug) could be going on at the same time the driver
  calls drm_fb_helper_resume_hotplug(). We need this to check whether or
  not drm_fb_helper_hotplug_event() needs to be called anyway

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00
Lyude Paul
611ce85542 drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
Since actual hotplug notifications don't get disabled until
nouveau_display_fini() is called, all this will do is cause any hotplugs
that happen between this drm_kms_helper_poll_disable() call and the
actual hotplug disablement to potentially be dropped if ACPI isn't
around to help us.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07 06:54:26 +10:00