linux_dsm_epyc7002/drivers/gpu/drm
Chris Wilson 34ba5a80f2 drm/i915/guc: Split hw submission for replay after GPU reset
Something I missed before sending off the partial series was that the
non-scheduler guc reset path was broken (in the full series, this is
pushed to the execlists reset handler). The issue is that after a reset,
we have to refill the GuC workqueues, which we do by resubmitting the
requests. However, if we already have submitted them, the fences within
them have already been used and triggering them again is an error.
Instead, just repopulate the guc workqueue.

[  115.858560] [IGT] gem_busy: starting subtest hang-render
[  135.839867] [drm] GPU HANG: ecode 9:0:0xe757fefe, in gem_busy [1716], reason: Hang on render ring, action: reset
[  135.839902] drm/i915: Resetting chip after gpu hang
[  135.839957] [drm] RC6 on
[  135.858351] ------------[ cut here ]------------
[  135.858357] WARNING: CPU: 2 PID: 45 at drivers/gpu/drm/i915/i915_sw_fence.c:108 i915_sw_fence_complete+0x25/0x30
[  135.858357] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 input_leds snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core btusb btrtl snd_hwdep snd_pcm 8250_dw snd_seq_midi hid_lenovo snd_seq_midi_event snd_rawmidi iwlwifi x86_pkg_temp_thermal coretemp snd_seq crct10dif_pclmul snd_seq_device hci_uart snd_timer crc32_pclmul ghash_clmulni_intel idma64 aesni_intel virt_dma btbcm snd btqca aes_x86_64 btintel lrw cfg80211 bluetooth gf128mul glue_helper ablk_helper cryptd soundcore intel_lpss_pci intel_pch_thermal intel_lpss_acpi intel_lpss acpi_als mfd_core kfifo_buf acpi_pad industrialio autofs4 hid_plantronics usbhid dm_mirror dm_region_hash dm_log sdhci_pci ahci sdhci libahci i2c_hid hid
[  135.858389] CPU: 2 PID: 45 Comm: kworker/2:1 Tainted: G        W       4.9.0-rc4+ #238
[  135.858389] Hardware name:                  /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[  135.858392] Workqueue: events_long i915_hangcheck_elapsed
[  135.858394]  ffffc900001bf9b8 ffffffff812bb238 0000000000000000 0000000000000000
[  135.858396]  ffffc900001bf9f8 ffffffff8104f621 0000006c00000000 ffff8808296137f8
[  135.858398]  0000000000000a00 ffff8808457a0000 ffff880845764e60 ffff880845760000
[  135.858399] Call Trace:
[  135.858403]  [<ffffffff812bb238>] dump_stack+0x4d/0x65
[  135.858405]  [<ffffffff8104f621>] __warn+0xc1/0xe0
[  135.858406]  [<ffffffff8104f748>] warn_slowpath_null+0x18/0x20
[  135.858408]  [<ffffffff813f8c15>] i915_sw_fence_complete+0x25/0x30
[  135.858410]  [<ffffffff813f8fad>] i915_sw_fence_commit+0xd/0x30
[  135.858412]  [<ffffffff8142e591>] __i915_gem_request_submit+0xe1/0xf0
[  135.858413]  [<ffffffff8142e5c8>] i915_gem_request_submit+0x28/0x40
[  135.858415]  [<ffffffff814433e7>] i915_guc_submit+0x47/0x210
[  135.858417]  [<ffffffff81443e98>] i915_guc_submission_enable+0x468/0x540
[  135.858419]  [<ffffffff81442495>] intel_guc_setup+0x715/0x810
[  135.858421]  [<ffffffff8142b6b4>] i915_gem_init_hw+0x114/0x2a0
[  135.858423]  [<ffffffff813eeaa8>] i915_reset+0xe8/0x120
[  135.858424]  [<ffffffff813f3937>] i915_reset_and_wakeup+0x157/0x180
[  135.858426]  [<ffffffff813f79db>] i915_handle_error+0x1ab/0x230
[  135.858428]  [<ffffffff812c760d>] ? scnprintf+0x4d/0x90
[  135.858430]  [<ffffffff81435985>] i915_hangcheck_elapsed+0x275/0x3d0
[  135.858432]  [<ffffffff810668cf>] process_one_work+0x12f/0x410
[  135.858433]  [<ffffffff81066bf3>] worker_thread+0x43/0x4d0
[  135.858435]  [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[  135.858436]  [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[  135.858438]  [<ffffffff8106bbb4>] kthread+0xd4/0xf0
[  135.858440]  [<ffffffff8106bae0>] ? kthread_park+0x60/0x60

v2: Only resubmit submitted requests
v3: Don't forget the pending requests have reserved space.

Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129121024.22650-6-chris@chris-wilson.co.uk
2016-11-29 15:52:46 +00:00
..
amd drm/irq: Unexport drm_vblank_on/off 2016-11-15 23:33:48 +01:00
arc drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
arm Merge branch 'drm-tda998x-mali' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-next 2016-11-17 08:55:26 +10:00
armada drm/armada: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:54:13 +01:00
ast drm/ast: free correct pointer in astfb_create() error paths 2016-11-14 07:45:16 +01:00
atmel-hlcdc drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
bochs drm/bochs: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:54:38 +01:00
bridge drm/bridge: analogix_dp: return error if transfer none byte 2016-11-16 11:16:13 +05:30
cirrus Merge tag 'topic/drm-misc-2016-11-10' of git://anongit.freedesktop.org/drm-intel into drm-next 2016-11-11 09:28:44 +10:00
etnaviv drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
exynos drm/exynos: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:54:50 +01:00
fsl-dcu drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
gma500 drm/gma500: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 08:01:01 +01:00
hisilicon drm: move allocation out of drm_get_format_name() 2016-11-12 14:19:38 +01:00
i2c Merge branch 'drm-tda998x-mali' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-next 2016-11-17 08:55:26 +10:00
i810 drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
i915 drm/i915/guc: Split hw submission for replay after GPU reset 2016-11-29 15:52:46 +00:00
imx drm/imx: Switch to drm_fb_cma_prepare_fb() helper 2016-11-15 08:25:06 +01:00
mediatek drm/irq: Unexport drm_vblank_count 2016-11-15 23:33:47 +01:00
mga
mgag200 Merge tag 'topic/drm-misc-2016-11-10' of git://anongit.freedesktop.org/drm-intel into drm-next 2016-11-11 09:28:44 +10:00
msm drm/msm: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:58:04 +01:00
nouveau drm/nouveau: Use drm_crtc_vblank_off/on 2016-11-15 23:33:37 +01:00
omapdrm drm/omapdrm: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:58:15 +01:00
panel
qxl drm/qxl: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:55:33 +01:00
r128
radeon drm/radeon: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:56:52 +01:00
rcar-du Merge branch 'drm/next/du' of git://linuxtv.org/pinchartl/media into drm-next 2016-11-16 09:39:21 +10:00
rockchip drm/rockchip: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:56:47 +01:00
savage drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
shmobile drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
sis drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
sti Merge tag 'topic/drm-misc-2016-11-10' of git://anongit.freedesktop.org/drm-intel into drm-next 2016-11-11 09:28:44 +10:00
sun4i Merge tag 'drm-misc-next-2016-11-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next 2016-11-17 08:02:46 +10:00
tdfx drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
tegra drm/tegra: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:56:58 +01:00
tilcdc drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
ttm drm/ttm: fix ttm_bo_wait 2016-11-09 00:46:04 +05:30
udl drm/udl: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:57:59 +01:00
vc4 This pull request brings in fragment shader threading and ETC1 support 2016-11-17 09:43:56 +10:00
vgem dma-buf: Rename struct fence to dma_fence 2016-10-25 14:40:39 +02:00
via drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
virtio drm/virtio: use DRM_FB_HELPER_DEFAULT_OPS for fb_ops 2016-11-14 07:58:10 +01:00
vmwgfx drm: move allocation out of drm_get_format_name() 2016-11-12 14:19:38 +01:00
zte drm: zte: checking for NULL instead of IS_ERR() 2016-11-15 11:00:42 +01:00
ati_pcigart.c
drm_agpsupport.c
drm_atomic_helper.c drm/fence: add in-fences support 2016-11-16 09:55:27 +01:00
drm_atomic.c drm/fence: add out-fences support 2016-11-16 14:36:27 +01:00
drm_auth.c
drm_blend.c drm: RIP mode_config->rotation_property 2016-10-22 10:42:11 +02:00
drm_bridge.c
drm_bufs.c
drm_cache.c
drm_color_mgmt.c drm/color: document NULL values and default settings better 2016-11-15 22:39:48 +01:00
drm_connector.c drm: Move tile group code into drm_connector.c 2016-11-15 15:30:38 +01:00
drm_context.c
drm_crtc_helper_internal.h
drm_crtc_helper.c
drm_crtc_internal.h drm/fence: add fence timeline to drm_crtc 2016-11-16 10:42:48 +01:00
drm_crtc.c drm/fence: add out-fences support 2016-11-16 14:36:27 +01:00
drm_debugfs_crc.c drm: fix sparse warnings on undeclared symbols in crc debugfs 2016-10-19 14:10:29 +03:00
drm_debugfs.c drm/atomic: add debugfs file to dump out atomic state 2016-11-08 16:38:03 -05:00
drm_dma.c
drm_dp_aux_dev.c
drm_dp_dual_mode_helper.c drm: Print some debug/error info during DP dual mode detect 2016-10-26 15:57:11 -04:00
drm_dp_helper.c Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux 2016-10-11 18:12:22 -07:00
drm_dp_mst_topology.c drm/dp/mst: Check peer device type before attempting EDID read 2016-10-26 18:53:44 +02:00
drm_drv.c drm: Extract drm_drv.h 2016-11-15 12:50:30 +01:00
drm_dumb_buffers.c drm: Consolidate dumb buffer docs 2016-11-15 12:51:49 +01:00
drm_edid_load.c
drm_edid.c Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued 2016-11-17 14:32:57 +01:00
drm_encoder_slave.c
drm_encoder.c
drm_fb_cma_helper.c drm/fb_cma_helper: Add drm_fb_cma_prepare_fb() helper 2016-11-14 12:43:58 +01:00
drm_fb_helper.c drm/fb-helper: fix segfaults in drm_fb_helper_debug_* 2016-11-14 07:47:34 +01:00
drm_flip_work.c
drm_fops.c drm: define drm_compat_ioctl NULL on CONFIG_COMPAT=n and reduce #ifdefs 2016-11-02 11:33:47 -04:00
drm_fourcc.c drm: move allocation out of drm_get_format_name() 2016-11-12 14:19:38 +01:00
drm_framebuffer.c drm: move allocation out of drm_get_format_name() 2016-11-12 14:19:38 +01:00
drm_gem_cma_helper.c
drm_gem.c
drm_global.c
drm_hashtab.c
drm_info.c drm: Print device information again in debugfs 2016-10-17 16:20:53 +10:00
drm_internal.h drm: drm_irq.h header cleanup 2016-11-15 23:33:48 +01:00
drm_ioc32.c
drm_ioctl.c
drm_irq.c drm/irq: Unexport drm_vblank_on/off 2016-11-15 23:33:48 +01:00
drm_kms_helper_common.c
drm_legacy.h
drm_lock.c
drm_memory.c
drm_mipi_dsi.c
drm_mm.c drm: Add stackdepot include for DRM_DEBUG_MM 2016-11-08 13:46:49 +01:00
drm_mode_config.c drm/fence: add out-fences support 2016-11-16 14:36:27 +01:00
drm_mode_object.c
drm_modes.c Revert "drm: Add aspect ratio parsing in DRM layer" 2016-11-15 15:01:42 +01:00
drm_modeset_helper.c drm: move allocation out of drm_get_format_name() 2016-11-12 14:19:38 +01:00
drm_modeset_lock.c drm: don't let crtc_ww_class leak out 2016-11-15 08:33:35 +01:00
drm_of.c drm: convert DT component matching to component_match_add_release() 2016-10-25 11:52:38 -04:00
drm_panel.c
drm_pci.c
drm_plane_helper.c drm: add helpers to go from plane state to drm_rect 2016-11-08 16:38:03 -05:00
drm_plane.c drm/fence: add in-fences support 2016-11-16 09:55:27 +01:00
drm_platform.c
drm_prime.c
drm_print.c drm/print: Move kerneldoc next to definition 2016-11-15 12:55:24 +01:00
drm_probe_helper.c
drm_property.c
drm_rect.c drm: helper macros to print composite types 2016-11-08 16:38:03 -05:00
drm_scatter.c
drm_simple_kms_helper.c
drm_sysfs.c
drm_trace_points.c
drm_trace.h
drm_vm.c
drm_vma_manager.c
Kconfig drm/fence: add in-fences support 2016-11-16 09:55:27 +01:00
Makefile drm: Extract drm_mode_config.[hc] 2016-11-15 15:23:29 +01:00