linux_dsm_epyc7002/drivers
Chris Wilson 34ba5a80f2 drm/i915/guc: Split hw submission for replay after GPU reset
Something I missed before sending off the partial series was that the
non-scheduler guc reset path was broken (in the full series, this is
pushed to the execlists reset handler). The issue is that after a reset,
we have to refill the GuC workqueues, which we do by resubmitting the
requests. However, if we already have submitted them, the fences within
them have already been used and triggering them again is an error.
Instead, just repopulate the guc workqueue.

[  115.858560] [IGT] gem_busy: starting subtest hang-render
[  135.839867] [drm] GPU HANG: ecode 9:0:0xe757fefe, in gem_busy [1716], reason: Hang on render ring, action: reset
[  135.839902] drm/i915: Resetting chip after gpu hang
[  135.839957] [drm] RC6 on
[  135.858351] ------------[ cut here ]------------
[  135.858357] WARNING: CPU: 2 PID: 45 at drivers/gpu/drm/i915/i915_sw_fence.c:108 i915_sw_fence_complete+0x25/0x30
[  135.858357] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 input_leds snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core btusb btrtl snd_hwdep snd_pcm 8250_dw snd_seq_midi hid_lenovo snd_seq_midi_event snd_rawmidi iwlwifi x86_pkg_temp_thermal coretemp snd_seq crct10dif_pclmul snd_seq_device hci_uart snd_timer crc32_pclmul ghash_clmulni_intel idma64 aesni_intel virt_dma btbcm snd btqca aes_x86_64 btintel lrw cfg80211 bluetooth gf128mul glue_helper ablk_helper cryptd soundcore intel_lpss_pci intel_pch_thermal intel_lpss_acpi intel_lpss acpi_als mfd_core kfifo_buf acpi_pad industrialio autofs4 hid_plantronics usbhid dm_mirror dm_region_hash dm_log sdhci_pci ahci sdhci libahci i2c_hid hid
[  135.858389] CPU: 2 PID: 45 Comm: kworker/2:1 Tainted: G        W       4.9.0-rc4+ #238
[  135.858389] Hardware name:                  /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[  135.858392] Workqueue: events_long i915_hangcheck_elapsed
[  135.858394]  ffffc900001bf9b8 ffffffff812bb238 0000000000000000 0000000000000000
[  135.858396]  ffffc900001bf9f8 ffffffff8104f621 0000006c00000000 ffff8808296137f8
[  135.858398]  0000000000000a00 ffff8808457a0000 ffff880845764e60 ffff880845760000
[  135.858399] Call Trace:
[  135.858403]  [<ffffffff812bb238>] dump_stack+0x4d/0x65
[  135.858405]  [<ffffffff8104f621>] __warn+0xc1/0xe0
[  135.858406]  [<ffffffff8104f748>] warn_slowpath_null+0x18/0x20
[  135.858408]  [<ffffffff813f8c15>] i915_sw_fence_complete+0x25/0x30
[  135.858410]  [<ffffffff813f8fad>] i915_sw_fence_commit+0xd/0x30
[  135.858412]  [<ffffffff8142e591>] __i915_gem_request_submit+0xe1/0xf0
[  135.858413]  [<ffffffff8142e5c8>] i915_gem_request_submit+0x28/0x40
[  135.858415]  [<ffffffff814433e7>] i915_guc_submit+0x47/0x210
[  135.858417]  [<ffffffff81443e98>] i915_guc_submission_enable+0x468/0x540
[  135.858419]  [<ffffffff81442495>] intel_guc_setup+0x715/0x810
[  135.858421]  [<ffffffff8142b6b4>] i915_gem_init_hw+0x114/0x2a0
[  135.858423]  [<ffffffff813eeaa8>] i915_reset+0xe8/0x120
[  135.858424]  [<ffffffff813f3937>] i915_reset_and_wakeup+0x157/0x180
[  135.858426]  [<ffffffff813f79db>] i915_handle_error+0x1ab/0x230
[  135.858428]  [<ffffffff812c760d>] ? scnprintf+0x4d/0x90
[  135.858430]  [<ffffffff81435985>] i915_hangcheck_elapsed+0x275/0x3d0
[  135.858432]  [<ffffffff810668cf>] process_one_work+0x12f/0x410
[  135.858433]  [<ffffffff81066bf3>] worker_thread+0x43/0x4d0
[  135.858435]  [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[  135.858436]  [<ffffffff81066bb0>] ? process_one_work+0x410/0x410
[  135.858438]  [<ffffffff8106bbb4>] kthread+0xd4/0xf0
[  135.858440]  [<ffffffff8106bae0>] ? kthread_park+0x60/0x60

v2: Only resubmit submitted requests
v3: Don't forget the pending requests have reserved space.

Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129121024.22650-6-chris@chris-wilson.co.uk
2016-11-29 15:52:46 +00:00
..
accessibility
acpi Merge branches 'acpica-fixes', 'acpi-pci-fixes' and 'acpi-apei-fixes' 2016-10-29 01:58:03 +02:00
amba
android ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct 2016-10-24 19:37:48 +02:00
ata ahci: fix the single MSI-X case in ahci_init_one 2016-10-25 11:43:07 -04:00
atm atm: iphase: fix newline escape and minor tweak to source formatting 2016-09-15 19:15:55 -04:00
auxdisplay auxdisplay: img-ascii-lcd: driver for simple ASCII LCD displays 2016-10-06 17:03:41 +02:00
base Linux 4.9-rc4 2016-11-07 09:37:09 +10:00
bcma
block virtio_blk: Delete an unnecessary initialisation in init_vq() 2016-10-31 00:21:47 +02:00
bluetooth Bluetooth: btwilink: Fix probe return value 2016-10-20 10:14:49 +02:00
bus bus: qcom-ebi2: depend on ARCH_QCOM or COMPILE_TEST 2016-10-17 13:46:09 -07:00
cdrom
char virtio: tests, fixes and cleanups 2016-11-01 16:56:05 -06:00
clk clk: at91: Fix a return value in case of error 2016-10-20 16:37:56 -07:00
clocksource Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init" 2016-10-20 21:58:58 +02:00
connector
cpufreq Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes' 2016-10-29 01:29:17 +02:00
cpuidle Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-10-15 09:26:12 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-10-10 14:04:16 -07:00
dax device-dax: fix percpu_ref_exit ordering 2016-10-27 17:04:05 -07:00
dca
devfreq PM / devfreq: Skip status update on uninitialized previous_freq 2016-10-11 00:01:20 +02:00
dio
dma dmaengine updates for 4.8-rc1 2016-10-06 17:13:54 -07:00
dma-buf reservation: revert "wait only with non-zero timeout specified (v3)" v2 2016-11-09 00:48:57 +05:30
edac * Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO 2016-10-04 12:06:26 -07:00
eisa
extcon extcon: qcom-spmi-misc: Sync the extcon state on interrupt 2016-10-26 16:04:29 +09:00
firewire firewire: net: fix fragmented datagram_size off-by-one 2016-11-03 14:46:39 +01:00
firmware efi/arm: Fix absolute relocation detection for older toolchains 2016-10-19 14:49:44 +02:00
fmc
fpga
gpio gpio/mvebu: Use irq_domain_add_linear 2016-11-01 19:31:49 +01:00
gpu drm/i915/guc: Split hw submission for replay after GPU reset 2016-11-29 15:52:46 +00:00
hid HID: add quirk for Akai MIDImix. 2016-10-10 10:58:22 +02:00
hsi
hv hv: do not lose pending heartbeat vmbus packets 2016-10-25 08:52:10 +02:00
hwmon hwmon: (max31790) potential ERR_PTR dereference 2016-10-17 10:16:20 -07:00
hwspinlock
hwtracing
i2c i2c: core: fix NULL pointer dereference under race condition 2016-11-04 20:36:58 +01:00
ide
idle nmi_backtrace: generate one-line reports for idle cpus 2016-10-07 18:46:30 -07:00
iio First set of IIO fixes for the 4.9 cycle. 2016-10-24 10:50:13 +02:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-29 20:33:20 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2016-11-05 11:26:11 -07:00
iommu IOMMU Updates for Linux v4.9 2016-10-11 12:52:41 -07:00
ipack ipack: print a hex number after a 0x prefix 2016-10-27 18:43:43 -07:00
irqchip GIC updates for Linux 4.9-rc2 2016-10-21 21:40:29 +02:00
isdn
leds leds: triggers: Check return value of kobject_uevent_env() 2016-09-20 10:22:10 +02:00
lguest
lightnvm Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
macintosh powerpc: Remove all usages of NO_IRQ 2016-09-20 20:57:12 +10:00
mailbox Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration 2016-10-06 17:36:53 -07:00
mcb mcb: Add a dma_device to mcb_device 2016-09-27 12:33:47 +02:00
md Merge tag 'md/4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2016-11-05 11:34:07 -07:00
media media fixes for v4.9-rc4 2016-11-05 11:15:09 -07:00
memory ARM: SoC driver updates for v4.9 2016-10-07 21:23:40 -07:00
memstick memstick: rtsx_usb_ms: Manage runtime PM when accessing the device 2016-10-17 15:43:05 +02:00
message scsi: fusion: Fix error return code in mptfc_probe() 2016-09-14 14:26:19 -04:00
mfd - Core Frameworks 2016-10-07 08:35:35 -07:00
misc Char/Misc driver fixes for 4.9-rc3 2016-10-29 11:19:02 -07:00
mmc mmc: sdhci-msm: Fix error return code in sdhci_msm_probe() 2016-10-27 09:43:01 +02:00
mtd MTD updates for 4.9-rc4: 2016-11-05 10:52:29 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-29 20:33:20 -07:00
nfc
ntb
nubus
nvdimm nvdimm: make CONFIG_NVDIMM_DAX 'bool' 2016-10-27 16:16:21 -07:00
nvme Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2016-10-21 10:54:01 -07:00
nvmem ARM: SoC driver updates for v4.9 2016-10-07 21:23:40 -07:00
of Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-10-15 09:26:12 -07:00
oprofile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
parisc
parport
pci pci-v4.9-fixes-2 2016-11-05 11:11:31 -07:00
pcmcia pcmcia: soc_common: add driver-data pointer 2016-09-22 09:39:16 +01:00
perf perf: xgene: Remove bogus IS_ERR() check 2016-10-17 15:50:07 +01:00
phy
pinctrl pinctrl: intel: Only restore pins that are used by the driver 2016-10-18 14:38:16 +02:00
platform platform-drivers-x86 for 4.9-2 2016-10-19 11:45:06 -07:00
pnp
power power supply and reset changes for the v4.9 series 2016-10-06 18:21:15 -07:00
powercap
pps pps: kc: fix non-tickless system config dependency 2016-10-11 15:06:32 -07:00
ps3 powerpc: Remove all usages of NO_IRQ 2016-09-20 20:57:12 +10:00
ptp drivers/ptp: Fix kernel memory disclosure 2016-10-13 10:20:06 -04:00
pwm
rapidio mm: replace get_user_pages() write/force parameters with gup_flags 2016-10-19 08:11:43 -07:00
ras
regulator regulator: core: silence warning: "VDD1: ramp_delay not set" 2016-10-28 18:22:40 +01:00
remoteproc rpmsg updates for v4.9 2016-10-06 17:03:49 -07:00
reset reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs 2016-10-22 18:31:42 +09:00
rpmsg
rtc RTC for 4.9 2016-10-14 13:13:44 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2016-10-27 14:16:30 -07:00
sbus
scsi SCSI fixes on 20161105 2016-11-05 11:28:21 -07:00
sfi
sh
sn
soc powerpc updates for 4.9 #2 2016-10-14 11:07:42 -07:00
spi Merge remote-tracking branches 'spi/fix/dt', 'spi/fix/fsl-dspi' and 'spi/fix/fsl-espi' into spi-linus 2016-10-29 12:51:55 -06:00
spmi spmi: pmic-arb: Return an error code if sanity check fails 2016-09-27 12:43:34 +02:00
ssb
staging media fixes for v4.9-rc4 2016-11-05 11:15:09 -07:00
target target/tcm_fc: use CPU affinity for responses 2016-10-21 01:19:44 -07:00
tc
thermal thermal/powerclamp: correct cpu support check 2016-10-20 14:15:44 +08:00
thunderbolt
tty tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup 2016-10-28 08:13:07 -04:00
uio
usb usb: chipidea: host: fix NULL ptr dereference during shutdown 2016-10-25 16:14:32 +08:00
uwb
vfio vfio/pci: Fix integer overflows, bitmask check 2016-10-26 13:49:29 -06:00
vhost
video Merge branch 'drm/next/du' of git://linuxtv.org/pinchartl/media into drm-next 2016-11-16 09:39:21 +10:00
virt mm: replace get_user_pages() write/force parameters with gup_flags 2016-10-19 08:11:43 -07:00
virtio virtio_ring: mark vring_dma_dev inline 2016-10-31 00:40:08 +02:00
vlynq
vme vme: vme_get_size potentially returning incorrect value on failure 2016-10-28 08:25:18 -04:00
w1
watchdog Merge branches 'acpi-wdat' and 'acpi-cppc' 2016-10-21 22:24:23 +02:00
xen xen: fixes for 4.9-rc2 2016-10-24 19:52:24 -07:00
zorro
Kconfig
Makefile A small bug fix and a new driver for acting as an IPMI device. 2016-10-23 15:56:23 -07:00