mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
10e36489ab
If we skip the reset as we found the engine inactive at the time of the
reset, we still need to clear the residual inflight & pending request
bookkeeping to reflect the current state of HW.
Otherwise, we may end up stuck in a loop like:
<7> [416.490346] hangcheck rcs0
<7> [416.490371] hangcheck Awake? 1
<7> [416.490376] hangcheck Hangcheck: 8003 ms ago
<7> [416.490380] hangcheck Reset count: 0 (global 0)
<7> [416.490383] hangcheck Requests:
<7> [416.491210] hangcheck RING_START: 0x0017b000
<7> [416.491983] hangcheck RING_HEAD: 0x00000048
<7> [416.491992] hangcheck RING_TAIL: 0x00000048
<7> [416.492006] hangcheck RING_CTL: 0x00000000
<7> [416.492037] hangcheck RING_MODE: 0x00000200 [idle]
<7> [416.492044] hangcheck RING_IMR: 00000000
<7> [416.492809] hangcheck ACTHD: 0x00000000_9ca00048
<7> [416.492824] hangcheck BBADDR: 0x00000000_00001004
<7> [416.492838] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [416.492845] hangcheck IPEIR: 0x00000000
<7> [416.492852] hangcheck IPEHR: 0x00000000
<7> [416.492863] hangcheck Execlist status: 0x00018001 00000000, entries 12
<7> [416.492869] hangcheck Execlist CSB read 1, write 1, tasklet queued? no (enabled)
<7> [416.492938] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 8307ms: signaled
<7> [416.492972] hangcheck Queue priority hint: -4093
<7> [416.492979] hangcheck Q 20ffa:16fd8- prio=-4093 @ 8307ms: [i915]
<7> [416.492985] hangcheck Q 20ffa:16fda prio=-4094 @ 8307ms: [i915]
<7> [416.492990] hangcheck Q 20ffa:16fdc prio=-4094 @ 8307ms: [i915]
<7> [416.492996] hangcheck Q 20ffa:16fde prio=-4094 @ 8307ms: [i915]
<7> [416.493001] hangcheck Q 20ffa:16fe0 prio=-4094 @ 8307ms: [i915]
<7> [416.493007] hangcheck Q 20ffa:16fe2 prio=-4094 @ 8307ms: [i915]
<7> [416.493013] hangcheck Q 20ffa:16fe4 prio=-4094 @ 8307ms: [i915]
<7> [416.493021] hangcheck ...skipping 21 queued requests...
<7> [416.493027] hangcheck Q 20ffa:17010 prio=-4094 @ 8307ms: [i915]
<7> [416.493081] hangcheck HWSP:
<7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493094] hangcheck *
<7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [416.493111] hangcheck *
<7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
<7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493127] hangcheck *
<7> [416.493132] hangcheck Idle? no
<6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0
<6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
<7> [424.489258] hangcheck rcs0
<7> [424.489263] hangcheck Awake? 1
<7> [424.489267] hangcheck Hangcheck: 5954 ms ago
<7> [424.489271] hangcheck Reset count: 1 (global 0)
<7> [424.489274] hangcheck Requests:
<7> [424.490128] hangcheck RING_START: 0x00000000
<7> [424.490870] hangcheck RING_HEAD: 0x00000000
<7> [424.490877] hangcheck RING_TAIL: 0x00000000
<7> [424.490887] hangcheck RING_CTL: 0x00000000
<7> [424.490897] hangcheck RING_MODE: 0x00000200 [idle]
<7> [424.490904] hangcheck RING_IMR: 00000000
<7> [424.490917] hangcheck ACTHD: 0x00000000_00000000
<7> [424.490930] hangcheck BBADDR: 0x00000000_00000000
<7> [424.490943] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [424.490950] hangcheck IPEIR: 0x00000000
<7> [424.490956] hangcheck IPEHR: 0x00000000
<7> [424.490968] hangcheck Execlist status: 0x00000001 00000000, entries 12
<7> [424.490972] hangcheck Execlist CSB read 11, write 11, tasklet queued? no (enabled)
<7> [424.490983] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 16305ms: signaled
<7> [424.490989] hangcheck Queue priority hint: -4093
<7> [424.490996] hangcheck Q 20ffa:16fd8- prio=-4093 @ 16305ms: [i915]
<7> [424.491001] hangcheck Q 20ffa:16fda prio=-4094 @ 16305ms: [i915]
<7> [424.491006] hangcheck Q 20ffa:16fdc prio=-4094 @ 16305ms: [i915]
<7> [424.491011] hangcheck Q 20ffa:16fde prio=-4094 @ 16305ms: [i915]
<7> [424.491016] hangcheck Q 20ffa:16fe0 prio=-4094 @ 16305ms: [i915]
<7> [424.491022] hangcheck Q 20ffa:16fe2 prio=-4094 @ 16305ms: [i915]
<7> [424.491048] hangcheck Q 20ffa:16fe4 prio=-4094 @ 16305ms: [i915]
<7> [424.491057] hangcheck ...skipping 21 queued requests...
<7> [424.491063] hangcheck Q 20ffa:17010 prio=-4094 @ 16305ms: [i915]
<7> [424.491095] hangcheck HWSP:
<7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491106] hangcheck *
<7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [424.491122] hangcheck *
<7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b
<7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491136] hangcheck *
<7> [424.491141] hangcheck Idle? no
<5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Where not having cleared the pending array on reset, it persists
indefinitely.
Fixes:
|
||
---|---|---|
.. | ||
drm | ||
host1x | ||
ipu-v3 | ||
vga | ||
Makefile |