Commit Graph

11 Commits

Author SHA1 Message Date
xinhui pan
511fdbc33a drm/amdgpu: ras support suspend/resume
add ras suspend function. rename ras_post_init to amdgpu_ras_resume.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:51 -05:00
xinhui pan
466b179346 drm/amdgpu: add badpages sysfs interafce
add badpages node.
it will output badpages list in format
gpu pfn : gpu page size : flags

example
0x00000000 : 0x00001000 : R
0x00000001 : 0x00001000 : R
0x00000002 : 0x00001000 : R
0x00000003 : 0x00001000 : R
0x00000004 : 0x00001000 : R
0x00000005 : 0x00001000 : R
0x00000006 : 0x00001000 : R
0x00000007 : 0x00001000 : P
0x00000008 : 0x00001000 : P
0x00000009 : 0x00001000 : P

flags can be one of below characters
R: reserved.
P: pending for reserve.
F: failed to reserve for some reasons.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:51 -05:00
xinhui pan
a564808e7f drm/amdgpu: handle ras reset
add another flag to allow IP do a gpu reset after device init.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
xinhui pan
b152e8e13e drm/amdgpu: Revert "drm/amdgpu: skip gpu reset when ras error occured"
Enable this now to reset the GPU on RAS errors.

This reverts commit 138352e575.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
xinhui pan
77de502b08 drm/amdgpu: Introduce another ras enable function
Many parts of the whole SW stack can program the ras enablement state
during the boot. Now we handle that case by adding one function which
check the ras flags and choose different code path.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-10 13:49:15 -05:00
xinhui pan
828cfa2909 drm/amdgpu: Fix amdgpu ras to ta enums conversion
Add helpes to transalte the two enums. And it will catch bugs
easily.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27 22:39:52 -05:00
xinhui pan
108c6a6309 drm/amdgpu: add new ras workflow control flags
add ras post init function.
Do some initialization after all IP have finished their late init.

Add new member flags which will control the ras work flow.
For now, vbios enable ras for us on boot. That might change in the
future.
So there should be a flag from vbios to tell us if ras is enabled or not
on boot. Looks like there is no such info now.

Other bits of the flags are reserved to control other parts of ras.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:52 -05:00
xinhui pan
5caf466a6e drm/amdgpu: add new member hw_supported
Currently, it is not clear how ras is supported. Both software and
hardware can set the supported. That is confusing.

Fix it by adding new member hw_supported.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:51 -05:00
xinhui pan
138352e575 drm/amdgpu: skip gpu reset when ras error occured
gpu reset is not stable on vega20 A1.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:51 -05:00
xinhui pan
36ea1bd2d0 drm/amdgpu: add debugfs ctrl node
allow userspace enable/disable ras

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:50 -05:00
xinhui pan
c030f2e416 drm/amdgpu: add amdgpu_ras.c to support ras (v2)
add obj management.
add feature control.
add debugfs infrastructure.
add sysfs infrastructure.
add IH infrastructure.
add recovery infrastructure.

It is a framework. Other IPs need call amdgpu_ras_xxx function instead of
psp_ras_xxx functions.

v2: squash in warning fixes

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:50 -05:00