Commit Graph

7116 Commits

Author SHA1 Message Date
Nirmoy Das
79cb2719be drm/amdgpu: fix switch-case indentation
Fix switch-case indentation in amdgpu_ctx_init_entity()

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:18:14 -04:00
Monk Liu
2e0cc4d48b drm/amdgpu: revise RLCG access path
what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.

tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:17:55 -04:00
Dennis Li
93cdb48eca drm/amdgpu: add codes to clear AccVGPR for arcturus
AccVGPRs are newly added in arcturus. Before reading these
registers, they should be initialized. Otherwise edc error
happens, when RAS is enabled.

v2: reuse the existing logical to calculate register size

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:36 -04:00
Joe Perches
2541f95c17 AMD KFD: Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:35 -04:00
Stanley.Yang
c1509f3f6f drm/amdgpu: fix warning in ras_debugfs_create_all()
Fix the warning
"warn: variable dereferenced before check 'obj' (see line 1131)"
by removing unnecessary checks as amdgpu_ras_debugfs_create_all()
is only called from amdgpu_debugfs_init() where obj member in
con->head list is not NULL.
Use list_for_each_entry() instead list_for_each_entry_safe() as obj
do not to be freeing or removing from list during this process.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Evan Quan
565d194155 drm/amdgpu: add fbdev suspend/resume on gpu reset
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Guchun Chen
88474ccad5 drm/amdgpu: update ras capability's query based on mem ecc configuration
RAS support capability needs to be updated on top of different
memeory ECC enablement, and remove redundant memory ecc check
in gmc module for vega20 and arcturus.

v2: check HBM ECC enablement and set ras mask accordingly.
v3: avoid to invoke atomfirmware interface to query twice.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Tom St Denis
6397ec580d drm/amd/amdgpu: Fix GPR read from debugfs (v2)
The offset into the array was specified in bytes but should
be in terms of 32-bit words.  Also prevent large reads that
would also cause a buffer overread.

v2:  Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Stanley.Yang
17cb04f2a6 drm/amdgpu: use amdgpu_ras.h in amdgpu_debugfs.c
include amdgpu_ras.h head file instead of use extern
ras_debugfs_create_all function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Hawking Zhang
06dcd7eb83 drm/amdgpu: check GFX RAS capability before reset counters
disallow the logical to be enabled on platforms that
don't support gfx ras at this stage, like sriov skus,
dgpu with legacy ras.etc

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
John Clements
c2c6f816a8 drm/amdgpu: resolve failed error inject msg
invoking an error injection successfully will cause an at_event intterrupt that

will occur before the invoke sequence can complete causing an invalid error

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
Jack Zhang
5f87611582 drm/amdgpu/sriov refine vcn_v2_5_early_init func
refine the assignment for vcn.num_vcn_inst,
vcn.harvest_config, vcn.num_enc_rings in VF

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
Evan Quan
063e768ebd drm/amdgpu: add fbdev suspend/resume on gpu reset
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 09:20:31 -04:00
Tom St Denis
5bbc6604a6 drm/amd/amdgpu: Fix GPR read from debugfs (v2)
The offset into the array was specified in bytes but should
be in terms of 32-bit words.  Also prevent large reads that
would also cause a buffer overread.

v2:  Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-03-13 09:20:31 -04:00
Dave Airlie
69ddce0970 Merge tag 'amd-drm-next-5.7-2020-03-10' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.7-2020-03-10:

amdgpu:
- SR-IOV fixes
- Fix up fallout from drm load/unload callback removal
- Navi, renoir power management watermark fixes
- Refactor smu parameter handling
- Display FEC fixes
- Display DCC fixes
- HDCP fixes
- Add support for USB-C PD firmware updates
- Pollock detection fix
- Rework compute ring priority handling
- RAS fixes
- Misc cleanups

amdkfd:
- Consolidate more gfx config details in amdgpu
- Consolidate bo alloc flags
- Improve code comments
- SDMA MQD fixes
- Misc cleanups

gpu scheduler:
- Add suport for modifying the sched list

uapi:
- Clarify comments about GEM_CREATE flags that are not used by userspace.
  The kernel driver has always prevented userspace from using these.
  They are only used internally in the kernel driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310212748.4519-1-alexander.deucher@amd.com
2020-03-13 09:09:11 +10:00
Dave Airlie
9e12da086e drm-misc-next for 5.7:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
 
 Driver Changes:
  - fb-helper: Remove drm_fb_helper_{add,add_all,remove}_one_connector
  - fbdev: some cleanups and dead-code removal
  - Conversions to simple-encoder
  - zero-length array removal
  - Panel: panel-dpi support in panel-simple, Novatek NT35510, Elida
    KD35T133,
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXmZKhwAKCRDj7w1vZxhR
 xUgxAQDB1kkf1xQdU7rdw344vaaMf270qBeG+GNX/py3h9pbnwEA7XQvbB1wWBec
 hR629PO+csE0dWcFkGi8d5kpdWQCOQY=
 =PRn3
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-03-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.7:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

Driver Changes:
 - fb-helper: Remove drm_fb_helper_{add,add_all,remove}_one_connector
 - fbdev: some cleanups and dead-code removal
 - Conversions to simple-encoder
 - zero-length array removal
 - Panel: panel-dpi support in panel-simple, Novatek NT35510, Elida
   KD35T133,

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309135439.dicfnbo4ikj4tkz7@gilmour
2020-03-12 12:42:56 +10:00
Dave Airlie
d3bd37f587 Linux 5.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5lkYceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGpHQH/RJrzcaZHo4lw88m
 Jf7vBZ9DYUlRgqE0pxTHWmodNObKRqpwOUGflUcWbb/7GD2LQUfeqhSECVQyTID9
 N9y7FcPvx321Qhc3EkZ24DBYk0+DQ0K2FVUrSa/PxO0n7czxxXWaLRDmlSULEd3R
 D4pVs3zEWOBXJHUAvUQ5R+lKfkeWKNeeepeh+rezuhpdWFBRNz4Jjr5QUJ8od5xI
 sIwobYmESJqTRVBHqW8g2T2/yIsFJ78GCXs8DZLe1wxh40UbxdYDTA0NDDTHKzK6
 lxzBgcmKzuge+1OVmzxLouNWMnPcjFlVgXWVerpSy3/SIFFkzzUWeMbqm6hKuhOn
 wAlcIgI=
 =VQUc
 -----END PGP SIGNATURE-----

Merge v5.6-rc5 into drm-next

Requested my mripard for some misc patches that need this as a base.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-03-11 07:27:21 +10:00
Feifei Xu
5d11e37c02 drm/amdgpu/runpm: disable runpm on Vega10
Some framework test will fail if enable runpm on Vega10.
Disable it untill issue fixed.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Tested-by: Kyle Chen <Kyle.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:18 -04:00
Tao Zhou
204eaac625 drm/amdgpu: call ras_debugfs_create_all in debugfs_init
and remove each ras IP's own debugfs creation

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:11 -04:00
Tao Zhou
f9317014ea drm/amdgpu: add function to creat all ras debugfs node
centralize all debugfs creation in one place for ras

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:02 -04:00
xinhui pan
9fe58d0bbd drm/amdgpu: Correct the condition of warning while bo release
Only kernel bo has kfd eviction fence.
This warning is to give a notice that kfd only remove eviction fence on
individual bos.

Tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:42 -04:00
Yong Zhao
1d251d9008 drm/amdkfd: Consolidate duplicated bo alloc flags
ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*,
but they are interweavedly used in kernel driver, resulting in bad
readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not
referenced in kernel, and it functions implicitly in kernel through
ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion.

Replace all occurrences of ALLOC_MEM_FLAGS_* with
KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:34 -04:00
Nirmoy Das
ea29221d1d drm/amdgpu: do not set nil entry in compute_prio_sched
If there are no high priority compute queues available then set normal
priority sched array to compute_prio_sched[AMDGPU_GFX_PIPE_PRIO_HIGH]

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:07 -04:00
Hawking Zhang
f1c2cd3f8f drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
The ROMC_INDEX/DATA offset was changed to e4/e5 since
from smuio_v11 (vega20/arcturus).

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-by: Candice Li <Candice.Li@amd.com>
Reviewed-by: Candice Li <Candice.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 16:42:28 -04:00
Nirmoy Das
552b80d740 drm/amdgpu: remove unused functions
AMDGPU statically sets priority for compute queues
at initialization so remove all the functions
responsible for changing compute queue priority dynamically.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:48 -04:00
Nirmoy Das
2316a86bde drm/amdgpu: change hw sched list on ctx priority override
Switch to appropriate sched list for an entity on priority override.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:42 -04:00
Nirmoy Das
33abcb1f5a drm/amdgpu: set compute queue priority at mqd_init
We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:24 -04:00
Andrey Grodzovsky
97f6a21bfa drm/amdgpu: Enter low power state if CRTC active.
CRTC in DPMS state off calls for low power state entry.
Support both atomic mode setting and pre-atomic mode setting.

v2: move comment

Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:50:52 -04:00
Monk Liu
cc9f2fba37 drm/amdgpu: disable clock/power gating for SRIOV
and disable MC resum in VCN2.0 as well
those are not concerned by VF driver

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:40:30 -05:00
Monk Liu
68430c6be5 drm/amdgpu: cleanup ring/ib test for SRIOV vcn2.0 (v2)
support IB test on dec/enc ring
disable ring test on dec/enc ring (MMSCH limitation)

v2: squash in unused variable warning fix

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:40:30 -05:00
Monk Liu
dd26858a9c drm/amdgpu: implement initialization part on VCN2.0 for SRIOV
something need to do for VCN2.0 enablement on SRIOV:
1)use one dec ring and one enc ring
2)allocate MM table for MMSCH usage
3)implement SRIOV version vcn_start which orgnize vcn programing
with patcket format and implement start mmsch for to run those
packet
4)doorbell is changed for SRIOV

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:56 -05:00
Monk Liu
fe44249186 drm/amdgpu: disable jpeg block for SRIOV
MMSCH doesn't support jpeg ring on SRIOV

Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:49 -05:00
Monk Liu
3569b6d19e drm/amdgpu: introduce mmsch v2.0 header
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:42 -05:00
Yong Zhao
fa5bde8056 drm/amdgpu: Use better names to reflect it is CP MQD buffer
Add "CP" to AMDGPU_GEM_CREATE_MQD_GFX9 to indicate it is only for CP MQD
buffer.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:18 -05:00
Andrey Grodzovsky
90f88cdd7c drm/amdgpu: Fix GPU reset error.
Problem:
During GU reset PSP's sysfs was being wrongly reinitilized
during call to amdgpu_device_ip_late_init which was failing
with duplicate error.
Fix:
Move psp_sysfs_init to psp_sw_init to avoid this. Add guards
in sysfs file's read and write hook agains premature call
if PSP is not finished initialization.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:24 -05:00
Jacob He
5e208eb62b drm/amdgpu: Update SPM_VMID with the job's vmid when application reserves the vmid
SPM access the video memory according to SPM_VMID. It should be updated
with the job's vmid right before the job is scheduled. SPM_VMID is a
global resource

Signed-off-by: Jacob He <jacob.he@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:16 -05:00
John Clements
1a2172b5ee drm/amdgpu: update page retirement sequence
check UMC status and exit prior to making and erroneus register access

this resolved unexpected behaviour with UMC indexing mode broadcasting writes

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:06 -05:00
Guchun Chen
d38c3ac716 drm/amdgpu: toggle DF-Cstate when accessing UMC ras error related registers
On arcturus, DF-Cstate needs to be toggled off/on
before and after accessing UMC error counter and
error address registers, otherwise, clearing such
registers may fail.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:59 -05:00
John Clements
1b3460a8b1 drm/amdgpu: increase atombios cmd timeout
mitigates race condition on BACO reset between GPU bootcode and driver reload

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:51 -05:00
Hawking Zhang
a61f41b177 drm/amdgpu: enable PCS error report on arcturus
add arcturus xgmi/wafl pcs err status group to support
PCS error detection and report on arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:43 -05:00
Hawking Zhang
ec01fe2dbf drm/amdgpu: enable PCS error report on VG20
Now driver will report XGMI/WAFL PCS error through
sysfs xgmi_wafl_err_count node on Vega20

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:35 -05:00
Hawking Zhang
18f36157f2 drm/amdgpu: add helper funcs to detect PCS error
Since from vega20, hardware supports run-time detect
and report XGMI/WAFL PCS ras error. Add helper functions
to walkthrough every type of ras error and report it if
any.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:28 -05:00
Pankaj Bharadiya
ff1f62d35b drm: Remove drm_fb_helper add, add all and remove connector calls
drm_fb_helper_{add,remove}_one_connector() and
drm_fb_helper_single_add_all_connectors() are dummy functions now
and serve no purpose. Hence remove their calls.

This is the preparatory step for removing the
drm_fb_helper_{add,remove}_one_connector() functions from
drm_fb_helper.h

This removal is done using below sementic patch and unused variable
compilation warnings are fixed manually.

@@
@@

- drm_fb_helper_single_add_all_connectors(...);

@@
expression e1;
statement S;
@@
- e1 = drm_fb_helper_single_add_all_connectors(...);
- S

@@
@@

- drm_fb_helper_add_one_connector(...);

@@
@@

- drm_fb_helper_remove_one_connector(...);

Changes since v1:
* Squashed warning fixes into the patch that introduced the
  warnings (into 5/7) (Laurent, Emil, Lyude)

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305120434.111091-6-pankaj.laxminarayan.bharadiya@intel.com
2020-03-06 14:19:58 +01:00
Pankaj Bharadiya
2dea2d1182 drm: Remove unused arg from drm_fb_helper_init
The max connector argument for drm_fb_helper_init() isn't used anymore
hence remove it.

All the drm_fb_helper_init() calls are modified with below sementic
patch.

@@
expression E1, E2, E3;
@@
-  drm_fb_helper_init(E1,E2, E3)
+  drm_fb_helper_init(E1,E2)

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305120434.111091-2-pankaj.laxminarayan.bharadiya@intel.com
2020-03-06 14:19:57 +01:00
Tianci.Yin
194bcf35bc drm/amdgpu: disable 3D pipe 1 on Navi1x
[why]
CP firmware decide to skip setting the state for 3D pipe 1 for Navi1x as there
is no use case.

[how]
Disable 3D pipe 1 on Navi1x.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-03-05 09:41:55 -05:00
Yintian Tao
2ab7e274b8 drm/amdgpu: clean wptr on wb when gpu recovery
The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:50:07 -05:00
Andrey Grodzovsky
6863d60732 drm/amdgpu: Wrap clflush_cache_range with x86 ifdef
To avoid compile errors on other platforms.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:33:30 -05:00
Andrey Grodzovsky
57430471e2 drm/amdgpu: Add support for USBC PD FW download
Starts USBC PD FW download and reads back the latest FW version.

v2:
Move sysfs file creation to late init
Add locking around PSP calls to avoid concurrent access to PSP's C2P registers

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:33:24 -05:00
Andrey Grodzovsky
0dc93fd117 drm/amdgpu: Add USBC PD FW load to PSP 11
Add the programming sequence.

v2:
Change donwload wait loop to more efficient.
Move C2PMSG_CMD_GFX_USB_PD_FW_VER defintion

v3: Fix lack of loop counter increment typo

v4: Remove superflous status reg read

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:33:16 -05:00
Andrey Grodzovsky
95860efc44 drm/amdgpu: Add USBC PD FW load interface to PSP.
Used to load power Delivery FW to PSP.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:33:09 -05:00
Hawking Zhang
1a0dd3d928 drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
The ROMC_INDEX/DATA offset was changed to e4/e5 since
from smuio_v11 (vega20/arcturus).

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-by: Candice Li <Candice.Li@amd.com>
Reviewed-by: Candice Li <Candice.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:33:01 -05:00
Hawking Zhang
4a89ad9b39 drm/amdgpu: add reset_ras_error_count function for HDP
HDP ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:54 -05:00
Hawking Zhang
279375c331 drm/amdgpu: add reset_ras_error_count function for GFX
GFX ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:47 -05:00
Hawking Zhang
fe5211f19a drm/amdgpu: add reset_ras_error_count function for MMHUB
MMHUB ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:40 -05:00
Hawking Zhang
86153f1be2 drm/amdgpu: add reset_ras_error_count function for SDMA
SDMA ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:32 -05:00
jianzh
e7429606bb drm/amdgpu/sriov: Use VF-accessible register for gpu_clock_count
Navi12 VK CTS subtest timestamp.calibrated.dev_domain_test failed
because mmRLC_CAPTURE_GPU_CLOCK_COUNT register cannot be
written in VF due to security policy.

Solution: use a VF-accessible timestamp register pair
mmGOLDEN_TSC_COUNT_LOWER/UPPER for SRIOV case.

v2: according to Deucher Alexander's advice, switch to
mmGOLDEN_TSC_COUNT_LOWER/UPPER for both bare metal and SRIOV.

Signed-off-by: jianzh <Jiange.Zhao@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:23 -05:00
Tiecheng Zhou
8a43cf88b7 drm/amdgpu/sriov: skip programing some regs with new L1 policy
With new L1 policy, some regs are blocked at guest and they are
programed at host side. So skip programing the regs under sriov.

the regs are:
GCMC_VM_FB_LOCATION_TOP
GCMC_VM_FB_LOCATION_BASE
MMMC_VM_FB_LOCATION_TOP
MMMC_VM_FB_LOCATION_BASE
GCMC_VM_SYSTEM_APERTURE_HIGH_ADDR
GCMC_VM_SYSTEM_APERTURE_LOW_ADDR
MMMC_VM_SYSTEM_APERTURE_HIGH_ADDR
MMMC_VM_SYSTEM_APERTURE_LOW_ADDR
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI
GCMC_VM_AGP_TOP
GCMC_VM_AGP_BOT
GCMC_VM_AGP_BASE

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:31:54 -05:00
Samir Dhume
022b651816 drm/amdgpu: Rearm IRQ in Navi10 SR-IOV if IRQ lost
Ported from Vega10. SDMA stress tests sometimes see IRQ lost.

Signed-off-by: Samir Dhume <samir.dhume@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:28:34 -05:00
Monk Liu
341dfe9073 drm/amdgpu: stop using sratch_reg in IB test
scratch_reg0 is used by RLCG for register access usage
in SRIOV case.

both CP firmware and driver can invoke RLCG to do
certain register access (through scratch_reg0/1/2/3)
but rlcg now dosen't have race concern so if two
clients are in parallel doing the RLCG reg access
then we are colliding,

GFX IB test is a runtime work, so it is forbidden
to use scrach_reg0/1/2/3 during IB test period

note:
Although we can only have this change for SRIOV, but
looks it doesn't worth the effort to differentiate
bare-metal with SRIOV on the GFX ib test

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:28:25 -05:00
Monk Liu
752c683dbb drm/amdgpu: fix IB test MCBP bug
1)for gfx IB test we shouldn't insert DE meta data

2)we should make sure IB test finished before we
send event 3 to hypervisor otherwise the IDLE from
event 3 will preempt IB test, which is not designed
as a compatible structure for MCBP

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:28:11 -05:00
Tianci.Yin
f091c1c70e drm/amdgpu: disable 3D pipe 1 on Navi1x
[why]
CP firmware decide to skip setting the state for 3D pipe 1 for Navi1x as there
is no use case.

[how]
Disable 3D pipe 1 on Navi1x.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:28:00 -05:00
Chengming Gui
0cf64555fe drm/amdgpu: Add debugfs interface to set arbitrary sclk for navi14 (v2)
add debugfs interface amdgpu_force_sclk
to set arbitrary sclk for navi14

v2: Add lock

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:27:50 -05:00
Rohit Khaire
1da7d4a8ab drm/amdgpu: Write blocked CP registers using RLC on VF
This change programs CP_ME_CNTL and RLC_CSIB_* through RLC

Signed-off-by: Rohit Khaire <Rohit.Khaire@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:26:34 -05:00
Yintian Tao
1d21a84661 drm/amdgpu: clean wptr on wb when gpu recovery
The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:25:57 -05:00
Maxime Ripard
83794ee6c1
Merge drm/drm-next into drm-misc-next
Daniel needs a few commits from drm-next.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2020-03-04 08:56:28 +01:00
Yintian Tao
6c26d558bf drm/amdgpu: release drm_device after amdgpu_driver_unload_kms
If we release drm_device before amdgpu_driver_unload_kms,
then it will raise the error below. Therefore, we need to
place it before amdgpu_driver_unload_kms.
[   43.055736] Memory manager not clean during takedown.
[   43.055777] WARNING: CPU: 1 PID: 2807 at /build/linux-hwe-9KJ07q/linux-hwe-4.18.0/drivers/gpu/drm/drm_mm.c:913 drm_mm_takedown+0x24/0x30 [drm]
[   43.055778] Modules linked in: amdgpu(OE-) amd_sched(OE) amdttm(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hda_codec_generic nfit kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_rawmidi snd_seq snd_seq_device aesni_intel snd_timer joydev aes_x86_64 crypto_simd cryptd glue_helper snd soundcore input_leds mac_hid serio_raw qemu_fw_cfg binfmt_misc sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic floppy usbhid psmouse hid i2c_piix4 e1000 pata_acpi
[   43.055819] CPU: 1 PID: 2807 Comm: modprobe Tainted: G           OE     4.18.0-15-generic #16~18.04.1-Ubuntu
[   43.055820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   43.055830] RIP: 0010:drm_mm_takedown+0x24/0x30 [drm]
[   43.055831] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 02 f3 c3 55 48 c7 c7 38 33 80 c0 48 89 e5 e8 1c 41 ec d0 <0f> 0b 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
[   43.055857] RSP: 0018:ffffae33c1393d28 EFLAGS: 00010286
[   43.055859] RAX: 0000000000000000 RBX: ffff9651b4a29800 RCX: 0000000000000006
[   43.055860] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9651bfc964b0
[   43.055861] RBP: ffffae33c1393d28 R08: 00000000000002a6 R09: 0000000000000004
[   43.055861] R10: ffffae33c1393d20 R11: 0000000000000001 R12: ffff9651ba6cb000
[   43.055863] R13: ffff9651b7f40000 R14: ffffffffc0de3a10 R15: ffff9651ba5c6460
[   43.055864] FS:  00007f1d3c08d540(0000) GS:ffff9651bfc80000(0000) knlGS:0000000000000000
[   43.055865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.055866] CR2: 00005630a5831640 CR3: 000000012e274004 CR4: 00000000003606e0
[   43.055870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.055871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   43.055871] Call Trace:
[   43.055885]  drm_vma_offset_manager_destroy+0x1b/0x30 [drm]
[   43.055894]  drm_gem_destroy+0x19/0x40 [drm]
[   43.055903]  drm_dev_fini+0x7f/0x90 [drm]
[   43.055911]  drm_dev_release+0x2b/0x40 [drm]
[   43.055919]  drm_dev_unplug+0x64/0x80 [drm]
[   43.055994]  amdgpu_pci_remove+0x39/0x70 [amdgpu]
[   43.055998]  pci_device_remove+0x3e/0xc0
[   43.056001]  device_release_driver_internal+0x18a/0x260
[   43.056003]  driver_detach+0x3f/0x80
[   43.056004]  bus_remove_driver+0x59/0xd0
[   43.056006]  driver_unregister+0x2c/0x40
[   43.056008]  pci_unregister_driver+0x22/0xa0
[   43.056087]  amdgpu_exit+0x15/0x57c [amdgpu]
[   43.056090]  __x64_sys_delete_module+0x146/0x280
[   43.056094]  do_syscall_64+0x5a/0x120

v2: put drm_dev_put after pci_set_drvdata

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:22 -05:00
Yintian Tao
d2790e10d3 drm/amdgpu: no need to clean debugfs at amdgpu
drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[   45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[   45.047256] PGD 0 P4D 0
[   45.047713] Oops: 0002 [#1] SMP PTI
[   45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G        W  OE     4.18.0-15-generic #16~18.04.1-Ubuntu
[   45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   45.050651] RIP: 0010:down_write+0x1f/0x40
[   45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[   45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[   45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[   45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[   45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[   45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[   45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[   45.059221] FS:  00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[   45.060809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[   45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.065897] Call Trace:
[   45.066426]  debugfs_remove+0x36/0xa0
[   45.067131]  amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[   45.068019]  amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[   45.068756]  amdgpu_pci_remove+0x49/0x70 [amdgpu]
[   45.069439]  pci_device_remove+0x3e/0xc0
[   45.070037]  device_release_driver_internal+0x18a/0x260
[   45.070842]  driver_detach+0x3f/0x80
[   45.071325]  bus_remove_driver+0x59/0xd0
[   45.071850]  driver_unregister+0x2c/0x40
[   45.072377]  pci_unregister_driver+0x22/0xa0
[   45.073043]  amdgpu_exit+0x15/0x57c [amdgpu]
[   45.073683]  __x64_sys_delete_module+0x146/0x280
[   45.074369]  do_syscall_64+0x5a/0x120
[   45.074916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:22 -05:00
Jacob He
460c484f24 drm/amdgpu: Initialize SPM_VMID with 0xf (v2)
SPM_VMID is a global resource, SPM access the video memory according to
SPM_VMID. The initial valude of SPM_VMID is 0 which is used by kernel.
That means UMD can overwrite the memory of VMID0 by enabling SPM, that
is really dangerous.

Initialize SPM_VMID with 0xf, it messes up other user mode process at
most.

v2: squash in indentation fix

Signed-off-by: Jacob He <jacob.he@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:21 -05:00
Emily Deng
89510a2737 drm/amdgpu/sriov: Use kiq to copy the gpu clock
For vega10 sriov, the register is blocked, use
copy data command to fix the issue.

v2: Rename amdgpu_kiq_read_clock to gfx_v9_0_kiq_read_clock.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:21 -05:00
Yong Zhao
fd7d08bad7 drm/amdkfd: Make get_tile_config() generic
Given we can query all the asic specific information from amdgpu_gfx_config,
we can make get_tile_config() generic.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:20 -05:00
Yong Zhao
94b5c215ce drm/amdgpu: Add num_banks and num_ranks to gfx config structure
The two members will be used by KFD later.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:20 -05:00
Dave Airlie
a2ae604da7 Merge tag 'amd-drm-next-5.7-2020-02-26' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.7-2020-02-26:

amdgpu:
- Rework VM update handling in preparation for HMM support
- HDCP srm support
- PSR fixes
- DC watermark fixes
- OLED panel support
- SR-IOV fixes
- BACO fixes
- Optimize debugging vram access
- RAS fixes
- Use BACO for runtime pm
- HDCP fixes
- XGMI fixes
- DDC fixes
- DC clock programming optimizations and fixes
- PSP fw loading sequence updates
- Drop DRIVER_USE_AGP
- Remove legacy drm load and unload callbacks

amdkfd:
- Add runtime pm support

radeon:
- Drop DRIVER_USE_AGP

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227043142.4075-1-alexander.deucher@amd.com
2020-02-28 15:40:26 +10:00
Christian König
bd2275eeed dma-buf: drop dynamic_mapping flag
Instead use the pin() callback to detect dynamic DMA-buf handling.
Since amdgpu is now migrated it doesn't make much sense to keep
the extra flag.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/353997/?series=73646&rev=1
2020-02-27 14:58:01 +01:00
Christian König
a448cb003e drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
Implement the importer side of unpinned DMA-buf handling.

v2: update page tables immediately

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/353998/?series=73646&rev=1
2020-02-27 14:58:01 +01:00
Christian König
2d4dad2734 drm/amdgpu: add amdgpu_dma_buf_pin/unpin v2
This implements the exporter side of unpinned DMA-buf handling.

v2: fix minor coding style issues

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/353999/?series=73646&rev=1
2020-02-27 14:58:01 +01:00
Christian König
4993ba0263 drm/amdgpu: use allowed_domains for exported DMA-bufs
Avoid that we ping/pong the buffers when we stop to pin DMA-buf
exports by using the allowed domains for exported buffers.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/353996/?series=73646&rev=1
2020-02-27 14:58:01 +01:00
Christian König
bb42df4662 dma-buf: add dynamic DMA-buf handling v15
On the exporter side we add optional explicit pinning callbacks. Which are
called when the importer doesn't implement dynamic handling, move notification
or need the DMA-buf locked in place for its use case.

On the importer side we add an optional move_notify callback. This callback is
used by the exporter to inform the importers that their mappings should be
destroyed as soon as possible.

This allows the exporter to provide the mappings without the need to pin
the backing store.

v2: don't try to invalidate mappings when the callback is NULL,
    lock the reservation obj while using the attachments,
    add helper to set the callback
v3: move flag for invalidation support into the DMA-buf,
    use new attach_info structure to set the callback
v4: use importer_priv field instead of mangling exporter priv.
v5: drop invalidation_supported flag
v6: squash together with pin/unpin changes
v7: pin/unpin takes an attachment now
v8: nuke dma_buf_attachment_(map|unmap)_locked,
    everything is now handled backward compatible
v9: always cache when export/importer don't agree on dynamic handling
v10: minimal style cleanup
v11: drop automatically re-entry avoidance
v12: rename callback to move_notify
v13: add might_lock in appropriate places
v14: rebase on separated locking change
v15: add EXPERIMENTAL flag, some more code comments

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/353993/?series=73646&rev=1
2020-02-27 14:58:00 +01:00
Alex Deucher
c6385e503a drm/amdgpu: drop legacy drm load and unload callbacks
We've moved the debugfs handling into a centralized place
so we can remove the legacy load an unload callbacks.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:13 -05:00
Alex Deucher
405a1f9090 drm/amdgpu/display: split dp connector registration (v4)
Split into init and register functions to avoid a segfault
in some configs when the load/unload callbacks are removed.

v2:
- add back accidently dropped has_aux setting
- set dev in late_register

v3:
- fix dp cec ordering

v4:
- squash in kdev reference fix

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:13 -05:00
Alex Deucher
d090e7db5a drm/amdgpu/display: move debugfs init into core amdgpu debugfs (v2)
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for display.

v2: add config guard for DC

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Harry Wentland <harry.wentland@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:13 -05:00
Alex Deucher
4074892967 drm/amdgpu: don't call drm_connector_register for non-MST ports
The core does this for us now.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:13 -05:00
Alex Deucher
fd23cfcc2e drm/amdgpu/ring: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for rings.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
cd9e29e717 drm/amdgpu/firmware: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for firmware.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
f9d64e6c4a drm/amdgpu/regs: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for register access files.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
3f5cea671c drm/amdgpu/gem: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for gem.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
24038d581c drm/amdgpu/fence: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for fence handling.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
15997544a3 drm/amdgpu/sa: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for SA (sub allocator).

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
a4c5b1bb7b drm/amdgpu/pm: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for pm.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
c5820361da drm/amdgpu/ttm: move debugfs init into core amdgpu debugfs
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling.  Do this for ttm.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Alex Deucher
923ffa6b02 drm/amdgpu: rename amdgpu_debugfs_preempt_cleanup
to amdgpu_debugfs_fini.  It will be used for other things in
the future.

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:12 -05:00
Yong Zhao
8bdab6bb1c drm/amdgpu: Increase timout on emulator to tenfold instead of twice
Since emulators are slower, sometime some operations like flushing tlb
through FM need more than twice the regular timout of 100ms, so increase
the timeout to 1s on emulators.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:03 -05:00
Yong Zhao
e694530418 drm/amdkfd: Avoid ambiguity by indicating it's cp queue
The queues represented in queue_bitmap are only CP queues.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:20:05 -05:00
Divya Shikre
0c663695a6 drm/amd: Extend ROCt to surface UUID for devices that have them
Devices from Arcturus onwards will have their UUID exposed to Thunk.
Adding neccessary functions to the kernel to propagate the uuid.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:18:17 -05:00
Kent Russell
944effd337 drm/amdgpu: Fix check for DPM when returning max clock
pp_funcs may not exist, while dpm may be enabled. This change ensures
that KFD topology will report the same as pp_dpm_sclk, as the conditions
for reporting them will be the same.

Otherwise, we may see the issue where KFD reports "100MHz" in topology
as the max speed, while DPM is working correctly.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:42 -05:00
Rohit Khaire
75ddb640e1 drm/amdgpu: Don't write GCVM_L2_CNTL* regs on navi12 VF
This change disables programming of GCVM_L2_CNTL* regs on VF.

Signed-off-by: Rohit Khaire <Rohit.Khaire@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00
Daniel Vetter
f3ed67395d drm/amdgpu: Drop DRIVER_USE_AGP
This doesn't do anything except auto-init drm_agp support when you
call drm_get_pci_dev(). Which amdgpu stopped doing with

commit b58c11314a
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Jun 2 17:16:31 2017 -0400

    drm/amdgpu: drop deprecated drm_get_pci_dev and drm_put_dev

No idea whether this was intentional or accidental breakage, but I
guess anyone who manages to boot a this modern gpu behind an agp
bridge deserves a price. A price I never expect anyone to ever collect
:-)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Xiaojie Yuan <xiaojie.yuan@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: "Tianci.Yin" <tianci.yin@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00
Tom St Denis
669e2f91e4 drm/amd/amdgpu: Add gfxoff debugfs entry
Write a 32-bit value of zero to disable GFXOFF and write a 32-bit
value of non-zero to enable GFXOFF.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00
Nirmoy Das
c6fc97f9bc drm/amdgpu: use amdgpu_ring_test_helper when possible
amdgpu_ring_test_helper already handles ring->sched.ready correctly

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00
Christian König
42e5fee65e drm/amdgpu: add VM update fences back to the root PD v2
Add update fences to the root PD while mapping BOs.

Otherwise PDs freed during the mapping won't wait for
updates to finish and can cause corruptions.

v2: rebased on drm-misc-next

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 90b69cdc5f drm/amdgpu: stop adding VM updates fences to the resv obj
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Tested-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00
Nirmoy Das
6f9f960472 drm/amdgpu: cleanup amdgpu_ring_fini
cleanup amdgpu_ring_fini to check the prerequisites before changing ring->sched.ready

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:17:33 -05:00