Commit Graph

14353 Commits

Author SHA1 Message Date
Kevin Wang
4e01847c38 drm/amdgpu: optimize amdgpu device attribute code
unified amdgpu device attribute node functions:
1. add some helper functions to create amdgpu device attribute node.
2. create device node according to device attr flags on different VF mode.
3. rename some functions name to adapt a new interface.

v2:
1. remove ATTR_STATE_DEAD, ATTR_STATE_ALIVE enum.
2. rename callback function perform to attr_update.
3. modify some variable names

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18 11:24:15 -04:00
Kevin Wang
a7f2810337 drm/amdgpu: add amdgpu_virt_get_vf_mode helper function
the swsmu or powerplay(hwmgr) need to handle task according to different VF mode,
this function to help query vf mode.

vf mode:
1. SRIOV_VF_MODE_BARE_METAL: the driver work on host  OS (PF)
2. SRIOV_VF_MODE_ONE_VF    : the driver work on guest OS with one VF
3. SRIOV_VF_MODE_MULTI_VF  : the driver work on guest OS with multi VF

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18 11:23:52 -04:00
Jiange Zhao
728e7e0cd6 drm/amdgpu: Add autodump debugfs node for gpu reset v8
When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
    (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
          rename debugfs file to amdgpu_autodump,
          provide autodump_read as well,
          style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
    the node can be reopened; also, there is no need to wait for
    completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
    add 'app_state_mutex' for race conditions:
	(1)Only 1 user can open this file node
	(2)wait_dump() can only take effect after poll() executed.
	(3)eliminated the race condition between release() and
	   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
    removed state checking in amdgpu_debugfs_wait_dump
    Improve on top of version 3 so that the node can be reopened.

v7: move reinit_completion into open() so that only one user
    can open it.

v8: remove complete_all() from amdgpu_debugfs_wait_dump().

Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18 11:23:37 -04:00
John Clements
b7f0656a25 drm/amdgpu: Updated XGMI power down control support check
Updated SMC FW version check to determine if XGMI power down control is supported

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 17:43:59 -04:00
John Clements
5c23e9e05e drm/amdgpu: Update RAS XGMI error inject sequence
Disable XGMI link power down prior to issuing a XGMI RAS error

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 17:42:35 -04:00
John Clements
5e7067b24f drm/amdgpu: Add DPM function for XGMI link power down control
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 17:42:27 -04:00
John Clements
ab9c21124d drm/amdgpu: Add cmd to control XGMI link sleep
Added host to SMU FW cmd to enable/disable XGMI link power down

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 17:42:15 -04:00
Colin Ian King
29c1ec244c drm/amdgpu: remove redundant assignment to variable ret
The variable ret is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:45 -04:00
Yintian Tao
72d99b395f drm/amdgpu: turn back rlcg write for gfx_v10
There is no need to use amdgpu_mm_wreg_mmio_rlc()
during initialization time because this interface
is only designed for debugfs case to access the
registers which are only permitted by RLCG during
run-time. Therefore, turn back rlcg write for gfx_v10.
If we not turn back it, it will raise amdgpu load failure.
[   54.904333] amdgpu: SMU driver if version not matched
[   54.904393] amdgpu: SMU is initialized successfully!
[   54.905971] [drm] kiq ring mec 2 pipe 1 q 0
[   55.115416] amdgpu 0000:00:06.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx_0.0.0 test failed (-110)
[   55.118877] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v10_0> failed -110
[   55.126587] amdgpu 0000:00:06.0: amdgpu_device_ip_init failed
[   55.133466] amdgpu 0000:00:06.0: Fatal error during GPU init

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:45 -04:00
Evan Quan
cd598d6cfd drm/amd/powerplay: report correct AC/DC event based on ctxid V2
'ctxid' is used to distinguish different events raised from SMC.
0x3 and 0x4 are for AC and DC power mode.

V2: update the way to retrieve the ctxid and change the log level
    to debug

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:44 -04:00
Evan Quan
e528ccf932 drm/amd/powerplay: shutdown on HW CTF
To prevent further damage.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:44 -04:00
Evan Quan
9495220577 drm/amd/powerplay: try to do a graceful shutdown on SW CTF
Normally this(SW CTF) should not happen. And by doing graceful
shutdown we can prevent further damage.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:44 -04:00
Andrey Grodzovsky
73339a7154 drm/amdgpu: Add AQUIRE_MEM PACKET3 fields defintion
Add this for gfx10 and gfx9.

v2: Fix identation

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-14 16:42:43 -04:00
Dave Airlie
49eea1c657 Merge tag 'amd-drm-next-5.8-2020-05-12' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.8-2020-05-12:

amdgpu:
- Misc cleanups
- RAS fixes
- Expose FP16 for modesetting
- DP 1.4 compliance test fixes
- Clockgating fixes
- MAINTAINERS update
- Soft recovery for gfx10
- Runtime PM cleanups
- PSP code cleanups

amdkfd:
- Track GPU memory utilization per process
- Report PCI domain in topology

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200512213703.4039-1-alexander.deucher@amd.com
2020-05-14 13:21:33 +10:00
Jason Yan
37e4f052cc drm/amd/amdgpu: remove defined but not used 'crtc_offsets'
Fix the following gcc warning:

drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:65:18: warning: ‘crtc_offsets’
defined but not used [-Wunused-const-variable=]
 static const u32 crtc_offsets[6] =
                  ^~~~~~~~~~~~

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:04:09 -04:00
Leo (Hanghong) Ma
3528cae940 drm/amd/amdgpu: Update update_config() logic
[Why]
For MST case: when update_config is called to disable a stream,
this clears the settings for all the streams on that link.
We should only clear the settings for the stream that was disabled.

[How]
Clear the settings after the call to remove display is called.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:03:31 -04:00
Tom St Denis
2c60129469 drm/amd/amdgpu: Add missing GRBM bits for GFX 10.1
Requested bits for UMR support

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:03:25 -04:00
Tom St Denis
b0be3c3a25 drm/amd/amdgpu: add raven1 part to the gfxoff quirk list
On my raven1 system (rev c6) with VBIOS 113-RAVEN-114 GFXOFF is
not stable (resulting in large block tiling noise in some applications).

Disabling GFXOFF via the quirk list fixes the problems for me.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:03:14 -04:00
Jane Jian
feb000fdff drm/amd/powerplay: skip judging if baco support for Arcturus sriov
since for sriov, baco happens on host side, no need and meaning
to judge is baco.
also, since kiq reads strap0 in here, if kiq is not ready
or gpu reset(kiq resume) happens after this read, would fail
to read and wrongly set baco as true(1).

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:03:01 -04:00
Alex Deucher
b586154466 drm/amdgpu: only set DPM_FLAG_NEVER_SKIP for legacy ATPX BOCO
We only need to set DPM_FLAG_NEVER_SKIP for the legacy ATPX
BOCO case.  D3cold and BACO work as expected.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:33:32 -04:00
Alex Deucher
af27c649b6 drm/amdgpu: drop extra runtime pm handling in resume pmop
The core handles this for us.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:33:30 -04:00
Alex Deucher
deff2b024a drm/amdgpu: fix runpm logic in amdgpu_pmops_resume
We should be checking whether the driver enabled runtime pm
rather than whether the asic supports BOCO or BACO.  That said
in general they are equivalent unless the user has disabled
runpm or it has been disabled for a specific asic.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:33:27 -04:00
Alex Deucher
f0d6967808 drm/amdgpu: drop pm_runtime_set_active
The pci core handles this for us in pci_pm_init.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:33:21 -04:00
Alex Deucher
0da4a419a2 drm/amdgpu: implement soft_recovery for gfx10
Same as gfx9.  This allows us to kill the waves for hung
shaders.

Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:33:09 -04:00
Nirmoy Das
77f3a5cd70 drm/amdgpu: cleanup sysfs file handling
Create sysfs file using attributes.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:32:24 -04:00
Evan Quan
85625e6429 drm/amdgpu: enable hibernate support on Navi1X
BACO is needed to support hibernate on Navi1X.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:32:16 -04:00
Hawking Zhang
890900fe77 drm/amdgpu: use node_id and node_size to calcualte dram_base_address
physical_node_id * node_segment_size should be the
dram_base_address for current gpu node in xgmi config

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:32:10 -04:00
Hawking Zhang
999a69e275 drm/amdgpu: switch to common rlc_autoload helper
drop IP specific psp function for rlc autoload since
the autoload_supported was introduced to mark ASICs
that support rlc_autoload

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:32:03 -04:00
Hawking Zhang
c797c583e8 drm/amdgpu: drop unused ras ta helper function
cure posion command was replaced by ras recovery
solution and was not a formal command supported
by ras ta anymore

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:31:56 -04:00
Hawking Zhang
001a0a95ed drm/amdgpu: switch to common ras ta helper
TRIGGER_ERROR is common ras ta command for all the
ASICs that support RAS feature. switch to common helper
to avoid duplicate implementation per IP generation

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:31:49 -04:00
Hawking Zhang
35ccba4e9f drm/amdgpu: switch to common xgmi ta helpers
get_hive_id/get_node_id/get_topology_info/set_topology_info
are common xgmi command supported by TA for all the ASICs
that support xgmi link. They should be implemented as common
helper functions to avoid duplicated code per IP generation

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-08 14:31:37 -04:00
Dave Airlie
3fd911b69b drm-misc-next for 5.8:
UAPI Changes:
 
 Cross-subsystem Changes:
 
  * MAINTAINERS: restore alphabetical order; update cirrus driver
  * Dcomuentation: document visionix, chronteli, ite vendor prefices; update
                   documentation for Chrontel CH7033, IT6505, IVO, BOE,
 		  Panasonic, Chunghwa, AUO bindings; convert dw_mipi_dsi.txt
 		  to YAML; remove todo item for drm_display_mode.hsync removal;
 
 Core Changes:
 
  * drm: add devm_drm_dev_alloc() for managed allocations of drm_device;
         use DRM_MODESET_LOCK_ALL_*() in mode-object code; remove
         drm_display_mode.hsync; small cleanups of unused variables,
 	compiler warnings and static functions
  * drm/client: dual-lincensing: GPL-2.0 or MIT
  * drm/mm: optimize tree searches in rb_hole_addr()
 
 Driver Changes:
 
  * drm/{many}: use devm_drm_dev_alloc(); don't use drm_device.dev_private
  * drm/ast: don't double-assign to drm_crtc_funcs.set_config; drop
             drm_connector_register()
  * drm/bochs: drop drm_connector_register()
  * drm/bridge: add support for Chrontel ch7033; fix stack usage with
                old gccs; return error pointer in drm_panel_bridge_add()
  * drm/cirrus: Move to tiny
  * drm/dp_mst: don't use 2nd sideband tx slot; revert "Remove single tx
                msg restriction"
  * drm/lima: support runtime PM;
  * drm/meson: limit modes wrt chipset
  * drm/panel: add support for Visionox rm69299; fix clock on
               boe-tv101wum-n16; fix panel type for AUO G101EVN10;
 	      add support for Ivo M133NFW4 R0; add support for BOE
 	      NV133FHM-N61; add support for AUO G121EAN01.4, G156XTN01.0,
 	      G190EAN01
  * drm/pl111: improve vexpress init; fix module auto-loading
  * drm/stm: read number of endpoints from device tree
  * drm/vboxvideo: use managed PCI functions; drop DRM_MTRR_WC
  * drm/vkms: fix use-after-free in vkms_gem_create(); enable cursor
              support by default
  * fbdev: use boolean values in several drivers
  * fbdev/controlfb: fix COMPILE_TEST
  * fbdev/w100fb: fix double-free bug
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl6ztuYACgkQaA3BHVML
 eiMWaAgAvYpXvBqBfE/5jOTa1zWQUX9VMbmMDUqSAgec5r/HVHjmlbsDWkLff2XG
 rH7T2shMNJwluoCleHfRFYUfyUWMmXoMgri1elnf4EgddmekLZ6vNQjtAYMRFUfL
 7VegcFw5D5Z28uxK5YGnaaSQob34LvD8UTgvnCVfDQzCjcEzGYi1d9XELm7kTHje
 dF2FOWPeF8FTqP0XHM8RrHNXW/pUPy9qErpmb2bWYBQp7ebkI2Q4p6IZIpRQm757
 bnKPIdMEcy4BWMk1xtzjEYEb4bGSZhozJUWKFwpm/AHNvTrQCt9hn0GSVrXiqBcR
 UKb42BRxEVR29C/4n10pUcnnQmmi+Q==
 =yzZ9
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.8:

UAPI Changes:

Cross-subsystem Changes:

 * MAINTAINERS: restore alphabetical order; update cirrus driver
 * Dcomuentation: document visionix, chronteli, ite vendor prefices; update
                  documentation for Chrontel CH7033, IT6505, IVO, BOE,
		  Panasonic, Chunghwa, AUO bindings; convert dw_mipi_dsi.txt
		  to YAML; remove todo item for drm_display_mode.hsync removal;

Core Changes:

 * drm: add devm_drm_dev_alloc() for managed allocations of drm_device;
        use DRM_MODESET_LOCK_ALL_*() in mode-object code; remove
        drm_display_mode.hsync; small cleanups of unused variables,
	compiler warnings and static functions
 * drm/client: dual-lincensing: GPL-2.0 or MIT
 * drm/mm: optimize tree searches in rb_hole_addr()

Driver Changes:

 * drm/{many}: use devm_drm_dev_alloc(); don't use drm_device.dev_private
 * drm/ast: don't double-assign to drm_crtc_funcs.set_config; drop
            drm_connector_register()
 * drm/bochs: drop drm_connector_register()
 * drm/bridge: add support for Chrontel ch7033; fix stack usage with
               old gccs; return error pointer in drm_panel_bridge_add()
 * drm/cirrus: Move to tiny
 * drm/dp_mst: don't use 2nd sideband tx slot; revert "Remove single tx
               msg restriction"
 * drm/lima: support runtime PM;
 * drm/meson: limit modes wrt chipset
 * drm/panel: add support for Visionox rm69299; fix clock on
              boe-tv101wum-n16; fix panel type for AUO G101EVN10;
	      add support for Ivo M133NFW4 R0; add support for BOE
	      NV133FHM-N61; add support for AUO G121EAN01.4, G156XTN01.0,
	      G190EAN01
 * drm/pl111: improve vexpress init; fix module auto-loading
 * drm/stm: read number of endpoints from device tree
 * drm/vboxvideo: use managed PCI functions; drop DRM_MTRR_WC
 * drm/vkms: fix use-after-free in vkms_gem_create(); enable cursor
             support by default
 * fbdev: use boolean values in several drivers
 * fbdev/controlfb: fix COMPILE_TEST
 * fbdev/w100fb: fix double-free bug

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507072503.GA10979@linux-uq9g
2020-05-08 15:17:08 +10:00
Dave Airlie
370fb6b0aa Merge tag 'amd-drm-next-5.8-2020-04-30' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.8-2020-04-30:

amdgpu:
- SR-IOV fixes
- SDMA fix for Navi
- VCN 2.5 DPG fixes
- Display fixes
- Display stuttering fixes for pageflip and cursor
- Add support for handling encrypted GPU memory
- Add UAPI for encrypted GPU memory
- Rework IB pool handling

amdkfd:
- Expose asic revision in topology
- Add UAPI for GWS (Global Wave Sync) resource management

UAPI:
- Add amdgpu UAPI for encrypted GPU memory
  Used by: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4401
- Add amdkfd UAPI for GWS (Global Wave Sync) resource management
  Thunk usage of KFD ioctl: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/roc-2.8.0/src/queues.c#L840
  ROCr usage of Thunk API: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/roc-3.1.0/src/core/runtime/amd_gpu_agent.cpp#L597
  HCC code using ROCr API: 98ee9f3494/lib/hsa/mcwamp_hsa.cpp (L2161)
  HIP code using HCC API: cf8589b8c8/src/hip_module.cpp (L567)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430212951.3902-1-alexander.deucher@amd.com
2020-05-08 13:31:08 +10:00
Chen Zhou
3852ee7953 drm/amd/display: remove duplicate headers
Remove duplicate headers which are included twice.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:25:14 -04:00
Jason Yan
b1c3b7f13e drm/amd/display: remove variable "result" in dcn20_patch_unknown_plane_state()
Fix the following coccicheck warning:

drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c:3216:16-22:
Unneeded variable: "result". Return "DC_OK" on line 3229

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:25:08 -04:00
Bernard Zhao
ecc8c2e193 drm/amd/amdgpu: cleanup coding style a bit
There is DEVICE_ATTR mechanism in separate attribute define.
So this change is to use attr array, also use
sysfs_create_files in init function & sysfs_remove_files in
fini function.
This maybe make the code a bit readable.

Signed-off-by: Bernard Zhao <bernard@vivo.com>

Changes since V1:
*Use DEVICE_ATTR mechanism

Link for V1:
*https://lore.kernel.org/patchwork/patch/1228076/

V2: make array const to fix build errors

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:25:03 -04:00
Simon Ser
e133020f92 drm/amd/display: add basic atomic check for cursor plane
This patch adds a basic cursor check when an atomic test-only commit is
performed. The position and size of the cursor plane is checked.

This should fix user-space relying on atomic checks to assign buffers to
planes.

Signed-off-by: Simon Ser <contact@emersion.fr>
Reported-by: Roman Gilg <subdiff@gmail.com>
References: https://github.com/emersion/libliftoff/issues/46
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:24:45 -04:00
Nicholas Kazlauskas
b931e199f1 drm/amd/display: Fix vblank and pageflip event handling for FreeSync
[Why]
We're sending the drm vblank event a frame too early in the case where
the pageflip happens close to VUPDATE and ends up blocking the signal.

The implementation in DM was previously correct *before* we started
sending vblank events from VSTARTUP unconditionally to handle cases
where HUBP was off, OTG was ON and userspace was still requesting some
DRM planes enabled. As part of that patch series we dropped VUPDATE
since it was deemed close enough to VSTARTUP, but there's a key
difference betweeen VSTARTUP and VUPDATE - the VUPDATE signal can be
blocked if we're holding the pipe lock.

There was a fix recently to revert the unconditional behavior for the
DCN VSTARTUP vblank event since it was sending the pageflip event on
the wrong frame - once again, due to blocking VUPDATE and having the
address start scanning out two frames later.

The problem with this fix is it didn't update the logic that calls
drm_crtc_handle_vblank(), so the timestamps are totally bogus now.

[How]
Essentially reverts most of the original VSTARTUP series but retains
the behavior to send back events when active planes == 0.

Some refactoring/cleanup was done to not have duplicated code in both
the handlers.

Fixes: 16f17eda8b ("drm/amd/display: Send vblank and user events at vsartup for DCN")
Fixes: 3a2ce8d66a ("drm/amd/display: Disable VUpdate interrupt for DCN hardware")
Fixes: 2b5aed9ac3 ("drm/amd/display: Fix pageflip event race condition for DCN.")

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:24:37 -04:00
John Clements
624e8c8703 drm/amdgpu: Fix bug in RAS invoke
Invoke sequence should abort when ras interrupt is detected before reading TA host shared memory

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-07 16:24:16 -04:00
ChenTao
7f6778b114 drm/amdgpu/navi10: fix unsigned comparison with 0
Fixes warning because size is uint32_t and can never be negtative

drivers/gpu/drm/amd/amdgpu/../powerplay/navi10_ppt.c:1296:12: warning:
comparison of unsigned expression < 0 is always false [-Wtype-limits]
   if (size < 0)

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ChenTao <chentao107@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:39 -04:00
Felix Kuehling
fd9a9f8801 drm/amdgpu: Use GEM obj reference for KFD BOs
Releasing the AMDGPU BO ref directly leads to problems when BOs were
exported as DMA bufs. Releasing the GEM reference makes sure that the
AMDGPU/TTM BO is not freed too early.

Also take a GEM reference when importing BOs from DMABufs to keep
references to imported BOs balances properly.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:29 -04:00
Alex Deucher
1cba098761 drm/amdgpu: force fbdev into vram
We set the fb smem pointer to the offset into the BAR, so keep
the fbdev bo in vram.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207581
Fixes: 6c8d74caa2 ("drm/amdgpu: Enable scatter gather display support")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:25 -04:00
Evan Quan
74577c3a48 drm/amd/powerplay: perform PG ungate prior to CG ungate
Since gfxoff should be disabled first before trying to access those
GC registers.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:15 -04:00
Evan Quan
47891bf1da drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate
As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact,
this unnecessary cancel_delayed_work_sync may leave a small time window
for race condition and is dangerous.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:05 -04:00
Evan Quan
2536c4b0dd drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate
Otherwise, MGCG/MGLS will be left enabled.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:50:51 -04:00
Christian König
9d11eb0d0c drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2
This should speed up debugging VRAM access a lot.

v2: add HDP flush/invalidate

Unrevert: RAS issue at root of the issue has been addressed

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Signed-off-by: Kent Russell <kent.russell@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:50:40 -04:00
Jerry (Fangzhi) Zuo
85d4d684fe drm/amd/display: Add dm support for DP 1.4 Compliance edid corruption test
It works together with drm framework
"drm: Add support for DP 1.4 Compliance edid corruption test"

Add the edid validity check scenario when edid base block is read back
with error. Send back real edid checksum and enable fail-safe mode in DC.

Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:50:25 -04:00
Arnd Bergmann
7fcffecf79 drm/amdgpu: allocate large structures dynamically
After the structure was padded to 1024 bytes, it is no longer
suitable for being a local variable, as the function surpasses
the warning limit for 32-bit architectures:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:587:5: error: stack frame size of 1072 bytes in function 'amdgpu_ras_feature_enable' [-Werror,-Wframe-larger-than=]
int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
    ^

Use kzalloc() instead to get it from the heap.

Fixes: a0d254820f ("drm/amdgpu: update RAS TA to Host interface")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 13:12:55 -04:00
Andriy Gapon
bcb7b0ef82 amdgpu_acpi: add backlight control for the DC case
This uses backlight_device_set_brightness() to set the brightness
level requested via ATIF.

Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 13:12:55 -04:00
Nathan Chancellor
54b7feb93f drm/amdgpu: Avoid integer overflow in amdgpu_device_suspend_display_audio
When building with Clang:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4160:53: warning: overflow in
expression; result is -294967296 with type 'long' [-Winteger-overflow]
                expires = ktime_get_mono_fast_ns() + NSEC_PER_SEC * 4L;
                                                                  ^
1 warning generated.

Multiplication happens first due to order of operations and both
NSEC_PER_SEC and 4 are long literals so the expression overflows. To
avoid this, make 4 an unsigned long long literal, which matches the
type of expires (u64).

Fixes: 3f12acc8d6 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Link: https://github.com/ClangBuiltLinux/linux/issues/1017
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-05 13:12:55 -04:00