Generate xGMI iolink for upper level usage
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the iolink type defines according to the new thunk spec
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Thunk will generate the XGMI topology information when necessary with the hive_id
for each specified device
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Retrieve hive_id from amdgpu device
v2: compile fix
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFD need to get hive id from amdgpu to build up the XGMI topology
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver will save an array of XGMI hive info, each hive will have a list of devices
that have the same hive ID.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add dummy function for xgmi function interface with psp
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Place holder for XGMI support
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On hives with xgmi enabled, the fb_location aperture is a size
which defines the total framebuffer size of all nodes in the
hive. Each GPU in the hive has the same view via the fb_location
aperture. GPU0 starts at offset (0 * segment size),
GPU1 starts at offset (1 * segment size), etc.
For access to local vram on each GPU, we need to take this offset into
account. This including on setting up GPUVM page table and GART table
v2: squash in "drm/amdgpu: Init correct fb region for none XGMI configuration"
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Slava Abramov <slava.abramov@amd.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Used to populate the xgmi info on vega20.
v2: PF_MAX_REGION is val - 1 (Ray)
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Slava Abramov <slava.abramov@amd.com>
Reviewed-by :Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by :Shaoyun liu <Shaoyun.liu@amd.com>
Initial pass at a structure to store xgmi info. xgmi is a high
speed cross gpu interconnect.
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Slava Abramov <slava.abramov@amd.com>
Reviewed-by :Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Correct the definition based on vega20 register spec
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
since we use PSP to program IH regs now
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we might run into a use after free during bulk move.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix SDMA hang in prt mode, clear XNACK_WATERMARK in reg SDMA0_UTCL1_WATERMK to avoid the issue
Affected ASICs: VEGA10 VEGA12 RV1 RV2
v2: add reg clear for SDMA1
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Tested-by: Yukun Li <yukun1.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
do_div expects the 1st argument in 64bit instead of 32bit.
Drop the usage of do_div as it seems unnecessary.
V2: drop usage of do_div completely
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch fixes following warnings.
./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3011:
warning: Excess function parameter 'dev' description
in 'amdgpu_vm_get_task_info'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3012:
warning: Function parameter or member 'adev' not
described in 'amdgpu_vm_get_task_info'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3012:
warning: Excess function parameter 'dev' description
in 'amdgpu_vm_get_task_info'
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The intent of two commits was lost in the last rebase:
810955b drm/amdgpu: Fix acquiring VM on large-BAR systems
b5d21aa drm/amdgpu: Don't use shadow BO for compute context
This commit restores the original behaviour:
* Don't set AMDGPU_GEM_CREATE_NO_CPU_ACCESS for page directories
to allow them to be reused for compute VMs
* Don't create shadow BOs for page tables in compute VMs
v2: move more logic into amdgpu_vm_bo_param
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Kent Russell <Kent.Russell@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable the old AGP aperture to avoid GART mappings.
v2: don't enable it for SRIOV
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only initialize KFD once by moving amdgpu_amdkfd_init from
amdgpu_pci_probe to amdgpu_init. This fixes kernel oopses and hangs
when booting multi-GPU systems.
Also removed some vestiges of KFD being its own module.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Existing debug dump are all invariant, new “low 32-bit of address”
dump is not invariant
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There are same purpose transition events.
[How]
remove the redundant event log.
Signed-off-by: Chiawen Huang <chiawen.huang@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The extraneous call to amdgpu_pm_compute_clocks is deprecated.
[How]
Remove it.
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Virtual sink is used when set mode happens on a disconnected display
to allow the mode set to proceed. This did not work with MST because
the logic for acquiring stream encoder uses stream signal to determine
the special handling is required, and stream signal is virtual instead
of DP in this case.
[How]
Use link type to decide instead.
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The i2c and aux engines are similar, and should be placed
next to eachother for readability
[How]
Reorder the elements of the resource_pool struct
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Our implementation is functionally identical to DRM's
Note that instead of checking if the provided id is 0, the helper
follows through with the mode object search. However, It will still
return NULL, since 0 is not a valid object id, and missed searches
will return NULL.
[How]
Remove our implementation, and replace it with
drm_atomic_helper_best_encoder.
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
AMD Stoney reference board, there are only 2 pipes (not include
underlay), and 3 connectors. resource creation, only
2 I2C/AUX engines are created. Within dc_link_aux_transfer, when
pin_data_en =2, refer to enengines[ddc_pin->pin_data->en] = NULL.
NULL point is referred later causing system crash.
[how]
each asic design has fixed number of ddc engines at hw side.
for each ddc engine, create its i2x/aux engine at sw side.
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some display need disconnect delay. Adding this parameter for future use
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Although 4 unique register values exist for gamma modes, two are
actually the same (the two RAMs) It’s not possible for caller to
understand this HW specific behavior, so some parsing is necessary
in driver
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Wesley Chalmers <Wesley.Chalmers@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]Update Code to get DTN golden log check to pass for tests run after
DAL217 tests.
[How]Change how dcn10_log_hw_state function prints HW state info
(CM_GAMUT_REMAP_Cx_Cx registers) when GAMUT REMAP is in bypass mode.
Signed-off-by: Gary Kattan <gary.kattan@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
We currently lock modeset by setting a boolean in dm. We want to lock
Based on what DC tells us.
[How]
Build stream_updates and plane_update based on what changed. Then we
call check_update_surfaces_for_stream() to get the update type
We lock only if update_type is not fast
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we won't be able to use the AGP aperture.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Start to use the old AGP aperture for system memory access.
v2: Move that to amdgpu_ttm_alloc_gart
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Helper to figure out the location of the AGP BAR.
v2: fix a couple of bugs
v3: correctly add one to vram_end
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Correct sign extend the GMC addresses to 48bit.
v2: sign extending turned out easier than thought.
v3: clean up the defines and move them into amdgpu_gmc.h as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since we have a lot of FAQ on the VM state machine try to improve the
documentation by adding functions for each state move.
v2: fix typo in amdgpu_vm_bo_invalidated, use amdgpu_vm_bo_relocated in
one more place as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoid unlocking a lock we never locked.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allows us to avoid taking the spinlock in more places.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_vm_bo_* functions should come much later.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For under voltage, negative value will be applied to voltage
offset. Update the data type to cover this case.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Added vega20 overdrive support based on existing OD sysfs
APIs. However, the OD logics are simplified on vega20. So,
the behavior will be a little different and works only on
some limited levels.
V2: fix typo
fix commit description
revise error logs
add support for clock OD
V3: separate clock from voltage OD settings
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vega20 does not appear to be affected by the same issue
as vega10. Enable the full stolen memory handling on
vega20. Reserve the appropriate size at init time to avoid
display artifacts and then free it at the end of init once
the new FB is up and running.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
vega12 does not appear to be affected by the same issue
as vega10. Enable the full stolen memory handling on
vega12. Reserve the appropriate size at init time to avoid
display artifacts and then free it at the end of init once
the new FB is up and running.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Raven does not appear to be affected by the same issue
as vega10. Enable the full stolen memory handling on
Raven. Reserve the appropriate size at init time to avoid
display artifacts and then free it at the end of init once
the new FB is up and running.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=106639
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No functional change, just rework it in order to adjust the
behavior on a per asic level. The problem is that on vega10,
something corrupts the lower 8 MB of vram on the second
resume from S3. This does not seem to affect Raven, other
gmc9 based asics need testing.
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add BOs to the idle state again and correctly clear the flag when
new BOs are added.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
First step to fix the LRU corruption, we accidentially tried to move things
on the LRU after dropping the lock.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Staring at the function for six hours, just to essentially move one line
of code. The problem was that the first list_cut_position call could result
in list2 pointing to la-la-land.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This at least allows to fail any subsequent IOCTLs with -ENODEV
after the device is gone.
Still this operation is not supported yet in graphic mode
and will lead at least to page faults and other issues.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit a7f91061c6.
Felix pointed out that we need to have the BOs mapped even before
amdgpu_vm_update_directories is called.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always try to put the GART away from where VRAM is.
v2: correctly handle the 4GB limitation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This caused a confusing error message, but there is functionally
no problem since the default method is DIRECT.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Move some KFD-related (but used in amdgpu_drv.c) definitions from
kfd_priv.h to kgd_kfd_interface.h so we don't need to include kfd_priv.h
in amdgpu_drv.c. This fixes a build failure when AMDGPU is enabled but
MMU_NOTIFIER is not.
This patch also disables KFD-related module options when HSA_AMD is not
enabled.
v2: rebase (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary,
and in this case buggy because we only clear the first entry.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The first parameter of list_cut_position() must point to an initialized
list.
Noticed thanks to KASAN pointing out something's fishy here.
Fixes: "drm/ttm: add bulk move function on LRU"
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of the larger one use the smaller hole in the MC address
space for the GART mappings.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.
It triggered various badness on my development machine when running the
piglit gpu profile with radeonsi on Bonaire, looks like memory
corruption due to insufficiently protected list manipulations.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Most of the time we only need to know if the BO has a valid GMC addr.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Further separate GART and GTT domain.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Improve the VCE limitation handling.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move that into amdgpu_gmc.c since we are really deadling with GMC
address space here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not used any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For compute vm acquired from amdgpu, vm.pasid is managed
by kfd. Decouple pasid from such vm on process destroy
to avoid duplicate pasid release.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To make a amdgpu vm to a compute vm, the old pasid will be freed and
replaced with a pasid managed by kfd. Kfd can't reuse original pasid
allocated by amdgpu because kfd uses different pasid policy with amdgpu.
For example, all graphic devices share one same pasid in a process.
v2: rebase (Alex)
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Doing it earlier hits a WARN_ON_ONCE in amdgpu_bo_gpu_offset.
Fixes: "drm/amdgpu: remove gart.table_addr"
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(&base->vm_status, &vm->evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(&bo_va->base.vm_status, &vm->moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It will be more safe to make full-acess include both phase1 and phase2.
Then accessing special registeris wherever at phase1 or phase2 will not
block any shutdown and suspend process under virtualization.
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Looks like a copy&paste error to me.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After merging KFD into amdgpu, move module parameters defined in KFD to
amdgpu_drv.c, where other module parameters are declared.
v2: add kernel-doc comments
v3: rebase and fix parameter variable name (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After amdkfd is merged to amdgpu, CONFIG_HSA_AMD_MODULE no longer exists.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since KFD is only supported by single GPU driver, it makes sense to merge
amdgpu and amdkfd into one module. This patch is the initial step: merge
Kconfig and Makefile.
v2: also remove kfd from drm Kconfig
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The fault reports the page number where the fault happend and not
the exact faulty address. Update the print message to reflect that.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The dce_i2c_hw code contained four funtcions that were only
called in one place and did not have a clearly delineated
purpose.
[How]
Inline these functions, keeping the same functionality.
This is not a functional change.
The functions disable_i2c_hw_engine and release_engine_dce_hw were
pulled into their respective callers.
The most interesting part of this change is the acquire functions.
dce_i2c_hw_engine_acquire_engine was pulled into
dce_i2c_engine_acquire_hw, and dce_i2c_engine_acquire_hw was pulled
into acquire_i2c_hw_engine.
Some notes to show that this change is not functional:
-Failure conditions in any function resulted in a cascade of calls that
ended in a 'return NULL'.
Those are replaced with a direct 'return NULL'.
-The variable result is the one from dce_i2c_hw_engine_acquire_engine.
The boolean result used as part of return logic was removed.
-As the second half of dce_i2c_hw_engine_acquire_engine is only executed
if that function is returning true and therefore exiting the do-while
loop in dce_i2c_engine_acquire_hw, those lines were moved outside
of the loop.
Signed-off-by: David Francis <David.Francis@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
On PCO and up, whenever SMU receive message to indicate active
display count = 0. SMU will turn off 48MHZ TMDP reference clock
by writing to 1 TMDP_48M_Refclk_Driver_PWDN. Once this clock is
off, no PHY register will respond to register access. This means
our current sequence of notifying display count along with requesting
clock will cause driver to hang when accessing PHY registers after
displays count goes to 0.
[How]
Separate the PPSMC_MSG_SetDisplayCount message from the SMU messages
that request clocks, have display own sequencing of this message so
that we can send it at the appropriate time.
Do not redundantly power off HW when entering S3, S4, since display
should already be called to disable all streams. And ASIC soon be
powered down.
Signed-off-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The i2c code contains two structs that contain the same
information as i2c_payload
[How]
Replace references to those structs with references to
i2c_payload
dce_i2c_transaction_request->status was written to but never read,
so all references to it are removed
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Jordan Lazare <Jordan.Lazare@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Logging hardware state can be done by triggering a write to the
debugfs file. It would also be useful to be able to read the hardware
state from the debugfs file to be able to generate a clean log without
timestamps.
[How]
Usage: cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log
Threading is an obvious concern when dealing with multiple debugfs
operations and blocking on global state in dm or dc seems unfavorable.
Adding an extra parameter for the debugfs log context state is the
implementation done here. Existing code that made use of DTN_INFO
and its associated macros needed to be refactored to support this.
We don't know the size of the log in advance so it reallocates the
log string dynamically. Once the log has been generated it's copied
into the user supplied buffer for the debugfs. This allows for seeking
support but it's worth nothing that unlike triggering output via
dmesg the hardware state might change in-between reads if your buffer
size is too small.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Jordan Lazare <Jordan.Lazare@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Good spelling and grammar makes comments
more pleasant and clearer.
Linux has coding standards for comments
that we should try to follow.
[How]
Fix obvious spelling and grammar issues
Ensure all comments use '/*' and '*/' and multi-line comments
follow linux convention
Remove line-of-stars comments that do not separate sections
of code and comments referring to lines of code that have
since been removed
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There is currently an intermittent hang from a memory leak in
DTN stress testing. It is caused by unfreed memory during driver
disable.
[How]
Do a dc_sink_release in the case that skips it incorrectly.
Signed-off-by: SivapiriyanKumarasamy <sivapiriyan.kumarasamy@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Now that we "scale" time delays correctly on Maximus (as of diags svn
r170115), the forced "35 ms" wait time now becomes 35 ms * 500 = 17.5
seconds, which is far too long. Even having to repeat polling a
register once causes excessive delays on Maximus.
[How]
Just use the regular wait time passed to the generic_reg_wait()
function. This is sufficient for Maximus now, and it also means that
there's one less "Maximus-only" code path in DAL.
Also disable the "REG_WAIT taking a while:" message on Maximus, since
things do take a while longer there and 1-2ms delays are not uncommon
(and nothing to worry about).
Signed-off-by: Ken Chalmers <ken.chalmers@amd.com>
Reviewed-by: Eric Bernstein <Eric.Bernstein@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
We have logging methods for printing hardware state for newer ASICs
but no way to trigger the log output.
[How]
Add support for triggering the output via writing to a debugfs file
entry. Log output currently goes into dmesg for convenience, but
accessing via a read should be possible later.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Jordan Lazare <Jordan.Lazare@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
drm_handle_vblank is deprecated. Use drm_crtc_handle_vblank instead.
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tony Cheng <tony.cheng@amd.com>
Reviewed-by: Steven Chiu <Steven.Chiu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The function pointers of the dce_i2c_hw struct were never
accessed from outside dce_i2c_hw.c and had only one version.
As function pointers take up space and make debugging difficult,
and they are not needed in this case, they should be removed.
[How]
Remove the dce_i2c_hw_funcs struct and make static all
functions that were previously a part of it. Reorder
the functions in dce_i2c_hw.c.
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Should work on Vega10 as well, but with an obvious performance hit.
Older APUs can be enabled as well, but will probably be more work.
v2: fix error checking
v3: use more general check
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Helper to get the PDE for a PD/PT.
v2: improve documentation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the necessary handling.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper function to figure them out only once.
v2: fix typo with memset
v3: rebase on kfd changes (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Just another leftover from radeon.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can't hold the mn_lock while allocating memory.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No more waiting for a fence done here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. initialize kiq before initialize gfx ring.
2. set kiq ring ready immediately when kiq initialize
successfully.
3. split function gfx_v9_0_kiq_resume into two functions.
gfx_v9_0_kiq_resume is for kiq initialize.
gfx_v9_0_kcq_resume is for kcq initialize.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. initialize kiq before initialize gfx ring.
2. set kiq ring ready immediately when kiq initialize
successfully.
3. split function gfx_v8_0_kiq_resume into two functions.
gfx_v8_0_kiq_resume is for kiq initialize.
gfx_v8_0_kcq_resume is for kcq initialize.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Send all kcq unmap_queue packets and then wait for
complete.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are no any logical changes here.
1. if kcq can be enabled via kiq, we don't need to
do kiq ring test.
2. amdgpu_ring_test_ring function can be used to
sync the ring complete, remove the duplicate code.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Send all kcq unmap_queue packets and then wait for
complete.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are no any logical changes here.
1. if kcq can be enabled via kiq, we don't need to
do kiq ring test.
2. amdgpu_ring_test_ring function can be used to
sync the ring complete, remove the duplicate code.
v2: alloc 6 (not 7) dws for unmap_queues
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As we have unify powergate_uvd/vce/mmhub to set_powergating_by_smu,
and set_powergating_by_smu was supported by both dpm and powerplay.
so remove the else case.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when hw_fini/suspend, smu only need to power on uvd block
if uvd pg is supported, don't need to call uvd to do hw_init.
v2: fix typo in patch descriptions and comments.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For SI/Kv, the power state is managed by function
amdgpu_pm_compute_clocks.
when dpm enabled, we should call amdgpu_pm_compute_clocks
to update current power state instand of set boot state.
this change can fix the oops when kfd driver was enabled on Kv.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Forgot to add vce pg support via smu for Kaveri/Mullins.
Fixes: 561a5c83eadd ("drm/amd/pp: Unify powergate_uvd/vce/mmhub
to set_powergating_by_smu")
v2: refine patch descriptions suggested by Michel
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when ac/dc switch, driver will be notified by acpi event.
then the power source will be updated. so don't need to
get power source when set power state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is required by gfx hw and can fix the rlc hang when
do s3 stree test on Cz/St.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hang Zhou <hang.zhou@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set the VM size based on system memory size between the ASIC-specific
limits given by min_vm_size and max_bits. GFXv9 GPUs will keep their
default VM size of 256TB (48 bit). Only older GPUs will adjust VM size
depending on system memory size.
This makes more VM space available for ROCm applications on GFXv8 GPUs
that want to map all available VRAM and system memory in their SVM
address space.
v2:
* Clarify comment
* Round up memory size before >> 30
* Round up automatic vm_size to power of two
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to kill waves on the SQ.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to kill waves on the SQ.
v2: only for the GFX ring for now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to kill waves on the SQ.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of hammering hard on the GPU try a soft recovery first.
v2: reorder code a bit
v3: increase timeout to 10ms, increment GPU reset counter
v4: squash in compile fix (Christian)
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
The new bulk moving functionality is ready, the overhead of moving PD/PT bos to
LRU is fixed. So move them on LRU again.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I continue to work for bulk moving that based on the proposal by Christian.
Background:
amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of
them on the end of LRU list one by one. Thus, that cause so many BOs moved to
the end of the LRU, and impact performance seriously.
Then Christian provided a workaround to not move PD/PT BOs on LRU with below
patch:
Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae ("drm/amdgpu: band aid
validating VM PTs")
However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU
instead of one by one.
Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be
validated we move all BOs together to the end of the LRU without dropping the
lock for the LRU.
While doing so we note the beginning and end of this block in the LRU list.
Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do,
we don't move every BO one by one, but instead cut the LRU list into pieces so
that we bulk move everything to the end in just one operation.
Test data:
+--------------+-----------------+-----------+---------------------------------------+
| |The Talos |Clpeak(OCL)|BusSpeedReadback(OCL) |
| |Principle(Vulkan)| | |
+------------------------------------------------------------------------------------+
| | | |0.319 ms(1k) 0.314 ms(2K) 0.308 ms(4K) |
| Original | 147.7 FPS | 76.86 us |0.307 ms(8K) 0.310 ms(16K) |
+------------------------------------------------------------------------------------+
| Orignial + WA| | |0.254 ms(1K) 0.241 ms(2K) |
|(don't move | 162.1 FPS | 42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 ms(16K)|
|PT BOs on LRU)| | | |
+------------------------------------------------------------------------------------+
| Bulk move | 163.1 FPS | 40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 ms(4K) |
| | | |0.214 ms(8K) 0.225 ms(16K) |
+--------------+-----------------+-----------+---------------------------------------+
After test them with above three benchmarks include vulkan and opencl. We can
see the visible improvement than original, and even better than original with
workaround.
v2: move all BOs include idle, relocated, and moved list to the end of LRU and
put them together.
v3: remove unused parameter and use list_for_each_entry instead of the one with
save entry.
v4: move the amdgpu_vm_move_to_lru_tail after command submission, at that time,
all bo will be back on idle list.
v5: remove amdgpu_vm_move_to_lru_tail_by_list(), use bulk_moveable instread of
validated, and move ttm_bo_bulk_move_lru_tail() also into
amdgpu_vm_move_to_lru_tail().
v6: clean up and fix return value.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This function allow us to bulk move a group of BOs to the tail of their LRU.
The positions of group of BOs are stored on the (first, last) bulk_move_pos
structure.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When move a BO to the end of LRU, it need remember the BO positions.
Make sure all moved bo in between "first" and "last". And they will be bulk
moving together.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper to get the root PD address and remove the workarounds from
the GMC9 code for that.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can easily figure out the address on the fly.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sed -i "s/gart.robj/gart.bo/" drivers/gpu/drm/amd/amdgpu/*.c
sed -i "s/gart.robj/gart.bo/" drivers/gpu/drm/amd/amdgpu/*.h
Just cleaning up radeon leftovers.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move setting the GART addr for window based copies into the TTM code who
uses it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper function for getting the root PD addr and cleanup join the
two VM related functions and cleanup the function name.
No functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Preparation for following changes. This validates the root PD twice,
but the overhead of that should be minimal.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check if we should call the function instead of providing the forced
flag.
v2: rebase on KFD changes (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: d790449835e6 ("drm/amdgpu: use kiq to do invalidate tlb")
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds support for LVDS displays.
v2: add support for spread spectrum, sink detect
v3: clean up enable_lvds_output
v4: fix up link_detect
v5: remove assert on 888 format
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105880
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When in gpu reset, don't use kiq, it will generate more TDR.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
do not remove entity from the rq if the current rq is from
the least loaded scheduler.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The if statement isn't indented and it makes static checkers complain.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix comile warning like,
CC [M] drivers/gpu/drm/i915/gvt/execlist.o
CC [M] drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.o
CC [M] drivers/gpu/drm/radeon/btc_dpm.o
CC [M] drivers/isdn/hisax/avm_a1p.o
CC [M] drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_dpp.o
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c: In function ‘dcn10_update_mpcc’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1903:9: warning: missing braces around initializer [-Wmissing-braces]
struct mpcc_blnd_cfg blnd_cfg = {0};
^
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1903:9: warning: (near initialization for ‘blnd_cfg.black_color’) [-Wmissing-braces]
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov, don't use kiq in exclusive mode, as don't know how long time
it will take, some times it will occur exclusive timeout.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the old doorbell range setting until the driver is
able to support more sdma queues.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In function ‘gfx_v9_0_check_fw_write_wait’:
warning: enumeration value ‘CHIP_TAHITI’ not handled in switch [-Wswitch]
Always add default case in case there is no match
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use a fixed number of entities for each hardware IP.
The number of compute entities is reduced to four, SDMA keeps it two
entities and all other engines just expose one entity.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the code into a separate function.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removewd from there by another thread during drm_sched_entity_flush.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need for gpu full access for suspend phase1
because under virtualization there is no hw register access
for dce block.
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To avoid the tlb flush not interrupted by world switch, use kiq and one
command to do tlb invalidate.
v2:
Refine the invalidate lock position.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-and-Tested-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Unify bare metal and sriov, and add firmware checking for
reg write and reg wait unify command.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Existing DTN infrastructure in driver is hacky. It uses implicit log
names, and also incorrect escape ID.
[how]
- Implement using generic DTN escape ID.
- Move file logging functionality from driver to to script; driver now outputs to string/buffer
- Move HWSS debug functionality to separate c file
- Add debug functionalty for per-block logging as CSV
- Add pretty print in python
Signed-off-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
During S4/S3 stress test it is possible to resume from S4 without
calling mode set on eDP, meaning high level optimization flag is not
reset. If this is followed by an S3 resume call, driver will see
optimization flag is set and consume it and think backend is powered
on when in fact it is not.
This results in PHY being off in sequence where
S4->Resume->S3->Resume->ApplyOpt->black screen.
[How]
Move optimization flag to stream instead of a DC flag.
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There are two versions of the hw function pointers: one for dce80
and one for all other versions. These paired functions are
nearly identical. dce80 and dce100 should not require
different i2c access functions.
[How]
Combine each pair of functions into a single function. Mostly
the new functions are based on the dce100 versions as those
versions are newer, support more features, and
were more maintained.
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Define register for dcn10 for future changes
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add csc_transform struct to dc_stream_update, and program if set when
updating streams
Signed-off-by: SivapiriyanKumarasamy <sivapiriyan.kumarasamy@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
I2C code did not match dc resource model and was generally
unpleasant
[How]
Move code into new svelte dce_i2c files, replacing various i2c
objects with two structs: dce_i2c_sw and dce_i2c_hw. Fully split
sw and hw code paths. Remove all redundant declarations. Use
address lists to distinguish between versions. Change dce80 code
to newer register access macros.
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Improve commit d796d844 (drm/radeon/kms: make hibernate work on IGPs) to
only migrate VRAM objects if the Linux kernel is actually built with
support for hibernation (suspend to disk).
Link: https://bugs.freedesktop.org/show_bug.cgi?id=100941
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Improve commit d796d844 (drm/radeon/kms: make hibernate work on IGPs) to
only migrate VRAM objects if the Linux kernel is actually built with
support for hibernation (suspend to disk).
The better solution is to get the information, if this is suspend or
hibernate, from `amdgpu_device_suspend()`, but that’s more involved, so
apply the simple solution first.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=107277
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The status field must be 0 after FW is loaded.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These codes are not used.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>