job->ctx actually is a fence_context of the entity
it belongs to, naming it as ctx is too vague, and
we'll need add amdgpu_ctx into the job structure
later.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As last resort try to evict BOs from the current working set into other
memory domains. This effectively prevents command submission failures when
VM page tables have been swapped out.
v2: fix typos
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
a. gart recovery should be ahead of ring test.
b. rename to amdgpu_ttm_recover_gart
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
spin lock instead of mutex for gtt list
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: corrected register offset shift
v3: rebase fixes
v4: fix firmware paths
add SI smc firmware versions for sysfs dump
remove unused function forward define
fix the tahiti specific value of DEEP_SLEEP_CLK_SEL field
fix to miss adding thermal controller
use vram_type instead of checking mem_gddr5 flag
fix incorrect index of CG_FFCT_0 register
fix incorrect reading method at si_get_current_pcie_speed
Signed-off-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds pcie port read/write entry, because it will be also
used on si dpm part.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v4: rebase fixes
v5: use the generic nop fill
v6: rebase fixes
v7: rebase fixes
copy count fixes from Jonathan
general cleanup
add fill buffer implementation
v8: adapt write_pte and copy_pte to latest changes
Signed-off-by: Ken Wang <Qingqing.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sync switch buffer scheme with windows kmd for gfx v8,
step1: append a switch_buffer to the end of CS
v2:rebase on latest staging
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The old mechanism used a per-submission limit that didn't take previous
submissions within the same time frame into account. It also filled VRAM
slowly when VRAM usage dropped due to a big eviction or buffer deallocation.
This new method establishes a configurable MBps limit that is obeyed when
VRAM usage is very high. When VRAM usage is not very high, it gives
the driver the freedom to fill it quickly. The result is more consistent
performance.
It can't keep the BO move rate low if lots of evictions are happening due
to VRAM fragmentation, or if a big buffer is being migrated.
The amdgpu.moverate parameter can be used to set a non-default limit.
Measurements can be done to find out which amdgpu.moverate setting gives
the best results.
Mainly APUs and cards with small VRAM will benefit from this. For F1 2015,
anything with 2 GB VRAM or less will benefit.
Some benchmark results - F1 2015 (Tonga 2GB):
Limit MinFPS AvgFPS
Old code: 14 32.6
128 MB/s: 28 41
64 MB/s: 15.5 43
32 MB/s: 28.7 43.4
8 MB/s: 27.8 44.4
8 MB/s: 21.9 42.8 (different run)
Random drops in Min FPS can still occur (due to fragmented VRAM?), but
the average FPS is much better. 8 MB/s is probably a good limit for this
game & the current VRAM management. The random FPS drops are still to be
tackled.
v2: use a spinlock
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not of much use at the moment (we don't really use
the second ring either), but may be useful later.
Reviewed-by: JimQu <Jim.Qu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rather than using a hardcoded value. This allows
different versions to expose more or less rings.
No functional change.
Reviewed-by: JimQu <Jim.Qu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The old names were dragged over from radeon. The new ones
better match the naming conventions used in the driver.
No functional change.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make it more obvious what we are doing here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
1. use mutex instead of spinlock for shadow list, since its process could
sleep.
2. move list_del to bo destroy phase.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
Checking if shadow is valid.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use shadow flag to judge which direction to sync.
V2:
Don't need bo pin, so remove it.
V3:
1. Split to two functions, one is backup_to_shadow, another is
restore_from_shadow.
2. Clean up previous shadow direction difinitions.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
add checking if need backup in amdgpu_bo_create.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It doesn't make much sense to create bigger commands first which we then need
to split into smaller one again. Just make sure the commands we create aren't
to big in the first place.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't need the gart mapping handling here any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Indicate if need to sync between bo and shadow, where sync to where.
V2:
Rename to backup_shadow
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shadow bo is the shadow of a bo, which is always in GTT,
which can be used to backup the original bo.
V2:
reference shadow parent, shadow bo will be freed by who allocted him.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Now we can program a flip during a vertical blank period, if it's the
one targeted by the flip (or a later one). This allows simplifying
amdgpu_flip_work_func considerably.
agd: update dce_virtual.c as well.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can actually do way more than just the 64KB we currently used as default.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not used for a long time.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch introduces a new macro WREG32_FIELD which is used
to write to a register with a new value in a field. It's designed
to replace the pattern:
tmp = RREG32(mmFoo);
tmp &= ~REG__FIELD_MASK;
tmp |= new_value << REG__FIELD__SHIFT;
WREG32(mmFoo, tmp)
with:
WREG32_FIELD(Foo, FIELD, new_value);
Unlike WREG32_P() it understands offsets/masks and doesn't
require the caller to shift the value (or mask properly).
It's applied where suitable in the gfx_v8_0.c driver to start
with.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For virtual display feature, as there may be multiple GPUs,
for user could choose whiche GPU need to enable this feature, change
the type of virtual_display from int to char*. The variable will be set
like this virtual_display="xxxx:xx:xx.x;xxxx:xx:xx.x;".
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For virtual display feature, define on variable in amdgpu.ko. When want to
enable virtual display feature, need set the option "amdgpu.virtual_display=1".
And then disable vga render and crtc if have DCE engine.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
so that bo could be set to some pattern
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It is used to identify if the ip block is hang.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Supported starting on certain FW versions.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
10ms should be enough for now.
v2: fix some typos in CIK code
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes turning power and clock on when it is actually needed.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For manual UVD/VCE power and clock gating.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Was never used as far as I can see.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Create a GC_CAC_IND_INDEX/DATA pair of funcitons to program
all the CAC registers
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
which avoids job->vm_pd_addr be changed.
V2: pass job structure to amdgpu_vm_grab_id and amdgpu_vm_flush directly.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Keep the time we don't have a fence associated with the resource smaller.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Same problem as with the VM page tables. The user fence address must be
determined before the job is scheduled, not when the IB is executed.
This fixes a security problem where user fences could be used to overwrite
any part of VRAM.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add ability to specify instance in select_se_sh callback.
Defaults to 0xffffffff all over the driver.
(v2) Don't enable INSTANCE_BROADCAST by default
(v3) Style changes
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>