This patch fixes a bug where the first_pipe index passed into init_pipelines()
was a #define instead of the value that is passed into amdkfd by radeon
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jammy Zhou <Jammy.Zhou@amd.com>
This patch fixes a bug when calling to init_pipeline() interface.
The index that was passed to that function didn't take into account the
first_pipe value, which represents the first pipe index that is under amdkfd's
responsibility.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jammy Zhou <Jammy.Zhou@amd.com>
This patch replaces the two current amdkfd module parameters with a new one.
The current parameters that are being replaced are:
- Maximum number of HSA processes
- Maximum number of queues per process
The new parameter that replaces them is called "Maximum queues per device"
This replacement achieves two goals:
- Allows the user to have as many HSA processes as it wants (until
a maximum of 512 HSA processes in Kaveri).
- Removes the limitation the user had on maximum number of queues per HSA
process. E.g. the user can now have processes which only have one queue and
other processes which have hundreds of queues, while before the user
couldn't have more than 128 queues per process (as default).
The default value of the new parameter is 4096 (32 * 128, which were the
defaults of the old parameters). There is almost no additional GART memory
required for the default case. As a reminder, this amount of queues requires a
little bit below 4MB of GART memory.
v2:
In addition, This patch defines a new counter for queues accounting in the DQM
structure. This is done because the current counter only counts active queues
which allows the user to create more queues than the
max_num_of_queues_per_device module parameter allows.
However, we need the current counter for the runlist packet build process, so
the solution is to have a dedicated counter for this accounting.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Ben Goz <ben.goz@amd.com>
If the first queue created was failed on DQM then PQM should
unregister the process from DQM.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The work queue couldn't reliably prevent the SW ring buffer from
overflowing, so dmesg was spammed by
kfd kfd: Interrupt ring overflow, dropping interrupt.
messages when running e.g. the Atlantis Substance demo from
https://wiki.unrealengine.com/Linux_Demos on Kaveri.
Since the SW ring buffer doesn't actually do anything at this point, just
remove it for now. When actual interrupt processing code is added to
amdkfd, it should try to do things immediately and only defer to work
queues when necessary.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch changes kfd_ioctl() to be very similar to drm_ioctl().
The patch defines an array of amdkfd_ioctls, which maps IOCTL definition to the
ioctl function.
The kfd_ioctl() uses that mapping to call the appropriate ioctl function,
through a function pointer.
This patch also declares a new typedef for the ioctl function pointer.
v2: Renamed KFD_COMMAND_(START|END) to AMDKFD_...
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This patch reformats the ioctl definitions in kfd_ioctl.h to be similar to the
drm ioctls definition style.
v2: Renamed KFD_COMMAND_(START|END) to AMDKFD_...
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This patch moves the copy_to_user() and copy_from_user() calls from the
different ioctl functions in amdkfd to the general kfd_ioctl() function, as
this is a common code for all ioctls.
This was done according to example taken from drm_ioctl.c
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This patch fixes a bug where deallocate_vmid() didn't actually unmap the
VMID<-->PASID mapping (in the registers).
That can cause undefined behavior.
This bug only occurs in non-HWS mode.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This patch fixes a bug in DQM, where the MQD of a newly created compute queue
is not loaded to an HQD slot. As a result, the CP never reads packets from this
queue.
This bug happens only in non-HWS (hardware scheduling) mode. In HWS mode, the
CP is responsible of loading MQDs to HQDs slots.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Commit "amdkfd: use sizeof(long) granularity for the pasid bitmask" calculated
the number of longs it will need, but ended up allocating that number of
bytes rather than longs.
Fix that silly error and allocate the amount of data really required.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Original code sent always 0 as the index number of the node. This patch fixes
this bug by sending a variable which is incremented per node.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch fixes a device QCM bug, where the number of queues were not
counted correctly for the operation of update queue. The count was incorrect
as there was no regard to the previous state of the queue.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Since the user space may call open() more that once from the same process,
the aperture initialization should be moved from kfd_open()
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch displays the firmware version of the microcode that is currently
running in the MEC.
This is needed for the HSA RT, so it could differentiate its behavior based on
fw version. e.g. workarounds for bugs in fw
v2: Send the KGD_ENGINE_MEC1 as a parameter to the get_fw_version()
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This patch checks if the process that opens the /dev/kfd device is 32-bit
process. If so, it returns -EPERM and prints a warning message in dmesg.
This is done to prevent 32-bit user processes from using amdkfd, and hence, HSA
features.
AMD's HSA userspace stack will also support only 64-bit processes on Linux.
Reviewed-by: Alexey Skidanov <alexey.skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
In function acquire_packet_buffer() we may return -ENOMEM. In that case, we
should set the *buffer_ptr to NULL, so that calling functions which check the
*buffer_ptr value as a criteria for success, will know that
acquire_packet_buffer() failed.
Reviewed-by: Alexey Skidanov <alexey.skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
srcu callbacks are running in atomic context, we can't allocate using
__GFP_WAIT.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
All the bit operations (such as find_first_zero_bit()) read sizeof(long) bytes
at a time. If we allocated less than sizeof(long) bytes for the bitmask we
would be accessing invalid memory when working with the bitmask.
Change the allocator to allocate sizeof(long) multiples for the bitmask.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This is dead code. We don't need to unbind here, we can just return
directly.
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The mqds array members are not freed when dqm is uninitialized.
Reviewed-by: Ben Goz <Ben.Goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The call to kernel_queue_uninit(NULL) will trigger a BUG(), and also the
error code is incorrect.
Fixes: 45102048f7 ('amdkfd: Add process queue manager module')
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
There is a typo here so the errors from kfd_bind_process_to_device()
are not detected.
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch removes the dependency of amdkfd upon DRM_AMDGPU symbol in amdkfd's
Kconfig file.
This is done because amdgpu driver is not yet upstreamed and therefore,
DRM_AMDGPU symbol is not present in any Kconfig file.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch fixes a compilation error when using certain configuration by
including the file io.h in kfd_doorbell.c
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
amdkfd uses cpu_relax() in its sync_with_hw() function. Because cpu_relax() is
defined as 'REP; NOP' on x86_64, it will block the CPU from servicing
IOMMU PPR requests.
This may cause a deadlock, because sync_with_hw() won't be completed
until the PPR request has been served.
Therefore, we need to use schedule() instead of cpu_relax() as it is the
minimum requirement to allow other threads to execute.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
struct device_process_node was allocated during process registration but
not released at process deregistration.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch was done due to sparse warning. It changes the definition of
doorbell_ptr in queue_properties to be with __iomem attribute, so it would
match the type which the doorbell module functions are returning.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
v3: Removed the use of internal typedefs, fixed debug prints, added checks
for parameters and moved to using doorbell address from user
v4: Extracted some of the code in the create queue ioctl to a different
function that may be also called from other ioctls in the future.
Also fixed the check of the ring size argument.
v5:
Add support for AQL queues creation to enable working with open-source HSA
runtime
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds the interrupt handling module, in kfd_interrupt.c, and its
related members in different data structures to the amdkfd driver.
The amdkfd interrupt module maintains an internal interrupt ring per amdkfd
device. The internal interrupt ring contains interrupts that needs further
handling. The extra handling is deferred to a later time through a workqueue.
There's no acknowledgment for the interrupts we use. The hardware simply queues
a new interrupt each time without waiting.
The fixed-size internal queue means that it's possible for us to lose
interrupts because we have no back-pressure to the hardware.
v3:
Move amdkfd from drm/radeon/ to drm/amd/
Change device init
Made sure spin lock is taken only if init is complete
Moved bool field to the end of the structure
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The queue scheduler divides into two sections, one section is process bounded
and the other section is device bounded.
The device bounded section is handled by this module.
The DQM module handles queue setup, update and tear-down from the device side.
It also supports suspend/resume operation.
v3: Changed device_init, added the use of the new gart allocation functions an
Added documentation.
v4:
Fixed a race in DQM queue scheduler where dqm->lock must be held when accessing
dqm->queue_count and dqm->processes_count. This fixes runlist IB allocation
failures when DQM is under load.
Fixed race in DQM queue destruction where queues being destroyed must be
removed from qpd->queues_list prior to preemption, or concurrent queue
creation activity may reschedule them while their MQD is destroyed.
Fixed EOP queue size setting in CP_HPD_EOP_CONTROL, because the size is
specified as (log2(size_dwords)-1). The previous calculation assumed the
size was specified in bytes, which caused interference between EOP queues
when multiple MEC pipelines were active.
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Change format of mqd structure to match latest KV firmware
Add support for AQL queues creation to enable working with open-source HSA
runtime
Remove unused unmap_queue function
Various fixes (Style, typos)
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The queue scheduler divides into two sections, one section is process bounded
and the other section is device bounded.
The process bounded section is handled by this module. The PQM handles usermode
queue setup, updates and tear-down.
v3:
Used kernel parameter to limit queues per process instead of define
Added use of doorbell address from user
v4:
Modified pqm_create_queue so that only when creating usermode queues the
driver should return the queue properties to the userspace.
Added an info message print when no more queues can be opened because of the
queue per process limitation
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Various fixes
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The packet manager module builds PM4 packets for the sole use of the CP
scheduler. Those packets are used by the HIQ to submit runlists to the CP.
v3:
Removed include of cik_mqds.h
Changed lower_32/upper_32 calls to use linux macros
Used new gart allocation functions
Added documentation
v5:
Move amdkfd from drm/radeon/ to drm/amd/
Change format of mqd structure to match latest KV firmware
Add support for AQL queues creation to enable working with open-source HSA
runtime
Always chain runlist if you have more than 1 process or if you have
over-subscription over the number of queues.
Various fixes (typos, style)
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This patch adds a new parameter to the amdkfd driver. This parameter enables
the user to select the scheduling policy of the CP. The choices are:
* CP Scheduling with support for over-subscription
* CP Scheduling without support for over-subscription
* Without CP Scheduling
Note that the third option (Without CP scheduling) is only for debug purposes
and bringup of new H/W. As such, it is _not_ guaranteed to work at all times on
all H/W versions.
v3: Fixed description of parameter, changed the permissions to read_only, added
a verification of the value and added documentation
v5: Set default sched_policy to HWS as it is now supported by firmware
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>