Commit Graph

8 Commits

Author SHA1 Message Date
Dalit Ben Zoor
5809e18e02 habanalabs: remove redundant member from parser struct
use_virt_addr member was used for telling whether to treat the
addresses in the CB as virtual during parsing. We disabled it only
when calling the parser from the driver memset device function,
and since this call had been removed, it should always be enabled.

Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-01 13:16:18 +03:00
Oded Gabbay
027d35d0b6 habanalabs: rename restore to ctx_switch when appropriate
This patch only does renaming of certain variables and structure members,
and their accompanied comments.

This is done to better reflect the actions these variables and members
represent.

There is no functional change in this patch.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-25 20:15:42 +03:00
Oded Gabbay
cab8e3e20d habanalabs: improve error messages
This patch improves two error messages to help the user to
better understand what error occurred.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-27 09:44:28 +02:00
Oded Gabbay
680cb3991c habanalabs: ratelimit warnings at start of IOCTLs
At the start of some IOCTLs we check if the device is disabled or in reset.
If it is, we return -EBUSY and print a message to kernel log.

Because these IOCTLs can be called at very high frequency, use ratelimit
to avoid spamming the kernel log. Also use the same type of message -
dev_warn - in all the relevant IOCTLs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-05 13:53:22 +02:00
Oded Gabbay
cbaa99ed1b habanalabs: perform accounting for active CS
This patch adds accounting for active CS. Active means that the CS was
submitted to the H/W queues and was not completed yet.

This is necessary to support suspend operation. Because the device will be
reset upon suspend, we can only suspend after all active CS have been
completed. Hence, we need to perform accounting on their number.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-03 15:13:15 +02:00
Oded Gabbay
af5f7eea45 habanalabs: soft-reset device if context-switch fails
This patch fix a bug in the driver, where if the TPC or MME remains in
non-IDLE even after all the command submissions are done (due to user bug
or malicious user), then future command submissions will fail in the
context-switch stage and the driver will remain in "stuck" mode.

The fix is to do a soft-reset of the device in case the context-switch
fails, because the device should be IDLE during context-switch. If it is
not IDLE, then something is wrong and we should reset the compute engines.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-28 13:07:52 +01:00
Oded Gabbay
c216477363 habanalabs: add debugfs support
This patch adds debugfs support to the driver. It allows the user-space to
display information that is contained in the internal structures of the
driver, such as:
- active command submissions
- active user virtual memory mappings
- number of allocated command buffers

It also enables the user to perform reads and writes through Goya's PCI
bars.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:46 +01:00
Oded Gabbay
eff6f4a0e7 habanalabs: add command submission module
This patch adds the main flow for the user to submit work to the device.

Each work is described by a command submission object (CS). The CS contains
3 arrays of command buffers: One for execution, and two for context-switch
(store and restore).

For each CB, the user specifies on which queue to put that CB. In case of
an internal queue, the entry doesn't contain a pointer to the CB but the
address in the on-chip memory that the CB resides at.

The driver parses some of the CBs to enforce security restrictions.

The user receives a sequence number that represents the CS object. The user
can then query the driver regarding the status of the CS, using that
sequence number.

In case the CS doesn't finish before the timeout expires, the driver will
perform a soft-reset of the device.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:45 +01:00