After removing the parsing of the command submission
when doing memset of the device memory, goya_validate_dma_pkt_host
is never called by the kernel, so there is no need to check
context id.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
use_virt_addr member was used for telling whether to treat the
addresses in the CB as virtual during parsing. We disabled it only
when calling the parser from the driver memset device function,
and since this call had been removed, it should always be enabled.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Routing device accesses to the host memory requires the usage of a base
offset, which is canceled by the iATU just before leaving the device.
The value of the base offset might be distinctive between different ASIC
types.
The manipulation of the addresses is currently used throughout the
driver code, and one should be aware to it whenever providing a host
memory address to the device.
This patch removes this manipulation from the driver common code, and
moves it to the ASIC specific functions that are responsible for
host memory allocation/mapping.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch renames four functions in the ASIC-specific functions section,
so it will be easier to differentiate them from the generic kernel
functions with the same name.
This will help in future code reviews, to make sure we don't use the
kernel functions directly.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
There is no need to parse the command submission when doing memset
of the device memory using the DMA engine because only the driver calls
the memset function and therefore, the CS is trusted and doesn't require
validation and patching.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The device's CPU accessible memory on host is managed in a dedicated
pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) -
which are allocated from generic DMA pools.
Due to address length limitations of the CPU, the addresses of all these
memory regions must have the same MSBs starting at bit 40.
This patch modifies the allocation of the PQ and EQ to be also from the
dedicated pool, to ensure compliance with the limitation.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch changes the ASIC interface function that changes the DRAM bar
window. The change is to return the old address that the DRAM bar pointed
to instead of an error code.
This simplifies the code that use this function (mainly in debugfs) to
restore the bar to the old setting.
This is also needed for easier support in future ASICs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch only does renaming of certain variables and structure members,
and their accompanied comments.
This is done to better reflect the actions these variables and members
represent.
There is no functional change in this patch.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch slightly changes the macros of RREG32 and WREG32, which are
used when reading or writing from registers.
Instead of directly calling a function in the common code from these
macros, the new code calls a function from the ASIC functions interface.
This change allows us to share much more code between real ASICs and
simulators, which in turn reduces the maintenance burden and
the chances for forgetting to port code between the ASIC files.
The patch also implements the hl_poll_timeout macro, instead of calling
the generic readl_poll_timeout macro. This is required to allow use of
this macro in the simulator files.
As a result from this change, more functions in goya.c are shared with the
simulator and therefore, should not be defined as static.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds missing fields of start address 0 and 1 in the bmon
parameter structure that is received from the user in the debug IOCTL.
Without these fields, the functionality of the bmon trace is broken,
because there is no configuration of the base address of the filter of the
bus monitor.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch re-factors goya_parse_cb_no_ext_queue() to make it more
readable by inverting the check inside the first if statement so the bulk
of the function won't be inside an if statement.
The patch also fixes a spelling error in the name of the function.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
pr_fmt() should be defined before including linux/printk.h, either
directly or indirectly, in order to avoid redefinition of the macro.
Currently the macro definition is in habanalabs.h, which is included in
many files, and that makes the addition/reorder of includes to be prone
to compilation errors.
This patch cancels this dependency by defining the macro only in the few
source files that use it.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
During hard-reset, contexts are closed as part of the tear-down process.
After a context is closed, the driver cleans up the page tables of that
context in the device's DRAM. This action is both dangerous and
unnecessary.
It is unnecessary, because the device is going through a hard-reset, which
means the device's DRAM contents are no longer valid and the device's MMU
is being reset.
It is dangerous, because if the hard-reset came as a result of a PCI
freeze, this action may cause the entire host machine to hang.
Therefore, prevent all device PTE updates when a hard-reset operation is
pending.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch makes some improvement in how IOCTLs behave when the device is
disabled or under reset.
The new code checks, at the start of every IOCTL, if the device is
disabled or in reset. If so, it prints an appropriate kernel message and
returns -EBUSY to user-space.
In addition, the code modifies the location of where the
hard_reset_pending flag is being set or cleared:
1. It is now cleared immediately after the reset *tear-down* flow is
finished but before the re-initialization flow begins.
2. It is being set in the remove function of the device, to make the
behavior the same with the hard-reset flow
There are two exceptions to the disable or in reset check:
1. The HL_INFO_DEVICE_STATUS opcode in the INFO IOCTL. This opcode allows
the user to inquire about the status of the device, whether it is
operational, in reset or malfunction (disabled). If the driver will
block this IOCTL, the user won't be able to retrieve the status in
case of malfunction or in reset.
2. The WAIT_FOR_CS IOCTL. This IOCTL allows the user to inquire about the
status of a CS. We want to allow the user to continue to do so, even if
we started a soft-reset process because it will allow the user to get
the correct error code for each CS he submitted.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes a bug in the implementation of the function that removes
the device.
The bug can happen when the device is removed but not the driver itself
(e.g. remove by the OS due to PCI freeze in Power architecture).
In that case, there maybe open users that are calling IOCTLs while the
device is removed. This is a possible race condition that the driver must
handle. Otherwise, a kernel panic may occur.
This race is prevented in the hard-reset flow, because the driver makes
sure the users are closed before continuing with the hard-reset. This
race can not occur when the driver itself is removed because the OS makes
sure all the file descriptors are closed.
The fix is to make sure the open users close their file descriptors and if
they don't (after a certain amount of time), the driver sends them a
SIGKILL, because the remove of the device can't be stopped.
The patch re-uses the same code that is called from the hard-reset flow.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
To make the memory ioctl code more readable, this patch moves the
legacy/debug code path of mmu-disabled to a separate function, which is
called (if necessary) from the main memory ioctl function.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch removes the enum value of ASIC_AUTO_DETECT because we can use
the validity of the pdev variable to know whether we have a real device or
a simulator. For a real device, we detect the asic type from the device ID
while for a simulator, the simulator code calls create_hdev() with the
specified ASIC type.
Set ASIC_INVALID as the first option in the enum to make sure that no
other enum value will receive the value 0 (which indicates a non-existing
entry in the simulator array).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch does some refactoring in goya.c to make code more reusable
between goya code and the goya simulator code (which is not upstreamed).
In addition, the patch removes some dead functions from goya.c which are
not used by the current upstream code
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the ASIC-specific function for GOYA to configure the
coresight components.
Most of the components have an enabled/disabled flag, depending on whether
the user wants to enable the component or disable it.
For some of the components, such as ETR and SPMU, the user can also
request to read values from them. Those values are needed for the user to
parse the trace data.
The ETR configuration is also checked for security purposes, to make sure
the trace data is written to the device's DRAM.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Habanalabs ASICs use the ARM coresight infrastructure to support debug,
tracing and profiling of neural networks topologies.
Because the coresight is configured using register writes and reads, and
some of the registers hold sensitive information (e.g. the address in
the device's DRAM where the trace data is written to), the user must go
through the kernel driver to configure this mechanism.
This patch implements the common code of the IOCTL and calls the
ASIC-specific function for the actual H/W configuration.
The IOCTL supports configuration of seven coresight components:
ETR, ETF, STM, FUNNEL, BMON, SPMU and TIMESTAMP
The user specifies which component he wishes to configure and provides a
pointer to a structure (located in its process space) that contains the
relevant configuration.
The common code copies the relevant data from the user-space to kernel
space and then calls the ASIC-specific function to do the H/W
configuration.
After the configuration is done, which is usually composed
of several IOCTL calls depending on what the user wanted to trace, the
user can start executing the topology. The trace data will be written to
the user's area in the device's DRAM.
After the tracing operation is complete, and user will call the IOCTL
again to disable the tracing operation. The user also need to read
values from registers for some of the components (e.g. the size of the
trace data in the device's DRAM). In that case, the user will provide a
pointer to an "output" structure in user-space, which the IOCTL code will
fill according the to selected component.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch removes an extra ; after the closing brackets of a while loop.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Unmapping ptes in the device MMU on Palladium can take a long time, which
can cause a kernel BUG of CPU soft lockup.
This patch minimize the chances for this bug by sleeping a little between
unmapping ptes.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
GIT does not like extra blank lines at the end of the file, so this patch
removes those lines.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Because DMA #0 is now used by the user, remove the limitation of credits
from this channel. Without this patch, this channel is pretty much
unusable due to its very low bandwidth configuration.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds a new opcode to INFO IOCTL that returns the device status.
This will allow users to query the device status in order to avoid sending
command submissions while device is in reset.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch allows the user to modify the TPC PLL clock relaxation value
on-the-fly in order to reduce power consumption.
To enable this, the patch removes the protection from the specific
register that controls this behavior.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On init or context switch, set TPC clock relaxation counter
register to a golden value.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Hard-reset of our device should never fail, due to dangers of permanent
damage to the H/W.
This patch removes the last place in the reset path where the driver might
exit before doing the actual reset.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch refactors the code that is responsible to set the DMA mask for
the device.
Upon each change of the dma mask, the driver will save the new value that
was set. This is needed in order to make sure we don't try to increase the
mask a second time, in case we failed in the first time. This is
especially relevant for Power machines, as that may cause a change in
configuration of the TVT which will break the device.
Goya will first try to set the device's dma mask to 39 bits, so that the
memory that is allocated on the host machine for communication with the
device's cpu will be in a bus address which is lower then 39 bits. Later,
Goya will try to increase that mask to 48 bits, but only if setting the
mask to 39 bits was successful.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes the implementation of suspend/resume of the device so that
upon resume of the device, the host won't crash due to PCI completion
timeout.
Upon suspend, the device is being reset due to PERST. Therefore, upon
resume, the driver must initialize the PCI controller as if the driver was
loaded. If the controller is not initialized and the device tries to
access the device through the PCI bars, the host will crash with PCI
completion timeout error.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds accounting for active CS. Active means that the CS was
submitted to the H/W queues and was not completed yet.
This is necessary to support suspend operation. Because the device will be
reset upon suspend, we can only suspend after all active CS have been
completed. Hence, we need to perform accounting on their number.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes the mapping of virtual address to physical addresses on
architectures where PAGE_SIZE is bigger than 4KB.
The break down to the device page size was done only for the virtual
address while it should have been done for the physical address as well.
As a result virtual addresses were mapped to wrong physical address.
The fix is to apply the break down for the physical addresses as well in
order to get correct mappings.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes a bug which led to a crash during hard reset flow.
Before a hard reset is executed, we wait a few seconds for the user
context cleanup to complete.
If it wasn't completed, we kill the user process and move on to the reset
flow.
Upon killing the user process, the context cleanup flow begins and may
take a while due to MMU unmaps.
Meanwhile, in the driver reset flow, we change the PCI DRAM bar location
which can interfere with the MMU that uses the bar.
If the context cleanup flow didn't finish quickly, a crash may occur due
to PCI DRAM bar mislocation during the MMU unmap.
Hence adding a wait between killing the user process and the start of the
reset flow.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes a bug of allocating a too big memory size with kmalloc,
which causes a failure.
In case of mapping a large memory block, an array of the relevant physical
page addresses is allocated. If there are many pages the array might be
too big to allocate with kmalloc, hence changing to kvmalloc.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The requested allocation size is 64bit, hence the number of requested
pages and the total requested size should 64bit as well.
This patch fixes all places where these are treated as 32bit.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The driver use the HWMON framework to display various sensors information.
Therefore, CONFIG_HWMON must be included to prevent build errors.
This patch adds "select HWMON" to the driver's Kconfig file to make sure
HWMON is built. In addition, to avoid breaking dependencies, it adds
dependency on HAS_IOMEM because HWMON is dependent on HAS_IOMEM.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When parsing the address of an internal command buffer, the driver prints
an error if the buffer's address is not in the range of the device's DRAM
or SRAM memory address space.
Use %px to print the real address that the user gave the driver and not a
hashed value, so the user will get a clue regarding the origin of his
error.
Note that if the print occurs, the pointer that is printed is a
user's virtual address and not some kind of physical address.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix compilation error in 32-bit ARM architecture regarding
division of 2 64-bit variables.
Use the kernel do_div() macro, which is implemented per architecture, for
doing these divisions instead of using the / operator.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add __cpu_to_le16/32/64 and __le16/32/64_to_cpu where needed according to
sparse.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes the following sparse warnings:
drivers/misc/habanalabs/hwmon.c:20:56: warning: Using plain integer as NULL pointer
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug in the driver, where if the TPC or MME remains in
non-IDLE even after all the command submissions are done (due to user bug
or malicious user), then future command submissions will fail in the
context-switch stage and the driver will remain in "stuck" mode.
The fix is to do a soft-reset of the device in case the context-switch
fails, because the device should be IDLE during context-switch. If it is
not IDLE, then something is wrong and we should reset the compute engines.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Don't cast pointer to u64 to print it. Instead, print the pointer using
%p.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug when a command buffer with unaligned size (with
regard to PAGE_SIZE) was used. The accounting for the unmap operation
wasn't done correctly and could result in a memory leak.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug where EINVAL was returned instead of -EINVAL.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug where the timeout for sending a job on QMAN0 by KMD
wasn't enough in palladium environment.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug where DMA channel 0 completion address wasn't
initialized by the driver.
The patch sets the address to Sync Object no. 1007
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix a bug in the validation of WREG32 in DMA queues. The
validation was too strict. It allowed the user to set the completion
address only for DMA channel 1.
The fix allows the user to set the completion address for all 5 DMA
channels.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix an incorrect initialization of the MMU cache registers. The
shift operation was done in the wrong direction.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch provides a workaround for a bug in the F/W where the response
time for a request from KMD may take more then 100ms. This could cause the
queue between KMD and the F/W to get out of sync.
The WA is to:
1. Increase the timeout of ALL requests to 1s.
2. In case a request isn't answered in time, mark the state as
"cpu_disabled" and prevent sending further requests from KMD to the F/W.
This will eventually lead to a heartbeat failure and hard reset of the
device.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch provides a workaround for a H/W bug in Goya, where access to
RAZWI from TPC can cause PCI completion timeout.
The WA is to use the device MMU to map any unmapped DRAM memory to a
default page in the DRAM. That way, the TPC will never reach RAZWI upon
accessing a bad address in the DRAM.
When a DRAM page is mapped by the user, its default mapping is
overwritten. Once that page is unmapped, the MMU driver will map that page
to the default page.
To help debugging, the driver will set the default page area to 0x99 on
device initialization.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch provides a workaround for a H/W bug in the RAZWI logger in
Goya. The logger doesn't recognize the initiator correctly and as a
result, accesses from one initiator are reported that were coming from a
different initiator.
The WA is to print the error information from the event entries we receive
without looking at the RAZWI logger at all.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes the below sparse warnings by either making the functions
static or by adding a declaration in the relevant header file.
In addition, the patch removes goya_mmap completely as it doesn't add any
additional benefit.
Fixes the following sparse warnings:
drivers/misc/habanalabs/habanalabs_drv.c:24:1: warning: symbol 'hl_devs_idr' was not declared. Should it be static?
drivers/misc/habanalabs/habanalabs_drv.c:25:1: warning: symbol 'hl_devs_idr_lock' was not declared. Should it be static?
drivers/misc/habanalabs/memory.c:1451:5: warning: symbol 'hl_vm_ctx_init_with_ranges' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:396:5: warning: symbol 'goya_send_pci_access_msg' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:417:5: warning: symbol 'goya_pci_bars_map' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:557:6: warning: symbol 'goya_reset_link_through_bridge' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:774:5: warning: symbol 'goya_early_fini' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:857:6: warning: symbol 'goya_late_fini' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:971:5: warning: symbol 'goya_sw_fini' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:1233:5: warning: symbol 'goya_init_cpu_queues' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:2914:5: warning: symbol 'goya_suspend' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:2939:5: warning: symbol 'goya_resume' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:2952:5: warning: symbol 'goya_mmap' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:2957:5: warning: symbol 'goya_cb_mmap' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:2973:6: warning: symbol 'goya_ring_doorbell' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3063:6: warning: symbol 'goya_flush_pq_write' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3068:6: warning: symbol 'goya_dma_alloc_coherent' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3074:6: warning: symbol 'goya_dma_free_coherent' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3080:6: warning: symbol 'goya_get_int_queue_base' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3138:5: warning: symbol 'goya_send_job_on_qman0' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3295:5: warning: symbol 'goya_test_queue' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3417:6: warning: symbol 'goya_dma_pool_zalloc' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3426:6: warning: symbol 'goya_dma_pool_free' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3432:6: warning: symbol 'goya_cpu_accessible_dma_pool_alloc' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3448:6: warning: symbol 'goya_cpu_accessible_dma_pool_free' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3458:5: warning: symbol 'goya_dma_map_sg' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3467:6: warning: symbol 'goya_dma_unmap_sg' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:3473:5: warning: symbol 'goya_get_dma_desc_list_size' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4210:5: warning: symbol 'goya_parse_cb_no_mmu' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4261:5: warning: symbol 'goya_parse_cb_no_ext_quque' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4294:5: warning: symbol 'goya_cs_parser' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4307:6: warning: symbol 'goya_add_end_of_cb_packets' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4334:5: warning: symbol 'goya_context_switch' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4426:6: warning: symbol 'goya_restore_phase_topology' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4460:5: warning: symbol 'goya_debugfs_read32' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4510:5: warning: symbol 'goya_debugfs_write32' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4738:6: warning: symbol 'goya_handle_eqe' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:4836:6: warning: symbol 'goya_get_events_stat' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:5075:5: warning: symbol 'goya_send_heartbeat' was not declared. Should it be static?
drivers/misc/habanalabs/goya/goya.c:5253:5: warning: symbol 'goya_get_eeprom_data' was not declared. Should it be static?
Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch increase the size field in the uapi structure of the Memory
IOCTL from 32-bit to 64-bit. This is to allow the user to allocate and/or
map memory in chunks that are larger then 4GB.
Goya's device memory (DRAM) can be up to 16GB, and for certain
topologies, the user may want an allocation that is larger than 4GB.
This change doesn't break current user-space because there was a "pad"
field in the uapi structure right after the size field. Changing the size
field to be 64-bit and removing the pad field maintains compatibility with
current user-space.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes two smatch warnings about two if statements that are
always true because of the types of the variables used - u32 when
comparing the sum to u32_max.
The patch changes the types to be u64 so the accumalted sum can be checked
if it is larger than u32_max
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver can't read/write from i2c if the device is in reset or
disabled. Therefore, return -EBUSY in those cases instead of 0.
This change also fixes a smatch warning about uninitialized variable.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds shadow mapping to the MMU module. The shadow mapping
allows traversing the page table in host memory rather reading each PTE
from the device memory.
It brings better performance and avoids reading from invalid device
address upon PCI errors.
Only at the end of map/unmap flow, writings to the device are performed in
order to sync the H/W page tables with the shadow ones.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The addr/data32 debugfs nodes currently permit the access to only physical
addresses of a device. This patch extends it and allows accessing also
device's DRAM virtual addresses.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Print the name of a busy engine when checking if a device is idle.
The change is done mainly to help a user to pinpoint problems in his
topology's recipe.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Remove pointers to ASIC-specific functions and instead call the functions
explicitly as they are not accessed from outside the ASIC-specific files.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Move duplicated PCI-related code from ASIC-specific files into the common
pci.c file.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
At the start of some IOCTLs we check if the device is disabled or in reset.
If it is, we return -EBUSY and print a message to kernel log.
Because these IOCTLs can be called at very high frequency, use ratelimit
to avoid spamming the kernel log. Also use the same type of message -
dev_warn - in all the relevant IOCTLs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The Event Queue MSI/X ID is different per ASIC. This patch renames the
current define to have the GOYA_ prefix to mark it only for Goya. It also
moves it from the common armcp_if.h file to the ASIC specific goya_fw_if.h
file.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch moves the code that is responsible of the communication
vs. the F/W to a dedicated file. This will allow us to share the code
between different ASICs.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This will prevent unneeded include of header files, which may increase
compilation time.
Signed-off-by: Dotan Barak <dbarak@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The goya_non_fatal_events array actually contains all the possible events
the driver can receive from the F/W. Therefore, use a proper name
for the array.
The patch also adds missing event Ids to the goya_async_event_id enum.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds a definition of a new status in the device CPU boot stages
and add the handling of the new status.
Signed-off-by: Igor Grinberg <igrinberg@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The driver uses the DMA_BUF module which is built only if
DMA_SHARED_BUFFER is selected. DMA_SHARED_BUFFER doesn't have any
dependencies so it is ok to select it (as done by many other components).
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
send_cpu_message() doesn't update the result parameter when an error
occurs in its code. Therefore, callers of send_cpu_message() shouldn't use
the result value when the return code indicates error.
This patch fixes a static checker warning in goya_test_cpu_queue(), where
that function did print the result even though the return code from
send_cpu_message() indicated error.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A spin lock is taken here so we should use GFP_ATOMIC.
Fixes: 0feaf86d4e ("habanalabs: add virtual memory and MMU modules")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds debugfs support to the driver. It allows the user-space to
display information that is contained in the internal structures of the
driver, such as:
- active command submissions
- active user virtual memory mappings
- number of allocated command buffers
It also enables the user to perform reads and writes through Goya's PCI
bars.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch implements the INFO IOCTL. That IOCTL is used by the user to
query information that is relevant/needed by the user in order to submit
deep learning jobs to Goya.
The information is divided into several categories, such as H/W IP, Events
that happened, DDR usage and more.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the Virtual Memory and MMU modules.
Goya has an internal MMU which provides process isolation on the internal
DDR. The internal MMU also performs translations for transactions that go
from Goya to the Host.
The driver is responsible for allocating and freeing memory on the DDR
upon user request. It also provides an interface to map and unmap DDR and
Host memory to the device address space.
The MMU in Goya supports 3-level and 4-level page tables. With 3-level, the
size of each page is 2MB, while with 4-level the size of each page is 4KB.
In the DDR, the physical pages are always 2MB.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the main flow for the user to submit work to the device.
Each work is described by a command submission object (CS). The CS contains
3 arrays of command buffers: One for execution, and two for context-switch
(store and restore).
For each CB, the user specifies on which queue to put that CB. In case of
an internal queue, the entry doesn't contain a pointer to the CB but the
address in the on-chip memory that the CB resides at.
The driver parses some of the CBs to enforce security restrictions.
The user receives a sequence number that represents the CS object. The user
can then query the driver regarding the status of the CS, using that
sequence number.
In case the CS doesn't finish before the timeout expires, the driver will
perform a soft-reset of the device.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds support for doing various on-the-fly reset of Goya.
The driver supports two types of resets:
1. soft-reset
2. hard-reset
Soft-reset is done when the device detects a timeout of a command
submission that was given to the device. The soft-reset process only resets
the engines that are relevant for the submission of compute jobs, i.e. the
DMA channels, the TPCs and the MME. The purpose is to bring the device as
fast as possible to a working state.
Hard-reset is done in several cases:
1. After soft-reset is done but the device is not responding
2. When fatal errors occur inside the device, e.g. ECC error
3. When the driver is removed
Hard-reset performs a reset of the entire chip except for the PCI
controller and the PLLs. It is a much longer process then soft-reset but it
helps to recover the device without the need to reboot the Host.
After hard-reset, the driver will restore the max power attribute and in
case of manual power management, the frequencies that were set.
This patch also adds two entries to the sysfs, which allows the root user
to initiate a soft or hard reset.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch add the sysfs and hwmon entries that are exposed by the driver.
Goya has several sensors, from various categories such as temperature,
voltage, current, etc. The driver exposes those sensors in the standard
hwmon mechanism.
In addition, the driver exposes a couple of interfaces in sysfs, both for
configuration and for providing status of the device or driver.
The configuration attributes is for Power Management:
- Automatic or manual
- Frequency value when moving to high frequency mode
- Maximum power the device is allowed to consume
The rest of the attributes are read-only and provide the following
information:
- Versions of the various firmwares running on the device
- Contents of the device's EEPROM
- The device type (currently only Goya is supported)
- PCI address of the device (to allow user-space to connect between
/dev/hlX to PCI address)
- Status of the device (operational, malfunction, in_reset)
- How many processes are open on the device's file
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds support for receiving events from Goya's control CPU and
for receiving MSI-X interrupts from Goya's DMA engines and CPU.
Goya's PCI controller supports up to 8 MSI-X interrupts, which only 6 of
them are currently used. The first 5 interrupts are dedicated for Goya's
DMA engine queues. The 6th interrupt is dedicated for Goya's control CPU.
The DMA queue will signal its MSI-X entry upon each completion of a command
buffer that was placed on its primary queue. The driver will then mark that
CB as completed and free the related resources. It will also update the
command submission object which that CB belongs to.
There is a dedicated event queue (EQ) between the driver and Goya's control
CPU. The EQ is located on the Host memory. The control CPU writes a new
entry to the EQ for various reasons, such as ECC error, MMU page fault, Hot
temperature. After writing the new entry to the EQ, the control CPU will
trigger its dedicated MSI-X entry to signal the driver that there is a new
entry in the EQ. The driver will then read the entry and act accordingly.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the H/W queues module and the code to initialize Goya's
various compute and DMA engines and their queues.
Goya has 5 DMA channels, 8 TPC engines and a single MME engine. For each
channel/engine, there is a H/W queue logic which is used to pass commands
from the user to the H/W. That logic is called QMAN.
There are two types of QMANs: external and internal. The DMA QMANs are
considered external while the TPC and MME QMANs are considered internal.
For each external queue there is a completion queue, which is located on
the Host memory.
The differences between external and internal QMANs are:
1. The location of the queue's memory. External QMANs are located on the
Host memory while internal QMANs are located on the on-chip memory.
2. The external QMAN write an entry to a completion queue and sends an
MSI-X interrupt upon completion of a command buffer that was given to
it. The internal QMAN doesn't do that.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the basic part of Goya's H/W initialization. It adds code
that initializes Goya's internal CPU, various registers that are related to
internal routing, scrambling, workarounds for H/W bugs, etc.
It also initializes Goya's security scheme that prevents the user from
abusing Goya to steal data from the host, crash the host, change
Goya's F/W, etc.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the command buffer (CB) module, which allows the user to
create and destroy CBs and to map them to the user's process
address-space.
A command buffer is a memory blocks that reside in DMA-able address-space
and is physically contiguous so it can be accessed by the device without
MMU translation. The command buffer memory is allocated using the
coherent DMA API.
When creating a new CB, the IOCTL returns a handle of it, and the
user-space process needs to use that handle to mmap the buffer to get a VA
in the user's address-space.
Before destroying (freeing) a CB, the user must unmap the CB's VA using the
CB handle.
Each CB has a reference counter, which tracks its usage in command
submissions and also its mmaps (only a single mmap is allowed).
The driver maintains a pool of pre-allocated CBs in order to reduce
latency during command submissions. In case the pool is empty, the driver
will go to the slow-path of allocating a new CB, i.e. calling
dma_alloc_coherent.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds two modules - ASID and context.
Each user process that opens a device's file must have at least one
context before it is able to "work" with the device. Each context has its
own device address-space and contains information about its runtime state
(its active command submissions).
To have address-space separation between contexts, each context is assigned
a unique ASID, which stands for "address-space id". Goya supports up to
1024 ASIDs.
Currently, the driver doesn't support multiple contexts. Therefore, the
user doesn't need to actively create a context. A "primary context" is
created automatically when the user opens the device's file.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds a basic support for the Goya device. The code initializes
the device's PCI controller and PCI bars. It also initializes various S/W
structures and adds some basic helper functions.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch just adds a lot of header files that contain description of
Goya's registers.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds the habanalabs skeleton driver. The driver does nothing at
this stage except very basic operations. It contains the minimal code to
insmod and rmmod the driver and to create a /dev/hlX file per PCI device.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>