Commit Graph

69 Commits

Author SHA1 Message Date
Omer Shpigelman
54bb67444e habanalabs: split MMU properties to PCI/DRAM
Split the properties used for MMU mappings to DRAM and PCI (host) types.
This is a prerequisite for future ASICs support.
Note that in Goya ASIC, the PMMU and DMMU are the same (except of page
sizes) as only one MMU mechanism is used for both of the mapping types.
Hence this patch should not have any effect on current behavior.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:46 +02:00
Omer Shpigelman
7b6e4ea0f7 habanalabs: type specific MMU cache invalidation
Add the ability to invalidate the necessary MMU cache only.
This ability is a prerequisite for future ASICs support.
Note that in Goya ASIC, a single cache is used for both host/DRAM
mappings and hence this patch should not have any effect on current
behavior.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:46 +02:00
Omer Shpigelman
7f74d4d335 habanalabs: re-factor memory module code
Some of the functions in the memory module code were too long and/or
contained multiple operations that are not always done together. Re-factor
the code by dividing those functions to smaller functions which are more
readable and maintainable.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:46 +02:00
Oded Gabbay
5d1012576d habanalabs: export uapi defines to user-space
The two defines that control the maximum size of a command buffer and the
maximum number of JOBS per CS need to be exported to the user as they are
part of the API towards user-space.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-11-21 11:35:46 +02:00
Oded Gabbay
bd4c8cb17d habanalabs: increase max jobs number to 512
In training, there is a need for a large amount of patching to the recipe.
This results in many command buffers contains a lot of DMA packets. The
number of command buffers per CS is larger than the current maximum of 64,
which is an arbitrary number that is enough for inference, but it has no
real affect on the code and/or resources of the host machine.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-11-21 11:35:45 +02:00
Oded Gabbay
62c1e124a9 habanalabs: add opcode to INFO IOCTL to return clock rate
Add a new opcode to the INFO IOCTL to allow the user application to
retrieve the ASIC's current and maximum clock rate. The rate is
returned in MHz.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2019-11-21 11:35:45 +02:00
Oded Gabbay
8fdacf2a53 habanalabs: set TPC Icache to 16 cache lines
Reduce latency to memory during TPC kernel execution.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2019-11-21 11:35:45 +02:00
Tomer Tayar
cb596aee88 habanalabs: Add a new H/W queue type
This patch adds a support for a new H/W queue type.
This type of queue is for DMA and compute engines jobs, for which
completion notification are sent by H/W.
Command buffer for this queue can be created either through the CB
IOCTL and using the retrieved CB handle, or by preparing a buffer on the
host or device SRAM/DRAM, and using the device address to that buffer.
The patch includes the handling of the 2 options, as well as the
initialization of the H/W queue and its jobs scheduling.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:45 +02:00
Tomer Tayar
df762375f1 habanalabs: Mark queue as expecting CB handle or address
Jobs on some queues must be provided with a handle to a driver command
buffer object, while for other queues, jobs must be provided with an
address to a command buffer.
Currently the distinction is done based on the queue type, which is less
flexible if the same queue type behaves differently on different
types of ASICs.
This patch adds a new queue property for this target, which is
configured per queue type per ASIC type.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:45 +02:00
Tomer Tayar
f435614ff5 habanalabs: Fix typos
s/paerser/parser/
s/requeusted/requested/
s/an JOB/a JOB/

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:45 +02:00
Oded Gabbay
6dc66f7c26 habanalabs: correctly cast variable to __le32
When using the macro le32_to_cpu(x), we need to correctly convert x to be
__le32 in case it is defined as u32 variable.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2019-09-05 14:55:28 +03:00
Oded Gabbay
4c172bbfaa habanalabs: stop using the acronym KMD
We want to stop using the acronym KMD. Therefore, replace all locations
(except for register names we can't modify) where KMD is written to other
terms such as "Linux kernel driver" or "Host kernel driver", etc.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay
e9730763a2 habanalabs: add uapi to retrieve aggregate H/W events
Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the
events counters are NOT cleared upon device reset, but count from the
loading of the driver.

Add the code to support it in the device event handling function.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay
75b3cb2bb0 habanalabs: add uapi to retrieve device utilization
Users and sysadmins usually want to know what is the device utilization as
a level 0 indication if they are efficiently using the device.

Add a new opcode to the INFO IOCTL that will return the device utilization
over the last period of 100-1000ms. The return value is 0-100,
representing as percentage the total utilization rate.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Tomer Tayar
ea451f88ef habanalabs: Expose devices after initialization is done
The char devices are currently exposed to user before the device and
driver initialization are done.
This patch moves the cdev and device adding to the system to the end of
the initialization sequence, while keeping the creation of the
structures at the beginning to allow the usage of dev_*().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Oded Gabbay
4d6a7751f6 habanalabs: create two char devices per ASIC
This patch changes the driver to create two char devices for each ASIC
it discovers. This is done to allow system/monitoring applications to
query the device for stats, information, idle state and more, while also
allowing the deep-learning application to send work to the ASIC.

One char device is the original device, hlX. IOCTL calls through this
device file can perform any task on the device (compute, memory, queries).
The open function for this device will fail if it was called before but
the file-descriptor it created was not completely released yet (the
release callback function is not called from the kernel until all
instances of that FD are closed). The driver needs to keep this behavior
to support backward compatibility with existing userspace, which count
that the open will fail if the device is "occupied".

The second char device is called "hl_controlDx", where x is the same index
of the main device with a minor number of the original char device + 1.
Applications that open this device can only call the INFO IOCTL. There is
no limitation on the number of applications opening this device.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
eb7caf84b0 habanalabs: maintain a list of file private data objects
This patch adds a new list to the driver's device structure. The list will
keep the file private data structures that the driver creates when a user
process opens the device.

This change is needed because it is useless to try to count how many FD
are open. Instead, track our own private data structure per open file and
once it is released, remove it from the list. As long as the list is not
empty, it means we have a user that can do something with our device.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
86d5307a6d habanalabs: rename user_ctx as compute_ctx
This patch renames the "user_ctx" field in the device structure to
"compute_ctx". This better reflects the meaning of this context.

In addition, we also check in the ctx_fini() that the debug mode should be
disabled only if the context being destroyed is the compute context. This
has no effect right now as we only have a single process and a single
context, but this makes the code more ready for multiple process support.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
b888751a02 habanalabs: add handle field to context structure
This patch adds a field to the context's structure that will hold a unique
handle for the context.

This will be needed when the user will create the context.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
ed0fc50535 habanalabs: cap simulator timeout
In the driver timeout functions, we give the simulator a factor of 10
in the timeout. This was necessary when the requested timeout is small
but if it was a few seconds, this can result in a very large timeout which
is unnecessary.

This patch caps the maximum timeout of the simulator to 10 seconds, which
is our largest timeout in the code. That is more then enough for anything
the simulator is doing.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:26 +03:00
Oded Gabbay
b9040c9941 habanalabs: fix endianness handling for internal QMAN submission
The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.

This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Ben Segal <bpsegal20@gmail.com>
2019-08-12 09:01:10 +03:00
Ben Segal
2aa4e41079 habanalabs: fix host memory polling in BE architecture
This patch fix a bug in the host memory polling macro. The bug is that the
memory being polled can be written by the device, which always writes it
in LE. However, if the host is running Linux in BE mode, we need to
convert the value that was written by the device before matching it to the
required value that the caller has given to the macro.

Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-07-29 11:40:25 +03:00
Tomer Tayar
e8960ca06b habanalabs: Add busy engines bitmask to HW idle IOCTL
The information which is currently provided as a response to the
"HL_INFO_HW_IDLE" IOCTL is merely a general boolean value.
This patch extends it and provides also a bitmask that indicates which
of the device engines are busy.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-07-01 13:59:45 +00:00
Tomer Tayar
06deb86a74 habanalabs: Add debugfs node for engines status
Command submissions sent to the device are composed of command buffers
which are targeted to different device engines, like DMA and compute
entities. When a command submission gets stuck, knowing in which engine
the stuck is, is crucial for debugging.
This patch adds a debugfs node that exports this information, by
displaying the engines' various registers that assemble their idle/busy
status.
The information retrieval is based on the is_device_idle ASIC function.
The printout in this function, of the first detected busy engine, is
removed because it becomes redundant in the presence of the more
elaborated info of the new debugfs node.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-07-01 13:59:45 +00:00
Oded Gabbay
95b5a8b83e habanalabs: add MMU mappings for Goya CPU
This patch adds the necessary MMU mappings for the Goya CPU to access the
device DRAM and the host memory.

The first 256MB of the device DRAM is being mapped. That's where the F/W
is running.

The 2MB area located on the host memory for the purpose of communication
between the driver and the device CPU is also being mapped.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-29 17:30:04 +03:00
Oded Gabbay
cbb10f1e4a habanalabs: don't limit packet size for device CPU
This patch removes a limitation on the maximum packet size that is read by
the device CPU as that limitation is not needed.

Therefore, the patch also removes an elaborate calculation that is based
on this limitation which is also not needed now. Instead, use a fixed
value for the memory pool size of the packets.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-17 01:08:23 +03:00
Omer Shpigelman
a1e537b3f0 habanalabs: increase PCI ELBI timeout for Palladium
This patch increases the timeout for PCI ELBI configuration to support low
frequency Palladium images.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-13 14:44:50 +03:00
Oded Gabbay
921a465ba7 habanalabs: pass device pointer to asic-specific function
This patch adds a new parameter that is passed to the
add_end_of_cb_packets() asic-specific function.

The parameter is the pointer to the driver's device structure. The
function needs this pointer for future ASICs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-12 16:53:16 +03:00
Oded Gabbay
a08b51a9a0 habanalabs: change polling functions to macros
This patch changes two polling functions to macros, in order to make their
API the same as the standard readl_poll_timeout so we would be able to
define the "condition for exit" when calling these macros.

This will simplify the code as it will eliminate the need to check both
for timeout and for the (cond) in the calling function.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-09 01:48:23 +03:00
Oded Gabbay
19734970c9 habanalabs: force user to set device debug mode
This patch adds the implementation of the HL_DEBUG_OP_SET_MODE opcode in
the DEBUG IOCTL.

It forces the user who wants to debug the device to set the device into
debug mode before he can configure the debug engines. The patch also makes
sure to disable debug mode upon user releasing FD, in case the user forgot
to disable debug mode.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-04 17:36:06 +03:00
Omer Shpigelman
d1287493ab habanalabs: minor documentation and prints fixes
This patch fixes comments on various structure members and some spelling
errors in log messages.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-05 13:24:24 +03:00
Omer Shpigelman
89225ce4fc habanalabs: halt debug engines on user process close
This patch fix a potential bug where a user's process has closed
unexpectedly without disabling the debug engines. In that case, the debug
engines might continue running but because the user's MMU mappings are
going away, we will get page fault errors.

This behavior is also opposed to the general rule where nothing runs on
the device after the user process closes.

The patch stops the debug H/W engines upon process termination and thus
makes sure nothing runs on the device after the process goes away.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-24 22:46:15 +03:00
Dalit Ben Zoor
b1b537713e habanalabs: increase timeout if working with simulator
Where there is a spike in the CPU consumption, it may cause
random failures in the C/I since the KMD timeout for CPU
and/or QMAN0 jobs expires and it stops communicating to the simulator.
This commit fixes it by increasing timeout on polling functions
if working with simulator.

Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-30 17:18:51 +03:00
Dalit Ben Zoor
5809e18e02 habanalabs: remove redundant member from parser struct
use_virt_addr member was used for telling whether to treat the
addresses in the CB as virtual during parsing. We disabled it only
when calling the parser from the driver memset device function,
and since this call had been removed, it should always be enabled.

Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-01 13:16:18 +03:00
Tomer Tayar
94cb669ceb habanalabs: Manipulate DMA addresses in ASIC functions
Routing device accesses to the host memory requires the usage of a base
offset, which is canceled by the iATU just before leaving the device.
The value of the base offset might be distinctive between different ASIC
types.
The manipulation of the addresses is currently used throughout the
driver code, and one should be aware to it whenever providing a host
memory address to the device.
This patch removes this manipulation from the driver common code, and
moves it to the ASIC specific functions that are responsible for
host memory allocation/mapping.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-01 11:28:15 +03:00
Oded Gabbay
d9c3aa8038 habanalabs: rename functions to improve code readability
This patch renames four functions in the ASIC-specific functions section,
so it will be easier to differentiate them from the generic kernel
functions with the same name.

This will help in future code reviews, to make sure we don't use the
kernel functions directly.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-05-01 11:47:04 +03:00
Tomer Tayar
03d5f641dc habanalabs: Use single pool for CPU accessible host memory
The device's CPU accessible memory on host is managed in a dedicated
pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) -
which are allocated from generic DMA pools.
Due to address length limitations of the CPU, the addresses of all these
memory regions must have the same MSBs starting at bit 40.
This patch modifies the allocation of the PQ and EQ to be also from the
dedicated pool, to ensure compliance with the limitation.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-28 19:17:38 +03:00
Oded Gabbay
a38693d775 habanalabs: return old dram bar address upon change
This patch changes the ASIC interface function that changes the DRAM bar
window. The change is to return the old address that the DRAM bar pointed
to instead of an error code.

This simplifies the code that use this function (mainly in debugfs) to
restore the bar to the old setting.

This is also needed for easier support in future ASICs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-28 10:18:35 +03:00
Oded Gabbay
027d35d0b6 habanalabs: rename restore to ctx_switch when appropriate
This patch only does renaming of certain variables and structure members,
and their accompanied comments.

This is done to better reflect the actions these variables and members
represent.

There is no functional change in this patch.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-25 20:15:42 +03:00
Oded Gabbay
b2377e032f habanalabs: use ASIC functions interface for rreg/wreg
This patch slightly changes the macros of RREG32 and WREG32, which are
used when reading or writing from registers.

Instead of directly calling a function in the common code from these
macros, the new code calls a function from the ASIC functions interface.

This change allows us to share much more code between real ASICs and
simulators, which in turn reduces the maintenance burden and
the chances for forgetting to port code between the ASIC files.

The patch also implements the hl_poll_timeout macro, instead of calling
the generic readl_poll_timeout macro. This is required to allow use of
this macro in the simulator files.

As a result from this change, more functions in goya.c are shared with the
simulator and therefore, should not be defined as static.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-22 11:49:06 +03:00
Tomer Tayar
e00dac3daa habanalabs: Cancel pr_fmt() definition dependency on includes order
pr_fmt() should be defined before including linux/printk.h, either
directly or indirectly, in order to avoid redefinition of the macro.
Currently the macro definition is in habanalabs.h, which is included in
many files, and that makes the addition/reorder of includes to be prone
to compilation errors.
This patch cancels this dependency by defining the macro only in the few
source files that use it.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-10 15:18:46 +03:00
Oded Gabbay
295938406c habanalabs: ASIC_AUTO_DETECT enum value is redundant
This patch removes the enum value of ASIC_AUTO_DETECT because we can use
the validity of the pdev variable to know whether we have a real device or
a simulator. For a real device, we detect the asic type from the device ID
while for a simulator, the simulator code calls create_hdev() with the
specified ASIC type.

Set ASIC_INVALID as the first option in the enum to make sure that no
other enum value will receive the value 0 (which indicates a non-existing
entry in the simulator array).

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-04 14:33:34 +03:00
Oded Gabbay
bedd14425d habanalabs: refactoring in goya.c
This patch does some refactoring in goya.c to make code more reusable
between goya code and the goya simulator code (which is not upstreamed).

In addition, the patch removes some dead functions from goya.c which are
not used by the current upstream code

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-02 15:56:16 +03:00
Omer Shpigelman
315bc055ed habanalabs: add new IOCTL for debug, tracing and profiling
Habanalabs ASICs use the ARM coresight infrastructure to support debug,
tracing and profiling of neural networks topologies.

Because the coresight is configured using register writes and reads, and
some of the registers hold sensitive information (e.g. the address in
the device's DRAM where the trace data is written to), the user must go
through the kernel driver to configure this mechanism.

This patch implements the common code of the IOCTL and calls the
ASIC-specific function for the actual H/W configuration.

The IOCTL supports configuration of seven coresight components:
ETR, ETF, STM, FUNNEL, BMON, SPMU and TIMESTAMP

The user specifies which component he wishes to configure and provides a
pointer to a structure (located in its process space) that contains the
relevant configuration.

The common code copies the relevant data from the user-space to kernel
space and then calls the ASIC-specific function to do the H/W
configuration.

After the configuration is done, which is usually composed
of several IOCTL calls depending on what the user wanted to trace, the
user can start executing the topology. The trace data will be written to
the user's area in the device's DRAM.

After the tracing operation is complete, and user will call the IOCTL
again to disable the tracing operation. The user also need to read
values from registers for some of the components (e.g. the size of the
trace data in the device's DRAM). In that case, the user will provide a
pointer to an "output" structure in user-space, which the IOCTL code will
fill according the to selected component.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-01 22:31:22 +03:00
Dalit Ben Zoor
aa957088b4 habanalabs: add device status option to INFO IOCTL
This patch adds a new opcode to INFO IOCTL that returns the device status.

This will allow users to query the device status in order to avoid sending
command submissions while device is in reset.

Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-24 10:15:44 +02:00
Oded Gabbay
d9973871da habanalabs: keep track of the device's dma mask
This patch refactors the code that is responsible to set the DMA mask for
the device.

Upon each change of the dma mask, the driver will save the new value that
was set. This is needed in order to make sure we don't try to increase the
mask a second time, in case we failed in the first time. This is
especially relevant for Power machines, as that may cause a change in
configuration of the TVT which will break the device.

Goya will first try to set the device's dma mask to 39 bits, so that the
memory that is allocated on the host machine for communication with the
device's cpu will be in a bus address which is lower then 39 bits. Later,
Goya will try to increase that mask to 48 bits, but only if setting the
mask to 39 bits was successful.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-07 18:03:23 +02:00
Omer Shpigelman
66542c3b9d habanalabs: add MMU shadow mapping
This patch adds shadow mapping to the MMU module. The shadow mapping
allows traversing the page table in host memory rather reading each PTE
from the device memory.
It brings better performance and avoids reading from invalid device
address upon PCI errors.
Only at the end of map/unmap flow, writings to the device are performed in
order to sync the H/W page tables with the shadow ones.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-24 09:17:55 +02:00
Tomer Tayar
c811f7bc77 habanalabs: Add a printout with the name of a busy engine
Print the name of a busy engine when checking if a device is idle.
The change is done mainly to help a user to pinpoint problems in his
topology's recipe.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-07 14:26:02 +02:00
Tomer Tayar
b6f897d75d habanalabs: Move PCI code into common file
Move duplicated PCI-related code from ASIC-specific files into the common
pci.c file.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-05 16:48:42 +02:00
Tomer Tayar
3110c60fdc habanalabs: Move device CPU code into common file
This patch moves the code that is responsible of the communication
vs. the F/W to a dedicated file. This will allow us to share the code
between different ASICs.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-04 10:22:09 +02:00