Commit Graph

35 Commits

Author SHA1 Message Date
Oded Gabbay
975ab7b32b habanalabs: count dropped CS because max CS in-flight
There is a case where the user reaches the maximum number of CS in-flight.
In that case, the driver rejects the new CS of the user with EAGAIN. Count
that event so the user can query the driver later to see if it happened.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Hillf Danton
0db575350c habanalabs: make use of dma_mmap_coherent
Add dma_mmap_coherent() for goya and gaudi to match their use of
dma_alloc_coherent(), see the Link tag for why.

Link: https://lore.kernel.org/lkml/20200609091727.GA23814@lst.de/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zhang Li <li.zhang@bitmain.com>
Cc: Ding Z Nan <oshack@hotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay
c5e0ec66f0 habanalabs: clear vm_pgoff before doing the mmap
The driver use vm_pgoff to hold the CB idr handle. Before we actually call
the mapping function, we need to clear the handle so there won't be any
garbage left in vm_pgoff.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay
3174ac9bb1 habanalabs: restructure hl_mmap
Arrange the hl_mmap code to be more structured and expandable for the
future. Add better defines that describe our usage of the vm_pgoff.

Note that I shamelessly took the code and defines from the amdkfd driver
(my previous driver).

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay
2f55342c5e habanalabs: replace armcp with the generic cpucp
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.

Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
farah kassabri
9f3064913e habanalabs: add support for getting device total energy
Add driver implementation for reading the total energy consumption
from the device ARM FW.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Tomer Tayar
56004701f5 habanalabs: Include linux/bitfield.h only in habanalabs.h
Include linux/bitfield.h only in habanalabs.h, instead of in each and
every file that needs it, as habanalabs.h is already included by all.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
farah kassabri
d90416c84d habanalabs: extend busy engines mask to 64 bits
change busy engines bitmask to 64 bits in order to represent
more engines, needed for future ASIC support.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay
65887291c6 habanalabs: use FIELD_PREP() instead of <<
Use the standard FIELD_PREP() macro instead of << operator to perform
bitmask operations. This ensures type check safety and eliminate compiler
warnings.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay
bd4ef37292 habanalabs: eliminate redundant else condition
If both parts of if-else are goto statements, we can remove the else and
put the else goto statement after the if statement.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay
f907af183b habanalabs: cast int to u32 before printing it with %u
%u is used for unsigned so we need to cast the int variable to u32 to avoid
compiler warning.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Oded Gabbay
f5b9c8cf25 habanalabs: change CB's ID to be 64 bits
Although the possible values for CB's ID are only 32 bits, there are a few
places in the code where this field is shifted and passed into a function
which expects 64 bits.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton
2e5eda4681 habanalabs: PCIe Advanced Error Reporting support
driver will now get notified upon any PCI error occurred and
will respond according to the severity of the error.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton
843839bec3 habanalabs: expose sync manager resources allocation in INFO IOCTL
Although the driver defines the first user-available sync manager object
and monitor in habanalabs.h, we would like to also expose this information
via the INFO IOCTL so the runtime can get this information dynamically.
This is because in future ASICs we won't need to define it statically.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton
0a068adde5 habanalabs: add information about PCIe controller
Update firmware header with new API for getting pcie info
such as tx/rx throughput and replay counter.
These counters are needed by customers for monitor and maintenance
of multiple devices.
Add new opcodes to the INFO ioctl to retrieve these counters.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton
a98d73c7fa habanalabs: Replace dma-fence mechanism with completions
habanalabs driver uses dma-fence mechanism for synchronization.
dma-fence mechanism was designed solely for GPUs, hence we purpose
a simpler mechanism based on completions to replace current
dma-fence objects.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:48 +03:00
Oded Gabbay
b71590efb2 habanalabs: increase length of ASIC name
Future ASIC names are longer than 15 chars so increase the variable length
to 32 chars.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-09-22 18:49:48 +03:00
Moti Haimovski
6396feabf7 habanalabs: prevent user buff overflow
This commit fixes a potential debugfs issue that may occur when
reading the clock gating mask into the user buffer since the
user buffer size was not taken into consideration.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-31 15:10:27 +03:00
Ofir Bitton
5aba368893 habanalabs: correctly report inbound pci region cfg error
During inbound iATU configuration we can get errors while
configuring PCI registers, there is a certain scenario in which these
errors are not reflected and driver is loaded with wrong configuration.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:58 +03:00
Ofir Bitton
0839152f8c habanalabs: check correct vmalloc return code
vmalloc can return different return code than NULL and a valid
pointer. We must validate it in order to dereference a non valid
pointer.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:58 +03:00
Ofir Bitton
bce382a8bb habanalabs: validate FW file size
We must validate FW size in order not to corrupt memory in case
a malicious FW file will be present in system.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:58 +03:00
Colin Ian King
804d057cfa habanalabs: fix incorrect check on failed workqueue create
The null check on a failed workqueue create is currently null checking
hdev->cq_wq rather than the pointer hdev->cq_wq[i] and so the test
will never be true on a failed workqueue create. Fix this by checking
hdev->cq_wq[i].

Addresses-Coverity: ("Dereference before null check")
Fixes: 5574cb2194 ("habanalabs: Assign each CQ with its own work queue")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:58 +03:00
Oded Gabbay
58361aae4b habanalabs: set max power according to card type
In Gaudi, the default max power setting is different between PCI and PMC
cards. Therefore, the driver need to set the default after knowing what is
the card type.

The current code has a bug where it limits the maximum power of the PMC
card to 200W after a reset occurs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00
Ofir Bitton
36545279f0 habanalabs: proper handling of alloc size in coresight
Allocation size can go up to 64bit but truncated to 32bit,
we should make sure it is not truncated and validate no address
overflow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00
Ofir Bitton
1cff119740 habanalabs: verify user input in cs_ioctl_signal_wait
User input must be validated before using it to
access internal structures.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00
Dan Carpenter
eeec23cd32 habanalabs: Fix memory corruption in debugfs
This has to be a long instead of a u32 because we write a long value.
On 64 bit systems, this will cause memory corruption.

Fixes: c216477363 ("habanalabs: add debugfs support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00
Ofir Bitton
bf6d10963e habanalabs: Validate user address before mapping
User address must be validated before driver performs address map.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:56 +03:00
Ofir Bitton
f1aae40e8d habanalabs: unmap PCI bars upon iATU failure
In case the driver fails to configure the PCI controller iATU, it needs to
unmap the PCI bars before exiting so if the driver is removed, the bars
won't be left mapped.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:56 +03:00
Wei Yongjun
22362aa30b habanalabs: remove unused but set variable 'ctx_asid'
Gcc report warning as follows:

drivers/misc/habanalabs/common/command_submission.c:373:6: warning:
 variable 'ctx_asid' set but not used [-Wunused-but-set-variable]
  373 |  int ctx_asid, rc;
      |      ^~~~~~~~

This variable is not used in function cs_timedout(), this commit
remove it to fix the warning.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20200729155902.33976-1-weiyongjun1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29 18:02:21 +02:00
Greg Kroah-Hartman
7b16a15524 habanalabs: fix up absolute include instructions
There's no need to try to be cute with the include file locations in the
Makefile, so just specify exactly where the files are.

Bonus is this fixes the problem of building with O= as well as trying to
just build the subdirectory alone.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Ben Segal <bpsegal20@gmail.com>
Cc: Christine Gharzuzi <cgharzuzi@habana.ai>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29 08:15:50 +02:00
Greg Kroah-Hartman
65a9bde6ed Linux 5.8-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb
 eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW
 yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh
 2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE
 QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B
 CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G
 171s5Hs=
 =BQIl
 -----END PGP SIGNATURE-----

Merge 5.8-rc7 into char-misc-next

This should resolve the merge/build issues reported when trying to
create linux-next.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27 11:49:37 +02:00
Tomer Tayar
94f8be9eb0 habanalabs: Fix memory leak in error flow of context initialization
Add a missing free of the cs_pending array in the error flow of context
initialization.

Fixes: c16d45f42b ("habanalabs: Use pending CS amount per ASIC")

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:40:06 +03:00
Oded Gabbay
8df8cb1efc habanalabs: enable device before hw_init()
Device is now enabled before the hw_init() because part of the
initialization requires communication with the device firmware to get
information that is required for the initialization itself

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-07-24 20:31:37 +03:00
Ofir Bitton
a04b7cd97e habanalabs: create internal CB pool
Create a device MMU-mapped internal command buffer pool, in order to allow
the driver to allocate CBs for the signal/wait operations
that are fetched by the queues when they are configured with the user's
address space ID.

We must pre-map this internal pool due to performance issues.

This pool is needed for future ASIC support and it is currently unused in
GOYA and GAUDI.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:37 +03:00
Oded Gabbay
70b2f993ea habanalabs: create common folder
For internal needs of our CI we need to move all the common code into a
common folder instead of putting them in the root folder of the driver.

Same applies to the common header files under include/

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-24 20:31:37 +03:00