Introduce a mechanism that performs an handshake between the userspace
provider and kernel driver which verifies that the user supports all
required features in order to operate correctly.
The handshake verifies the needed functionality by comparing the reported
device caps and the provider caps. If the device reports a non-zero
capability the appropriate comp mask is required from the userspace
provider in order to allocate the context.
Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The device reports the minimum SQ size required for creation.
This patch queries the min SQ size and reports it back to the userspace
library.
Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Shadi Ammouri <sammouri@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The device reports the maximum number of bytes to be written before
ringing the doorbell (zero means unlimited).
This patch queries the max batch size and reports it back to the userspace
library.
Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Have the driver use shared CQs provided by the rdma core driver. This
provides the advantage of improved efficiency handling interrupts.
Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Have the driver use shared CQs provided by the rdma core driver. Since
this provides similar functionality to iser_comp it has been removed.
Now there is no reason to allocate very large CQs when the driver is
loaded while gaining the advantage of shared CQs. Previously when a single
connection was opened a CQ was opened for every core with enough space for
eight connections, this is a very large overhead that in most cases will
not be utilized.
Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Have the driver use shared CQs provided by the rdma core driver. Since
this provides similar functionality to iser_comp it has been removed. Now
there is no reason to allocate very large CQs when the driver is loaded
while gaining the advantage of shared CQs.
Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Acked-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The error codes in _ib_modify_qp() are supposed to be negative errno.
Fixes: 7a5c938b9e ("IB/core: Check for rdma_protocol_ib only after validating port_num")
Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Li Heng <liheng40@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Meir Lichtinger says:
====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_uar':
RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
RDMA/mlx5: Use MLX5_SET macro instead of local structure
RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.
With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.
Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.
Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Nnothing uses the enum name, so this is harmless.
Fixes: 3226944124 ("IB/mlx5: Introduce driver create and destroy flow methods")
Link: https://lore.kernel.org/r/20200724084112.GC31930@amd
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fix reported by kbuild warning.
drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
^
Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The kbuild reported the following warning, so clean whole uverbs_cmd.c
file.
drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment]
ret = uverbs_request(attrs, &cmd, sizeof(cmd));
^
drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used.
int ret = -EINVAL;
^
Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Up to ConnectX-7 setting mkey relaxed ordering read/write attributes
by UMR is not supported. ConnectX-7 supports this option, which is
indicated by two new HCA capabilities - relaxed_ordering_write_umr
and relaxed_ordering_read_umr.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The automatic object lifetime model allows us to change the write()
interface to have the same logic as the ioctl() path. Update the
create/alloc functions to be in the following format, so the code flow
will be the same:
* Allocate objects
* Initialize them
* Call to the drivers, this is last step that is allowed to fail
* Finalize object
* Return response and allow to core code to handle abort/commit
respectively.
Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Create the same logic flow for the write() interface as we have for the
ioctl() path by making sure that the object is committed or aborted
automatically after HW object creation.
Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently the SQ is set to a ready state when the RAW QP is modified to
INIT. When the TIS is modified, e.g. to change the lag_tx_affinity, then
SQs which are already in the ready state will not be affected.
Open a window to modify the SQ behavior by setting the SQ as ready only
when QP was modified to RTS.
Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Change the repeated word "it" in a comment to "it to".
Also insert a dash in the sentence to add clarity.
Link: https://lore.kernel.org/r/20200719003220.21250-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Modifying the post-send and post-recv to initialize the wqes slot by slot
dynamically depending on the number of max sges requested by consumer at
the time of QP creation.
Changed the QP creation logic to determine the size of SQ and RQ in 16B
slots based on the number of wqe and number of SGEs requested by consumer
Link: https://lore.kernel.org/r/1594822619-4098-6-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Adding few helper data structure which are useful to initialize hardware
send wqe in variable wqe mode.
Adding a qp flag in HSI to indicate variable wqe is enabled for this qp.
Link: https://lore.kernel.org/r/1594822619-4098-5-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Changing the PSN management memory buffers from statically initialized to
dynamic pull scheme.
During create qp only the start pointers are initialized and during
post-send the psn buffer is pulled based on current producer index.
Adjusting post_send code to accommodate dynamic psn-pull and changing
post_recv code to match post-send code wrt pseudo flush wqe generation.
Link: https://lore.kernel.org/r/1594822619-4098-4-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The bnxt_re driver now allocates shadow sq and rq to maintain per wqe
wr_id and few other flags required to support variable wqe. Segregated the
allocation of shadow queue in a separate function and adjust the cqe
polling logic. The new polling logic is based on shadow queue indices.
Link: https://lore.kernel.org/r/1594822619-4098-3-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The bnxt_re driver need to decide on how much SQ and RQ memory should to
be allocated and which wqe posting/polling algorithm to use.
Making changes to set the wqe-mode to a default value during device
registration sequence. The wqe-mode is passed to the lower layer driver as
well. Going forward in the lower layer driver wqe-mode will be used to
decide execution path. Initializing the wqe-mode to static wqe type for
now.
Link: https://lore.kernel.org/r/1594822619-4098-2-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed from the common ops and moved to
the RoCE only ops within the qedr driver.
Link: https://lore.kernel.org/r/20200714183414.61069-8-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.
Link: https://lore.kernel.org/r/20200714183414.61069-7-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.
Link: https://lore.kernel.org/r/20200714183414.61069-6-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Now that the query_pkey() isn't mandatory by the RDMA core for iwarp
providers, this callback can be removed.
Link: https://lore.kernel.org/r/20200714183414.61069-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The query_pkey() isn't mandatory for the iwarp providers, so remove this
requirement from the RDMA core.
Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Allocate the pkey cache only if the pkey_tbl_len is set by the provider,
also add checks to avoid accessing the pkey cache when it not initialized.
Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rxe_post_send_kernel() iterates over linked list of wr's, until the
wr->next ptr is NULL. However if we've got an interrupt after last wr is
posted, control may be returned to the code after send completion callback
is executed and wr memory is freed.
As a result, wr->next pointer may contain incorrect value leading to
panic. Store the wr->next on the stack before posting it.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200716190340.23453-1-m.malygin@yadro.com
Signed-off-by: Mikhail Malygin <m.malygin@yadro.com>
Signed-off-by: Sergey Kojushev <s.kojushev@yadro.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
User space should receive the maximum edpm size from kernel driver,
similar to other edpm/ldpm related limits. Add an additional parameter to
the alloc_ucontext_resp structure for the edpm maximum size.
In addition, pass an indication from user-space to kernel
(and not just kernel to user) that the DPM sizes are supported.
This is for supporting backward-forward compatibility between driver and
lib for everything related to DPM transaction and limit sizes.
This should have been part of commit mentioned in Fixes tag.
Link: https://lore.kernel.org/r/20200707063100.3811-3-michal.kalderon@marvell.com
Fixes: 93a3d05f9d ("RDMA/qedr: Add kernel capability flags for dpm enabled mode")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In older FW versions the completion flag was treated as the ack flag in
edpm messages. commit ff937b916e ("qed: Add EDPM mode type for user-fw
compatibility") exposed the FW option of setting which mode the QP is in
by adding a flag to the qedr <-> qed API.
This patch adds the qedr <-> libqedr interface so that the libqedr can set
the flag appropriately and qedr can pass it down to FW. Flag is added for
backward compatibility with libqedr.
For older libs, this flag didn't exist and therefore set to zero.
Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20200707063100.3811-2-michal.kalderon@marvell.com
Signed-off-by: Yuval Bason <yuval.bason@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
QP's with the same SRQ, working on different CQs and running in parallel
on different CPUs could lead to a race when maintaining the SRQ consumer
count, and leads to FW running out of SRQs. Update the consumer
atomically. Make sure the wqe_prod is updated after the sge_prod due to
FW requirements.
Fixes: 3491c9e799 ("qedr: Add support for kernel mode SRQ's")
Link: https://lore.kernel.org/r/20200708195526.31040-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Current iSER target code allocates MR pool budget based on queue size.
Since there is no handshake between iSER initiator and target on max IO
size, we'll set the iSER target to support upto 16MiB IO operations and
allocate the correct number of RDMA ctxs according to the factor of MR's
per IO operation. This would guarantee sufficient size of the MR pool for
the required IO queue depth and IO size.
Link: https://lore.kernel.org/r/20200708091908.162263-1-maxg@mellanox.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When using action drop dest_type was never assigned to any value. Add
initialization of dest_type to -1 since 0 is valid.
Fixes: f29de9eee7 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20200707110259.882276-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Instead of returning IB_LINK_LAYER_ETHERNET from rxe_link_layer, return it
directly from get_link_layer callback and remove rxe_link_layer().
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The return value from rxe_mem_init_dma() is always 0 - change it to be
void and fix the callers accordingly.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The return value from rxe_init_port_param() is always 0 - change it to be
void.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Both pkey_tbl_len and gid_tbl_len are set in rxe_init_port_param() - so no
need to check if they aren't set.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Enable the creation of rules with allow and count actions.
This enables using counters on egress flow tables.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename mlx5_ifc_device_virtio_emulation_cap_bits to
mlx5_ifc_virtio_emulation_cap_bits to match names produced by the
tools producing these auto generated files.
In addition missing capabilities that will be required by VDPA
implementation.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
VDPA is a new interface that will be added in subsequent patches. It
uses mlx5 core devices and resources. Add an interface type for it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>