Simplify drivers by ensuring lifetime of ib_ah object. The changes
in .create_ah() go hand in hand with relevant update in .destroy_ah().
We will use this opportunity and convert .destroy_ah() to don't fail, as
it was suggested a long time ago, because there is nothing to do in case
of failure during destroy.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.
Make ib_udata the only argument psssed.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Following the PD conversion patch, do the same for ucontext allocations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().
We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Update the driver to use the new device capability to report 64-bit UAR
PFNs.
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that CTX objects is cleared before access.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that PD objects is cleared before access.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce a 'flags' field to destroy address handle callback and add a
flag that marks whether the callback is executed in an atomic context or
not.
This will allow drivers to wait for completion instead of polling for it
when it is allowed.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce a 'flags' field to create address handle callback and add a flag
that marks whether the callback is executed in an atomic context or not.
This will allow drivers to wait for completion instead of polling for it
when it is allowed.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The internal flag IP_BASED_GIDS was added to a field that was being used
to hold the port Info CapabilityMask without considering the effects this
will have. Since most drivers just use the value from the HW MAD it means
IP_BASED_GIDS will also become set on any HW that sets the IBA flag
IsOtherLocalChangesNoticeSupported - which is not intended.
Fix this by keeping port_cap_flags only for the IBA CapabilityMask value
and store unrelated flags externally. Move the bit definitions for this to
ib_mad.h to make it clear what is happening.
To keep the uAPI unchanged define a new set of flags in the uapi header
that are only used by ib_uverbs_query_port_resp.port_cap_flags which match
the current flags supported in rdma-core, and the values exposed by the
current kernel.
Fixes: b4a26a2728 ("IB: Report using RoCE IP based gids in port caps")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.
Fixes: 8b10ba783c ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add the required functions needed to support SRQs. Currently, kernel
clients are not supported. SRQs will only be available in userspace.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Nitish Bhat <bnitish@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Added support for two device caps - max_sge_rd, max_fast_reg_page_list_len
and the IP_BASED_GIDS port cap flag.
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Because the Mellanox code required being based on a net-next tree,
I keept it separate from the remainder of the RDMA stack submission
that is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
Kev/DSpIAzX1WOBkte+a
=rspQ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull Mellanox rdma updates from Doug Ledford:
"Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree, I
keept it separate from the remainder of the RDMA stack submission that
is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
IB/mlx5: Fix configuration of port capabilities
IB/mlx4: Take source GID by index from HW GID table
IB/mlx5: Fix blue flame buffer size calculation
IB/mlx4: Remove unused variable from function declaration
IB: Query ports via the core instead of direct into the driver
IB: Add protocol for USNIC
IB/mlx4: Support raw packet protocol
IB/mlx5: Support raw packet protocol
IB/core: Add raw packet protocol
IB/mlx5: Add implicit MR support
IB/mlx5: Expose MR cache for mlx5_ib
IB/mlx5: Add null_mkey access
IB/umem: Indicate that process is being terminated
IB/umem: Update on demand page (ODP) support
IB/core: Add implicit MR flag
IB/mlx5: Support creation of a WQ with scatter FCS offload
IB/mlx5: Enable QP creation with cvlan offload
IB/mlx5: Enable WQ creation and modification with cvlan offload
IB/mlx5: Expose vlan offloads capabilities
IB/uverbs: Enable QP creation with cvlan offload
...
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.
Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.
Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.
This patch doesn't add any new functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch series adds a driver for a paravirtual RDMA device. The
device is developed for VMware's Virtual Machines and allows existing RDMA
applications to continue to use existing Verbs API when deployed in VMs
on ESXi. We recently did a presentation in the OFA Workshop [1] regarding
this device.
Description and RDMA Support
============================
The virtual device is exposed as a dual function PCIe device. One part
is a virtual network device (VMXNet3) which provides networking properties
like MAC, IP addresses to the RDMA part of the device. The networking
properties are used to register GIDs required by RDMA applications to
communicate.
These patches add support and the all required infrastructure for
letting applications use such a device. We support the mandatory Verbs API as
well as the base memory management extensions (Local Inv, Send with Inv and
Fast Register Work Requests). We currently support both Reliable Connected
and Unreliable Datagram QPs but do not support Shared Receive Queues
(SRQs).
Also, we support the following types of Work Requests:
o Send/Receive (with or without Immediate Data)
o RDMA Write (with or without Immediate Data)
o RDMA Read
o Local Invalidate
o Send with Invalidate
o Fast Register Work Requests
This version only adds support for version 1 of RoCE. We will add RoCEv2
support in a future patch. We do support registration of both MAC-based
and IP-based GIDs. I have also created a git tree for our user-level driver
[2].
Testing
=======
We have tested this internally for various types of Guest OS - Red Hat,
Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12
using backported versions of this driver. The tests included several
runs of the performance tests (included with OFED), Intel MPI PingPong
benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough
to test the backported version of the driver internally on their hardware
using a VMware provided ESX build. I have also applied and tested this
with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch
series should be applied all together. I split out the commits so that
it may be easier to review.
PVRDMA Resources
================
[1] OFA Workshop Presentation -
https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf
[2] Libpvrdma User-level library -
http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: George Zhang <georgezhang@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>