Commit Graph

841824 Commits

Author SHA1 Message Date
Mike Marciniszyn
aa9b79ec37 IB/hfi1: Add missing INVALIDATE opcodes for trace
This was missed in the original implementation of the memory management
extensions.

Fixes: 0db3dfa03c ("IB/hfi1: Work request processing for fast register mr and invalidate")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
bf3b1e0ce0 IB/hfi1: Reduce excessive aspm inlines
Uninline the aspm API since it increases code space for no reason.

Move the aspm module param to the new aspm C file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
2b0ad2da8f IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR details
Add some helper functions to hide struct rvt_swqe details.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
d310c4bf8a IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a
non-zero reference count.  IBTA 11.2.2 notes no such return value or error
case:

	Output Modifiers:
	- Verb results:
	- Operation completed successfully.
	- Invalid HCA handle.
	- Invalid address handle.

ULPs never test for this error and this will leak memory.

The reference count exists to allow for driver independent progress
mechanisms to process UD SWQEs in parallel with post sends.  The SWQE will
hold a reference count until the UD SWQE completes and then drops the
reference.

Fix by removing need to reference count the AH.  Add a UD specific
allocation to each SWQE entry to cache the necessary information for
independent progress.  Copy the information during the post send
processing.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
fe2ac04712 IB/rdmavt: Set QP allowed opcodes after QP allocation
Currently QP allowed_ops is set after the QP is completely initialized.
This curtails the use of this optimization for any initialization before
allowed_ops is set.

Fix by adding a helper to determine the correct allowed_ops and moving the
setting of the allowed_ops to just after QP allocation.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
5136bfea7e IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full
When a completion queue is full, the associated queue pairs are not put
into the error state. According to the IBTA specification, this is a
violation.

Quote from IBTA spec:
C9-218: A Requester Class F error occurs when the CQ is inaccessible or
full and an attempt is made to complete a WQE.  The Affected QP shall be
moved to the error state and affiliated asynchronous errors generated as
described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The
current WQE and any subsequent WQEs are left in an unknown state.

C11-37: The CI shall generate a CQ Error when a CQ overrun is
detected. This condition will result in an Affiliated Asynchronous Error
for any associated Work Queues when they attempt to use that
CQ. Completions can no longer be added to the CQ. It is not guaranteed
that completions present in the CQ at the time the error occurred can be
retrieved. Possible causes include a CQ overrun or a CQ protection error.

Put the qp in error state when cq is full. Implement a state called full
to continue to put other associated QPs in error state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
f592ae3c99 IB/rdmavt: Fracture single lock used for posting and processing RWQEs
Usage of single lock prevents fetching posted and processing receive work
queue entries from progressing simultaneously and impacts overall
performance.

Fracture the single lock used for posting and processing Receive Work
Queue Entries (RWQEs) to allow the circular buffer to be filled and
emptied at the same time. Two new spinlocks - one for the producers and
one for the consumers used for posting and processing RWQEs simultaneously
and the two indices are define on two different cache lines. The threshold
count is used to avoid reading other index in different cache line every
time.

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Kamenee Arumugam
dabac6e460 IB/hfi1: Move receive work queue struct into uapi directory
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the
providers but are not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Kamenee Arumugam
239b0e52d8 IB/hfi1: Move rvt_cq_wc struct into uapi directory
The rvt_cq_wc struct elements are shared between rdmavt and the providers
but not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

In that case, move rvt_cq_wc struct into the rvt-abi.h header file and
create a rvt_k_cq_w for the kernel completion queue.

Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Jason Gunthorpe
371bb62158 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into rdma.git for-next

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 21:18:23 -03:00
Jianbo Liu
92ab1eb392 net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it
As the ingress ACL rules save vhca id and vport number to packet's
metadata REG_C_0, and the metadata matching for the rules in both fast
path and slow path are all added, enable this feature if supported.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
669ff1e32f RDMA/mlx5: Add vport metadata matching for IB representors
If vport metadata matching is enabled in eswitch, the rule created
must be changed to match on the metadata, instead of source port.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
a5641cb524 net/mlx5: E-Switch, Add match on vport metadata for rule in slow path
In slow path, packet that not matched by any offloaded rule is
forwarded to eswitch vport manager for further processing.
Add matching on metadata for peer miss rules in FDB, and rules which
forward packet to correct representor in esw manager NIC_RX table.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
c1286050cf net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager
In order to do matching on metadata in slow path when demuxing traffic
to representors, explicitly enable the feature that allows HW to pass
metadata REG_C_0 from FDB to eswitch manager NIC_RX table.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
5784386870 net/mlx5: E-Switch, Add query and modify esw vport context functions
Add esw vport query and modify functions, and exposing them is needed for
enabling or disabling registers passed as metatdata to vport NIC_RX table
in slow path.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
c01cfd0f11 net/mlx5: E-Switch, Add match on vport metadata for rule in fast path
If FW's capabilities and configurations meet the requirement of vport
metadata matching, this feature will be used. As the information
about vport number and vhca_id related to packet is already stored to
its metadata register, which is used as an indicator for perticular
vport, now we can change to match on this metadata for all the
offloading rules in fast path.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
8d212ff057 net/mlx5e: Specifying known origin of packets matching the flow
In vport metadata matching, source port number is replaced by metadata.
While FW has no idea about what it is in the metadata, a syndrome will
happen. Specify a known origin to avoid the syndrome.
However, there is no functional change because ANY_VPORT (0) is filled
in flow_source, the same default value as before, as a pre-step towards
metadata matching for fast path.
There are two other values can be filled in flow_source. When setting
0x1, packet matching this rule is from uplink, while 0x2 is for packet
from other local vports.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
7445cfb116 net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs
When a dual-port VHCA sends a RoCE packet on its non-native port, and the
packet arrives to its affiliated vport FDB, a mismatch might occur on the
rules that match the packet source vport as it is not represented by single
VHCA only in this case. So we change to match on metadata instead of source
vport.
To do that, a rule is created in all vports and uplink ingress ACLs, to
save the source vport number and vhca id in the packet's metadata in order
to match on it later.
The metadata register used is the first of the 32-bit type C registers. It
can be used for matching and header modify operations. The higher 16 bits
of this register are for vhca id, and the lower 16 ones is for vport
number.
This change is not for dual-port RoCE only. If HW and FW allow, the vport
metadata matching is enabled by default.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
bb0ee7dcc4 net/mlx5: Add flow context for flow tag
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Parav Pandit
91d6291c4e net/mlx5: Introduce a helper API to check VF vport
Introduce a helper API mlx5_eswitch_is_vf_vport() to check
if a given vport_num belongs to VF or not.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
84b0d6a7a1 net/mlx5: Support allocating modify header context from ingress ACL
That modify header action can be then attached to a steering rule in
the ingress ACL.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
f53297d678 net/mlx5: Get vport ACL namespace by vport index
The ingress and egress ACL root namespaces are created per vport and
stored into arrays. However, the vport number is not the same as the
index. Passing the array index, instead of vport number, to get the
correct ingress and egress acl namespace.

Fixes: 9b93ab981e ("net/mlx5: Separate ingress/egress namespaces for each vport")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Jianbo Liu
65c0f2c166 net/mlx5: Introduce vport metadata matching bits and enum constants
When a dual-port VHCA sends a RoCE packet on its non-native port, and
the packet arrives to its affiliated vport FDB, a mismatch might occur
on the rules that match the packet source vport. So we replace the
match on source port with the match on metadata that was configured in
ingress ACL, and that metadata will be passed further also to the NIC
RX table of the eswitch manager.

Introduce vport metadata matching bits and enum constants as a pre-step
towards metadata matching.
    o metadata type C registers in the misc parameters 2 fields.
    o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports
      ingress ACL for the uplink vport.
    o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to
      support propagating the metadata to the nic rx through the loopback
      path.
    o flow_source in flow context, to indicate the known origin of packets.
    o enum constants, to support the above bits.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Colin Ian King
10dcc7448e RDMA/hns: fix spelling mistake "attatch" -> "attach"
There is a spelling mistake in an dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 16:33:00 -03:00
Doug Ledford
34d65cd837 RDMA/netlink: Audit policy settings for netlink attributes
For all string attributes for which we don't currently accept the element
as input, we only use it as output, set the string length to
RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1.  That way we will only
accept a null string for that element.  This will prevent someone from
writing a new input routine that uses the element without also updating
the policy to have a valid value.

Also while there, make sure the existing entries that are valid have the
correct policy, if not, correct the policy.  Remove unnecessary checks
for nla_strlcpy() overflow once the policy has been set correctly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 16:26:54 -03:00
Lijun Ou
e9816ddf2a RDMA/hns: Cleanup unnecessary exported symbols
This patch removes the hns-roce.ko for cleanup all the exported symbols in
common part.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 14:48:44 -03:00
Mauro Carvalho Chehab
97162a1ee8 docs: infiniband: convert docs to ReST and rename to *.rst
The InfiniBand docs are plain text with no markups.  So, all we needed to
do were to add the title markups and some markup sequences in order to
properly parse tables, lists and literal blocks.

At its new index.rst, let's add a :orphan: while this is not linked to the
main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:23:09 -03:00
Dan Carpenter
b417c0879d RDMA/hns: Fix an error code in hns_roce_set_user_sq_size()
This function is supposed to return negative kernel error codes but here
it returns CMD_RST_PRC_EBUSY (2).  The error code eventually gets passed
to IS_ERR() and since it's not an error pointer it leads to an Oops in
hns_roce_v1_rsv_lp_qp()

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:21:13 -03:00
Colin Ian King
7ef7587541 RDMA/hns: fix potential integer overflow on left shift
There is a potential integer overflow when int i is left shifted as this
is evaluated using 32 bit arithmetic but is being used in a context that
expects an expression of type dma_addr_t.  Fix this by casting integer i
to dma_addr_t before shifting to avoid the overflow.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 2ac0bc5e72 ("RDMA/hns: Add a group interfaces for optimizing buffers getting flow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:18:19 -03:00
Matthew Wilcox
792c4e9d0b net/mlx5: Convert mkey_table to XArray
The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24 16:44:40 -07:00
Max Gurtovoy
7796d2a3bb RDMA/mlx5: Refactor MR descriptors allocation
Improve code readability using static helpers for each memory region
type. Re-use the common logic to get smaller functions that are easy
to maintain and reduce code duplication.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
2563e2f30a RDMA/mlx5: Use PA mapping for PI handover
If possibe, avoid doing a UMR operation to register data and protection
buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the
SG lists using PA access. This is safe, since the internal key for data
and protection never exposed to the remote server (only signature key
might be exposed). If PA mappings are not possible, perform mapping
using MTT/KLM descriptors.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1266.4K/1262.4K    1720.1K/1732.1K
4k    793139/570902      1129.6K/773982
32k   72660/72086        97229/96164

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1590.2K/1600.1K    1828.2K/1830.3K
4k    1078.1K/937272     1142.1K/815304
32k   77012/77369        98125/97435

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
de0ae958de RDMA/mlx5: Improve PI handover performance
In some loads, there is performance degradation when using KLM mkey
instead of MTT mkey. This is because KLM descriptor access is via
indirection that might require more HW resources and cycles.
Using KLM descriptor is not necessary when there are no gaps at the
data/metadata sg lists. As an optimization, use MTT mkey whenever it
is possible. For that matter, allocate internal MTT mkey and choose the
effective pi_mr for in transaction according to the required mapping
scheme.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o/baseline):

bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
4k    570902/571233/457874       773982/743293/642080
32k   72086/72388/71933          96164/71789/93249

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
4k    937272/921992/762934       815304/753772/646071
32k   77369/75052/72058          97435/73180/94612

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Idan Burstein <idanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
5c171cbe3a RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR code
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY
was used.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
e9a53e73a2 RDMA/rw: Use IB_WR_REG_MR_INTEGRITY for PI handover
Replace the old signature handover API with the new one. The new API
simplifes PI handover code complexity for ULPs and improve performance.
For RW API it will reduce the maximum number of work requests per task
and the need of dealing with multiple MRs (and their registrations and
invalidations) per task. All the mappings and registration of the data
and the protection buffers is done by the LLD using a single WR and a
special MR type (IB_MR_TYPE_INTEGRITY) for the PI handover operation.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1243.3K/1182.3K    1725.1K/1680.2K
4k    571233/528835      743293/748259
32k   72388/71086        71789/93573

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1572.1K/1427.2K    1823.5K/1724.3K
4k    921992/916194      753772/768267
32k   75052/73960        73180/95484

There is a performance degradation when writing big block sizes.
Degradation is caused by the complexity of combining multiple
indirections and perform RDMA READ operation from it. This will be
fixed in the following patches by reducing the indirections if
possible.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
6cb2d5b105 RDMA/rw: Introduce rdma_rw_inv_key helper
This is a preparation for adding new signature API to the rw-API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
185eddc457 RDMA/core: Validate integrity handover device cap
Protect the case that a ULP tries to allocate a QP with signature
enabled flag while the LLD doesn't support this feature.
While we're here, also move integrity_en attribute from mlx5_qp to
ib_qp as a preparation for adding new integrity API to the rw-API
(that is part of ib_core module).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
c0a6cbb9cb RDMA/core: Rename signature qp create flag and signature device capability
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN
and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
5a6781a558 RDMA/core: Add an integrity MR pool support
This is a preparation for adding new signature API to the rw-API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
b9294f8b7c IB/iser: Unwind WR union at iser_tx_desc
After decreasing WRs array size from 7 to 3 it is more
readable to give each WR a descriptive name.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
b76a439982 IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover
Using this new API reduces iSER code complexity.
It also reduces the maximum number of work requests per task and the need
of dealing with multiple MRs (and their registrations and invalidations)
per task. It is done by using a single WR and a special MR type
(IB_MR_TYPE_INTEGRITY) for PI operation.

The setup of the tested benchmark:
 - 2 servers with 24 cores (1 initiator and 1 target)
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=0 and read_verify=0 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512     1236.6K/1164.3K   1357.2K/1332.8K
1k      1196.5K/1163.8K   1348.4K/1262.7K
2k      1016.7K/921950    1003.7K/931230
4k      662728/600545     595423/501513
8k      385954/384345     333775/277090
16k     222864/222820     170317/170671
32k     116869/114896     82331/82244
64k     55205/54931       40264/40021

Using write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512     1090.1K/1030.9K   1303.9K/1101.4K
1k      1057.7K/904583    1318.4K/988085
2k      965226/638799     1008.6K/692514
4k      555479/410151     542414/414517
8k      298675/224964     264729/237508
16k     133485/122481     164625/138647
32k     74329/67615       80143/78743
64k     35716/35519       39294/37334

We get performance improvement at all block sizes.
The most significant improvement is when writing 4k bs (almost 30% more
iops).

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
38ca87c6f1 RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work request
This new WR will be used to perform PI (protection information) handover
using the new API. Using the new API, the user will post a single WR that
will internally perform all the needed actions to complete PI operation.
This new WR will use a memory region that was allocated as
IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the
registration. In the old API, in order to perform a signature handover
operation, each ULP should perform the following:
1. Map and register the data buffers.
2. Map and register the protection buffers.
3. Post a special reg WR to configure the signature handover operation
   layout.
4. Invalidate the signature memory key.
5. Invalidate protection buffers memory key.
6. Invalidate data buffers memory key.

In the new API, the mapping of both data and protection buffers is
performed using a single call to ib_map_mr_sg_pi function. Also the
registration of the buffers and the configuration of the signature
operation layout is done by a single new work request called
IB_WR_REG_MR_INTEGRITY.
This patch implements this operation for mlx5 devices that are capable to
offload data integrity generation/validation while performing the actual
buffer transfer.
This patch will not remove the old signature API that is used by the iSER
initiator and target drivers. This will be done in the future.

In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work
request, we are using a single UMR operation to register both data and
protection buffers using KLM's.
Afterwards, another UMR operation will describe the strided block format.
These will be followed by 2 SET_PSV operations to set the memory/wire
domains initial signature parameters passed by the user.
In the end of the whole transaction, only the signature memory key
(the one that exposed for the RDMA operation) will be invalidated.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
22465bba39 RDMA/mlx5: Update set_sig_data_segment attribute for new signature API
Explicitly pass the sig_mr and the access flags for the mkey segment
configuration. This function will be used also in the new signature
API, so modify it in order to use it in both APIs. This is a preparation
commit before adding new signature API.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
9ac7c4bcd3 RDMA/mlx5: Pass UMR segment flags instead of boolean
UMR ctrl segment flags can vary between UMR operations. for example,
using inline UMR or adding free/not-free checks for a memory key.
This is a preparation commit before adding new signature API that
will not need not-free checks for the internal memory key during the
UMR operation.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
62e3c379d4 RDMA/mlx5: Add attr for max number page list length for PI operation
PI offload (protection information) is a feature that each RDMA provider
can implement differently. Thus, introduce new device attribute to define
the maximal length of the page list for PI fast registration operation. For
example, mlx5 driver uses a single internal MR to map both data and
protection SGL's, so it's equal to max_fast_reg_page_list_len / 2.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
6c984472ba RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrity
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the
mlx5 memory region prior to the registration operation. In the new
API, the mlx5 driver will allocate an internal memory region for the
UMR operation to register both PI and data SG lists. The internal MR
will use KLM mode in order to map 2 (possibly non-contiguous/non-align)
SG lists using 1 memory key. In the new API, each ULP will use 1 memory
region for the signature operation (instead of 3 in the old API). This
memory region will have a key that will be exposed to remote server to
perform RDMA operation. The internal memory key that will map the SG lists
will stay private.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
7c717d3aee RDMA/core: Add signature attrs element for ib_mr structure
This element will describe the needed characteristics for the signature
operation per signature enabled memory region (type IB_MR_TYPE_INTEGRITY).
Also add meta_length attribute to ib_sig_attrs structure for saving the
mapped metadata length (needed for the new API implementation).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
2cdfcdd867 RDMA/core: Introduce ib_map_mr_sg_pi to map data/protection sgl's
This function will map the previously dma mapped SG lists for PI
(protection information) and data to an appropriate memory region for
future registration.
The given MR must be allocated as IB_MR_TYPE_INTEGRITY.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Israel Rukshin
26bc7eaee9 RDMA/core: Introduce IB_MR_TYPE_INTEGRITY and ib_alloc_mr_integrity API
This is a preparation for signature verbs API re-design. In the new
design a single MR with IB_MR_TYPE_INTEGRITY type will be used to perform
the needed mapping for data integrity operations.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
a0bc099abf RDMA/core: Save the MR type in the ib_mr structure
This is a preparation for the signature verbs API change. This change is
needed since the MR type will define, in the upcoming patches, the need
for allocating internal resources in LLD for signature handover related
operations. It will also help to make sure that signature related
functions are called with an appropriate MR type and fail otherwise.
Also introduce new mr types IB_MR_TYPE_USER, IB_MR_TYPE_DMA and
IB_MR_TYPE_DM for correctness.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00