Commit Graph

15 Commits

Author SHA1 Message Date
Leon Romanovsky
21a428a019 RDMA: Handle PD allocations by IB/core
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().

We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08 16:51:04 -07:00
Parav Pandit
d5108e69fe IB/rxe: Make counters thread safe
Current rxe device counters are not thread safe.
When multiple QPs are used, they can be racy.
Make them thread safe by making it atomic64.

Fixes: 0b1e5b99a4 ("IB/rxe: Add port protocol stats")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 14:09:45 -07:00
Zhu Yanjun
8c9959689b IB/rxe: make rxe_unregister_device void
Since the function rxe_unregister_device always returns 0, it is changed
to void.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08 14:22:54 -07:00
Vijay Immanuel
4e4c53df56 IB/rxe: avoid back-to-back retries
Error retries can occur due to timeouts, NAKs or receiving
packets beyond the current read request. Avoid back-to-back
retries due to packet processing, by only retrying the initial
attempt immediately. Subsequent retries must be due to timeouts.

Continue to process completion packets after scheduling a retry.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30 17:27:15 -04:00
Vijay Immanuel
b97db58557 IB/rxe: fix for duplicate request processing and ack psns
Don't reset the resp opcode for a replayed read response.
The resp opcode could be in the middle of a write or send
sequence, when the duplicate read request was received.
An example sequence is as follows:
- Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22.
- Receive write first PSN 23.
  At this point the resp psn is 24 and resp opcode is write first.
- The sender notices that PSN 20 is dropped and retransmits.
  Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22. The resp opcode is
  set to -1, the resp psn remains 24.
- Receive write first PSN 23. This is processed by duplicate_request().
  The resp opcode remains -1 and resp psn remains 24.
- Receive write middle PSN 24. check_op_seq() reports a missing
  first error since the resp opcode is -1.

When sending an ack for a duplicate send or write request,
use the psn of the previous ack sent. Do not use the psn
of a read response for the ack.
An example sequence is as follows:
- Receive write PSN 30. Transmit ACK for PSN 30.
- Receive read request 4KB PSN 31. Transmit read response with
  PSN 31. The resp psn is now 32.
- The sender notices that PSN 30 is dropped and retransmits.
  Receive write PSN 30. duplicate_request() sends an ACK with
  PSN 31. That is incorrect since PSN 31 was a read request.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30 17:22:12 -04:00
Vijay Immanuel
d3c04a3a68 IB/rxe: vary the source udp port for receive scaling
Select the source udp port number for a QP based on the
source QPN. This provides a better spread of traffic
across NIC RX queues for RC/UC QPs.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30 16:31:50 -04:00
Hernán Gonzalez
c33bab622d IB/rxe: Remove unused variable (char *rxe_qp_state_name[])
Note: This is compile only tested as I have no access to the hw.  This
variable was not used anywhere in the code. Removing it saves 24 bytes.

add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24)
Function                                     old     new   delta
rxe_qp_state_name                             24       -     -24
Total: Before=3348732, After=3348708, chg -0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:40 -07:00
Bart Van Assche
bb3ffb7ad4 RDMA/rxe: Fix rxe_qp_cleanup()
rxe_qp_cleanup() can sleep so it must be run in thread context and
not in atomic context. This patch avoids that the following bug is
triggered:

Kernel BUG at 00000000560033f3 [verbose debug info unavailable]
BUG: sleeping function called from invalid context at net/core/sock.c:2761
in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000000b6e69628>] __do_softirq+0x4e/0x540
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xbf
 ___might_sleep+0x177/0x260
 lock_sock_nested+0x1d/0x90
 inet_shutdown+0x2e/0xd0
 rxe_qp_cleanup+0x107/0x140 [rdma_rxe]
 rxe_elem_release+0x18/0x80 [rdma_rxe]
 rxe_requester+0x1cf/0x11b0 [rdma_rxe]
 rxe_do_task+0x78/0xf0 [rdma_rxe]
 tasklet_action+0x99/0x270
 __do_softirq+0xc0/0x540
 run_ksoftirqd+0x1c/0x70
 smpboot_thread_fn+0x1be/0x270
 kthread+0x117/0x130
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:20 -05:00
Andrew Boyer
b9109b7ddb IB/rxe: Fix destination cache for IPv6
To successfully match an IPv6 path, the path cookie must match. Store it
in the QP so that the IPv6 path can be reused.

Replace open-coded version of dst_check() with the actual call, fixing the
logic. The open-coded version skips the check call if dst->obsolete is 0
(DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE
means that the route may continue to be used, though.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:33 -04:00
Andrew Boyer
bfc3ae0566 IB/rxe: Disable completion upcalls when a CQ is destroyed
This prevents the stack from accessing userspace objects while they
are being torn down.

One possible sequence of events:
 - Userspace program exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
   ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn't destroyed yet
   - The CQ is referenced by the QP, so the CQ isn't destroyed yet
   - The UCQ is kfree()'d anyway
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
   for is_closed, but it has no way to check the ib_ucq_object before
   accessing it

The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
yonatanc
cee2688e3c IB/rxe: Offload CRC calculation when possible
Use CPU ability to perform CRC calculations, by
replacing direct calls to crc32_le() with crypto_shash_updata().

The overall performance gain measured with ib_send_bw tool is 10% and it
was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

---------------------------------------------------------------------------------------------
|             | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
---------------------------------------------------------------------------------------------
| crc32_le    | 1048576 | 9000       | inf             | 497.60             | 0.000498      |
| CRC offload | 1048576 | 9000       | inf             | 546.70             | 0.000547      |
---------------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00
Yonatan Cohen
0b1e5b99a4 IB/rxe: Add port protocol stats
Expose new counters using the get_hw_stats callback.
We expose the following counters:

+---------------------+----------------------------------------+
|      Name           |           Description                  |
|---------------------+----------------------------------------|
|sent_pkts            | number of sent pkts                    |
|---------------------+----------------------------------------|
|rcvd_pkts            | number of received packets             |
|---------------------+----------------------------------------|
|out_of_sequence      | number of errors due to packet         |
|                     | transport sequence number              |
|---------------------+----------------------------------------|
|duplicate_request    | number of received duplicated packets. |
|                     | A request that previously executed is  |
|                     | named duplicated.                      |
|---------------------+----------------------------------------|
|rcvd_rnr_err         | number of received RNR by completer    |
|---------------------+----------------------------------------|
|send_rnr_err         | number of sent RNR by responder        |
|---------------------+----------------------------------------|
|rcvd_seq_err         | number of out of sequence packets      |
|                     | received                               |
|---------------------+----------------------------------------|
|ack_deffered         | number of deferred handling of ack     |
|                     | packets.                               |
|---------------------+----------------------------------------|
|retry_exceeded_err   | number of times retry exceeded         |
|---------------------+----------------------------------------|
|completer_retry_err  | number of times completer decided to   |
|                     | retry                                  |
|---------------------+----------------------------------------|
|send_err             | number of failed send packet           |
+---------------------+----------------------------------------+

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:43:28 -04:00
Bart Van Assche
839f5ac0d8 IB/rxe: Remove a pointless indirection layer
Neither rxe->ifc_ops nor any of the function pointers in struct
struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops
indirection mechanism.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
32404fb764 IB/rxe: Let the compiler check the type of the cleanup functions
Change the argument type of these functions from void * into
struct rxe_pool_entry *.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Moni Shoua
8700e3e7c4 Soft RoCE driver
Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04 11:13:12 -04:00