Currently c4iw_peer_abort_intr() does not wake up the waiter if the
endpoint state indicates we're using MPAv2 and we're currently trying to
connect. This was introduced with commit 7c0a33d611 ("RDMA/cxgb4:
Don't wakeup threads for MPAv2")
However, this original fix is flawed because it introduces a race that
can cause a deadlock of the iwarp stack. Here is the race:
->local side sets up an active offload connection.
->local side sends MPA_START request.
->peer sends MPA_START response.
->local side ingress cpl thread begins processing the MPA_START response,
but before it changes the state from MPA_REQ_SENT to FPDU_MODE:
->peer sends a RST which results in a ABORT_REQ_RSS. This triggers
peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev
is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the
stack will re-attempt the connection using MPAv1.
->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls
c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR
to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT
WR.
->So the cpl thread is stuck waiting for a reply and cannot process the
ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a
halt because no more ingress cpls are processed by the stack...
The correct fix for the issue is to always do the wake up in
c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().
Fixes: 7c0a33d611 ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add get_ep_from_stid() which will atomically find and reference the
endpoint struct if found. This avoids touch-after-free races between
threads destroying listening endpoints and the CPL processing thread
processing an incoming PASS_ACCEPT_REQ CPL.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
c4iw_reject() and c4iw_accept() need to handle the case where the
endpoint has timed out and is in the middle of ABORTING the connection.
Here is the flow that causes the BUG_ON() to fire on the server side:
1) offload connection setup and endpoint timer started
2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP
3) endpoint timer fires, and process_timeout() aborts the connection,
this moves the endpoint state to ABORTING until HW sends up the
ABORT_RPL_RSS.
4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls
c4iw_reject_cr() to destroy this connection request.
5) WHAMO: BUG_ON() because the state is ABORTING.
The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the
operation if the state is not in MPA_REQ_RCVD vs in DEAD.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
ARP failure may also happen when ep in FPDU_MODE and these failures need
to be handled by process_timeout(). process_timeout() also has to handle
case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Arp failure for send_mpa_reply/reject() is handled by freeing the
mpa_skb in c4iw_free_ep() before releasing ep.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a race between ULP threads calling c4iw_ep_disconnect() via
c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread
can free the endpoint just after the ingress CPL thread finds the ep
pointer in the tid table. To avoid this, we now use the hwtid_idr table
for lookups instead of the LLD tid table so we can lock around insert,
remove, and lookup+get_ep to avoid the race. The CPL handlers now will
either find the ep ptr and have a ref on it, or not find it and they
can discard the CPL. Callers of get_ep_from_tid() will have a ref
on the ep if found, and thus must deref when they are done.
Negative advice in peer_abort_intr() need to dereference the ep.
therefore peer_abort() is scheduled to dereference the ep later.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In abort_arp_failure(), the return value from c4iw_ofld_send() is
ignored and thus if the CPL isn't sent, the endpoint is stuck and never
gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and
the ep resources are released in a safer context through process_work().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moving the state to ABORTING causes the ep to get stuck because
c4iw_ep_timeout() thinks the ABORT has already been done. So leave the
state alone and let c4iw_ep_disconnect() do the right thing given the
ep state.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is
no handling of the return value of send_fw_act_open_req().
->In send_fw_act_open_req(), there is no handling of return value of
c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers
on connection establish failure.
->send_mpa_req() should act on the return from c4iw_l2t_send() and
return the error to the caller.
->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without
starting the timer and not changing the ep state, which is further
handled by act_establish()
-> In act_establish()?if send_mpa_request's get_skb returns an error,
may cause an ep leak. So handle return value of send_mpa_req()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
->Stop the ep timer after MPA negotiation so that the arp failures
during send_mpa_reply/reject will be handled by process_timeout() after
the ep timer expires.
->Added case MPA_REP_SENT in process_timeout().
->For MPA reject, c4iw_ep_disconnect tries to start an already started
timer, which leads to warning message "timer already started".
-> In case of mpa reject stop the timer and call send_mpa_reject().
-> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer
only for send_mpa_reply(), which is set in c4iw_accept_cr().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case of incomplete mpa messages we should not stop timer as it
results in return with timeout for the next mpa message
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
-> On passive side of connection parent_ep referenced during connection
request has to be dereferenced during the passive accept failure.
-> As passive accept failure error handlinglogic runs in atomic context,
the parent ep is dereferenced by scheduling work request.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The FID value in a ULP_MEMIO command needs to be set to an IQ ID of
a queue configured for our PF. The FID/IQ id is used to index into the
PCIE FID table, to find out on which function the DMA needs to be
issued. Essentially, every DMA needs to have the ingress queue. The exact
ingress queue doesn't matter, but it needs to be an ingress queue
associated with the function you want to see the DMA on.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- add EP_DISC_FAIL history bit
- add QP_REFED/DEREFED history bits
- Add functions to ref/deref the cm_id and add history bit for the same
- add CLOSE_CON_RPL history
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adding the needed mlx5_ifc hardware bits and structs
for the following features:
* Add vport to steering commands for SRIOV ACL support
* Add mlcr, pcmr and mcia registers for dump module EEPROM
* Add support for FCS, beacon led and disable_link bits to
hca caps
* Add CQE period mode bit in CQ context for CQE based CQ
moderation support
* Add umr SQ bit for fragmented memory registration
* Add needed bits and caps for Striding RQ support
In-order to avoid possible future conflicts between rdma and
net-next we added all expected updates to this file for this release.
If more changes will be submitted, we plan to do it only through
one of the subsystems, probably net-next.
All updated bits in this patch will be later used in
the up-coming submissions to net-next and rdma trees.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All reserved fields after early_vf_enable are off by 1, since
early_vf_enable was not explicitly declared as array of size 1.
Reserved field before cqe_zip had a wrong size, it should
be 0x80 + 0x3f.
Fixes: b084444459 ("net/mlx5_core: Introduce access function to read internal timer ")
Fixes: b4ff3a36d3 ("net/mlx5: Use offset based reserved field names in the IFC header file")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since all srp_map_finish_fr() callers pass a non-zero value as
the fourth argument (sg_nents), the sg_nents == 0 check in that
function can be removed. Add a count == 0 check in the caller
of that function.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The srp_queuecommand() function translates ENOMEM into QUEUE_FULL
which causes the SCSI mid-layer to retry the command. All other
error codes are translated into DID_ERROR which causes the SCSI
command to fail. Return E2BIG if mapping will always fail to
prevent that the SCSI mid-layer keeps resubmitting a command
forever.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ensure that req->nmdesc is set correctly in srp_map_sg() if mapping
fails. Avoid that mapping failure causes a memory descriptor leak.
Report srp_map_sg() failure to the caller.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The free request list was removed through patch "IB/srp: Use block layer tags".
Hence update a comment that refers to that free request list.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
This is the last user of abort_connection(), so remove it too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In c4iw_ep_disconnect(), if we fail to initiate a close operation, then
move the qp to ERROR to disassociate the ep from the qp. Failure to do
this will leak the ep resources.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead return whether the caller needs to disconnect. This is part of
getting rid of abort_connection() altogether so we properly clean up on
send_abort() failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead, have the caller, rx_data() handle the close/abort like
it does for process_mpa_request(). This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
streaming data, we call c4iw_modify_rc_qp() and move the qp from
RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns
an error, the ep will be dereferenced (refcnt=1). Then rx_data()
calls c4iw_ep_disconnect() which starts the close operation.
But if send_halfclose() fails in c4iw_ep_disconnect(), we will call
release_ep_resources() derefing the ep which reduces the refcnt to 0 and
and frees the ep. However we still has the ep mutex at that point, so we
have a touch-after-free bug. There is a similar issue where
peer_close() calls c4iw_ep_disconnect().
The solution is to add a reference to the ep in c4iw_ep_disconnect()
after acquiring the mutex, and release it after releasing the mutex.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
but send_halfclose() fails, we need to stop the timer and send a CLOSE
event up to the IWCM before releasing the resources. Otherwise, we can
crash when the ep timer fires if the ep is referencing a previous instance
of the device. This can happen as part of adapter reset/recovery, for
instance.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
will be stuck in SYN_PEND and never released. So create an arp failure
handler specifically for this message to release the endpoint resources.
In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
be freed when destroyed. Also we don't need to call release_tid() here
because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
hwtid.
If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
peer's ACK to our SYN is never received), then put the parent as well
in peer_abort().
Treat accept_cr() failures just like arp failures: put the parent ep
and release the ep resources destroying the tid
The ARP failure handlers are called in an atomic context, so we need to
schedule some of the processing which might block. Namely _c4iw_free_ep()
which needs a mutex. So create a "special" CPL opcode and handler and
schedule it via sched() to be run by process_work() in a blockable context.
Also rework the active open arp failure handler to make use of
release_ep_resources(). This allows both the active and passive arp
failure handlers to use the same deferred cleanup function.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.
This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.
Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Alternatively one could free the skb, OTOH I don't think this test is
useful so just remove it.
Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix for removing a quad hash entry when the
corresponding quad hash entry hasn't been added,
which is the case in loopback connections
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix for checking if the QP associated with a completion
has been destroyed while processing CQ elements.
If that is the case, move the CQ head to the next element
and continue completion processing.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A check is added to validate the requested sge number.
iWARP doesn't support multiple sg elements for
RDMA READ work requests.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix to calculate the SQ size based on the max
frag_count, requested by the application instead
of overwriting it with the max supported frag_count
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
STag index mask is calculated incorrectly, missing
the 14 bits minimum requirement. Add max macro to use
either # of MRs or 14 bits in the mask size calculation.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Invalidation after every WQE write is changed to invalidate
only if required. NOPs are padded so that WQE writes are
aligned to 64B boundary.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adding sq and rq drain functions, which block until all
previously posted wr-s in the specified queue have completed.
A completion object is signaled to unblock the thread,
when the last cqe for the corresponding queue is processed.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Correct SD calculation by using base address returned from commit FPM.
This alleviates any assumptions on resource ordering and alignment
requirement. Also consolidate SD estimation code into i40iw_est_sd().
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix endian warnings and errors due to u32 stored to u16.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Implement fast register mr, Local invalidate, send with
invalidate and RDMA read with invalidate.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>