This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.
Fixes: d77e65350f ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.
Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In some cases, we might reach the iser connection termination without
ep_disconnect being invoked (for example if user-space daemon doesn't
exists. In this case, we need to free the iscsi endpoint when we
remove the iser connection.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).
Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.
Fixes: 5426b1711f ("IB/iser: Collapse cleanup and disconnect handlers")
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We always unmap SGs with the same direction instead of unmapping
with the direction the mapping was done, fix that.
Fixes: 9a8b08fad2 ("IB/iser: Generalize iser_unmap_task_data and [...]")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Following few recent Block integrity updates, we align the iSER data
integrity offload settings with:
- Deprecate pi_guard module param
- Expose support for DIX type 0.
- Use scsi_transfer_length for the transfer length
- Get pi_interval, ref_tag, ref_remap, bg_type and
check_mask setting from scsi_cmnd
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use likely() for wc.status == IB_WC_SUCCESS
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
And fix a checkpatch warning.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
No reason to settle with four, can use the min between device max comp
vectors and number of cores.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It is enough to check mem_h pointer assignment, mem_h == NULL will
indicate that buffer is not registered using mr.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When closing the connection, we should first terminate the connection
(in case it was not previously terminated) to guarantee the QP is in
error state and we are done with servicing IO. Only then go ahead with
tasks cleanup via iscsi_conn_stop.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In certain scenarios (target kill with live IO) scsi TMFs may race
with iser RDMA teardown, which might cause NULL dereference on iser IB
device handle (which might have been freed). In this case we take a
conditional lock for TMFs and check the connection state (avoid
introducing lock contention in the IO path). This is indeed best
effort approach, but sufficient to survive multi targets sudden death
while heavy IO is inflight.
While we are on it, add a nice kernel-doc style documentation.
Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If rdma_cm error event comes after ep_poll but before conn_bind, we
should protect against dereferncing the device (which may have been
terminated) in session_create and conn_create (already protected)
callbacks.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use uintptr_t to handle wr_id casting, which was found by Kbuild test
robot and smatch. Also remove an internal definition of variable which
potentially shadows an external one (and make sparse happy).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix a regression was introduced in commit 6df5a128f0 ("IB/iser:
Suppress scsi command send completions").
The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.
Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.
Fixes: 6df5a128f0 ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When creating a connection QP we choose the least used CQ and inc the
number of active QPs on that. If we fail to create the QP, we need to
decrement the active QPs counter.
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA
resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive). As
for the cma_id, only destroy it if the state is not DOWN where in this
case, conn_release is already running and we don't want to compete.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case of the HCA going into catasrophic error flow, the
beacon post_send is likely to fail, so surely there will
be no completion for it.
In this case, use a best effort approach and don't wait for beacon
completion if we failed to post the send.
Reported-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Re-adjust max CQEs per CQ and max send_wr per QP according
to the resource limits supported by underlying hardware.
Signed-off-by: Minh Tran <minhduc.tran@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Drop the now unused reason argument from the ->change_queue_depth method.
Also add a return value to scsi_adjust_queue_depth, and rename it to
scsi_change_queue_depth now that it can be used as the default
->change_queue_depth implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
All drivers use the implementation for ramping the queue up and down, so
instead of overloading the change_queue_depth method call the
implementation diretly if the driver opts into it by setting the
track_queue_depth flag in the host template.
Note that a few drivers validated the new queue depth in their
change_queue_depth method, but as we never go over the queue depth
set during slave_configure or the sysfs file this isn't nessecary
and can safely be removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Expose more signature setting parameters. We modify the signature API
to allow usage of some new execution parameters relevant to data
integrity feature.
This patch modifies ib_sig_domain structure by:
- Deprecate DIF type in signature API (operation will
be determined by the parameters alone, no DIF type awareness)
- Add APPTAG check bitmask (for input domain)
- Add REFTAG remap (increment) flag for each domain
- Add APPTAG/REFTAG escape options for each domain
The mlx5 driver is modified to follow the new parameters in HW
signature setup.
At the moment the callers (iser/isert) hard-code new parameters (by
DIF type). In the future, callers will retrieve them from the scsi
command structure.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Later there will be more parameters to set, so we want to do it in a
centralized place.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In the future this will be a per-command parameter so we can lose it,
but in the mean time IP_CSUM is a lot lighter for SW layers to
compute, set it as default.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We clear the struct before - no need to do 0 assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Update the driver version and add Sagi Grimberg as maintainer
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Match to the debug level of all functions in connect/disconnect flows.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Singal completion of every 32 scsi commands and suppress all the rest.
We don't do anything upon getting the completion so no need to "just
consume" it. Cleanup of scsi command is done in cleanup_task callback.
Still keep dataout and control send completions as we may need to
cleanup there. This helps reducing the amount of interrupts/completions
in the IO path.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Poll in batch of 16. Since we don't want it on the stack, keep under
iser completion context (iser_comp).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid post_send counting (atomic) in the IO path just to keep track of
how many completions we need to consume. Use a beacon post to indicate
that all prior posts completed.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This will solve a possible condition where we might miss TX completion
(flush error) during session teardown. Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().
This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need a way to guarentee that we don't stay in soft-IRQ context for
too long. We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections. No harm done, but when we try to unload
the module we will need to cleanup all these connections. So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload. BUG_ON()
will cause segfault on module_exit and we don't want that.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.
While we are at it, add a nice kernel-doc style documentation.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called
after the IB device was removed (DEVICE_REMOVAL CM event). We also
call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and
tasks cleanup from racing.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we didn't need to unbind the iser_conn and iscsi_conn since
we always relied on iscsi daemon to teardown the connection and never
let it finish before we cleanup all that is needed in iser. This is
not the case anymore (for DEVICE_REMOVAL event). So avoid any possible
chance we cause iscsi_conn dereference after iscsi_conn was freed.
We also call iser_conn_terminate (safe to call multiple times) just
for the corner case of iscsi daemon stopping an old connection before
invoking endpoint removal (might happen if it was violently killed).
Notice we are unbinding under a lock - which is required.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.
ib_completion:
Guaranteed to come when:
- Got DISCONNECTED/ADDR_CHANGE event or
- iSCSI called ep_disconnect/conn_stop
Guaranteed to finish when:
- Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
- All Flush errors are consumed
- IB related resources are destroyed
stop_completion:
Guaranteed to come when:
- iSCSI calls conn_stop
Guaranteed to finish when:
- All inflight tasks were cleaned up
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).
This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.
The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.
This patch introduces a separation in logic when handling CM events:
- DISCONNECTED_HANDLER, ADDR_CHANGED
This events indicate the start of teardown process.
Actions:
1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
2. Notify iSCSI of connection failure
3. Change state to TERMINATING
4. Poll for all flush errors to be consumed
- TIMEWAIT_EXIT, DEVICE_REMOVAL
These events indicate the final stage of termination process and
we can free RDMA related resources.
Actions:
1. Call disconnected handler (we are not guaranteed that DISCONNECTED
event was invoked in the past)
2. Cleanup RDMA related resources
3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
implicitly destroy the cm_id (Can't rely on user-space, make sure
we have forward progress)
We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).
The iser_conn_release_work will wait for teardown completions:
- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion
And then will continue to free iser connection representation (iser_conn).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Put all connection IB related resources release in this routine. One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex. Also move its position to avoid forward
declaration. While at it fix qp NULL assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Structure that describes the RDMA relates connection objects. Static
member of iser_conn.
This patch does not change any functionality
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Two reasons why we choose to do this:
1. No point today calling struct iser_conn by another name ib_conn
2. In the next patches we will restructure iser control plane representation
- struct iser_conn: connection logical representation
- struct ib_conn: connection RDMA layout representation
This patch does not change any functionality.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>