Replace all leys with pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.
The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this
looks trivially fixed by forcing the use of registration in a future
patch.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chaning of send work requests benefits performance by
reducing the send queue lock contention (acquired in
ib_post_send) and saves us HW doorbells which is posted
only once.
Currently, in normal IO flows iser does not chain the CDB send
work request with the registration work request. Also in PI
flows, signature work requests are not chained as well.
Lets chain those and post only once.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easier to debug when we have the registration details.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iser support up to 512KB data transfer in a single scsi command.
This means that larger IOs will split to different request. While
iser can easily saturate FDR/EDR wires, some arrays are fine tuned
for 1MB (or larger) IO sizes, hence add an option to support larger
transfers (up to 8MB) if the device allows it.
Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.
At the moment, iser works in 4K pages granularity, In a later stage
we will get it to work with system page size instead.
IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hard coded for now. This will allow to allocate different
sized MRs depending on the IO size needed (and device
capabilities).
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and
logically do the same thing other than the buffer registration
method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr).
The DIF logic is not implemented in the FMR flow as there is no
existing device that supports FMRs and Signature feature.
This patch unifies the flow in a single routine iser_reg_rdma_mem
and just split to fmr/frwr for the buffer registration itself.
Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will
call the relevant device specific unreg routine).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
As for fmrs we will hold a single registration descriptor
as no need for multiple like in the frwr mode (descriptor
for each task). This change helps unifying the duplicate
registration code paths.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Also, change a name of a local variable.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This will allow us to unify the memory registration code path between
the various methods which vary by the device capabilities. This change
will make it easier and less intrusive to remove fmr_pools from the
code when we'd want to.
The reason we use a single descriptor is to avoid taking a
redundant spinlock when working with FMRs.
We also change the signature of iser_reg_page_vec to make it match
iser_fast_reg_mr (and the future indirect registration method).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of having it a part of the connection structure,
have it be under a dedicated (embedded) structure in the
connection. A logical separation of the registration pool
and the connection structure.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don't have the caller allocate the structure and worry about
freeing it in case the routine failed.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In the past the we always tried to allocate an fmr_pool
and if it failed on ENOSYS (not supported) then we continued
with dma mr. This is not the case anymore and if we tried to
allocate an fmr_pool then it is supported and we expect to succeed.
Also, the check if fmr_pool is allocated when free is called is
redundant as well as we are guaranteed it exists.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Avoid struct names without iser_ prefix.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Have fast_reg_descriptor hold struct iser_reg_resources
(mr, frpl, valid flag). This will be useful when the
actual buffer registration routines will be passed with
the needed registration resources (i.e. iser_reg_resources)
without being aware of their nature (i.e. data or protection).
In order to achieve this, we remove reg_indicators flags container
and place specific flags (mr_valid) within iser_reg_resources struct.
We also place the sig_mr_valid and sig_protcted flags in iser_pi_context.
This patch also modifies iser_fast_reg_mr to receive the
reg_resources instead of the fast_reg_descriptor and a data/protection
indicator.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().
Fixes: 7414dde0a6 (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).
qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts
Go ahead and remove them.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.
Fixes: 7414dde0a6 (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
While we're at it, use permission defines instead
of octal values and rearrange a little bit.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit. Also
adjust the max_sectors based on the resulting sg tablesize.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.
Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In some rare cases, IO operations may be not aligned to page
boundaries. This prevents iser from performing fast memory
registration. In order to overcome that iser uses a bounce
buffer to carry the transaction. We basically allocate a buffer
in the size of the transaction and perform a copy.
The buffer allocation using kmalloc is too restrictive since it
requires higher order (atomic) allocations for large transactions
(which may result in memory exhaustion fairly fast for some workloads).
We rewrite the bounce buffer code path to allocate scattered pages
and perform a copy between the transaction sg and the bounce sg.
Reported-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In singleton scatterlists, DMA memory registration code
is taken both for Fastreg and FMR code paths. Move it to
a function.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of passing ib_sge as output variable, we pass the mem_reg
pointer to have the routines fill the rkey as well. This reduces
code duplication and extra assignments. This is a preparation step
to unify some registration logics together. Also, pass iser_fast_reg_mr
the fastreg descriptor directly.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No need to keep lkey, va, len variables, we can keep
them as struct ib_sge. This will help when we change the
memory registration logic.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Memory regions are resources that are saved
in the device caches. Increase the probability for
a cache hit by adding the MRU descriptor to pool
head.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make iser_[create|destroy]_fastreg_desc shorter, more
readable and easily extendable.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of open-coding connection fastreg pool get/put,
we introduce iser_reg_desc[get|put] helpers.
We aren't setting these static as this will be a per-device
routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg
a bit.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No need for these two separate. Keep it in a single routine
like in the fastreg case. This will also make iser_reg_page_vec
closer to iser_fast_reg_mr arguments. This is a preparation
step for registration flow refactor.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This struct members other than struct iser_mem_reg are unused,
so remove it altogether.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Buffer length was assigned twice, and no reason to set va to
io_addr and then add the offset, just set va to io_addr + offset.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
As memory registration/de-registration methods, lets
move them to their natural location. While we're at it,
make iser_reg_page_vec routine static.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No need to pass that, we can take it from the task.
In a later stage, this function will be invoked
according to a device capability.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No need to keep two iser_data_buf structures just in case we use
mem copy. We can avoid that just by adding a pointer to the original
sg. So keep only two iser_data_buf per command (data and protection)
and pass the relevant data_buf to bounce buffer routine.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This code was added before we had protection data length
calculation (in iser_send_command), so we needed to calc
the sg data length from the sg itself. This is not needed
anymore.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.
Fixes: d77e65350f ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.
Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In some cases, we might reach the iser connection termination without
ep_disconnect being invoked (for example if user-space daemon doesn't
exists. In this case, we need to free the iscsi endpoint when we
remove the iser connection.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).
Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.
Fixes: 5426b1711f ("IB/iser: Collapse cleanup and disconnect handlers")
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We always unmap SGs with the same direction instead of unmapping
with the direction the mapping was done, fix that.
Fixes: 9a8b08fad2 ("IB/iser: Generalize iser_unmap_task_data and [...]")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>