Commit f0dc117abd ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.
The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.
This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq(). If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.
This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().
Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.
Signed-off-by: Roland Dreier <roland@purestorage.com>
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host(). That function
will invoke __scsi_remove_device() for each LUN. And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL. Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.
If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand(). These requests will time out. If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.
Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline. Introduce a new variable to keep track of the
transport state instead of failing requests if (!target->connected ||
target->qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out. If aborting fails, a device reset will be attempted. If
the device reset also fails a host reset will be attempted. If the
host reset also fails the whole procedure will be repeated.
srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started. Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.
Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
has already started or if reconnecting fails.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing. In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it. No need to
be so verbose about that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the ipoib client info isn't found on the _remove_one callback, we
must not attempt to scan the returned null list. Found by Coverity.
Signed-off-by: Itai Garbi <igarbi@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Implement version info as well as report firmware version and bus info
of the underlying IB HW device.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The hash function introduced in commit b63b70d877 ("IPoIB: Use a
private hash table for path lookup in xmit path") was designd to use
the 3 octets of the IPoIB HW address that holds the remote QPN.
However, this currently isn't the case on little-endian machines,
because the the code there uses the flags part (octet[0]) and not the
last octet of the QPN (octet[3]). Fix this.
The fix caused a checkpatch warning on line over 80 characters, to
solve that changed the name of the temp variable that holds the daddr.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
After commit b13912bbb4 ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie
iperf -c <IP> -P 16 -t 3600
results in a crash.
After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())
The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()." This leads to running the
skb cleanup routines simultaneously in two different contexts.
The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request. If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently. It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).
However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet. This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst. We've seen this lead
to pathological behavior with respect to route and neighbour GC. The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().
Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull target updates from Nicholas Bellinger:
"It has been a very busy development cycle this time around in target
land, with the highlights including:
- Kill struct se_subsystem_dev, in favor of direct se_device usage
(hch)
- Simplify reservations code by combining SPC-3 + SCSI-2 support for
virtual backends only (hch)
- Simplify ALUA code for virtual only backends, and remove left over
abstractions (hch)
- Pass sense_reason_t as return value for I/O submission path (hch)
- Refactor MODE_SENSE emulation to allow for easier addition of new
mode pages. (roland)
- Add emulation of MODE_SELECT (roland)
- Fix bug in handling of ExpStatSN wrap-around (steve)
- Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
- Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
- Convert ib_srpt to use modern target_submit_cmd caller + drop
legacy ioctx->kref usage (nab)
- Convert ib_srpt to use modern target_submit_tmr caller (nab)
- Add link_magic for fabric allow_link destination target_items for
symlinks within target_core_fabric_configfs.c code (nab)
- Allocate pointers in instead of full structs for
config_group->default_groups (sebastian)
- Fix 32-bit highmem breakage for FILEIO (sebastian)
All told, hch was able to shave off another ~1K LOC by killing the
se_subsystem_dev abstraction, along with a number of PR + ALUA
simplifications. Also, a nice patch by Roland is the refactoring of
MODE_SENSE handling, along with the addition of initial MODE_SELECT
emulation support for virtual backends.
Sebastian found a long-standing issue wrt to allocation of full
config_group instead of pointers for config_group->default_group[]
setup in a number of areas, which ends up saving memory with big
configurations. He also managed to fix another long-standing BUG wrt
to broken 32-bit highmem support within the FILEIO backend driver.
Thank you again to everyone who contributed this round!"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi_target: Add NodeACL tags for initiator group support
target/tcm_fc: fix the lockdep warning due to inconsistent lock state
sbp-target: fix error path in sbp_make_tpg()
sbp-target: use simple assignment in tgt_agent_rw_agent_state()
iscsi-target: use kstrdup() for iscsi_param
target/file: merge fd_do_readv() and fd_do_writev()
target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
target: Add link_magic for fabric allow_link destination target_items
ib_srpt: Convert TMR path to target_submit_tmr
ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
target: Make spc_get_write_same_sectors return sector_t
target/configfs: use kmalloc() instead of kzalloc() for default groups
target/configfs: allocate only 6 slots for dev_cg->default_groups
target/configfs: allocate pointers instead of full struct for default_groups
target: update error handling for sbc_setup_write_same()
iscsit: use GFP_ATOMIC under spin lock
iscsi_target: Remove redundant null check before kfree
target/iblock: Forward declare bio helpers
target: Clean up flow in transport_check_aborted_status()
target: Clean up logic in transport_put_cmd()
...
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.
Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
[ updated to current code from OFED, cleaned up commit message ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED. That allows us to remove the state
SRP_TARGET_DEAD. Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected. This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target(). Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Keep track of the connection state. Only report QP errors while
connected. Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state. This allows us
to eliminate the target state SRP_TARGET_CONNECTING.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.
v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage. This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.
It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.
Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values. This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.
(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
sense_reason_t with Roland's MODE SELECT changes)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull scsi target updates from Nicholas Bellinger:
"Things have been calm for the most part with no new fabric drivers in
flight for v3.7 (we're up to eight now !), so this update is primarily
focused on addressing a few long-standing items within target-core and
iscsi-target fabric code.
The highlights include:
- target: Simplify fabric sense data length handling (roland)
- qla2xxx: Fix endianness of task management response code (roland)
- target: fix truncation of mode data, support zero allocation length
(paolo)
- target: Properly support zero-length commands in normal processing
path (paolo)
- iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
PDU (ronnie + nab)
- iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
demo-mode (ronnie + nab)
- target/file: Re-enable optional fd_buffered_io=1 operation (nab +
hch)
- iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
initiator MDRSL declaration (nab)
- target: Add target_submit_cmd_map_sgls for SGL fabric memory
passthrough (nab + hch)
- tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
nab)
- tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
+ hch)
The last series for adding a new target_submit_cmd_map_sgls() fabric
caller (as requested by hch) that accepts pre-allocated SGL memory
(using existing logic), along with converting tcm_loop + tcm_vhost has
only been in -next for the last days, but has gotten enough review
+testing and is clear enough a mechanical change that I think it's
reasonable to merge for -rc1 code.
Thanks again to everyone who contributed this round! Extra special
thanks to Roland (PureStorage) for tracking down the qla2xxx target
TMR response code endian issue, and to Paolo (Redhat) for resolving
the long standing zero-length CDB issues within target-core between
virtual and pSCSI backends."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
iscsit: proper endianess conversions
iscsit: use the itt_t abstract type
iscsit: add missing endianess conversion in iscsit_check_inaddr_any
iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
iscsit: mark various functions static
target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
target/usb-gadget: strlen() doesn't count the terminator
target/usb-gadget: remove duplicate initialization
tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
target: Add control CDB READ payload zero work-around
tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
iscsi-target: Add base MaxXmitDataSegmentLength code
target/file: Re-enable optional fd_buffered_io=1 operation
...
RX/TX CQs will now be selected from a per HCA pool. For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.
QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With the new netlink support in commit 862096a8bb ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built. Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).
This fixes the build error
drivers/built-in.o: In function 'ipoib_changelink':
ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'
when CONFIG_INFINIBAND_IPOIB_CM isn't set.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.
For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to call scsi_done() for commands after we abort them.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)
Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().
In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.
Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.
Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.
Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.
Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2. However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().
So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core. Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.
(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
change transport_get_sense_buffer to follow v3.6-rc6 code w/o
->set_fabric_sense_len usage)
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Lockdep points out a circular locking dependency betwwen the ipoib
device priv spinlock (priv->lock) and the neighbour table rwlock
(ntbl->rwlock).
In the normal path, ie neigbour garbage collection task, the neigh
table rwlock is taken first and then if the neighbour needs to be
deleted, priv->lock is taken.
However in some error paths, such as in ipoib_cm_handle_tx_wc(),
priv->lock is taken first and then ipoib_neigh_free routine is called
which in turn takes the neighbour table ntbl->rwlock.
The solution is to get rid the neigh table rwlock completely and use
only priv->lock.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the neighbours hash table is empty when unloading the module, then
ipoib_flush_neighs(), the cleanup routine, isn't called and the
memory used for the hash table itself leaked.
To fix this, ipoib_flush_neighs() is allways called, and another
completion object is added to signal when the table is freed.
Once invoked, ipoib_flush_neighs() flushes all the neighbours (if
there are any), calls the the hash table RCU free routine, which now
signals completion of the deletion process, and waits for the last
neighbour to be freed.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid a crash caused by the scmnd->scsi_done(scmnd) call in
srp_process_rsp() being invoked with scsi_done == NULL. This can
happen if a reply is received during or after a command abort.
Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
Reference: http://marc.info/?l=linux-rdma&m=134314367801595
Cc: <stable@vger.kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Correct spelling typos in comments in drivers/infiniband.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>