This patch addresses an issue where the legacy diagpacket is sent in
from the user, but the driver operates on only the extended
diagpkt. This patch specifically initializes the extended diagpkt
based on the legacy packet.
Cc: <stable@vger.kernel.org>
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE.
As of the dual port qib QDR card, this is not necessarily correct.
Change to use the port as specified in the call.
Cc: <stable@vger.kernel.org>
Reported-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The cpl_abort_req struct has several reserved members which need to be
cleared to avoid disclosing kernel information. I have added a memset()
so now it matches the cxgb4 version of this function.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be
aligned on a 8 bytes multiple, while for i386, such padding is not
added.
Tool pahole could be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_srq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
+++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
--- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
@@ -69,7 +68,6 @@ struct mlx5_ib_create_srq {
__u64 db_addr; /* 8 8 */
__u32 flags; /* 16 4 */
- /* size: 20, cachelines: 1, members: 3 */
- /* last cacheline: 20 bytes */
+ /* size: 24, cachelines: 1, members: 3 */
+ /* padding: 4 */
+ /* last cacheline: 24 bytes */
};
ABI disagreement will make an x86_64 kernel try to read past
the buffer provided by an i386 binary.
When boundary check will be implemented, the x86_64 kernel will
refuse to read past the i386 userspace provided buffer and the
uverb will fail.
Anyway, if the structure lay in memory on a page boundary and
next page is not mapped, ib_copy_from_udata() will fail and the
uverb will fail.
This patch makes create_srq_user() takes care of the input
data size to handle the case where no padding was provided.
This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq
as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.
So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a
8 bytes multiple, while for i386, such padding is not added.
The tool pahole can be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name mlx5_ib_create_cq \
drivers/infiniband/hw/mlx5/mlx5_ib.o
Then, structure layout can be compared between i386 and x86_64:
+++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100
--- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100
@@ -34,9 +34,8 @@ struct mlx5_ib_create_cq {
__u64 db_addr; /* 8 8 */
__u32 cqe_size; /* 16 4 */
- /* size: 20, cachelines: 1, members: 3 */
- /* last cacheline: 20 bytes */
+ /* size: 24, cachelines: 1, members: 3 */
+ /* padding: 4 */
+ /* last cacheline: 24 bytes */
};
This ABI disagreement will make an x86_64 kernel try to read past the
buffer provided by an i386 binary.
When boundary check will be implemented, a x86_64 kernel will refuse
to read past the i386 userspace provided buffer and the uverb will
fail.
Anyway, if the structure lies in memory on a page boundary and next
page is not mapped, ib_copy_from_udata() will fail when trying to read
the 4 bytes of padding and the uverb will fail.
This patch makes create_cq_user() takes care of the input data size to
handle the case where no padding is provided.
This way, x86_64 kernel will be able to handle struct
mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Instead of having the UMR context part of each memory region, allocate
a struct on the stack. This allows queuing multiple UMRs that access
the same memory region.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For user QPs, the creation process does not currently initialize the fields:
* qp->rq.offset
* qp->sq.offset
* qp->sq.wqe_shift
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The patch stores iova, pd and size during mr creation and after UMRs
that modify them. It removes the unused access flags field.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
For memory regions that are allocated using reg_umr, the suffix of
mlx5_core_create_mkey isn't being called. Instead the creation is
completed in a callback function (reg_mr_callback). This means that
these MRs aren't being added to the MR radix tree. Add them in the
callback.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If ib_post_send fails when posting the UMR work request in reg_umr,
the code doesn't release the temporary pas buffer allocated, and
doesn't dma_unmap it.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some DIF implementations (SCSI initiator/target) may want to use different
input/output values for application tag and/or reference tag. So in
case memory/wire domain values don't match HW must not copy them.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
No need for repetition format pattern in case the data and protection
are already interleaved in the memory domain since the pattern
already exists. A single key entry is sufficient and may save some
extra fetch ops.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When the data and protection are interleaved in the memory domain, no
need to expand the mkey total length.
At the moment no Linux user works (iSER initiator & target) in
interleaved mode. This may change in the future as for SCSI
pass-through devices there is no real point in target performing
de-interleaving and re-interleaving of the protection data in the PT
stage. Regardless, signature verbs support this mode.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
RDMA connections over a vlan interface don't work due to
import_ep() not using the correct egress device.
- use the real device in import_ep()
- use rdma_vlan_dev_real_dev() in get_real_dev().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This removes an open-coded duplicate of simple_open().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
c4iw_alloc() bails out without freeing the storage that 'devp' points to.
Picked up by Coverity - CID 1204241.
Fixes: fa658a98a2 ("RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices")
Signed-off-by: Christoph Jaeger <christophjaeger@linux.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When we receive a netdev event indicating a netdev change and/or
a netdev address change, we must change the MAC index used by the
proxy QP1 (in the QP context), otherwise RoCE CM packets sent by the
VF will not carry the same source MAC address as the non-CM packets.
We use the UPDATE_QP command to perform this change.
In order to avoid modifying a QP context based on netdev event,
while the driver attempts to destroy this QP (e.g either the mlx4_ib
or ib_mad modules are unloaded), we use mutex locking in both flows.
Since the relevant mlx4 proxy GSI QP is created indirectly by the
mad module when they create their GSI QP, the mlx4 didn't need to
keep track on that QP prior to this change.
Now, when QP modifications are needed to this QP from within the
driver, we added refernece to it.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole db drop avoidance stuff is for T4 only. So we cannot allow
that to be enabled for T5 devices.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This is required to work around a T5 HW issue.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1. rx_data() and
process_mpa_reply() are not doing this. This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.
Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.
Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message. Previously this was implied by !internal.
Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex. Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.
Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.
Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Need to get the endpoint reference before calling rdma_fini(), which
might fail causing us to not get the reference.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The max depth of a fastreg mr depends on whether the device supports
DSGL or not. So compute it dynamically based on the device support
and the module use_dsgl option.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status. Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work
request was actually complete and has a CQE in the CQ, then when that
CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
SQ/CQ state.
This is a very small timing hole and has only been hit once so far.
The fix is two-fold:
1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
status regardless of the QP state.
2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
before moving the state out of RTS. This ensures that the state
transition will not happen while another thread is in
post_rc_send(), because set_state() and post_rc_send() both aquire
the qp spinlock. Also, once we transition the state out of RTS,
subsequent calls to post_rc_send() will fail because the "in error"
bit is set. I don't think this fully closes the race where the FW
can get a FINI followed a SQ work request being posted (because
they are posted to differente EQs), but the #1 fix will handle the
issue by flushing the SQ work request.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1) timedout endpoint processing can be starved. If there are continual
CPL messages flowing into the driver, the endpoint timeout
processing can be starved. This condition exposed the other bugs
below.
Solution: In process_work(), call process_timedout_eps() after each CPL
is processed.
2) Connection events can be processed even though the endpoint is on
the timeout list. If the endpoint is scheduled for timeout
processing, then we must ignore MPA Start Requests and Replies.
Solution: Change stop_ep_timer() to return 1 if the ep has already been
queued for timeout processing. All the callers of stop_ep_timer() need
to check this and act accordingly. There are just a few cases where
the caller needs to do something different if stop_ep_timer() returns 1:
1) in process_mpa_reply(), ignore the reply and process_timeout()
will abort the connection.
2) in process_mpa_request, ignore the request and process_timeout()
will abort the connection.
It is ok for callers of stop_ep_timer() to abort the connection since
that will leave the state in ABORTING or DEAD, and process_timeout()
now ignores timeouts when the ep is in these states.
3) Double insertion on the timeout list. Since the endpoint timers
are used for connection setup and teardown, we need to guard
against the possibility that an endpoint is already on the timeout
list. This is a rare condition and only seen under heavy load and
in the presense of the above 2 bugs.
Solution: In ep_timeout(), don't queue the endpoint if it is already on
the queue.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[ Fix cast from u64* to integer. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add support for the block multicast loopback QP creation flag along
the proper firmware API for that.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As result of the deprecation of the MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block(), all drivers using these
two interfaces need to be updated to use the new
pci_enable_msi_range() or pci_enable_msi_exact() and
pci_enable_msix_range() or pci_enable_msix_exact() interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As result of the deprecation of the MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block(), all drivers using these
two interfaces need to be updated to use the new pci_enable_msi_range()
and pci_enable_msix_range() interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- The biggest change is core API extensions and mlx5 low-level driver
support for handling DIF/DIX-style protection information, and the
addition of PI support to the iSER initiator. Target support will be
arriving shortly through the SCSI target tree.
- A nice simplification to the "umem" memory pinning library now that
we have chained sg lists. Kudos to Yishai Hadas for realizing our
code didn't have to be so crazy.
- Another nice simplification to the sg wrappers used by qib, ipath and
ehca to handle their mapping of memory to adapter.
- The usual batch of fixes to bugs found by static checkers etc. from
intrepid people like Dan Carpenter and Yann Droneaud.
- A large batch of cxgb4, ocrdma, qib driver updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
zn6f16GmJ37BKn2/qYY/
=Ww6H
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
support for handling DIF/DIX-style protection information, and the
addition of PI support to the iSER initiator. Target support will
be arriving shortly through the SCSI target tree.
- A nice simplification to the "umem" memory pinning library now that
we have chained sg lists. Kudos to Yishai Hadas for realizing our
code didn't have to be so crazy.
- Another nice simplification to the sg wrappers used by qib, ipath
and ehca to handle their mapping of memory to adapter.
- The usual batch of fixes to bugs found by static checkers etc.
from intrepid people like Dan Carpenter and Yann Droneaud.
- A large batch of cxgb4, ocrdma, qib driver updates"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
RDMA/ocrdma: Fix warnings about pointer <-> integer casts
RDMA/ocrdma: Code clean-up
RDMA/ocrdma: Display FW version
RDMA/ocrdma: Query controller information
RDMA/ocrdma: Support non-embedded mailbox commands
RDMA/ocrdma: Handle CQ overrun error
RDMA/ocrdma: Display proper value for max_mw
RDMA/ocrdma: Use non-zero tag in SRQ posting
RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
RDMA/ocrdma: Increment abi version count
RDMA/ocrdma: Update version string
be2net: Add abi version between be2net and ocrdma
RDMA/ocrdma: ABI versioning between ocrdma and be2net
RDMA/ocrdma: Allow DPP QP creation
RDMA/ocrdma: Read ASIC_ID register to select asic_gen
RDMA/ocrdma: SQ and RQ doorbell offset clean up
RDMA/ocrdma: EQ full catastrophe avoidance
RDMA/cxgb4: Disable DSGL use by default
RDMA/cxgb4: rx_data() needs to hold the ep mutex
...
Unregister the inet notifier during ocrdma unload to avoid a panic after
driver unload.
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Clean up code. Also modifying GSI QP to error during ocrdma_close is fixed.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Issue mailbox commands to query ocrdma controller information and phy
information and print them while adding ocrdma device.
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As part of SRQ receive buffers posting we populate a non-zero tag
which will be returned in SRQ receive completions.
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Increment the ABI version count for driver/library interface.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
While loading RoCE driver be2net driver should check for ABI version
to catch functional incompatibilities.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allow creating DPP QP even if inline-data is not requested. This is an
optimization to lower latency.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ocrdma driver selects execution path based on sli_family and asic
generation number. This introduces code to read the asic gen number
from pci register instead of obtaining it from the Emulex NIC driver.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Stale entries in the CQ being destroyed causes hardware to generate
EQEs indefinitely for a given CQ. Thus causing uncontrolled execution
of irq_handler. This patch fixes this using following sementics:
* irq_handler will ring EQ doorbell atleast once and implement budgeting scheme.
* cq_destroy will count number of valid entires during destroy and ring
cq-db so that hardware does not generate uncontrolled EQE.
* cq_destroy will synchronize with last running irq_handler instance.
* arm_cq will always defer arming CQ till poll_cq, except for the first arm_cq call.
* poll_cq will always ring cq-db with arm=SET if arm_cq was called prior to enter poll_cq.
* poll_cq will always ring cq-db with arm=UNSET if arm_cq was not called prior to enter poll_cq.
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Current hardware doesn't correctly support DSGL.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
To avoid racing with other threads doing close/flush/whatever, rx_data()
should hold the endpoint mutex.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
There is a race between ULP threads doing an accept/reject, and the
ingress processing thread handling close/abort for the same connection.
The accept/reject path needs to hold the lock to serialize these paths.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>.
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
These methods appear to only mimic the sg_dma_address() and
sg_dma_len() behavior.
They can be safely removed.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The removal of these methods is compensated for by code changes to
.map_sg to insure that the vanilla sg_dma_address() and sg_dma_len()
will do the same thing as the equivalent former ib_sg_dma_address()
and ib_sg_dma_len() calls into the drivers.
The introduction of this patch required that the struct
ipath_dma_mapping_ops be converted to a C99 initializer.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove the overload for .dma_len and .dma_address
The removal of these methods is compensated for by code changes to
.map_sg to insure that the vanilla sg_dma_address() and sg_dma_len()
will do the same thing as the equivalent former ib_sg_dma_address()
and ib_sg_dma_len() calls into the drivers.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
My static checker complains that the sprintf() here can overflow.
drivers/infiniband/hw/mlx4/main.c:1836 mlx4_ib_alloc_eqs()
error: format string overflow. buf_size: 32 length: 69
This seems like a valid complaint. The "dev->pdev->bus->name" string
can be 48 characters long. I just made the buffer 80 characters instead
of 69 and I changed the sprintf() to snprintf().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code was indented too far and also kernel style says we should have
curly braces.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case of error when writing to userspace, function ehca_create_cq()
does not set an error code before following its error path.
This patch sets the error code to -EFAULT when ib_copy_to_udata()
fails.
This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case of error when writing to userspace, the function mthca_create_cq()
does not set an error code before following its error path.
This patch sets the error code to -EFAULT when ib_copy_to_udata() fails.
This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If kmalloc() fails in c4iw_alloc_ucontext(), the function
leaves but does not set an error code in ret variable:
it will return 0 to the caller.
This patch set ret to -ENOMEM in such case.
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@chelsio.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code which is dealing with SRIOV alias GUIDs in the mlx4 IB driver has some
logic which operated according to the maximal possible active functions (PF + VFs).
After the single port VFs code integration this resulted in a flow of false-positive
warnings going to the kernel log after the PF driver started the alias GUID work.
Fix it by referring to the actual number of functions.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.
If the IWCM returns an error, then we must abort the connection and
release resources. Also abort_connection() should not post a CLOSE
event, so clean that up too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
These are generated by HW in some error cases and need to be
silently discarded.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must
free the request skb and the saved skb with the tcp header.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Adds support for N-Port VFs, this includes:
1. Adding support in the wrapped FW command
In wrapped commands, we need to verify and convert
the slave's port into the real physical port.
Furthermore, when sending the response back to the slave,
a reverse conversion should be made.
2. Adjusting sqpn for QP1 para-virtualization
The slave assumes that sqpn is used for QP1 communication.
If the slave is assigned to a port != (first port), we need
to adjust the sqpn that will direct its QP1 packets into the
correct endpoint.
3. Adjusting gid[5] to modify the port for raw ethernet
In B0 steering, gid[5] contains the port. It needs
to be adjusted into the physical port.
4. Adjusting number of ports in the query / ports caps in the FW commands
When a slave queries the hardware, it needs to view only
the physical ports it's assigned to.
5. Adjusting the sched_qp according to the port number
The QP port is encoded in the sched_qp, thus in modify_qp we need
to encode the correct port in sched_qp.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports.
Instead, we should only loop until the number of actual ports in i
the device, which is stored in dev->caps.num_ports.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returning directly is easier to read than do-nothing gotos. Remove the
duplicative check on "olp" and pull the code in one indent level.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Improve performance by changing the behavour of the driver when all
SDMA descriptors are in use, and the processes adding new descriptors
are single- or multi-rail.
For single-rail processes, the driver will block the call and finish
posting all SDMA descriptors onto the hardware queue before returning
back to PSM. Repeated kernel calls are slower than blocking.
For multi-rail processes, the driver will return to PSM as quick as
possible so PSM can feed packets to other rail. If all hardware
queues are full, PSM will buffer the remaining SDMA descriptors until
notified by interrupt that space is available.
This patch builds a red-black tree to track the number rails opened by
a particular PID. If the number is more than one, it is a multi-rail
PSM process, otherwise, it is a single-rail process.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: John A Gregor <john.a.gregor@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it. This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().
I found this with dma mapping debugging enabled in the kernel. The
fix is to save the length in the c4iw_fr_page_list struct.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Based on original work from Jay Hernandez <jay@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Always release the neigh entry in rx_pkt().
Based on original work by Santosh Rastapur <santosh@chelsio.com>.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
find_route() must treat loopback as a valid egress interface.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
There is a four byte hole at the end of the "uresp" struct after the
->qid_mask member.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
These sizes should be unsigned so we don't allow negative values and
have underflow bugs. These can come from the user so there may be
security implications, but I have not tested this.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix the following warning for the mlx4 driver:
$ make M=drivers/infiniband C=2 CF=-D__CHECK_ENDIAN__
drivers/infiniband/hw/mlx4/qp.c:1885:31: warning: restricted __be16 degrades to integer
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function ‘_ocrdma_modify_qp’:
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1299:31: error: ‘old_qps’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
status = ocrdma_mbx_modify_qp(dev, qp, attr, attr_mask, old_qps);
ocrdma_mbx_modify_qp() (and subsequent calls) doesn't appear to use old_qps
so it doesn't need to be passed on. Removing the variable results in the
warning going away.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Devesh Sharma (Devesh.sharma@emulex.com)
Signed-off-by: Roland Dreier <roland@purestorage.com>
We don't need to test "ret" twice and also the white space is messed up.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We already know "pusable" is non-zero, no need to check again.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
%pa format already prints in hexadecimal format, so remove the '0x' annotation
to avoid a double '0x0x' pattern.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case of error while accessing to userspace memory, function
nes_create_qp() returns NULL instead of an error code wrapped through
ERR_PTR(). But NULL is not expected by ib_uverbs_create_qp(), as it
check for error with IS_ERR().
As page 0 is likely not mapped, it is going to trigger an Oops when
the kernel will try to dereference NULL pointer to access to struct
ib_qp's fields.
In some rare cases, page 0 could be mapped by userspace, which could
turn this bug to a vulnerability that could be exploited: the function
pointers in struct ib_device will be under userspace total control.
This was caught when using spatch (aka. coccinelle)
to rewrite calls to ib_copy_{from,to}_udata().
Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In qib_create_ctxts() we allocate an array to hold recv contexts. Then attempt
to create data for those recv contexts. If that call to qib_create_ctxtdata()
fails then an error is returned but the previously allocated memory is not
freed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit af061a644a add some code in qib_ib_rcv() which
trigger a warning from coccicheck (coccinelle/spatch):
$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/
CHECK drivers/infiniband/hw/qib/qib_verbs.c
drivers/infiniband/hw/qib/qib_verbs.c:679:5-32: code aligned with following code on line 681
CC [M] drivers/infiniband/hw/qib/qib_verbs.o
In fact, according to similar code in qib_kreceive(),
qib_ib_rcv() code is correct but improperly indented.
This patch fix indentation for the misaligned portion.
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit c804f07248 moved qib_assign_ctxt() to
do_qib_user_sdma_queue_create() but dropped the braces
around the statements.
This was spotted by coccicheck (coccinelle/spatch):
$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/
CHECK drivers/infiniband/hw/qib/qib_file_ops.c
drivers/infiniband/hw/qib/qib_file_ops.c:1583:2-23: code aligned with following code on line 1587
This patch adds braces back.
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Cc: stable@vger.kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.
The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.
This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch replaces the dd->int_counter with a percpu counter.
The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.
There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)
A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The debugfs init code was incorrectly called before the idr mechanism
is used to get the unit number, so the dd->unit hasn't been
initialized. This caused the unit relative directory creation to fail
after the first.
This patch moves the init for the debugfs stuff until after all of the
failures and after the unit number has been determined.
A bug in unwind code in qib_alloc_devdata() is also fixed.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Guard against a potential buffer overrun. The size to read from the
user is passed in, and due to the padding that needs to be taken into
account, as well as the place holder for the ICRC it is possible to
overflow the 32bit value which would cause more data to be copied from
user space than is allocated in the buffer.
Cc: <stable@vger.kernel.org>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Guard against a potential buffer overrun. Right now the qib driver is
protected by the fact that the data structure in question is only 16
bits. Should that ever change the problem will be exposed. There is a
similar defect in the ipath driver and this brings the two code paths
into sync.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>