Commit Graph

5768 Commits

Author SHA1 Message Date
Markus Elfring
3491ab63b4 IB/mthca: NULL arg to pci_dev_put is OK
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:35 -04:00
Markus Elfring
f7ca535ba0 IB/hfi1: NULL arg to sc_return_credits is OK
The sc_return_credits() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:35 -04:00
Mark Bloch
3f85f2aaab IB/mlx4: Add diagnostic hardware counters
Expose IB diagnostic hardware counters.
The counters count IB events and are applicable for IB and RoCE.

The counters can be divided into two groups, per device and per port.
Device counters are always exposed.
Port counters are exposed only if the firmware supports per port counters.

rq_num_dup and sq_num_to are only exposed if we have firmware support
for them, if we do, we expose them per device and per port.
rq_num_udsdprd and num_cqovf are device only counters.

rq - denotes responder.
sq - denotes requester.

|-----------------------|---------------------------------------|
|	Name		|	Description			|
|-----------------------|---------------------------------------|
|rq_num_lle		| Number of local length errors		|
|-----------------------|---------------------------------------|
|sq_num_lle		| number of local length errors		|
|-----------------------|---------------------------------------|
|rq_num_lqpoe		| Number of local QP operation errors	|
|-----------------------|---------------------------------------|
|sq_num_lqpoe		| Number of local QP operation errors	|
|-----------------------|---------------------------------------|
|rq_num_lpe		| Number of local protection errors	|
|-----------------------|---------------------------------------|
|sq_num_lpe		| Number of local protection errors	|
|-----------------------|---------------------------------------|
|rq_num_wrfe		| Number of CQEs with error		|
|-----------------------|---------------------------------------|
|sq_num_wrfe		| Number of CQEs with error		|
|-----------------------|---------------------------------------|
|sq_num_mwbe		| Number of Memory Window bind errors	|
|-----------------------|---------------------------------------|
|sq_num_bre		| Number of bad response errors		|
|-----------------------|---------------------------------------|
|sq_num_rire		| Number of Remote Invalid request	|
|			| errors				|
|-----------------------|---------------------------------------|
|rq_num_rire		| Number of Remote Invalid request	|
|			| errors				|
|-----------------------|---------------------------------------|
|sq_num_rae		| Number of remote access errors	|
|-----------------------|---------------------------------------|
|rq_num_rae		| Number of remote access errors	|
|-----------------------|---------------------------------------|
|sq_num_roe		| Number of remote operation errors	|
|-----------------------|---------------------------------------|
|sq_num_tree		| Number of transport retries exceeded	|
|			| errors				|
|-----------------------|---------------------------------------|
|sq_num_rree		| Number of RNR NAK retries exceeded	|
|			| errors				|
|-----------------------|---------------------------------------|
|rq_num_rnr		| Number of RNR NAKs sent		|
|-----------------------|---------------------------------------|
|sq_num_rnr		| Number of RNR NAKs received		|
|-----------------------|---------------------------------------|
|rq_num_oos		| Number of Out of Sequence requests	|
|			| received				|
|-----------------------|---------------------------------------|
|sq_num_oos		| Number of Out of Sequence NAKs	|
|			| received				|
|-----------------------|---------------------------------------|
|rq_num_udsdprd		| Number of UD packets silently		|
|			| discarded on the Receive Queue due to	|
|			| lack of receive descriptor		|
|-----------------------|---------------------------------------|
|rq_num_dup		| Number of duplicate requests received	|
|-----------------------|---------------------------------------|
|sq_num_to		| Number of time out received		|
|-----------------------|---------------------------------------|
|num_cqovf		| Number of CQ overflows		|
|-----------------------|---------------------------------------|

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:34 -04:00
Mustafa Ismail
e972dabf9c Use smaller 512 byte messages for portmapper messages
Portmapper messages are short and do not occupy more than 512 bytes.
Lower portmapper message size to 512 bytes. This change significantly
reduces the amount of memory needed when trying to establish a large
number of connections simultaneously. The old value is based on page
size.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:33 -04:00
Yuval Shaia
5faba54695 IB/ipoib: Report SG feature regardless of HW UD CSUM capability
Decouple SG support from HW ability to do UD checksum.
This coupling is for historical reasons and removed with 'commit
ec5f061564 ("net: Kill link between CSUM and SG features.")'

During driver load it is assumed that device does not supports SG. The
final decision is taken after creating UD QP based on device capability.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Roland Dreier
0c87b67209 IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
We allocate a small tracking structure as part of mlx4_ib_resize_cq().
However, we don't need to use GFP_ATOMIC -- immediately after the
allocation, we call mlx4_cq_resize(), which allocates a command
mailbox with GFP_KERNEL and then sleeps on a firmware command, so we
better not be in an atomic context.

This actually has a real impact, because when this GFP_ATOMIC
allocation fails (and GFP_ATOMIC does fail in practice) then a
userspace consumer resizing a CQ will get a spurious failure that we
can easily avoid.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Bart Van Assche
a154a8cd08 IB/hfi1: Disable by default
There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: f48ad614c1 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.7+
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:32 -04:00
Bart Van Assche
378fc3201e IB/rdmavt: Disable by default
There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: 0194621b22 ("IB/rdmavt: Create module framework and handle driver registration")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jubin John <jubin.john@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:31 -04:00
Doug Ledford
6a89d89d85 Merge branch 'i40iw' into k.o/for-4.8 2016-08-03 21:00:16 -04:00
Doug Ledford
3e5e8e8a9a Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8 2016-08-03 20:58:45 -04:00
Dean Luick
0636e9ab83 IB/hfi1: Add cache evict LRU list
The original code used a LRU list to evict nodes which were least
recently used.  For correctness the evict code was moved under the
handler->lock, now add back the LRU list.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
2677a7680e IB/hfi1: Fix memory leak during unexpected shutdown
During an unexpected shutdown, references to tid_rb_node were NULL'ed out
without properly being released.

Fix this by calling clear_tid_node in the mmu notifier remove callback
rather than after these callbacks are called.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
082b353291 IB/hfi1: Remove unneeded mm argument in remove function
The reworked mmu_rb interface allows the unused mm argument to be removed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
b85ced9151 IB/hfi1: Consistently call ops->remove outside spinlock
The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock.  In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()

This had 2 problems.  First this would attempt to acquire the mmap_sem
under a spin lock.  Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.

Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock.  This also allows the correct
mm to be used in the remove callback without fear of deadlock.

Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.

Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
b7df192f74 IB/hfi1: Use evict mmu rb operation
Use the new cache evict operation in the SDMA code.  This allows the cache
to properly coordinate evicts and removes, preventing any race.  With this
change, the separate list, lock, and race flag are not needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
1034599805 IB/hfi1: Add evict operation to the mmu rb handler
Allow users to clear nodes from the rb tree based on their evict callback.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
622c202c4a IB/hfi1: Fix TID caching actions
Per file descriptor TID caching actions depend on a global that can
change midway through the lifetime of that file descriptor.

Make the use of caching consistent for the life of the file descriptor
by using the presence of the cache handler to decide when to use the cache
functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
e0b09ac55d IB/hfi1: Make the cache handler own its rb tree root
The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.

Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.

This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
3faa3d9a30 IB/hfi1: Make use of mm consistent
The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed.  The driver
incorrectly assumes that the close will always happen from the same
context as the open.  In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context.  In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.

Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd.  Note: this
patch leaves 1 use of current->mm in place.  This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.

If registration fails, there is no reason to keep the handler object
around.  Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.

Suggested-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
7b3256e331 IB/hfi1: Fix user SDMA racy user request claim
The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made.  Move the request in-use flag into its own bit
array and use that for atomic claims.  This cleans up the claim code and
removes any race possibility.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
9da7e9a711 IB/hfi1: Fix error condition that needs to clean up
If input validation fails, properly free the request before returning.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
a383f8ec55 IB/hfi1: Release node on insert failure
If unable to insert node into the RB tree cache, node will be freed
before returning from the function.  Null out iovec's pointer to node
so iovec does not try to free it later.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
9ff73c8715 IB/hfi1: Validate SDMA user iovector count
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
4fa0d22c9a IB/hfi1: Validate SDMA user request index
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
bdf7752e07 IB/hfi1: Use the same capability state for all shared contexts
Save the current capability state at user context creation
time.  Report this saved value for all shared contexts.

Also get rid of unnecessary hfi1_get_base_kinfo function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
53445bb32d IB/hfi1: Prevent null pointer dereference
If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
a7cd2dc5d4 IB/hfi1: Rename TID mmu_rb_* functions
Clarify the names of the TID mmu functions.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
20a42d0833 IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
Checking if the rb tree is empty is redundant with the while loop which is
emptying the rb tree.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
ea3a0ee52d IB/hfi1: Restructure hfi1_file_open
Rearrange the file open call in prep for new changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
ff4ce9bde9 IB/hfi1: Make iovec loop index easy to understand
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
639297b4f0 IB/hfi1: Use "false" not 0
For bool parameters "false" should be used

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
5ed3b15b05 IB/hfi1: Remove unused sub-context parameter
subctxt is not used, just remove it.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
3c1091aa94 IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove
__mmu_rb_remove was called in only 1 place which was a very simple
call site.  Combine this function into its caller.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
c0946642e5 IB/hfi1: Always expect ops functions
Remove, insert, and invalidate are always provided.  No
need to test.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
862548dace IB/hfi1: Add parameter names to callback declarations
This makes it more clear what these functions are
operating on.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
ac335e7e80 IB/hfi1: Add parameter names to function declarations
Parameter names to function declarations make it more clear
what those parameters do.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
fc87879ae2 IB/hfi1: Remove unused function hfi1_mmu_rb_search
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dean Luick
8e1f52df97 IB/hfi1: Remove unused uctxt->subpid and uctxt->pid
These are no longer needed.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
72720ddfc6 IB/hfi1: Fix minor format error
Brackets should be on the next line of a function

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
fc0b76c016 IB/hfi1: Expand reported serial number
Expand the serial number space by using more bits
from the GUID.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
c492980269 IB/hfi1: Allow for non-double word multiple message sizes for user SDMA
The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
fe508272c9 IB/rdmavt: Eliminate redundant opcode test in mr ref clear
The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.

The overly specific test hinders the use of implementation
specific operations.

The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Ira Weiny
042b0159aa IB/hfi1: Handle kzalloc failure in init_pervl_scs
Checking the return value of the memory allocation call in
init_pervl_scs() was missed.  Recently the kmalloc() was changed to
kzalloc() which identified the problem.

While fixing this issue 2 other bugs were noticed.  First, the array
being allocated is accessed in the nomem path which can be reached before
it is allocated.  Second, kernel_send_context was not released on error.
Fix both of these by creating a more common memory unwind label structure.

Fixes: 35f6befc84 ("staging/rdma/hfi1: Add qp to send context mapping for PIO")
Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Dasaratharaman Chandramouli
527dbf12e0 IB/qib, IB/hfi1: Fix grh creation in ud loopback
Instead of copying the actual GRH of type struct ib_grh, existing code
copies the struct ib_global_route into the sge. This patch fixes that
and constructs the actual GRH from ib_global_route and copies the GRH
into the sge.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli
b736a469f9 IB/hfi1: Use hdr2sc function to calculate 5-bit SC
The interface is used to compute the 5-bit SC field from the
LRH and the RHF bits. Modify code to use the interface instead.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli
89c057cae4 IB/hfi1: Cleanup UD packet handler.
Cleanup hfi1_ud_rcv to not have to look at the packet
header fields multiple times. The fields are looked up
once and used throughout the function. Also fix sc
computation when validating MAD packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Don Hiatt
d4d602e9a3 IB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header.
hfi1_pio_header should really be called hfi1_sdma_header
as it is only used for sdma transmits.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli
a9b6b3bc29 IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info
struct ahg_ib_header has no header specific information.
Rename it to struct hfi1_ahg_info

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli
bd24ef5eca IB/hfi1: Remove unused elements from struct ahg_ib_header
sde and hfi1_ib_header are not used anymore.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Easwar Hariharan
b5e7101954 IB/hfi1: Reset QSFP on every run through channel tuning
Active QSFP cables were reset only every alternate iteration of the
channel tuning algorithm instead of every iteration due to incorrect
reset of the flag that controlled QSFP reset, resulting in using stale
QSFP status in the channel tuning algorithm.

Fixes: 8ebd4cf185 ("Add active and optical cable support")
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Easwar Hariharan
5fbd98dd20 IB/hfi1: Ignore QSFP interrupts until power stabilizes
Some QSFP cables assert the interrupt line as a side effect of module
plug-in and power up. This causes the SerDes and QSFP tuning algorithm
to begin cable initialization by reading the QSFP memory map over I2C,
which fails. This patch ignores any interrupt line assertion until
the module has completed power up and voltage rails have stabilized,
which can take a maximum of 500 ms per the SFF-8679 specification.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Easwar Hariharan
3ca5f4c068 IB/hfi1: Disable external device configuration requests
QSFP CDR enablement is now controlled by determining power class
and the configuration file. We disable the DC 8051 from requesting
enablement or disabling of TX and RX CDRs by removing the code
that allowed the DC 8051 to request changes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
d9b13c2030 IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
the server contains messages like these:

[  931.992501] svcrdma: Error -22 posting RDMA_READ
[  952.076879] svcrdma: Error -22 posting RDMA_READ
[  982.154127] svcrdma: Error -22 posting RDMA_READ
[ 1012.235884] svcrdma: Error -22 posting RDMA_READ
[ 1042.319194] svcrdma: Error -22 posting RDMA_READ

Here is why:

With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:

(1)IB_WR_REG_MR, signaled;
(2)IB_WR_RDMA_READ, signaled;
(3)IB_WR_LOCAL_INV, signaled & fencing.

These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).

This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.

To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Mike Marciniszyn
856cc4c237 IB/hfi1: Add the capability for reserved operations
This fix allows for support of in-kernel reserved operations
without impacting the ULP user.

The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.

ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.

The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Grzegorz Heldt
23002d5b08 IB/hfi1: Fix trace message units
Trace shows incorrect amount of allocated memory.
Fix trace to display memory in KB.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Tadeusz Struk
b14db1f0aa IB/hfi1: Add sysfs entry to override SDMA interrupt affinity
Add sysfs entry to allow user to override affinity for SDMA
engine interrupts.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dean Luick
c3f8de0b33 IB/hfi1: Add static PCIe Gen3 CTLE tuning
Enhance the PCIe Gen3 recipe to support static CTLE tuning,
and add a switch to choose between static and dynamic
approaches.  Make discrete chips default to static CTLE
tuning.

Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
8adf71fa14 IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings
This fixes the following warnings with PROVE_LOCKING and PROVE_RCU
enabled in the kernel:

case (1):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/init.c:532
suspicious rcu_dereference_check() usage!

case (2):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/hfi.h:1624
suspicious rcu_dereference_check() usage!

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
a6580f4310 IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock
This fixes the following warning with PROV_LOCKING enabled kernel:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1
Hardware name: Intel Corporation S2600WT2R/S2600WT2R,
......
Call Trace:
[<ffffffff8139ec0d>] dump_stack+0x85/0xc8
[<ffffffff810eb765>] register_lock_class+0x415/0x4b0
[<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960
[<ffffffff810edaa9>] __lock_acquire+0x99/0x1960
[<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60
[<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60
[<ffffffff810ef9d3>] lock_acquire+0xd3/0x200
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff81763391>] _raw_spin_lock+0x31/0x40
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core]
[<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core]
[<ffffffff810e968f>] ? up_write+0x1f/0x40
[<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core]
[<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200
[<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt]
[<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1]
[<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1]
[<ffffffff813e40a5>] local_pci_probe+0x45/0xa0
[<ffffffff813e5110>] ? pci_match_device+0xe0/0x110
[<ffffffff813e550c>] pci_device_probe+0xfc/0x140
[<ffffffff814daee9>] driver_probe_device+0x239/0x460
[<ffffffff814db1dd>] __driver_attach+0xcd/0xf0
[<ffffffff814db110>] ? driver_probe_device+0x460/0x460
[<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0
[<ffffffff814da74e>] driver_attach+0x1e/0x20
[<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff814dbf60>] driver_register+0x60/0xe0
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff813e39d0>] __pci_register_driver+0x60/0x70
[<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1]
[<ffffffff81002190>] do_one_initcall+0x50/0x190
[<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70
[<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0
[<ffffffff811c1881>] ? do_init_module+0x27/0x1dc
[<ffffffff811c18ba>] do_init_module+0x60/0x1dc
[<ffffffff811360cc>] load_module+0x132c/0x1ac0
[<ffffffff81132c40>] ? __symbol_put+0x60/0x60
[<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dean Luick
b3bf270bed IB/hfi1: Read all firmware versions
Read the version of the SBus, PCIe SerDes, and Fabric Serdes
firmwares at driver load time.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Dean Luick
6854c6925d IB/hfi1: Explain state complete frame details
When link up fails in LNI, the local and peer state complete
frames are reported as numbers.  Explain what the values mean
so the operator can better diagnose the problem.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Harish Chegondi
8784ac0243 IB/hfi1: Modify the default number of kernel receive conexts
Currently, the default number of kernel receive contexts is set to the
number of NUMA nodes on the system plus one for control context. However,
the systems that have a single socket and/or have NUMA disabled in the BIOS
will have only one receive context by default. This patch would ensure that
by default there will be at least two kernel receive contexts plus one for
control context regardless of the number of NUMA nodes on the system. The
user can override the default number of kernel receive contexts with the
krcvqs module parameter.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
c72cfe3e38 IB/hfi1: Add support for extended memory management
Advertise and add the capability of handing all aspects of IBTA extended
memory management support in post send.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
0db3dfa03c IB/hfi1: Work request processing for fast register mr and invalidate
In order to support extended memory management support, add send side
processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and
IB_WR_SEND_WITH_INV. The first two are local operations and are supported
for both RC and UC. Send with invalidate is only supported for RC because
the corresponding IB opcodes are not defined for UC.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
a2df0c8332 IB/hfi1: Handle send with invalidate opcode in the RC recv path
As part of enabling extended memory management support, add the processing
of the RC send with invalidate.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
d9f8723924 IB/rdmavt: Handle local operations in post send
Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:

(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.

(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.

Define a new a flag bit for the post send table to identify local
operations.

Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
e8f8b098a4 IB/rdmavt: Add mechanism to invalidate MR keys
In order to support extended memory management, add the mechanism to
invalidate MR keys. This includes a flag "lkey_invalid" in the MR data
structure that is to be checked when validating access to the MR via
the associated key, and two utility functions to perform fast memory
registration and memory key invalidate operations.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
a41081aa59 IB/rdmavt: Add support for ib_map_mr_sg
This implements the device specific function needed by the verbs
API function ib_map_mr_sg().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Mitko Haralanov
5fd2b562ed IB/hfi1: Pull FECN/BECN processing to a common place
There were multiple places where FECN/BECN processing was
being done for the different types of QPs. All of that code
was very similar, which meant that it could be pulled into
a single function used by the different QP types.

To retain the performance in the fastpath, the common code
starts with an inline function, which only calls the slow
path if the packet has any of the [FB]ECN bits set.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Tymoteusz Kielan
1b23f02cf4 IB/hfi1: Fix to fully initialize send context area
While handling buffer control MAD, partially initialized
dd->kernel_send_context area may cause potential dereference
of uninitialized pointers. Fix by using kzalloc_node()
instead of kmalloc_node().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jakub Pawlak
3210314ad3 IB/hfi1: Fix integrity errors counter value calculation
PMA should not sum TX and RX replay counts when reporting
local link integrity errors. Fixed by removing C_DC_TX_REPLAY
counter from calculation of the link integrity errors counter
value.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Mike Marciniszyn
2821c509fc IB/rdmavt: Use new driver specific post send table
Change rvt_post_one_wr to use the new table mechanism for
post send.

Validate that each low level driver specifies the table.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:45 -04:00
Mike Marciniszyn
9ec4faa391 IB/qib: Add qib post send table
Add initial table for table driven post_send support.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:44 -04:00
Mike Marciniszyn
1ac57c50e9 IB/hfi1: Add hfi1 post send tables
Add initial table for table driven post_send support.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:44 -04:00
Mike Marciniszyn
afcf8f7647 IB/rdmavt: Add data structures and routines for table driven post send
Add flexibility for driver dependent operations in post send
because different drivers will have differing post send
operation support.

This includes data structure definitions to support a table
driven scheme along with the necessary validation routine
using the new table.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:44 -04:00
Jakub Pawlak
71e68e3db8 IB/hfi1: Correct receive packet handler assignment
Prevent processing receive packet in case when opcode is
accepted by QP but handler for this type of packet is not
defined.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:43 -04:00
Jianxin Xiong
14833b8c52 IB/hfi1: Improve SDMA engine assignment for user SDMA
Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.

This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:43 -04:00
Dean Luick
e014991d07 IB/hfi1: Remove TWSI references
Remove the TWSI code.  The driver now uses the kernel's built-in
i2c bit bus module.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:42 -04:00
Dean Luick
dba715f0c8 IB/hfi1: Use built-in i2c bit-shift bus adapter
Use built-in i2c bit-shift bus adapter to control the
i2c busses on the chip.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:42 -04:00
Sebastian Sanchez
b094a36f90 IB/hfi1: Refine user process affinity algorithm
When performing process affinity recommendations for MPI ranks, the current
algorithm doesn't take into account multiple HFI units. Also, real
cores and HT cores are not distinguished from one another. Therefore,
all HT cores are recommended to be assigned first within the local NUMA
node before recommending the assignments of cores in other NUMA nodes.
It's ideal to assign all real cores across all NUMA nodes first, then all
HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU
cores in other NUMA nodes could be running interrupt handlers, and this is
not taken into account.

To balance the CPU workload for user processes, the following
recommendation algorithm is used:

 For each user process that is opening a context on HFI Y:
  a) If all cores are assigned to user processes, start assignments all
	 over from the first core
  b) Assign real cores first, then HT cores (First set of HT cores on
	 all physical cores, then second set of HT cores, and, so on) in the
	 following order:

	 1. Same NUMA node as HFI Y and not running an IRQ handler
	 2. Same NUMA node as HFI Y and running an IRQ handler
	 3. Different NUMA node to HFI Y and not running an IRQ handler
	 4. Different NUMA node to HFI Y and running an IRQ handler
  c) Mark core as assigned in the global affinity structure. As user
	 processes are done, remove core assignments from global affinity
	 structure.

This implementation allows an arbitrary number of HT cores and provides
support for multiple HFIs.

This is being included in the kernel rather than user space due to the
fact that user space has no way of knowing the CPU recommendations for
contexts running as part of other jobs.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:33 -04:00
Sebastian Sanchez
d63730192f IB/hfi1: Reserve and collapse CPU cores for contexts
Kernel receive queues oversubscribe CPU cores on multi-HFI systems.
To prevent this, the kernel receive queues are separated onto
different cores, and the SDMA engine interrupts are constrained to
a lesser number of cores.

hfi1s_on_numa_node*krcvqs is the number of CPU cores that are
reserved for kernel receive queues for all HFIs. Each HFI initializes
its kernel receive queues to one of the reserved CPU cores. If there
ends up being 0 CPU cores leftover for SDMA engines, use the same
CPU cores as receive contexts.

In addition, general and control contexts are assigned to their own
CPU core, however, both types of contexts tend to have low traffic.
To save CPU cores, collapse general and control contexts to one CPU
core for all HFI units. This change prevents SDMA engine interrupts
from wrapping around general contexts.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:07 -04:00
Dennis Dalessandro
4197344ba5 IB/hfi1: Add global structure for affinity assignments
When HFI units get initialized, they each use their own mask copy for
affinity assignments. On a multi-HFI system, affinity assignments
overbook CPU cores as each HFI doesn't have knowledge of affinity
assignments for other HFI units. Therefore, some CPU cores are never
used for interrupt handlers in systems with high number of CPU cores
per NUMA node.

For multi-HFI systems, SDMA engine interrupt assignments start all over
from the first CPU in the local NUMA node after the first HFI
initialization. This change allows assignments to continue where the
last HFI unit left off.

Add global structure for affinity assignments for multiple HFIs to share
affinity mask.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:45:14 -04:00
Alex Vesker
321a9e3ebc IB/mlx5: Fix port counter ID association to QP offset
The q-counter-id is given in modify-QP command associates
the QP with the counter. The offset to which the counter
ID was set is incorrect, causing IB port counters not to
count on QP.

Fixes: 0837e86a7a ('IB/mlx5: Add per port counters')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:40:49 -04:00
Slava Shwartsman
b0ffeb537f IB/mlx5: Fix iteration overrun in GSI qps
Number of outstanding_pi may overflow and as a result may indicate that
there are no elements in the queue. The effect of doing this is that the
MAD layer will get stuck waiting for completions. The MAD layer will
think that the QP is full - because it didn't receive these completions.

This fix changes it so the outstanding_pi number is increased
with 32-bit wraparound and is not limited to max_send_wr so
that the difference between outstanding_pi and outstanding_ci will
really indicate the number of outstanding completions.

Cc: Stable <stable@vger.kernel.org>
Fixes: ea6dc20362 ('IB/mlx5: Reorder GSI completions')
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:32:51 -04:00
Mustafa Ismail
da5c138185 i40iw: Add NULL check for puda buffer
i40iw_puda_get_listbuf may return NULL if the list is empty.
Add NULL check prior to accessing the pointer.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
8c1ea86d44 i40iw: Change dup_ack_thresh to u8
Change dup_ack_thressh to u8 since it is a 3 bit field.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
c5d057d32b i40iw: Remove unnecessary check for moving CQ head
In i40iw_cq_poll_completion, we always move the tail. So there is
no reason to check for overflow everytime we move the head.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
b494c3e677 i40iw: Simplify code to set fragments in SQ WQE
Replace a subtract and multiply with an add; while populating fragments
in SQ wqe.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
b54143bea9 i40iw: Remove unnecessary parameter to i40iw_cq_poll_completion
Post_cq parameter passed to i40iw_cq_poll_completion is always
true; so remove.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
5ec11ed23e i40iw: Do not access pointer after free
Child_listen_node pointer is used in a debug print after kfree.
Move the print before kfree.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Mustafa Ismail
342c387b0d i40iw: Correct and use size parameter to i40iw_reg_phys_mr
Fix size parameter passed to i40iw_reg_phys_mr and use it to
register memory.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Shiraz Saleem
fe5d6e625d i40iw: Fix return codes
Fix incorrect usage of ENOSYS and other return codes.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 14:17:38 -04:00
Steve Wise
59c68ac31e iw_cm: free cm_id resources on the last deref
Remove the complicated logic to free the iw_cm_id inside iw_cm
event handlers vs when an application thread destroys the cm_id.
Also remove the block in iw_destroy_cm_id() to block the application
until all references are removed.  This block can cause a deadlock when
disconnecting or destroying cm_ids inside an rdma_cm event handler.
Simply allowing the last deref of the iw_cm_id to free the memory
is cleaner and avoids this potential deadlock. Also a flag is added,
IW_CM_DROP_EVENTS, that is set when the cm_id is marked for destruction.
If any events are pending on this iw_cm_id, then as they are processed
they will be dropped vs posted upstream if IW_CM_DROP_EVENTS is set.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:15:18 -04:00
Steve Wise
ad61a4c7a9 iw_cxgb4: don't block in destroy_qp awaiting the last deref
Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp
or disconnect a cm_id from their cm event handler function.  There is
no need to block here anyway, so just replace the refcnt atomic with a
kref object and free the memory on the last put.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:15:17 -04:00
Steve Wise
1b1a889dbb iw_cxgb4: explicitly move the qp to ERROR state during flush
This forces the connection to abort if the application failed to
disconnect before flushing.  This is aligned with how the common
flush services work.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:15:17 -04:00
Steve Wise
12eb5137ed iw_cxgb4: stop MPA_REPLY timer when disconnecting
There exists a race where the application can setup a connection
and then disconnect it before iw_cxgb4 processes the fw4_ack
message.  For passive side connections, the fw4_ack message is
used to know when to stop the ep timer for MPA_REPLY messages.

If the application disconnects before the fw4_ack is handled then
c4iw_ep_disconnect() needs to clean up the timer state and stop the
timer before restarting it for the disconnect timer.  Failure to do this
results in a "timer already started" message and a premature stopping
of the disconnect timer.

Fixes: e4b76a2 ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:15:17 -04:00
Mustafa Ismail
cea05eadde IB/core: Add flow control to the portmapper netlink calls
During connection establishment with a large number of
connections, it is possible that the connection requests
might fail. Adding flow control prevents this failure.
Change ibnl_unicast to use blocking to enable flow control.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:14:27 -04:00
Amitoj Kaur Chawla
3b8fb4b86c RDMA/cxgb3: Use AF_INET for sin_family field
Elsewhere the sin_family field holds a value with a name of the form
AF_..., so it seems reasonable to do so here as well.  Also the values
of PF_INET and AF_INET are the same.

The Coccinelle semantic patch that makes this change is as follows:

// <smpl>
@@
struct sockaddr_in sip;
@@

(
sip.sin_family ==
- PF_INET
+ AF_INET
|
sip.sin_family !=
- PF_INET
+ AF_INET
|
sip.sin_family =
- PF_INET
+ AF_INET
)
// </smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:04:42 -04:00
Hariprasad S
56b2eca3fe RDMA/iw_cxgb4: Use kfree_skb instead of kfree
The commit 0f8ab0b6e9 ("RDMA/iw_cxgb4: Low resource fixes for Memory
registration") from Jun 10, 2016, leads to the following static checker
warning:

        drivers/infiniband/hw/cxgb4/mem.c:612 c4iw_alloc_mw()
        error: use kfree_skb() here instead of kfree(mhp->dereg_skb)

Also fixes skb leak in c4iw_dealloc_mw

Fixes: 0f8ab0b6e9 ("RDMA/iw_cxgb4: Low resource fixes for Memory registration")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:04:42 -04:00
Wei Yongjun
619615005e IB/mlx5: Fix duplicate const warning
Fixes the following sparse warning:

drivers/infiniband/hw/mlx5/main.c:2574:25: warning: duplicate const

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:00:33 -04:00