Commit Graph

5297 Commits

Author SHA1 Message Date
Raed Salem
1a1e03dc15 IB/mlx5: Add counters read support
This patch implements the uverbs counters read API, it will use the
specific read counters function to the given type to accomplish its
task.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:37 +03:00
Raed Salem
5e95af5f7b IB/mlx5: Add flow counters read support
Implements the flow counters read wrapper.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:37 +03:00
Raed Salem
3b3233fbf0 IB/mlx5: Add flow counters binding support
Associates a counters with a flow when IB_FLOW_SPEC_ACTION_COUNT is part
of the flow specifications.

The counters user space placements of location and description (index,
description) pairs are passed as private data of the counters flow
specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:32 +03:00
Raed Salem
b29e2a1309 IB/mlx5: Add counters create and destroy support
This patch implements the device counters create and destroy APIs and
introducing some internal management structures.

Downstream patches in this series will add the functionality to support
flow counters binding and reading.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:57 +03:00
Matan Barak
59082a327d IB/core: Support passing uhw for create_flow
This is required when user-space drivers need to pass extra information
regarding how to handle this flow steering specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:55 +03:00
Doug Ledford
008fba465d RDMA/hns_roce: Don't check return value of zap_vma_ptes()
There is no need to check return value of zap_vma_ptes()
because there is nothing to do with this knowledge.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:20:43 -04:00
Leon Romanovsky
7fc8ff267d RDMA/mlx4: Don't crash machine if zap_vma_ptes() fails
The failure reported by zap_vma_ptes() means that wrong VMA pages
were supplied, however it is impossible for this type of address.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:58 -04:00
Leon Romanovsky
2cb4079188 RDMA/mlx5: Don't check return value of zap_vma_ptes()
There is no need to check return value of zap_vma_ptes()
because there is nothing to do with this knowledge.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:24 -04:00
Leon Romanovsky
cd13a399e6 RDMA/cxgb3: Don't crash kernel just because IDR is full
cxgb3 driver properly handles errors returned by IDR, so there is no
need to have special case (kernel crash) just because IDR is full.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:23 -04:00
Leon Romanovsky
6b1ca7ece1 RDMA/mlx4: Discard unknown SQP work requests
There is no need to crash the machine if unknown work request was
received in SQP MAD.

Cc: <stable@vger.kernel.org> # 3.6
Fixes: 37bfc7c1e8 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:23 -04:00
Leon Romanovsky
f77f303626 RDMA/mlx4: Catch FW<->SW misalignment without machine crash
Any steering QP is supposed be above steering_qp_base,
see function mlx4_ib_steer_qp_alloc() for it, however in case
of misalignment between SW and FW, this qp_base can be wrong.

Use WARN() to catch such situation without killing the machine.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:23 -04:00
Colin Ian King
367d2f0787 RDMA/qedr: fix spelling mistake: "adrresses" -> "addresses"
Trivial fix to spelling mistake in DP_ERR error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-31 14:00:58 -04:00
Wei Hu(Xavier)
fedc3abe7b RDMA/hns: Implement the disassociate_ucontext API
This patch implemented the IB core disassociate_ucontext API.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30 20:45:03 -04:00
Wei Hu(Xavier)
a0976f418d RDMA/uverbs: Hoist the common process of disassociate_ucontext into ib core
This patch hoisted the common process of disassociate_ucontext
callback function into ib core code, and these code are common
to ervery ib_device driver.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30 20:45:03 -04:00
Wei Hu(Xavier)
0b25c9cc53 RDMA/hns: Fix the illegal memory operation when cross page
This patch fixed the potential illegal operation when using the
extend sge buffer cross page in post send operation. The bug
will cause the calltrace as below.

[ 3302.922107] Unable to handle kernel paging request at virtual address ffff00003b3a0004
[ 3302.930009] Mem abort info:
[ 3302.932790]   Exception class = DABT (current EL), IL = 32 bits
[ 3302.938695]   SET = 0, FnV = 0
[ 3302.941735]   EA = 0, S1PTW = 0
[ 3302.944863] Data abort info:
[ 3302.947729]   ISV = 0, ISS = 0x00000047
[ 3302.951551]   CM = 0, WnR = 1
[ 3302.954506] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000009ea5000
[ 3302.961279] [ffff00003b3a0004] *pgd=00000023dfffe003, *pud=00000023dfffd003, *pmd=00000022dc84c003, *pte=0000000000000000
[ 3302.972224] Internal error: Oops: 96000047 [#1] SMP
[ 3302.999509] CPU: 9 PID: 19628 Comm: roce_test_main Tainted: G           OE   4.14.10 #1
[ 3303.007498] task: ffff80234df78000 task.stack: ffff00000f640000
[ 3303.013412] PC is at hns_roce_v2_post_send+0x690/0xe20 [hns_roce_pci]
[ 3303.019843] LR is at hns_roce_v2_post_send+0x658/0xe20 [hns_roce_pci]
[ 3303.026269] pc : [<ffff0000020694f8>] lr : [<ffff0000020694c0>] pstate: 804001c9
[ 3303.033649] sp : ffff00000f643870
[ 3303.036951] x29: ffff00000f643870 x28: ffff80232bfa9c00
[ 3303.042250] x27: ffff80234d909380 x26: ffff00003b37f0c0
[ 3303.047549] x25: 0000000000000000 x24: 0000000000000003
[ 3303.052848] x23: 0000000000000000 x22: 0000000000000000
[ 3303.058148] x21: 0000000000000101 x20: 0000000000000001
[ 3303.063447] x19: ffff80236163f800 x18: 0000000000000000
[ 3303.068746] x17: 0000ffff86b76fc8 x16: ffff000008301600
[ 3303.074045] x15: 000020a51c000000 x14: 3128726464615f65
[ 3303.079344] x13: 746f6d6572202c29 x12: 303035312879656b
[ 3303.084643] x11: 723a6f666e692072 x10: 573a6f666e693a5d
[ 3303.089943] x9 : 0000000000000004 x8 : ffff8023ce38b000
[ 3303.095242] x7 : ffff8023ce38b320 x6 : 0000000000000418
[ 3303.100541] x5 : ffff80232bfa9cc8 x4 : 0000000000000030
[ 3303.105839] x3 : 0000000000000100 x2 : 0000000000000200
[ 3303.111138] x1 : 0000000000000320 x0 : ffff00003b3a0000
[ 3303.116438] Process roce_test_main (pid: 19628, stack limit = 0xffff00000f640000)
[ 3303.123906] Call trace:
[ 3303.126339] Exception stack(0xffff00000f643730 to 0xffff00000f643870)
[ 3303.215790] [<ffff0000020694f8>] hns_roce_v2_post_send+0x690/0xe20 [hns_roce_pci]
[ 3303.223293] [<ffff0000021c3750>] rt_ktest_post_send+0x5d0/0x8b8 [rdma_test]
[ 3303.230261] [<ffff0000021b3234>] exec_send_cmd+0x664/0x1350 [rdma_test]
[ 3303.236881] [<ffff0000021b8b30>] rt_ktest_dispatch_cmd_3+0x1510/0x3790 [rdma_test]
[ 3303.244455] [<ffff0000021bae54>] rt_ktest_dispatch_cmd_2+0xa4/0x118 [rdma_test]
[ 3303.251770] [<ffff0000021bafec>] rt_ktest_dispatch_cmd+0x124/0xaa8 [rdma_test]
[ 3303.258997] [<ffff0000021bbc3c>] rt_ktest_dev_write+0x2cc/0x568 [rdma_test]
[ 3303.265947] [<ffff0000082ad688>] __vfs_write+0x60/0x18c
[ 3303.271158] [<ffff0000082ad998>] vfs_write+0xa8/0x198
[ 3303.276196] [<ffff0000082adc7c>] SyS_write+0x6c/0xd4
[ 3303.281147] Exception stack(0xffff00000f643ec0 to 0xffff00000f644000)
[ 3303.287573] 3ec0: 0000000000000003 0000fffffc85faa8 0000000000004e60 0000000000000000
[ 3303.295388] 3ee0: 0000000021fb2000 000000000000ffff eff0e3efe4e58080 0000fffffcc724fe
[ 3303.303204] 3f00: 0000000000000040 1999999999999999 0101010101010101 0000000000000038
[ 3303.311019] 3f20: 0000000000000005 ffffffffffffffff 0d73757461747320 ffffffffffffffff
[ 3303.318835] 3f40: 0000000000000000 0000000000459b00 0000fffffc85e360 000000000043d788
[ 3303.326650] 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3303.334465] 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3303.342281] 3fa0: 0000000000000000 0000fffffc85e570 0000000000438804 0000fffffc85e570
[ 3303.350096] 3fc0: 0000ffff8553f618 0000000080000000 0000000000000003 0000000000000040
[ 3303.357911] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3303.365729] [<ffff000008083808>] __sys_trace_return+0x0/0x4
[ 3303.371288] Code: b94008e9 34000129 b9400ce2 110006b5 (b9000402)
[ 3303.377377] ---[ end trace fd5ab98b3325cf9a ]---

Reported-by: Jie Chen <chenjie103@huawei.com>
Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30 20:45:03 -04:00
Wei Hu(Xavier)
cb7a94c9c8 RDMA/hns: Add reset process for RoCE in hip08
This patch added reset process for RoCE in hip08.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30 20:45:02 -04:00
Jason Gunthorpe
f3ca0ab114 Merge branch 'mini_cqe' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Leon Romanovsky says:

====================
Introduce new internal to mlx5 CQE format - mini-CQE. It is a CQE in
compressed form that holds data needed to extra a single full CQE.

It is a stride index, byte count and packet checksum.
====================

* mini_cqe:
  IB/mlx5: Introduce a new mini-CQE format
  IB/mlx5: Refactor CQE compression response
  net/mlx5: Exposing a new mini-CQE format

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:23:18 -06:00
Yonatan Cohen
6f1006a438 IB/mlx5: Introduce a new mini-CQE format
The new mini-CQE format includes the stride index, byte count and
packet checksum.
Stride index is needed for striding WQ feature.
This patch exposes this capability and enables its setting
via mlx5 UHW data as part of query device and cq creation.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:18:38 -06:00
Yonatan Cohen
572f46bf94 IB/mlx5: Refactor CQE compression response
Refactor CQE compression response to be fully set only
when it`s really supported. There is no change from user
perspective because anyway resp.cqe_comp_caps.max_num was
set to zero.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>W
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:18:38 -06:00
Jason Gunthorpe
0394808d9e Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.

Based on rdma for-rc due to dependencies.

* mr_fix: (2 commits)
  IB/mlx4: Mark user MR as writable if actual virtual memory is writable
  IB/core: Make testing MR flags for writability a static inline function
2018-05-28 11:44:35 -06:00
Jack Morgenstein
d8f9cc328c IB/mlx4: Mark user MR as writable if actual virtual memory is writable
To allow rereg_user_mr to modify the MR from read-only to writable without
using get_user_pages again, we needed to define the initial MR as writable.
However, this was originally done unconditionally, without taking into
account the writability of the underlying virtual memory.

As a result, any attempt to register a read-only MR over read-only
virtual memory failed.

To fix this, do not add the writable flag bit when the user virtual memory
is not writable (e.g. const memory).

However, when the underlying memory is NOT writable (and we therefore
do not define the initial MR as writable), the IB core adds a
"force writable" flag to its user-pages request. If this succeeds,
the reg_user_mr caller gets a writable copy of the original pages.

If the user-space caller then does a rereg_user_mr operation to enable
writability, this will succeed. This should not be allowed, since
the original virtual memory was not writable.

Cc: <stable@vger.kernel.org>
Fixes: 9376932d0c ("IB/mlx4_ib: Add support for user MR re-registration")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-05-28 11:41:39 -06:00
David S. Miller
5b79c2af66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26 19:46:15 -04:00
Devesh Sharma
6e04b10356 RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
The recent changes in Broadcom's ethernet driver(L2 driver) broke
RoCE functionality in terms of MSIx vector allocation and
de-allocation.

There is a possibility that L2 driver would initiate MSIx vector
reallocation depending upon the requests coming from administrator.
In such cases L2 driver needs to free up all the MSIx vectors
allocated previously and reallocate/initialize those.

If RoCE driver is loaded and reshuffling is attempted, there will be
kernel crashes because RoCE driver would still be holding the MSIx
vectors but L2 driver would attempt to free in-use vectors. Thus
leading to a kernel crash.

Making changes in roce driver to fix crashes described above.
As part of solution L2 driver tells RoCE driver to release
the MSIx vector whenever there is a need. When RoCE driver
get message it sync up with all the running tasklets and IRQ
handlers and releases the vectors. L2 driver send one more
message to RoCE driver to resume the MSIx vectors. L2 driver
guarantees that RoCE vector do not change during reshuffling.

Fixes: ec86f14ea5 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Fixes: 08654eb213 ("bnxt_en: Change IRQ assignment for RDMA driver.")
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-25 11:03:47 -06:00
Wei Hu(Xavier)
d59fcacc4b RDMA/hns: Increase checking CMQ status timeout value
This patch increases checking CMQ status timeout value and
uses the same value with NIC driver to avoid deficiency of
time.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 15:41:52 -06:00
Wei Hu(Xavier)
5b6eb54f58 RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
This patch modified uar allocation algorithm in hns_roce_uar_alloc
function to avoid bitmap exhaust.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 15:41:52 -06:00
Linus Torvalds
34b48b8789 Merge candidates for 4.17-rc
- Remove bouncing addresses from the MAINTAINERS file
 - Kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers
 - Various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers
 - Two fixes for patches already sent during the merge window
 - A long standing bug related to not decreasing the pinned pages count in the right
   MM was found and fixed
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbByPQAAoJEDht9xV+IJsa164P/AihB/vbn9MBdK3pe1OSUGTm
 tKZJ/Y6nY/Q/XTJSeM2wNECk8fOrZbKuLBz2XlPRsB2djp4ugC5WWfK9YbwWMGXG
 I5B/lB8VTorQr8E5i9lqqMDQc8aF8VcGJtdqVE3nD4JsVTrQSGiSnw45/BARDUm3
 OycJJMDOWhDj2wnNSa+JfjPemIMDM1jse7DnsJfDsGfTMS/G+6nyzjKIlEnnFZ8/
 PBxhq0q7C5viNDwwn2GsAVUrATTlW48SY0WYhkgMdSl20d2th9wMZqNMqtniz8NP
 lg87SrhzsAPOTlbSWlYYkAnzE7nEhfJyIfYUp2piNJeYuOohYPtO6w99Tqjl/GmU
 uLIYIXtZCxAK1Zb/znc49HkRVL5YFDsQGXdtYy7tvRZPwwR32kowUtpKIWaZFz8O
 BA/x+Zgqu9AlwqSWwQwxmMbUX42RRwhNJDVyTYlXQSSzhfgFaLIZARqb4K6HxeNN
 vZN0BK+x6pX6FI7hpdsqNRtH1oo4SNUBxiuUsrZ7cy7GqYNdUJ6piygDgmERaJxU
 svIUJof/+OoU1QyErQ0JgUEK/3jOHbjxSPb/rjQeqxAnCqhaGOuNGMtdfsGqgvBU
 x/u3eDcbfi/LBErXR46gYtxnOQ8I2BB+m8erUc/GVvCzWrX+R7ELZYpBrP5Pcu/6
 mr2D7hDqgZHbeU8aB8+D
 =uFZh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is pretty much just the usual array of smallish driver bugs.

   - remove bouncing addresses from the MAINTAINERS file

   - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
     hns drivers

   - various small LOC behavioral/operational bugs in mlx5, hns, qedr
     and i40iw drivers

   - two fixes for patches already sent during the merge window

   - a long-standing bug related to not decreasing the pinned pages
     count in the right MM was found and fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
  RDMA/hns: Move the location for initializing tmp_len
  RDMA/hns: Bugfix for cq record db for kernel
  IB/uverbs: Fix uverbs_attr_get_obj
  RDMA/qedr: Fix doorbell bar mapping for dpi > 1
  IB/umem: Use the correct mm during ib_umem_release
  iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
  RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
  RDMA/i40iw: Avoid reference leaks when processing the AEQ
  RDMA/i40iw: Avoid panic when objects are being created and destroyed
  RDMA/hns: Fix the bug with NULL pointer
  RDMA/hns: Set NULL for __internal_mr
  RDMA/hns: Enable inner_pa_vld filed of mpt
  RDMA/hns: Set desc_dma_addr for zero when free cmq desc
  RDMA/hns: Fix the bug with rq sge
  RDMA/hns: Not support qp transition from reset to reset for hip06
  RDMA/hns: Add return operation when configured global param fail
  RDMA/hns: Update convert function of endian format
  RDMA/hns: Load the RoCE dirver automatically
  RDMA/hns: Bugfix for rq record db for kernel
  RDMA/hns: Add rq inline flags judgement
  ...
2018-05-24 14:12:05 -07:00
Jason Gunthorpe
c62091bcd9 mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next

mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

* tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Explicitly set source e-switch in offloaded TC rules
  net/mlx5: Add source e-switch owner
  net/mlx5e: Explicitly set destination e-switch in FDB rules
  net/mlx5: Add destination e-switch owner
  net/mlx5: Properly handle a vport destination when setting FTE
  net/mlx5: Add merged e-switch cap
  IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-24 09:40:43 -06:00
Parav Pandit
25e62655c7 IB/core: Reduce the places that use zgid
Instead of open coding memcmp() to check whether a given GID is zero or
not, use a helper function to do so, and replace instances of
memcpy(z,&zgid) with memset.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Erez Shitrit
7b74a83cf5 IB/mlx5: Fetch soft WQE's on fatal error state
On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.

For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.

This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.

Without the fix the next deadlock might appears:
        schedule_timeout+0x274/0x350
        wait_for_common+0xec/0x240
        mcast_remove_one+0xd0/0x120 [ib_core]
        ib_unregister_device+0x12c/0x230 [ib_core]
        mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
        mlx5_detach_device+0x184/0x1a0 [mlx5_core]
        mlx5_unload_one+0x308/0x340 [mlx5_core]
        mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Leon Romanovsky
8f06228733 RDMA/mlx5: Remove debug prints of VMA pointers
Remove various prints of VMA pointers.

Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
oulijun
cc3391cb53 RDMA/hns: Rename the idx field of db
The lower 15 bit of paramter of db structure means different
meanings when db type is sq, rq and srq.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
0252f73334 IB/qib: Fix DMA api warning with debug kernel
The following error occurs in a debug build when running MPI PSM:

[  307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158
check_unmap+0x4ee/0xa20
[  307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map
error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page]
[  307.517494] Modules linked in:
[  307.531584]  ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma
sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib
scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm
irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg
ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt
iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler
ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod
crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common
drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core
[  307.846113]  pps_core dm_mirror dm_region_hash dm_log dm_mod
[  307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not
tainted 3.10.0-862.el7.x86_64.debug #1
[  307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
[  307.944206] Call Trace:
[  307.956973]  [<ffffffffbd9e915b>] dump_stack+0x19/0x1b
[  307.982201]  [<ffffffffbd2a2f58>] __warn+0xd8/0x100
[  308.005999]  [<ffffffffbd2a2fdf>] warn_slowpath_fmt+0x5f/0x80
[  308.034260]  [<ffffffffbd5f667e>] check_unmap+0x4ee/0xa20
[  308.060801]  [<ffffffffbd41acaa>] ? page_add_file_rmap+0x2a/0x1d0
[  308.090689]  [<ffffffffbd5f6c4d>] debug_dma_unmap_page+0x9d/0xb0
[  308.120155]  [<ffffffffbd4082e0>] ? might_fault+0xa0/0xb0
[  308.146656]  [<ffffffffc07761a5>] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib]
[  308.180739]  [<ffffffffc0776bf4>] qib_write+0x894/0x1280 [ib_qib]
[  308.210733]  [<ffffffffbd540b00>] ? __inode_security_revalidate+0x70/0x80
[  308.244837]  [<ffffffffbd53c2b7>] ? security_file_permission+0x27/0xb0
[  308.266025] qib_ib0.8006: multicast join failed for
ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22
[  308.323421]  [<ffffffffbd46f5d3>] vfs_write+0xc3/0x1f0
[  308.347077]  [<ffffffffbd492a5c>] ? fget_light+0xfc/0x510
[  308.372533]  [<ffffffffbd47045a>] SyS_write+0x8a/0x100
[  308.396456]  [<ffffffffbd9ff355>] system_call_fastpath+0x1c/0x21

The code calls a qib_map_page() which has never correctly tested for a
mapping error.

Fix by testing for pci_dma_mapping_error() in all cases and properly
handling the failure in the caller.

Additionally, streamline qib_map_page() arguments to satisfy just
the single caller.

Cc: <stable@vger.kernel.org>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Tested-by: Don Dutile <ddutile@redhat.com>
Reviewed-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
3ce459cd68 IB/{rdmavt,hfi1}: Change hrtimer add to use pinned version
Given we are dealing with nano-second level timers, when the timer
pops, ensure it happens on the CPU which caused the timer to be set
in the first place.  This avoids excessive jitter from the desired
expiration time by avoiding the cost of switching our context to
another CPU that is cache cold for this given timer.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Michael J. Ruhl
5938d94cf0 IB/hfi1: Set port number for errorinfo MAD response
For errorinfo MAD requests, the response has a 0 port number left over
from a memset. Instead we should always set the port number in the
response.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Mike Marciniszyn
c8314811f9 IB/hfi1: Cleanup of exp_rcv
The knowledge of the internal workings of the expect receive
is too distributed.

Fix by:
- right size several rcd fields associated with
  expect receive
- making an init entrance to init all the lists
- consolidate all the allocations into an array anchored
  in the rcd

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
43a68c35c7 IB/hfi1: Add 16B Management Packet trace support
Add trace support for 16B Management Packets.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
81cd3891f0 IB/hfi1: Add support for 16B Management Packets
16B Management Packets (L4=0x08) replace the BTH and DETH
of normal MAD packet packets with a header containing the
the source and destination queue pair numbers; fields that
were originally retrieved from the BTH/DETH are now populated
from this header as well as from the 16B LRH (e.g. pkey).

16B Management Packets are used as an optimized management
format on 16B fabrics.

These management packets have an opcode of IB_OPCODE_UD_SEND_ONLY,
a fixed 3Byte pad, and a header length of 24Bytes.

The decision as to when we send a management packet is based
upon either the source or destination queue pair number being
0 or 1.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Don Hiatt
4171a693a5 IB/hfi1: Define 16B Management Packets
Add 16B Management Packet definition. This optimized packet
format replaces the ib_other_headers and BTH with a source
and destination QP number.

To support these packets we introduce struct opa_16b_mgmt
into the struct hfi1_16b_header.

This packet format is only used for MAD packets using the
IB_OPCODE_UD_SEND_ONLY opcode on QP0/1.

The original 16B implementation failed to use 16B management
packets so now we add their definition.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
013f64a880 iw_cxgb4: provide detailed driver-specific MR information
Add a table of important fields from the fw_ri_tpte structure to the mr
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
54e7688e54 iw_cxgb4: provide detailed driver-specific CQ information
Add a table of important fields from the c4iw_cq* structures to the cq
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
Steve Wise
116aeb8873 iw_cxgb4: provide detailed provider-specific CM_ID information
Add a table of important fields from the c4iw_ep* structures to the cm_id
resource tracking table.  This is helpful in debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-24 09:39:25 -06:00
oulijun
55ba49cbce RDMA/hns: Move the location for initializing tmp_len
When posted work request, it need to compute the length of
all sges of every wr and fill it into the msg_len field of
send wqe. Thus, While posting multiple wr,
tmp_len should be reinitialized to zero.

Fixes: 8b9b8d143b ("RDMA/hns: Fix the endian problem for hns")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23 15:45:44 -06:00
oulijun
05d6a4ddb6 RDMA/hns: Bugfix for cq record db for kernel
When use cq record db for kernel, it needs to set the hr_cq->db_en
to 1 and configure the dma address of record cq db of qp context.

Fixes: 86188a8810 ("RDMA/hns: Support cq record doorbell for kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23 15:45:44 -06:00
Kalderon, Michal
30bf066cd9 RDMA/qedr: Fix doorbell bar mapping for dpi > 1
Each user_context receives a separate dpi value and thus a different
address on the doorbell bar. The qedr_mmap function needs to validate
the address and map the doorbell bar accordingly.
The current implementation always checked against dpi=0 doorbell range
leading to a wrong mapping for doorbell bar. (It entered an else case
that mapped the address differently). qedr_mmap should only be used
for doorbells, so the else was actually wrong in the first place.
This only has an affect on arm architecture and not an issue on a
x86 based architecture.
This lead to doorbells not occurring on arm based systems and left
applications that use more than one dpi (or several applications
run simultaneously ) to hang.

Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-23 15:12:14 -06:00
Steve Wise
b06f2efd3b iw_cxgb4: always set iw_cm_id.provider_data
In active side connections, the provider_data field is not
getting set.  This will be used in a subsequent patch to dump
state, so always set it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-22 14:32:30 -04:00
David S. Miller
3888ea4e2f mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 13:00:08 -04:00
Yixian Liu
5e6e78dbd3 RDMA/hns: Add 64KB page size support for hip08
This patch adds the support of 64KB page size for hip08
in kernel.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:44:05 -06:00
Ariel Levkovich
e818e255a5 IB/mlx5: Expose MPLS related tunneling offloads
This patch reports the device's capbilities to offload
encapsulated MPLS tunnel protocols to user-space:
- Capability to offload MPLS over GRE.
- Capability to offload MPLS over UDP.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
71c6e8638c IB/mlx5: Add support for MPLS flow specification
This patch introduces support for the MPLS flow spec and
allows the creation of rules that are matching on the
MPLS label.

Applying the rule matching depends on the flow specs order and
the location of the MPLS in the spec list as there are different
configurations to be made in the device in the cases of MPLSoGRE
and MPLSoUDP vs. non-encapsulated MPLS.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
da2f22ae77 IB/mlx5: Add support for GRE flow specification
This patch introduces support for the GRE flow spec and
allowing the creation of rules based on the protocol and
key fields that are part of GRE protocol header.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Christophe JAILLET
909d434406 IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:50:19 -07:00
Shiraz Saleem
f43c00c04b i40iw: Extend port reuse support for listeners
If two listeners are created with different IP's but
same port, the second rdma_listen fails due to a
duplicate port entry being added from the CQP add
APBVT OP. commit f16dc0aa5e ("i40iw: Add support
for port reuse on active side connections") does not
account for listener side port reuse.

Check for duplicate port before invoking the CQP command
to add APBVT entry and delete the entry only if the port
is not in use. Additionally, consolidate all port-reuse
logic into i40iw_manage_apbvt.

Fixes: f16dc0aa5e ("i40iw: Add support for port reuse on active side connections")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 13:13:20 -06:00
Steve Wise
2d478b2859 iw_cxgb4: remove wr_id attributes
Remove sq/rq wr_id attributes because typically they are pointers and
we don't want to pass up kernel pointers.

Fixes: 056f9c7f39 ("iw_cxgb4: dump detailed driver-specific QP information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:18:03 -06:00
Steve Wise
1ea62e8164 iw_cxgb4: fix uninitialized variable warnings
Fixes: 056f9c7f39 ("iw_cxgb4: dump detailed driver-specific QP information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-15 16:14:25 -06:00
Doug Ledford
009669cfad RDMA/hfi1: Fix build error with debugfs disabled
A recent patch set to rework the usage of debugfs and to add fault
injection capabilities via debugfs files to the hfi1 driver introduced a
build error that only shows up when debugfs is fully disabled.  The
patchset mistakenly defines some empty stub functions in two different
headers when debugfs is disabled.  Remove the set that shouldn't have
been there to resolve the issue.

Fixes: a74d5307ca ("IB/hfi1: Rework fault injection machinery")
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-15 14:24:18 -04:00
Brian Welty
832369fa64 IB/{hfi1, qib, rdmavt}: Move logic to allocate receive WQE into rdmavt
Moving receive-side WQE allocation logic into rdmavt will allow
further code reuse between qib and hfi1 drivers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Sebastian Sanchez
5d18ee67d4 IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support
Currently the driver doesn't support completion vectors. These
are used to indicate which sets of CQs should be grouped together
into the same vector. A vector is a CQ processing thread that
runs on a specific CPU.

If an application has several CQs bound to different completion
vectors, and each completion vector runs on different CPUs, then
the completion queue workload is balanced. This helps scale as more
nodes are used.

Implement CQ completion vector support using a global workqueue
where a CQ entry is queued to the CPU corresponding to the CQ's
completion vector. Since the workqueue is global, it's guaranteed
to always be there when queueing CQ entries; Therefore, the RCU
locking for cq->rdi->worker in the hot path is superfluous.

Each completion vector is assigned to a different CPU. The number of
completion vectors available is computed by taking the number of
online, physical CPUs from the local NUMA node and subtracting the
CPUs used for kernel receive queues and the general interrupt.
Special use cases:

  * If there are no CPUs left for completion vectors, the same CPU
    for the general interrupt is used; Therefore, there would only
    be one completion vector available.

  * For multi-HFI systems, the number of completion vectors available
    for each device is the total number of completion vectors in
    the local NUMA node divided by the number of devices in the same
    NUMA node. If there's a division remainder, the first device to
    get initialized gets an extra completion vector.

Upon a CQ creation, an invalid completion vector could be specified.
Handle it as follows:

  * If the completion vector is less than 0, set it to 0.

  * Set the completion vector to the result of the passed completion
    vector moded with the number of device completion vectors
    available.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Sebastian Sanchez
cf38ea100e IB/hfi1: Create common functions for affinity CPU mask operations
CPU masks are used to keep track of affinity assignments for IRQs
and processes. Operations performed on these affinity CPU masks are
duplicated throughout the code.

Create common functions for affinity CPU mask operations to remove
duplicate code.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Kamenee Arumugam
c872a1f9e3 IB/Hfi1: Read CCE Revision register to verify the device is responsive
When Hfi1 device is unresponsive, reading the RcvArrayCnt register
will return all 1's. This value is then used to remap chip's RcvArray.
The incorrect all ones value used in remapping RcvArray
will cause warn on as shown by trace below:

[<ffffffff81685eac>] dump_stack+0x19/0x1b
[<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
[<ffffffff810858bc>] warn_slowpath_fmt+0x5c/0x80
[<ffffffff81065c29>] __ioremap_caller+0x279/0x320
[<ffffffff8142873c>] ? _dev_info+0x6c/0x90
[<ffffffffa021d155>] ? hfi1_pcie_ddinit+0x1d5/0x330 [hfi1]
[<ffffffff81065d62>] ioremap_wc+0x32/0x40
[<ffffffffa021d155>] hfi1_pcie_ddinit+0x1d5/0x330 [hfi1]
[<ffffffffa0204851>] hfi1_init_dd+0x1d1/0x2440 [hfi1]
[<ffffffff813503dc>] ? pci_write_config_word+0x1c/0x20

Read CCE revision register first to verify that WFR device is
responsive. If the read return "all ones", bail out from init
and fail the driver load.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Mitko Haralanov
a74d5307ca IB/hfi1: Rework fault injection machinery
The packet fault injection code present in the HFI1 driver had some
issues which not only fragment the code but also created user
confusion. Furthermore, it suffered from the following issues:

  1. The fault_packet method only worked for received packets. This
     meant that the only fault injection mode available for sent
     packets is fault_opcode, which did not allow for random packet
     drops on all egressing packets.
  2. The mask available for the fault_opcode mode did not really work
     due to the fact that the opcode values are not bits in a bitmask but
     rather sequential integer values. Creating a opcode/mask pair that
     would successfully capture a set of packets was nearly impossible.
  3. The code was fragmented and used too many debugfs entries to
     operate and control. This was confusing to users.
  4. It did not allow filtering fault injection on a per direction basis -
     egress vs. ingress.

In order to improve or fix the above issues, the following changes have
been made:

   1. The fault injection methods have been combined into a single fault
      injection facility. As such, the fault injection has been plugged
      into both the send and receive code paths. Regardless of method used
      the fault injection will operate on both egress and ingress packets.
   2. The type of fault injection - by packet or by opcode - is now controlled
      by changing the boolean value of the file "opcode_mode". When the value
      is set to True, fault injection is done by opcode. Otherwise, by
      packet.
   2. The masking ability has been removed in favor of a bitmap that holds
      opcodes of interest (one bit per opcode, a total of 256 bits). This
      works in tandem with the "opcode_mode" value. When the value of
      "opcode_mode" is False, this bitmap is ignored. When the value is
      True, the bitmap lists all opcodes to be considered for fault injection.
      By default, the bitmap is empty. When the user wants to filter by opcode,
      the user sets the corresponding bit in the bitmap by echo'ing the bit
      position into the 'opcodes' file. This gets around the issue that the set
      of opcodes does not lend itself to effective masks and allow for extremely
      fine-grained filtering by opcode.
   4. fault_packet and fault_opcode methods have been combined. Hence, there
      is only one debugfs directory controlling the entire operation of the
      fault injection machinery. This reduces the number of debugfs entries
      and provides a more unified user experience.
   5. A new control files - "direction" - is provided to allow the user to
      control the direction of packets, which are subject to fault injection.
   6. A new control file - "skip_usec" - is added that would allow the user
      to specify a "timeout" during which no fault injection will occur.

In addition, the following bug fixes have been applied:

   1. The fault injection code has been split into its own header and source
      files. This was done to better organize the code and support conditional
      compilation without littering the code with #ifdef's.
   2. The method by which the TX PIO packets were being marked for drop
      conflicted with the way send contexts were being setup. As a result,
      the send context was repeatedly being reset.
   3. The fault injection only makes sense when the user can control it
      through the debugfs entries. However, a kernel configuration can
      enable fault injection but keep fault injection debugfs entries
      disabled. Therefore, it makes sense that the HFI fault injection
      code depends on both.
   4. Error suppression did not take into account the method by which PIO
      packets were being dropped. Therefore, even with error suppression
      turned on, errors would still be displayed to the screen. A larger
      enough packet drop percentage would case the kernel to crash because
      the driver would be stuck printing errors.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Alex Estrin
8d3e71136a IB/{hfi1, qib}: Add handling of kernel restart
A warm restart will fail to unload the driver, leaving link state
potentially flapping up to the point the BIOS resets the adapter.
Correct the issue by hooking the shutdown pci method,
which will bring port down.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Michael J. Ruhl
a93a0a3111 IB/hfi1: Reorder incorrect send context disable
User send context integrity bits are cleared before the context is
disabled.  If the send context is still processing data, any packets
that need those integrity bits will cause an error and halt the send
context.

During the disable handling, the driver waits for the context to drain.
If the context is halted, the driver will eventually timeout because
the context won't drain and then incorrectly bounce the link.

Reorder the bit clearing and the context disable.

Examine the software state and send context status as well as the
egress status to determine if a send context is in the halted state.

Promote the check macros to static functions for consistency with the
new check and to follow kernel style.

Remove an unused define that refers to the egress timeout.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Michael J. Ruhl
e4607073ff IB/hfi1: Return correct value for device state
The driver_pstate() function is used to map internal driver state
information to externally defined states.

The VERIFY_CAP and GOING_UP states are config/training states, but
the mapping routing returns the POLLING value.

Update the return values for VERIFY_CAP and GOING_UP to return the
correct value: TRAINING.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Mike Marciniszyn
8c79d8223b IB/hfi1: Fix fault injection init/exit issues
There are config dependent code paths that expose panics in unload
paths both in this file and in debugfs_remove_recursive() because
CONFIG_FAULT_INJECTION and CONFIG_FAULT_INJECTION_DEBUG_FS can be
set independently.

Having CONFIG_FAULT_INJECTION set and CONFIG_FAULT_INJECTION_DEBUG_FS
reset causes fault_create_debugfs_attr() to return an error.

The debugfs.c routines tolerate failures, but the module unload panics
dereferencing a NULL in the two exit routines.  If that is fixed, the
dir passed to debugfs_remove_recursive comes from a memory location
that was freed and potentially reused causing a segfault or corrupting
memory.

Here is an example of the NULL deref panic:

[66866.286829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[66866.295602] IP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
[66866.301138] PGD 858496067 P4D 858496067 PUD 8433a7067 PMD 0
[66866.307452] Oops: 0000 [#1] SMP
[66866.310953] Modules linked in: hfi1(-) rdmavt rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_cm ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache sb_edac x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt iTCO_vendor_support crypto_simd mei_me glue_helper cryptd mxm_wmi ipmi_si pcspkr lpc_ich sg mei ioatdma ipmi_devintf i2c_i801 mfd_core shpchp ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops ttm ahci ptp crc32c_intel libahci pps_core drm dca libata i2c_algo_bit i2c_core [last unloaded: opa_vnic]
[66866.385551] CPU: 8 PID: 7470 Comm: rmmod Not tainted 4.14.0-mam-tid-rdma #2
[66866.393317] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[66866.405252] task: ffff88084f28c380 task.stack: ffffc90008454000
[66866.411866] RIP: 0010:hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
[66866.417984] RSP: 0018:ffffc90008457da0 EFLAGS: 00010202
[66866.423812] RAX: 0000000000000000 RBX: ffff880857de0000 RCX: 0000000180040001
[66866.431773] RDX: 0000000180040002 RSI: ffffea0021088200 RDI: 0000000040000000
[66866.439734] RBP: ffffc90008457da8 R08: ffff88084220e000 R09: 0000000180040001
[66866.447696] R10: 000000004220e001 R11: ffff88084220e000 R12: ffff88085a31c000
[66866.455657] R13: ffffffffa07c9820 R14: ffffffffa07c9890 R15: ffff881059d78100
[66866.463618] FS:  00007f6876047740(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000
[66866.472644] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66866.479053] CR2: 0000000000000088 CR3: 0000000856357006 CR4: 00000000001606e0
[66866.487013] Call Trace:
[66866.489747]  remove_one+0x1f/0x220 [hfi1]
[66866.494221]  pci_device_remove+0x39/0xc0
[66866.498596]  device_release_driver_internal+0x141/0x210
[66866.504424]  driver_detach+0x3f/0x80
[66866.508409]  bus_remove_driver+0x55/0xd0
[66866.512784]  driver_unregister+0x2c/0x50
[66866.517164]  pci_unregister_driver+0x2a/0xa0
[66866.521934]  hfi1_mod_cleanup+0x10/0xaa2 [hfi1]
[66866.526988]  SyS_delete_module+0x171/0x250
[66866.531558]  do_syscall_64+0x67/0x1b0
[66866.535644]  entry_SYSCALL64_slow_path+0x25/0x25
[66866.540792] RIP: 0033:0x7f6875525c27
[66866.544777] RSP: 002b:00007ffd48528e78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[66866.553224] RAX: ffffffffffffffda RBX: 0000000001cc01d0 RCX: 00007f6875525c27
[66866.561185] RDX: 00007f6875596000 RSI: 0000000000000800 RDI: 0000000001cc0238
[66866.569146] RBP: 0000000000000000 R08: 00007f68757e9060 R09: 00007f6875596000
[66866.577120] R10: 00007ffd48528c00 R11: 0000000000000206 R12: 00007ffd48529db4
[66866.585080] R13: 0000000000000000 R14: 0000000001cc01d0 R15: 0000000001cc0010
[66866.593040] Code: 90 0f 1f 44 00 00 48 83 3d a3 8b 03 00 00 55 48 89 e5 53 48 89 fb 74 4e 48 8d bf 18 0c 00 00 e8 9d f2 ff ff 48 8b 83 20 0c 00 00 <48> 8b b8 88 00 00 00 e8 2a 21 b3 e0 48 8b bb 20 0c 00 00 e8 0e
[66866.614127] RIP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] RSP: ffffc90008457da0
[66866.621885] CR2: 0000000000000088
[66866.625618] ---[ end trace c4817425783fb092 ]---

Fix by insuring that upon failure from fault_create_debugfs_attr() the
parent pointer for the routines is always set to NULL and guards added
in the exit routines to insure that debugfs_remove_recursive() is not
called when when the parent pointer is NULL.

Fixes: 0181ce31b2 ("IB/hfi1: Add receive fault injection feature")
Cc: <stable@vger.kernel.org> # 4.14.x
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Alex Estrin
959f2d172d IB/hfi1: Complete check for locally terminated smp
For lid routed packets 'hop_cnt' is zero, therefore current
test is incomplete. Fix it by using local mad check for
both lid routed and direct routed MADs.

Reviewed-by: Mike Mariciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Michael J. Ruhl
48e0a6559d IB/hfi1: Return actual error value from program_rcvarray()
A failure of program_rcvarray() is treated inconsistently by the
calling function.  In one case the error is returned, in a second
case, the error is overwritten with EFAULT.  In both cases the
code path is doing the same thing, allocating memory for groups,
so it should be consistent.

Make the error path consistent and return the error generated by
program_rcvarray().

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Fixes: 7e7a436ecb ("staging/hfi1: Add TID entry program function body")
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Sebastian Sanchez
254361c189 IB/hfi1: Prevent LNI hang when LCB can't obtain lanes
When the LCB isn't able to get any lanes operational on the
first transition into mission mode, the link transfer active
never happens and the LNI stays in the polling state indefinitely.

Reset LCB upon receiving an 8051 interrupt for LCB to try to obtain
lanes with firmware version 1.25.0 or later. Also, update the LCB
reset value in other parts of the code with a macro defined to make
the code more maintainable and rename functions with the link_width
label to link_mode to reflect the fact that those functions set and
read link related data not just the link width.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Doug Ledford
f5e27a203f Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
Several items of conflict have arisen between the RDMA stack's for-rc
branch and upcoming for-next work:

9fd4350ba8 ("IB/rxe: avoid double kfree_skb") directly conflicts with
2e47350789 ("IB/rxe: optimize the function duplicate_request")

Patches already submitted by Intel for the hfi1 driver will fail to
apply cleanly without this merge

Other people on the mailing list have notified that their upcoming
patches also fail to apply cleanly without this merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:48:48 -04:00
Idan Burstein
064e526247 IB/mlx5: posting klm/mtt list inline in the send queue for reg_wr
As most kernel RDMA ULPs, (e.g. NVMe over Fabrics in its default
"register_always=Y" mode) registers and invalidates user buffer
upon each IO.

Today the mlx5 driver is posting the registration work
request using scatter/gather entry for the MTT/KLM list.
The fetch of the MTT/KLM list becomes the bottleneck in
number of IO operation could be done by NVMe over Fabrics
host driver on a single adapter as shown below.

This patch is adding the support for inline registration
work request upon MTT/KLM list of size <=64B.

The result for NVMe over Fabrics is increase of > x3.5 for small
IOs as shown below, I expect other ULPs (e.g iSER, SRP, NFS over RDMA)
performance to be enhanced as well.

The following results were taken against a single NVMe-oF (RoCE link layer)
subsystem with a single namespace backed by null_blk using fio benchmark
(with rw=randread, numjobs=48, iodepth={16,64}, ioengine=libaio direct=1):

ConnectX-5 (pci Width x16)
---------------------------

Block Size       s/g reg_wr            inline reg_wr
++++++++++     +++++++++++++++        ++++++++++++++++
512B            1302.8K/34.82%         4951.9K/99.02%
1KB             1284.3K/33.86%         4232.7K/98.09%
2KB             1238.6K/34.1%          2797.5K/80.04%
4KB             1169.3K/32.46%         1941.3K/61.35%
8KB             1013.4K/30.08%         1236.6K/39.47%
16KB            695.7K/20.19%          696.9K/20.59%
32KB            350.3K/9.64%           350.6K/10.3%
64KB            175.86K/5.27%          175.9K/5.28%

ConnectX-4 (pci Width x8)
---------------------------

Block Size       s/g reg_wr            inline reg_wr
++++++++++     +++++++++++++++        ++++++++++++++++
512B            1285.8K/42.66%          4242.7K/98.18%
1KB             1254.1K/41.74%          3569.2K/96.00%
2KB             1185.9K/39.83%          2173.9K/75.58%
4KB             1069.4K/36.46%          1343.3K/47.47%
8KB             755.1K/27.77%           748.7K/29.14%

Tested-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Idan Burstein <idanb@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 12:08:21 -04:00
Leon Romanovsky
ed3dd9b017 RDMA/hns: Drop local zgid in favor of core defined variable
The zgid is already provided by IB/core, so there is no need in locally
defined variable, let's drop it and reuse common one.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 12:08:21 -04:00
Christophe Jaillet
3d69191086 iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
The error handling path of 'c4iw_get_dma_mr()' does not free resources
in the correct order.
If an error occures, it can leak 'mhp->wr_waitp'.

Fixes: a3f12da0e9 ("iw_cxgb4: allocate wait object for each memory object")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:19 -04:00
Andrew Boyer
43731753c4 RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
The current code sets an affinity hint with a cpumask_t stored on the
stack. This value can then be accessed through /proc/irq/*/affinity_hint/,
causing a segfault or returning corrupt data.

Move the cpumask_t into struct i40iw_msix_vector so it is available later.

Backtrace:
BUG: unable to handle kernel paging request at ffffb16e600e7c90
IP: irq_affinity_hint_proc_show+0x60/0xf0
PGD 17c0c6d067
PUD 17c0c6e067
PMD 15d4a0e067
PTE 0

Oops: 0000 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 172543 Comm: grep Tainted: G           OE   ... #1
Hardware name: ...
task: ffff9a5caee08000 task.stack: ffffb16e659d8000
RIP: 0010:irq_affinity_hint_proc_show+0x60/0xf0
RSP: 0018:ffffb16e659dbd20 EFLAGS: 00010086
RAX: 0000000000000246 RBX: ffffb16e659dbd20 RCX: 0000000000000000
RDX: ffffb16e600e7c90 RSI: 0000000000000003 RDI: 0000000000000046
RBP: ffffb16e659dbd88 R08: 0000000000000038 R09: 0000000000000001
R10: 0000000070803079 R11: 0000000000000000 R12: ffff9a59d1d97a00
R13: ffff9a5da47a6cd8 R14: ffff9a5da47a6c00 R15: ffff9a59d1d97a00
FS:  00007f946c31d740(0000) GS:ffff9a5dc1800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffb16e600e7c90 CR3: 00000016a4339000 CR4: 00000000007406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 seq_read+0x12d/0x430
 ? sched_clock_cpu+0x11/0xb0
 proc_reg_read+0x48/0x70
 __vfs_read+0x37/0x140
 ? security_file_permission+0xa0/0xc0
 vfs_read+0x96/0x140
 SyS_read+0x58/0xc0
 do_syscall_64+0x5a/0x190
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f946bbc97e0
RSP: 002b:00007ffdd0c4ae08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000000096b000 RCX: 00007f946bbc97e0
RDX: 000000000096b000 RSI: 00007f946a2f0000 RDI: 0000000000000004
RBP: 0000000000001000 R08: 00007f946a2ef011 R09: 000000000000000a
R10: 0000000000001000 R11: 0000000000000246 R12: 00007f946a2f0000
R13: 0000000000000004 R14: 0000000000000000 R15: 00007f946a2f0000
Code: b9 08 00 00 00 49 89 c6 48 89 df 31 c0 4d 8d ae d8 00 00 00 f3 48 ab 4c 89 ef e8 6c 9a 56 00 49 8b 96 30 01 00 00 48 85 d2 74 3f <48> 8b 0a 48 89 4d 98 48 8b 4a 08 48 89 4d a0 48 8b 4a 10 48 89
RIP: irq_affinity_hint_proc_show+0x60/0xf0 RSP: ffffb16e659dbd20
CR2: ffffb16e600e7c90

Fixes: 8e06af711b ("i40iw: add main, hdr, status")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:19 -04:00
Andrew Boyer
9f7b16afab RDMA/i40iw: Avoid reference leaks when processing the AEQ
In this switch there is a reference held on the QP. 'continue' will grab
the next event without releasing the reference, causing a leak.

Change it to 'break' to drop the reference before grabbing the next event.

Fixes: 4e9042e647 ("i40iw: add hw and utils files")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
Andrew Boyer
a75895b1eb RDMA/i40iw: Avoid panic when objects are being created and destroyed
A panic occurs when there is a newly-registered element on the QP/CQ MR
list waiting to be attached, but a different MR is deregistered. The
current code only checks for whether the list is empty, not whether the
element being deregistered is actually on the list.

Fix the panic by adding a boolean to track if the object is on the list.

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
a0403be8af RDMA/hns: Fix the bug with NULL pointer
When the last QP of eight QPs is not exist in
hns_roce_v1_mr_free_work_fn function, the
print for qpn of hr_qp may introduce a
calltrace for NULL pointer.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
79d442071a RDMA/hns: Set NULL for __internal_mr
This patch mainly configure value for __internal_mr of mr_free_pd.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
85e0274dc6 RDMA/hns: Enable inner_pa_vld filed of mpt
When enabled inner_pa_vld field of mpt, The pa0 and
pa1 will be valid and the hardware will use it
directly and not use base address of pbl. As a
result, it can reduce the delay.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
90e7a4d506 RDMA/hns: Set desc_dma_addr for zero when free cmq desc
In order to avoid illegal use for desc_dma_addr of ring,
it needs to set it zero when free cmq desc.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
778cc5a8b7 RDMA/hns: Fix the bug with rq sge
When received multiply rq sge, it should tag the
invalid lkey for the last non-zero length sge
when have some sges' length are zero. This patch
fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
391bd5fc7d RDMA/hns: Not support qp transition from reset to reset for hip06
Because hip06 hardware is not support for qp transition from
reset to reset state, it need to return errno when qp
transited from reset to reset. This patch fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
2349fdd483 RDMA/hns: Add return operation when configured global param fail
When configure global param function run fail, it should directly return
and the initial flow will stop.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
ad18e20ba2 RDMA/hns: Update convert function of endian format
Because the sys_image_guid of ib_device_attr structure is __be64, it
need to use cpu_to_be64 for converting.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
f97a62c394 RDMA/hns: Load the RoCE dirver automatically
To enable the linux-kernel system to load the hns-roce-hw-v2 driver
automatically when hns-roce-hw-v2 is plugged in pci bus, it need to
create a MODULE_DEVICE_TABLE for expose the pci_table of
hns-roce-hw-v2 to user.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reported-by: Zhou Wang <wangzhou1@hisilicon.com>
Tested-by: Xiaojun Tan <tanxiaojun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
3a39bbecc8 RDMA/hns: Bugfix for rq record db for kernel
When used rq record db for kernel, it needs to set the rdb_en of
hr_qp to 1 and configures the dma address of record rq db of qp
context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
oulijun
ecaaf1e26a RDMA/hns: Add rq inline flags judgement
It needs to set the rqie field of qp context by configured rq inline
flags. Besides, it need to decide whether posting inline rqwqe by
judged rq inline flags.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:45:18 -04:00
Mustafa Ismail
eeb1af4f53 i40iw: Use correct address in dst_neigh_lookup for IPv6
Use of incorrect structure address for IPv6 neighbor lookup
causes connections to IPv6 addresses to fail. Fix this by
using correct address in call to dst_neigh_lookup.

Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:51 -04:00
Mustafa Ismail
5a7189d529 i40iw: Fix memory leak in error path of create QP
If i40iw_allocate_dma_mem fails when creating a QP, the
memory allocated for the QP structure using kzalloc is not
freed because iwqp->allocated_buffer is used to free the
memory and it is not setup until later. Fix this by setting
iwqp->allocated_buffer before allocating the dma memory.

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Daria Velikovsky
37da2a03c0 RDMA/mlx5: Use proper spec flow label type
Flow label is defined as u32 in the in ipv6 flow spec, but
used internally in the flow specs parsing as u8. That was
causing loss of part of flow_label value.

Fixes: 2d1e697e9b ('IB/mlx5: Add support to match inner packet fields')
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Yishai Hadas
18b0362e87 RDMA/mlx5: Don't assume that medium blueFlame register exists
User can leave system without medium BlueFlames registers,
however the code assumed that at least one such register exists.

This patch fixes that assumption.

Fixes: c1be5232d2 ("IB/mlx5: Fix micro UAR allocator")
Reported-by: Rohit Zambre <rzambre@uci.edu>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Michael J. Ruhl
f9e76ca377 IB/hfi1: Use after free race condition in send context error path
A pio send egress error can occur when the PSM library attempts to
to send a bad packet.  That issue is still being investigated.

The pio error interrupt handler then attempts to progress the recovery
of the errored pio send context.

Code inspection reveals that the handling lacks the necessary locking
if that recovery interleaves with a PSM close of the "context" object
contains the pio send context.

The lack of the locking can cause the recovery to access the already
freed pio send context object and incorrectly deduce that the pio
send context is actually a kernel pio send context as shown by the
NULL deref stack below:

[<ffffffff8143d78c>] _dev_info+0x6c/0x90
[<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
[<ffffffff816ab124>] ? __schedule+0x424/0x9b0
[<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
[<ffffffff810aa3ba>] process_one_work+0x17a/0x440
[<ffffffff810ab086>] worker_thread+0x126/0x3c0
[<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffff810b252f>] kthread+0xcf/0xe0
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[<ffffffff816b8798>] ret_from_fork+0x58/0x90
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40

This is the best case scenario and other scenarios can corrupt the
already freed memory.

Fix by adding the necessary locking in the pio send context error
handler.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Linus Torvalds
eb4f959b26 First pull request for 4.17-rc
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
 - SPDX license tag cleanups (new tag Linux-OpenIB)
 - RoCE GID fixes related to default GIDs
 - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
   mlx4, mlx5, and hfi1 (medium batch)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa
 DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL
 oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+
 YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK
 UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9
 5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF
 nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig
 fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7
 CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH
 UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B
 jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q
 xhjIS6/EZjcTAh5e4LzR
 =PI8v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "This is our first pull request of the rc cycle. It's not that it's
  been overly quiet, we were just waiting on a few things before sending
  this off.

  For instance, the 6 patch series from Intel for the hfi1 driver had
  actually been pulled in on Tuesday for a Wednesday pull request, only
  to have Jason notice something I missed, so we held off for some
  testing, and then on Thursday had to respin the series because the
  very first patch needed a minor fix (unnecessary cast is all).

  There is a sizable hns patch series in here, as well as a reasonably
  largish hfi1 patch series, then all of the lines of uapi updates are
  just the change to the new official Linux-OpenIB SPDX tag (a bunch of
  our files had what amounts to a BSD-2-Clause + MIT Warranty statement
  as their license as a result of the initial code submission years ago,
  and the SPDX folks decided it was unique enough to warrant a unique
  tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
  and core/cache/cma updates to round out the bunch.

  None of it was overly large by itself, but in the 2 1/2 weeks we've
  been collecting patches, it has added up :-/.

  As best I can tell, it's been through 0day (I got a notice about my
  last for-next push, but not for my for-rc push, but Jason seems to
  think that failure messages are prioritized and success messages not
  so much). It's also been through linux-next. And yes, we did notice in
  the context portion of the CMA query gid fix patch that there is a
  dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
  and remove it anywhere we can.

  Summary:

   - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)

   - SPDX license tag cleanups (new tag Linux-OpenIB)

   - RoCE GID fixes related to default GIDs

   - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
     mlx4, mlx5, and hfi1 (medium batch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/mlx4: Fix integer overflow when calculating optimal MTT size
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
  IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
  IB/hfi1: Fix loss of BECN with AHG
  IB/hfi1 Use correct type for num_user_context
  IB/hfi1: Fix handling of FECN marked multicast packet
  IB/core: Make ib_mad_client_id atomic
  iw_cxgb4: Atomically flush per QP HW CQEs
  IB/uverbs: Fix kernel crash during MR deregistration flow
  IB/uverbs: Prevent reregistration of DM_MR to regular MR
  RDMA/mlx4: Add missed RSS hash inner header flag
  RDMA/hns: Fix a couple misspellings
  RDMA/hns: Submit bad wr
  RDMA/hns: Update assignment method for owner field of send wqe
  RDMA/hns: Adjust the order of cleanup hem table
  RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
  RDMA/hns: Remove some unnecessary attr_mask judgement
  RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
  ...
2018-05-04 20:51:10 -10:00
Steve Wise
056f9c7f39 iw_cxgb4: dump detailed driver-specific QP information
Provide a cxgb4-specific function to fill in qp state details.
This allows dumping important c4iw_qp state useful for debugging.

Included in the dump are the t4_sq, t4_rq structs, plus a dump
of the t4_swsqe and t4swrqe descriptors for the first and last
pending entries.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:51:27 -04:00
Jack Morgenstein
b03bcde962 IB/mlx4: Fix integer overflow when calculating optimal MTT size
When the kernel was compiled using the UBSAN option,
we saw the following stack trace:

[ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27
[ 1184.828114] signed integer overflow:
[ 1184.828247] -2147483648 - 1 cannot be represented in type 'int'

The problem was caused by calling round_up in procedure
mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack
trace) with the second parameter (1 << block_shift) (which is an int).
The second parameter should have been (1ULL << block_shift) (which
is an unsigned long long).

(1 << block_shift) is treated by the compiler as an int (because 1 is
an integer).

Now, local variable block_shift is initialized to 31.
If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368.
This is the most negative int value.

Inside the round_up macro, there is a cast applied to ((1 << 31) - 1).
However, this cast is applied AFTER ((1 << 31) - 1) is calculated.
Since (1 << 31) is treated as an int, we get the negative overflow
identified by UBSAN in the process of calculating ((1 << 31) - 1).

The fix is to change (1 << block_shift) to (1ULL << block_shift) on
line 349.

Fixes: 9901abf583 ("IB/mlx4: Use optimal numbers of MTT entries")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:31:43 -04:00
Sebastian Sanchez
59482a1491 IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
When IRQ affinity is set and the interrupt type is unknown, a cpu
mask allocated within the function is never freed. Fix this memory
leak by allocating memory within the scope where it is used.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:48 -04:00
Sebastian Sanchez
e9777ad439 IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
When allocating device data, if there's an allocation failure, the
already allocated memory won't be freed such as per-cpu counters.

Fix memory leaks in exception path by creating a common reentrant
clean up function hfi1_clean_devdata() to be used at driver unload
time and device data allocation failure.

To accomplish this, free_platform_config() and clean_up_i2c() are
changed to be reentrant to remove dependencies when they are called
in different order. This helps avoid NULL pointer dereferences
introduced by this patch if those two functions weren't reentrant.

In addition, set dd->int_counter, dd->rcv_limit,
dd->send_schedule and dd->tx_opstats to NULL after they're freed in
hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:48 -04:00
Sebastian Sanchez
45d924571a IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
When an invalid num_vls is used as a module parameter, the code
execution follows an exception path where the macro dd_dev_err()
expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This
causes a NULL pointer dereference.

Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev
earlier in the code. If a dd exists, then dd->pcidev and
dd->pcidev->dev always exists.

BUG: unable to handle kernel NULL pointer dereference
at 00000000000000f0
IP: __dev_printk+0x15/0x90
Workqueue: events work_for_cpu_fn
RIP: 0010:__dev_printk+0x15/0x90
Call Trace:
 dev_err+0x6c/0x90
 ? hfi1_init_pportdata+0x38d/0x3f0 [hfi1]
 hfi1_init_dd+0xdd/0x2530 [hfi1]
 ? pci_conf1_read+0xb2/0xf0
 ? pci_read_config_word.part.9+0x64/0x80
 ? pci_conf1_write+0xb0/0xf0
 ? pcie_capability_clear_and_set_word+0x57/0x80
 init_one+0x141/0x490 [hfi1]
 local_pci_probe+0x3f/0xa0
 work_for_cpu_fn+0x10/0x20
 process_one_work+0x152/0x350
 worker_thread+0x1cf/0x3e0
 kthread+0xf5/0x130
 ? max_active_store+0x80/0x80
 ? kthread_bind+0x10/0x10
 ? do_syscall_64+0x6e/0x1a0
 ? SyS_exit_group+0x10/0x10
 ret_from_fork+0x35/0x40

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:47 -04:00
Mike Marciniszyn
0a0bcb046b IB/hfi1: Fix loss of BECN with AHG
AHG may be armed to use the stored header, which by design is limited
to edits in the PSN/A 32 bit word (bth2).

When the code is trying to send a BECN, the use of the stored header
will lose the BECN bit.

Fix by avoiding AHG when getting ready to send a BECN. This is
accomplished by always claiming the packet is not a middle packet which
is an AHG precursor.  BECNs are not a normal case and this should not
hurt AHG optimizations.

Cc: <stable@vger.kernel.org> # 4.14.x
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:47 -04:00
Michael J. Ruhl
5da9e742be IB/hfi1 Use correct type for num_user_context
The module parameter num_user_context is defined as 'int' and
defaults to -1.  The module_param_named() says that it is uint.

Correct module_param_named() type information and update the modinfo
text to reflect the default value.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:47 -04:00
Mike Marciniszyn
f59fb9e051 IB/hfi1: Fix handling of FECN marked multicast packet
The code for handling a marked UD packet unconditionally returns the
dlid in the header of the FECN marked packet.  This is not correct
for multicast packets where the DLID is in the multicast range.

The subsequent attempt to send the CNP with the multicast lid will
cause the chip to halt the ack send context because the source
lid doesn't match the chip programming.   The send context will
be halted and flush any other pending packets in the pio ring causing
the CNP to not be sent.

A part of investigating the fix, it was determined that the 16B work
broke the FECN routine badly with inconsistent use of 16 bit and 32 bits
types for lids and pkeys.  Since the port's source lid was correctly 32
bits the type mixmatches need to be dealt with at the same time as
fixing the CNP header issue.

Fix these issues by:
- Using the ports lid for as the SLID for responding to FECN marked UD
  packets
- Insure pkey is always 16 bit in this and subordinate routines
- Insure lids are 32 bits in this and subordinate routines

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 88733e3b84 ("IB/hfi1: Add 16B UD support")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03 15:24:44 -04:00
Colin Ian King
ffab8c89ba RDMA/qedr: fix spelling mistake: "failes" -> "fails"
Trivial fix to spelling mistake in DP_ERR error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-01 11:36:24 -04:00
YueHaibing
ecb238f6a7 IB/cxgb4: use skb_put_zero()/__skb_put_zero
Use the recently introduced helper to replace the pattern of
skb_put_zero/__skb_put() && memset().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-01 11:27:37 -04:00
Bharat Potnuri
2df19e19ae iw_cxgb4: Atomically flush per QP HW CQEs
When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire
corresponding QP lock before moving the CQEs into its corresponding SW
queue and accessing the SQ contents for completing a WR.
Ignore CQEs if corresponding QP is already flushed.

Cc: stable@vger.kernel.org
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:38:44 -04:00
Leon Romanovsky
4f9ca2d868 RDMA/mlx4: Add missed RSS hash inner header flag
Despite being advertised to user space application, the RSS inner
header flag was filtered by checks at the beginning of QP creation
routine.

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 4d02ebd9bb ("IB/mlx4: Fix RSS hash fields restrictions")
Fixes: 07d84f7b6a ("IB/mlx4: Add support to RSS hash for inner headers")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:22:23 -04:00
oulijun
ab17884903 RDMA/hns: Fix a couple misspellings
This patch fixes two spelling errors.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:21:51 -04:00
oulijun
137ae32084 RDMA/hns: Submit bad wr
When generated bad work reqeust, it needs to
report to user. This patch mainly fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:48 -04:00
oulijun
634f639022 RDMA/hns: Update assignment method for owner field of send wqe
When posting a work reqeust, it need to update the owner bit of send
wqe. This patch mainly fix the bug when posting multiply work
request.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
ae25db0028 RDMA/hns: Adjust the order of cleanup hem table
This patch update the order of cleaning hem table for trrl_table and irrl_table
as well as mtt_cqe_table and mtt_table.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
b6dd9b3483 RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
Only when the IB_QP_PATH_DEST_QPN flag of attr_mask is set
is it valid to assign the dqpn field of qp context

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
734f38638d RDMA/hns: Remove some unnecessary attr_mask judgement
This patch deletes some unnecessary attr_mask if condition
in hip08 according to the IB protocol.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
6852af8662 RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
Only when the IB_QP_PATH_MTU flag of attr_mask is set
it is valid to assign the mtu field of qp context when
qp type is not GSI and UD.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
6e1a70943c RDMA/hns: Fix the qp context state diagram
According to RoCE protocol, it is possible to
transition from error to error state for modifying
qp in hip08. This patch fix it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
328d405b3d RDMA/hns: Intercept illegal RDMA operation when use inline data
RDMA read operation is not supported inline data. If user cofigures
issue a RDMA read and use inline data, it will happen a hardware
error.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
oulijun
215a8c09e5 RDMA/hns: Bugfix for init hem table
During init hem table, type should be used instead of
table->type which is finally initializaed with type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
Luc Van Oostenryck
c192a12ce8 IB/nes: fix nes_netdev_start_xmit()'s return type
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:07:30 -04:00
Frederick Lawler
8a7d1b16bc IB/hfi1: Replace custom hfi1 macros with PCIe macros
IB/hfi1 contains custom macros for PCIe link configuration. Remove the
custom macros in favor of the PCIe link macros. No functional change
intended.

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: use "GT" instead of "GB"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-04-27 12:53:40 -05:00
Raju Rangoju
26bff1bd74 RDMA/cxgb4: release hw resources on device removal
The c4iw_rdev_close() logic was not releasing all the hw
resources (PBL and RQT memory) during the device removal
event (driver unload / system reboot). This can cause panic
in gen_pool_destroy().

The module remove function will wait for all the hw
resources to be released during the device removal event.

Fixes c12a67fe(iw_cxgb4: free EQ queue memory on last deref)
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 13:52:31 -04:00
Souptick Joarder
7991d96dd1 infiniband: hw: qib: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Reference id -> 1c8f422059 ("mm: change return type to
vm_fault_t")

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:10:11 -04:00
Souptick Joarder
d819734126 infiniband: hw: hfi1: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Reference id -> 1c8f422059 ("mm: change return type to
vm_fault_t")

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:10:11 -04:00
Leon Romanovsky
444261ca6f RDMA/mlx5: Properly check return value of mlx5_get_uars_page
Starting from commit 72f36be061 ("net/mlx5: Fix mlx5_get_uars_page to
return error code") the mlx5_get_uars_page() call returns error in case
of failure, but it was mistakenly overlooked in the merge commit.

Fixes: e7996a9a77 ("Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git")
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Parav Pandit
84a6a7a99c IB/mlx5: Fix represent correct netdevice in dual port RoCE
In commit bcf87f1dbb ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
incorrectly mapped primary device's netdevice to 2nd port netdevice.
It always represented primary port's netdevice for 2nd port netdevice
when ib representors were not used.

This results into failing to process CM request arriving on 2nd port due
to incorrect mapping of netdevice.

This fix corrects it by considering the right mdev.

Cc: <stable@vger.kernel.org> # 4.16
Fixes: bcf87f1dbb ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Danit Goldberg
4f32ac2e45 IB/mlx5: Use unlimited rate when static rate is not supported
Before the change, if the user passed a static rate value different
than zero and the FW doesn't support static rate,
it would end up configuring rate of 2.5 GBps.

Fix this by using rate 0; unlimited, in cases where FW
doesn't support static rate configuration.

Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Leon Romanovsky
002bf2282b RDMA/mlx5: Protect from shift operand overflow
Ensure that user didn't supply values too large that can cause overflow.

UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:263:23
shift exponent -2147483648 is negative
CPU: 0 PID: 292 Comm: syzkaller612609 Not tainted 4.16.0-rc1+ #131
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call
Trace:
dump_stack+0xde/0x164
ubsan_epilogue+0xe/0x81
set_rq_size+0x7c2/0xa90
create_qp_common+0xc18/0x43c0
mlx5_ib_create_qp+0x379/0x1ca0
create_qp.isra.5+0xc94/0x2260
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0xc2c/0x1010
vfs_write+0x1b0/0x550
SyS_write+0xc7/0x1a0
do_syscall_64+0x1aa/0x740
entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x433569
RSP: 002b:00007ffc6e62f448 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433569
RDX: 0000000000000070 RSI: 00000000200042c0 RDI: 0000000000000003
RBP: 00000000006d5018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040c9f0 R14: 000000000040ca80 R15: 0000000000000006

Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Leon Romanovsky
b4bd701ac4 RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr flow
Failure in rereg MR releases UMEM but leaves the MR to be destroyed
by the user. As a result the following scenario may happen:
"create MR -> rereg MR with failure -> call to rereg MR again" and
hit "NULL-ptr deref or user memory access" errors.

Ensure that rereg MR is only performed on a non-dead MR.

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.5
Fixes: 395a8e4c32 ("IB/mlx5: Refactoring register MR code")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Israel Rukshin
6082d9c9c9 net/mlx5: Fix mlx5_get_vector_affinity function
Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.

The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.

Fixes: 2572cf57d7 ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84e0 ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-04-26 12:43:19 -07:00
Jia-Ju Bai
4e56569cee infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_l2param_change
i40iw_l2param_change() is never called in atomic context.

i40iw_make_listen_node() is only set as ".l2_param_change"
in struct i40e_client_ops, and this function pointer is not called
in atomic context.

Despite never getting called from atomic context,
i40iw_l2param_change() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:57:12 -06:00
Jia-Ju Bai
f9af873014 infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_make_listen_node
i40iw_make_listen_node() is never called in atomic context.

i40iw_make_listen_node() is only called by i40iw_create_listen, which is
set as ".create_listen" in struct iw_cm_verbs.

Despite never getting called from atomic context,
i40iw_make_listen_node() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:57:11 -06:00
Jia-Ju Bai
39e487faaf infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_add_mqh_4
i40iw_add_mqh_4() is never called in atomic context, because it
calls rtnl_lock() that can sleep.

Despite never getting called from atomic context,
i40iw_add_mqh_4() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:57:11 -06:00
Randy Dunlap
b3fe6c62bc infiniband: mlx5: fix build errors when INFINIBAND_USER_ACCESS=m
Fix build errors when INFINIBAND_USER_ACCESS=m and MLX5_INFINIBAND=y.
The build error occurs when the mlx5 driver code attempts to use
USER_ACCESS interfaces, which are built as a loadable module.

Fixes these build errors:

drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root':
../drivers/infiniband/hw/mlx5/main.c:4982: undefined reference to `uverbs_default_get_objects'
../drivers/infiniband/hw/mlx5/main.c:4994: undefined reference to `uverbs_alloc_spec_tree'
drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root':
../drivers/infiniband/hw/mlx5/main.c:5001: undefined reference to `uverbs_free_spec_tree'

Build-tested with multiple config combinations.

Fixes: 8c84660bb4 ("IB/mlx5: Initialize the parsing tree root without the help of uverbs")
Cc: stable@vger.kernel.org # reported against 4.16
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:26:43 -06:00
Zhu Yanjun
25d0e2db3d IB/mlx5: remove duplicate header file
The header file fs_helpers.h is included twice. So it should be removed.

Fixes: 802c212568 ("IB/mlx5: Add IPsec support for egress and ingress")
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-16 16:23:02 -06:00
Linus Torvalds
19fd08b85b Merge candidates for 4.17 merge window
- Fix RDMA uapi headers to actually compile in userspace and be more
   complete
 
 - Three shared with netdev pull requests from Mellanox:
 
    * 7 patches, mostly to net with 1 IB related one at the back). This
      series addresses an IRQ performance issue (patch 1), cleanups related to
      the fix for the IRQ performance problem (patches 2-6), and then extends
      the fragmented completion queue support that already exists in the net
      side of the driver to the ib side of the driver (patch 7).
 
    * Mostly IB, with 5 patches to net that are needed to support the remaining
      10 patches to the IB subsystem. This series extends the current
      'representor' framework when the mlx5 driver is in switchdev mode from
      being a netdev only construct to being a netdev/IB dev construct. The IB
      dev is limited to raw Eth queue pairs only, but by having an IB dev of
      this type attached to the representor for a switchdev port, it enables
      DPDK to work on the switchdev device.
 
    * All net related, but needed as infrastructure for the rdma driver
 
 - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
 
 - SRP performance updates
 
 - IB uverbs write path cleanup patch series from Leon
 
 - Add RDMA_CM support to ib_srpt. This is disabled by default.  Users need to
   set the port for ib_srpt to listen on in configfs in order for it to be
   enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
 
 - TSO and Scatter FCS support in mlx4
 
 - Refactor of modify_qp routine to resolve problems seen while working on new
   code that is forthcoming
 
 - More refactoring and updates of RDMA CM for containers support from Parav
 
 - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
   user API features
 
 - Infrastructure updates for the new IOCTL interface, based on increased usage
 
 - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
   kernel as was originally intended. See the commit messages for
   extensive details
 
 - Syzkaller bugs and code cleanups motivated by them
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
 y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
 5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
 URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
 EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
 oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
 D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
 i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
 5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
 Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
 rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
 QuLEdvJIFmwrND1KDoqn
 =Ku8g
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Doug and I are at a conference next week so if another PR is sent I
  expect it to only be bug fixes. Parav noted yesterday that there are
  some fringe case behavior changes in his work that he would like to
  fix, and I see that Intel has a number of rc looking patches for HFI1
  they posted yesterday.

  Parav is again the biggest contributor by patch count with his ongoing
  work to enable container support in the RDMA stack, followed by Leon
  doing syzkaller inspired cleanups, though most of the actual fixing
  went to RC.

  There is one uncomfortable series here fixing the user ABI to actually
  work as intended in 32 bit mode. There are lots of notes in the commit
  messages, but the basic summary is we don't think there is an actual
  32 bit kernel user of drivers/infiniband for several good reasons.

  However we are seeing people want to use a 32 bit user space with 64
  bit kernel, which didn't completely work today. So in fixing it we
  required a 32 bit rxe user to upgrade their userspace. rxe users are
  still already quite rare and we think a 32 bit one is non-existing.

   - Fix RDMA uapi headers to actually compile in userspace and be more
     complete

   - Three shared with netdev pull requests from Mellanox:

      * 7 patches, mostly to net with 1 IB related one at the back).
        This series addresses an IRQ performance issue (patch 1),
        cleanups related to the fix for the IRQ performance problem
        (patches 2-6), and then extends the fragmented completion queue
        support that already exists in the net side of the driver to the
        ib side of the driver (patch 7).

      * Mostly IB, with 5 patches to net that are needed to support the
        remaining 10 patches to the IB subsystem. This series extends
        the current 'representor' framework when the mlx5 driver is in
        switchdev mode from being a netdev only construct to being a
        netdev/IB dev construct. The IB dev is limited to raw Eth queue
        pairs only, but by having an IB dev of this type attached to the
        representor for a switchdev port, it enables DPDK to work on the
        switchdev device.

      * All net related, but needed as infrastructure for the rdma
        driver

   - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers

   - SRP performance updates

   - IB uverbs write path cleanup patch series from Leon

   - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
     need to set the port for ib_srpt to listen on in configfs in order
     for it to be enabled
     (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)

   - TSO and Scatter FCS support in mlx4

   - Refactor of modify_qp routine to resolve problems seen while
     working on new code that is forthcoming

   - More refactoring and updates of RDMA CM for containers support from
     Parav

   - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
     memory' user API features

   - Infrastructure updates for the new IOCTL interface, based on
     increased usage

   - ABI compatibility bug fixes to fully support 32 bit userspace on 64
     bit kernel as was originally intended. See the commit messages for
     extensive details

   - Syzkaller bugs and code cleanups motivated by them"

* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
  IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
  IB/mlx5: Device memory mr registration support
  net/mlx5: Mkey creation command adjustments
  IB/mlx5: Device memory support in mlx5_ib
  net/mlx5: Query device memory capabilities
  IB/uverbs: Add device memory registration ioctl support
  IB/uverbs: Add alloc/free dm uverbs ioctl support
  IB/uverbs: Add device memory capabilities reporting
  IB/uverbs: Expose device memory capabilities to user
  RDMA/qedr: Fix wmb usage in qedr
  IB/rxe: Removed GID add/del dummy routines
  RDMA/qedr: Zero stack memory before copying to user space
  IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
  IB/mlx5: Add information for querying IPsec capabilities
  IB/mlx5: Add IPsec support for egress and ingress
  {net,IB}/mlx5: Add ipsec helper
  IB/mlx5: Add modify_flow_action_esp verb
  IB/mlx5: Add implementation for create and destroy action_xfrm
  IB/uverbs: Introduce ESP steering match filter
  IB/uverbs: Add modify ESP flow_action
  ...
2018-04-06 17:35:43 -07:00
Ariel Levkovich
6c29f57ea4 IB/mlx5: Device memory mr registration support
Adding mlx5_ib driver implementation for reg_dm_mr callback
which allows registering device memory (DM) as an MR for
local and remote access.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
cdbd0d2bae net/mlx5: Mkey creation command adjustments
This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
24da00164f IB/mlx5: Device memory support in mlx5_ib
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.

The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.

The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Kalderon, Michal
09c4854fde RDMA/qedr: Fix wmb usage in qedr
This patch comes as a result of Sinan Kaya's work and the decision that
writel() must be a strong enough barrier for DMA.

wmb usages in qedr driver have either been removed where they were there
only to order DMA accesses, and replaced with smp_wmb and comments for the
places that the barrier was there for SMP reasons.

Fixes: 561e5d4896 ("RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 11:10:14 -06:00
Jason Gunthorpe
57939021e8 RDMA/qedr: Zero stack memory before copying to user space
The fact this struct was not init'd like all the others was missed when
the padding reserved field was added.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 71e80a4781 ("RDMA/qedr: Fix uABI structure layouts for 32/64 compat")
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 10:11:37 -06:00
Matan Barak
2d93fc8569 IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
When a Raw Ethernet QP is created, we actually create a few objects.
One of these objects is a TIR. Currently, a TIR could hash (and spread
the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR
creation with the required UAPI bit.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Matan Barak
c03faa562d IB/mlx5: Add information for querying IPsec capabilities
Users should be able to query for IPSec support. Adding a few
capabilities bits as part of the driver specific part in
alloc_ucontext:
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA
	Payload's header is returned with metadata representing the
	IPSec decryption state.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX
	Support ESP_AES_GCM in ingress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX
	Support ESP_AES_GCM in egress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY
	Hardware doesn't support matching SPI in flow steering rules
	but just hashing and spreading the traffic accordingly.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Aviad Yehezkel
802c212568 IB/mlx5: Add IPsec support for egress and ingress
This commit introduces support for the esp_aes_gcm flow
specification for the Innova device. To that end we add
support for egress steering and some validations that an
IPsec rule is indeed valid.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Matan Barak
349705c193 IB/mlx5: Add modify_flow_action_esp verb
Adding implementation in mlx5 driver to modify action_xfrm object. This
merely call the accel layer. Currently a user can modify only the
ESN parameters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
c6475a0bca IB/mlx5: Add implementation for create and destroy action_xfrm
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.

A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Boris Pismenny
8510020d29 IB/mlx4: Check for egress flow steering
ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP
error when such a flow is being created.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Matan Barak
8c84660bb4 IB/mlx5: Initialize the parsing tree root without the help of uverbs
In order to have a custom parsing tree, a provider driver needs to
assign its parsing tree to ib_device specs_tree field. Otherwise, the
uverbs client assigns a common default parsing tree for it.
In downstream patches, the mlx5_ib driver gains a custom parsing tree,
which contains both the common objects and a new flags field for the
UVERBS_FLOW_ACTION_ESP_CREATE command.
This patch makes mlx5_ib assign its own tree to specs_root, which
later on will be extended.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:23 -06:00
Parav Pandit
414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit
3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
14169e333e IB/providers: Avoid zero GID check for RoCE
Now that the IB core GID cache ensures that a zero GID doesn't exist in
the GID table remove zero GID checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
0e1f9b9244 RDMA/providers: Simplify query_gid callback of RoCE providers
ib_query_gid() fetches the GID from the software cache maintained in
ib_core for RoCE ports.

Therefore, simplify the provider drivers for RoCE to treat query_gid()
callback as never called for RoCE, and only require non-RoCE devices to
implement it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:47 -06:00
Parav Pandit
ca486a3b33 IB/qedr: Remove GID add/del dummy routines
qedr driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.

Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:46:29 -06:00
Shiraz Saleem
3e64f8d6f5 i40iw: Remove pre-production workaround for resource profile 1
Support for resource profile 1 is currenlty deprecated due to
a pre-production errata. Remove this workaround as its no longer
needed.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:40:39 -06:00
Jason Gunthorpe
41d902cb7c RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp
This structure is pushed down the ex and the non-ex path, so it needs to be
aligned to 8 bytes to go through ex without implicit padding.

Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so
take the approach of just copying the minimum length.

New user space will consistently provide 8 bytes in both cases.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:38:40 -06:00
Gustavo A. R. Silva
689a8e3193 IB/ocrdma_hw: Remove redundant checks and goto labels
Check on return values and goto label mbx_err are unnecessary.

Addresses-Coverity-ID: 1271151 ("Identical code for different branches")
Addresses-Coverity-ID: 1268788 ("Identical code for different branches")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 10:43:35 -06:00