Commit Graph

7552 Commits

Author SHA1 Message Date
Kamenee Arumugam
0e31a2e195 IB/hfi1: Remove wrapper function in mmu_rb
Wrapper functions were used to call the same function
mmu_notifier_mem_invalidate for 2 callbacks in
mmu_notifier.
The commit 7def96f0a9 ("IB/hfi1: update to new mmu_notifier semantic")
removed the invalidate_page callback.
Therefore, the wrapper function is no longer needed.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:56 -05:00
Jakub Byczkowski
22a3ffa780 IB/hfi1: Reduce 8051 command timeout
Timeout of 20 seconds is too long for active wait performed
for 8051 command completion. It was required for scenarios
when transition to polling was requested before offline.quiet
state was reached. Currently wait for offline.quiet is
properly implemented and timeout can be reduced to 1 second.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Duane McCrory <duane.mccrory@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:56 -05:00
Jan Sokolowski
641f348bbd IB/hfi1: Allow MgmtAllowed on B2B setups
HFI's are hard-wired to send Device Info frames with
MgmtAllowed bit set to 0. This means in B2B setups,
MgmtAllowed would never be allowed, which prevents
remote opa management tools from working properly.

Assume MgmtAllowed if a neighbor is also an HFI.

Fixes: 98b9ee2002 ("IB/hfi1: Cache neighbor secure data after link up")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:56 -05:00
Bart Van Assche
57b0c46054 IB/srpt: Ensure that modifying the use_srq configfs attribute works
The use_srq configfs attribute is created after it is read. Hence
modify srpt_tpg_attrib_use_srq_store() such that this function
switches dynamically between non-SRQ and SRQ mode.

Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche
8b6dc529ea IB/srpt: Wait until channel release has finished during module unload
Introduce the helper function srpt_set_enabled(). Protect
sport->enabled changes with sdev->mutex. Makes configfs writes
into 'enabled' wait until all channel resources have been freed.
Wait until channel release has finished during kernel module
unload.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche
01b3ee13c2 IB/srpt: Introduce srpt_disconnect_ch_sync()
Except for changing a BUG_ON() call into a WARN_ON_ONCE() call, this
patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Bart Van Assche
c76d7d64f8 IB/srpt: Introduce helper functions for SRQ allocation and freeing
The only functional change in this patch is in the srpt_add_one()
error path: if allocating the ring buffer for the SRQ fails, fall
back to non-SRQ mode instead of disabling SRP target functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Mike Marciniszyn
321e329b9c IB/srpt: Post receive work requests after qp transition to INIT state
IB and iWARP specs both spell out that posting a receive work request
to a queue pair in the RESET state is an invalid operation and required
to fail.  Postpone posting receive work requests until after the
transition to the INIT state.

Fixes: commit dea262094c ("IB/srpt: Change default behavior from using SRQ to using RC")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:25:16 -05:00
Steve Wise
ba97b74997 iw_cxgb4: remove BUG_ON() usage.
iw_cxgb4 has many BUG_ON()s that were left over from various enhancemnets
made over the years.  Almost all of them should just be removed.  Some,
however indicate a ULP usage error and can be handled w/o bringing down
the system.

If the condition cannot happen with correctly implemented cxgb4 sw/fw,
then remove the BUG_ON.

If the condition indicates a misbehaving ULP (like CQ overflows), add
proper recovery logic.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:01:25 -05:00
Sriharsha Basavapatna
063fb5bd1a bnxt_re: changing the ip address shouldn't affect new connections
While adding a new gid, the driver currently does not return the context
back to the stack. A subsequent del_gid() (e.g, when ip address is changed)
doesn't find the right context in the driver and it ends up dropping that
request. This results in the HW caching a stale gid entry and traffic fails
because of that. Fix by returning the proper context in bnxt_re_add_gid().

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:01:25 -05:00
Sriharsha Basavapatna
d6d5c59905 bnxt_re: fix a crash in qp error event processing
In bnxt_qplib_process_qp_event(), for qp error events we look up the
qp-handle and pass it for further processing. But we don't check if the
handle is NULL. This could lead to a crash in the called functions when
that qp-handle is dereferenced, if the qp is destroyed in the meantime.
Fix this by checking for a valid qp-handle in that function.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:01:25 -05:00
Parav Pandit
2e4c85c6ed IB/core: Avoid unnecessary return value check
Since there is nothing done with non zero return value, such check is
avoided.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 14:42:04 -05:00
Mark Bloch
5f22a1d87c IB/mlx4: Increase maximal message size under UD QP
Maximal message should be used as a limit to the max message payload allowed,
without the headers. The ConnectX-3 check is done against this value includes
the headers. When the payload is 4K this will cause the NIC to drop packets.

Increase maximal message to 8K as workaround, this shouldn't change current
behaviour because we continue to set the MTU to 4k.

To reproduce;
set MTU to 4296 on the corresponding interface, for example:
ifconfig eth0 mtu 4296 (both server and client)

On server:
ib_send_bw -c UD -d mlx4_0 -s 4096 -n 1000000 -i1 -m 4096

On client:
ib_send_bw -d mlx4_0 -c UD <server_ip> -s 4096 -n 1000000 -i 1 -m 4096

Fixes: 6e0d733d92 ("IB/mlx4: Allow 4K messages for UD QPs")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 14:42:04 -05:00
Guy Levi
ed8637d361 IB/mlx4: Add contig support for control objects
Taking advantage of the optimization which was introduced in previous
commit ("IB/mlx4: Use optimal numbers of MTT entries") to optimize the
MTT usage for QP and CQ.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 14:42:04 -05:00
Guy Levi
9901abf583 IB/mlx4: Use optimal numbers of MTT entries
Optimize the device performance by assigning multiple physical pages,
which are contiguous, to a single MTT. As a result, the number of MTTs
is reduced and in turn save cache misses of MTTs.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 14:42:04 -05:00
Majd Dibbiny
2b621851ac IB/mlx5: Fix RoCE Address Path fields
When working over a RoCE network, the UDP source port should be set only
for statically connected QPs (RC, UC and XRC).

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Majd Dibbiny
31fde034a8 IB/mlx5: Assign send CQ and recv CQ of UMR QP
The UMR's QP is created by calling mlx5_ib_create_qp directly, and
therefore the send CQ and the recv CQ on the ibqp weren't assigned.

Assign them right after calling the mlx5_ib_create_qp to assure
that any access to those pointers will work as expected and won't
crash the system as might happen as part of reset flow.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Leon Romanovsky
9950acf945 RDMA/cxgb4: Protect from possible dereference
Smatch tool reports the following error:
  drivers/infiniband/hw/cxgb4/qp.c:1886
	c4iw_create_qp() error: we previously assumed 'ucontext'
	could be null (see line 1804)

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Leon Romanovsky
e32d2d7144 RDMA/bnxt_re: Remove unused vlan_tag variable
The Broadcom driver produces the following compilation warning

drivers/infiniband/hw/bnxt_re/ib_verbs.c:
	In function ‘bnxt_re_create_ah’:
drivers/infiniband/hw/bnxt_re/ib_verbs.c:668:6:
	warning: variable ‘vlan_tag’ set but not used [-Wunused-but-set-variable]
	u16 vlan_tag;

Let's remove it till vlan_tag will be implemented properly.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Noa Osherovich
b1383aa641 IB/mlx5: Add PCI write end padding support
Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.

During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.

PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:50:27 -05:00
Noa Osherovich
e1d2e88733 IB/core: Add PCI write end padding flags for WQ and QP
There are root complexes that are able to optimize their
performance when incoming data is multiple full cache lines.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

Add a relevant entry to ib_device_cap_flags to report such capability
of an RDMA device.

Add the QP and WQ create flags:
 * A QP/WQ created with a scatter end padding flag will cause
   HW to pad the last upstream write generated by a packet to cache line.

User should consider several factors before activating this feature:
- In case of high CPU memory load (which may cause PCI back pressure in
  turn), if a large percent of the writes are partial cache line, this
  feature should be checked as an optional solution.
- This feature might reduce performance if most packets are between one
  and two cache lines and PCIe throughput has reached its maximum
  capacity. E.g. 65B packet from the network port will lead to 128B
  write on PCIe, which may cause traffic on PCIe to reach high
  throughput.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:50:27 -05:00
Thomas Bogendoerfer
3192c53e5a IB/rxe: don't crash, if allocation of crc algorithm failed
Following crash happens, if crc algorithm couldn't be allocated:

[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250]  at 0000000000000046
[...]

Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:43:50 -05:00
Parav Pandit
89548bcafe IB/core: Avoid crash on pkey enforcement failed in received MADs
Below kernel crash is observed when Pkey security enforcement fails on
received MADs. This issue is reported in [1].

ib_free_recv_mad() accesses the rmpp_list, whose initialization is
needed before accessing it.
When security enformcent fails on received MADs, MAD processing avoided
due to security checks failed.

OpenSM[3770]: SM port is down
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: PGD 0
kernel: P4D 0
kernel:
kernel: Oops: 0002 [#1] SMP
kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
kernel: Call Trace:
kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
kernel:  process_one_work+0x1e9/0x410
kernel:  worker_thread+0x4b/0x410
kernel:  kthread+0x109/0x140
kernel:  ? process_one_work+0x410/0x410
kernel:  ? kthread_create_on_node+0x70/0x70
kernel:  ? SyS_exit_group+0x14/0x20
kernel:  ret_from_fork+0x25/0x30
kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
kernel: CR2: 0000000000000008

[1] : https://www.spinics.net/lists/linux-rdma/msg56190.html

Fixes: 47a2b338fe ("IB/core: Enforce security on management datagrams")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:26:00 -05:00
Leon Romanovsky
7d7d065a5e RDMA/cxgb4: Annotate r2 and stag as __be32
Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.

  drivers/infiniband/hw/cxgb4/qp.c:614:16:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] r2
      got restricted __be32 [usertype] <noident>
  drivers/infiniband/hw/cxgb4/qp.c:615:18:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] stag
      got restricted __be32 [usertype] <noident>

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:04:09 -05:00
Guy Levi
108809a057 IB/mlx4: Fix RSS's QPC attributes assignments
In the modify QP handler the base_qpn_udp field in the RSS QPC is
overwrite later by irrelevant value assignment. Hence, ingress packets
which gets to the RSS QP will be steered then to a garbage QPN.

The patch fixes this by skipping the above assignment when a RSS QP is
modified, also, the RSS context's attributes assignments are relocated
just before the context is posted to avoid future issues like this.

Additionally, this patch takes the opportunity to change the code to be
disciplined to the device's manual and assigns the RSS QP context just at
RESET to INIT transition.

Fixes:3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:04:09 -05:00
Guy Levi
09d208b258 IB/mlx4: Add report for RSS capabilities by vendor channel
The mlx4's RSS patches submission missed a report of RSS capabilities
which should be reported by the vendor channel in query_device.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:04:09 -05:00
Leon Romanovsky
fec99ededf RDMA/umem: Avoid partial declaration of non-static function
The RDMA/umem uses generic RB-trees macros to generate various ib_umem
access functions. The generation is performed with INTERVAL_TREE_DEFINE
macro, which allows one of two modes: declare all functions as static or
declare none of the function to be static.

The second mode of operation produces the following sparse errors:
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_first' was not declared.
	Should it be static?
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_next' was not declared.
	Should it be static?

Code relocation together with declaration of such functions to be
"static" solves the issue.

Because there is no need to have separate file for two functions,
let's consolidate umem_rtree.c and umem_odp.c into one file.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:02:12 -05:00
oulijun
26beb85f41 RDMA/hns: Modify the usage of cmd_sn in hip08
The cmd_sn field of CQ doorbell inits for 0. It should be
increment on each first db rung after a completion Event.
if the cmd_sn of notify doorbell Adjacent two times is the
same, the hardware will distinguish it for the same notify
request and update its type according to the priority level
of next event and solicited event.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:32:43 -05:00
oulijun
0203b14c4f RDMA/hns: Unify the calculation for hem index in hip08
The calculation of hem index are different between hns_roce_table_get
and hns_roce_table_find. When the table chunk size of TRRL is not
divisible by object size, it will faile to find the trrl table.

This patch is to update the calculation of the hem index in the
hns_roce_table_find to the same as which in the hns_roce_table_get.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:32:18 -05:00
oulijun
e8d1853357 RDMA/hns: Set the owner field of SQWQE in hip08 RoCE
the owner need to be set when posting sqwqe in hip08 RoCE.
The owner be used according to the below algorithm:
The value of owner should be 1 in the first lap, it
should be 0 in the second lap and in turn.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:32:02 -05:00
oulijun
b5fddb7ce7 RDMA/hns: Add sq_invld_flg field in QP context
In hip08 RoCE, it need to add the sq_invld_flg field
in QP context for RoCE hardware.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:31:52 -05:00
oulijun
2872646134 RDMA/hns: Update the usage of ack timeout in hip08
The ack timeout's value in qp context shall be a 5-bit value
and be assgined by users. When at of qpc is set zero, the
timer is disabled.

When attr_mask set for IB_QP_TIMEOUT, The ack timeout field
is effective.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:31:34 -05:00
oulijun
befb63b43d RDMA/hns: Set sq_cur_sge_blk_addr field in QPC in hip08
If the extend sges exist, the sq_cur_sge_blk_addr field in QPC
(qp context) should be configured.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:31:25 -05:00
oulijun
a49d761fc1 RDMA/hns: Enable the cqe field of sqwqe of RC
When sig_type of qpc is non-selectable, all sq's wqes will produce
cqe and not depend on the cqe attribute of wqe. When sig_type of
qpc is selectable, The cqe attribute of wqe will decide whether to
produce the cqe.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:30:45 -05:00
oulijun
492b2bd026 RDMA/hns: Set se attribute of sqwqe in hip08
When send flags is IB_SEND_SOLICITED, the se(solicated event)
field of sqwqe will be set.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:30:35 -05:00
oulijun
651487c229 RDMA/hns: Configure fence attribute in hip08 RoCE
When post wr for mixed rdma operation, we need to use fence
mechanism to keep the correct execute order.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:30:10 -05:00
oulijun
e92f2c182b RDMA/hns: Configure TRRL field in hip08 RoCE device
The TRRL(Target RDMA Read/aTOMIC List) record the information
of receiving RDMA READ or ATOMIC operation in hip08. It will
be used the hardware. The driver need to assign a continuous
physical address for trrl_ba field of qp context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:29:47 -05:00
oulijun
d551424617 RDMA/hns: Update calculation of irrl_ba field for hip08
The irrl(initiator RDMA Read/Atomic list) base address of qp
context is assigned for addr[63:6]. This patch mainly fixed
it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:15:11 -05:00
Wei Hu(Xavier)
b5ff0f610b RDMA/hns: Configure sgid type for hip08 RoCE
The hardware vendors need to generate RoCEv1 or RoCEv2
packet according to the sgid type configured.

Besides, update the gid table size for hip08 RoCE
device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:14:27 -05:00
Wei Hu(Xavier)
023c1477b0 RDMA/hns: Generate gid type of RoCEv2
HNS_ROCE_CAP_FALG_ROCE_V1_V2 is added for selecting capability of
RoCE in hns driver. When HNS_ROCE_CAP_FALG_ROCE_V1_V2 is set,
driver will inform ib core that the related hns device can support
RoCEv2, and ib core can generate the gid of the related type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:14:27 -05:00
Wei Hu(Xavier)
a2c80b7b41 RDMA/hns: Add rereg mr support for hip08
This patch adds rereg mr support for hip08.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 12:14:27 -05:00
Doug Ledford
5c08681b48 Merge branch 'k.o/for-rc' into k.o/for-next
Pick up the missing netlink oops fix

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-01 15:25:27 -04:00
Mike Marciniszyn
31acd18b61 IB/hfi1: Take advantage of kvzalloc_node in sdma initialization
The code that allocates the tx ring in the sdma code fails to take
advantage of kvzalloc variations.

Fix by converting to use kvzalloc_node.

Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Kamenee Arumugam
45a041cce7 IB/hfi1: Don't modify num_user_contexts module parameter
The driver parameter num_user_contexts controls global behavior and
should not be modified by the driver.
This patch eliminates modification of num_user_contexts by using a
local variable to keep track of the value.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Mike Marciniszyn
2d9544aacf IB/hfi1: Insure int mask for in-kernel receive contexts is clear
The only use for the urg interrupt is for priority PSM packets.

There is no reason for this interrupt to be enabled for kernel
contexts.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Mike Marciniszyn
1b311f8931 IB/hfi1: Add tx_opcode_stats like the opcode_stats
This patch adds tx_opcode_stats to parallel the
(rx)opcode_stats in the debugfs.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Sebastian Sanchez
406310c66d IB/hfi1: Validate PKEY for incoming GSI MAD packets
These are the use-cases where the pkey needs to be tested to see
if a packet needs to be dropped.

a) Check if pkey is not FULL_MGMT_P_KEY or LIM_MGMT_P_KEY,
   drop the packet as it's not part of the management partition.
   Self-originated packets are an exception.

b) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   in the table, the packet is coming from a management node,
   and the receiving node is also a management node, so it is safe
   for the packet to go through.

c) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   NOT in the table, drop the packet as LIM_MGMT_P_KEY should
   always be in the pkey table. It could be a misconfiguration.

d) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is
   NOT in the table, it is safe for the packet to go through
   since a non-management node is talking to another non-managment
   node.

e) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is in
   the table, drop the packet because a non-management node is
   talking to a management node, and it could be an attack.

For the implementation, these rules can be simplied to only checking
for (a) and (e). There's no need to check for rule (b) as
the packet doesn't need to be dropped. Rule (c) is not possible in
the driver as LIM_MGMT_P_KEY is always in the pkey table.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Patel Jay P
00f9203119 Ib/hfi1: Return actual operational VLs in port info query
__subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as
operational vls. hfi1_get_ib_cfg() returns vls_operational field in
hfi1_pportdata. The problem with this is that the value is always equal
to vls_supported field in hfi1_pportdata.

The logic to calculate operational_vls is to set value passed by FM
(in  __subn_set_opa_portinfo routine). If no value is passed then
default value is stored in operational_vls.

Field actual_vls_operational is calculated on the basis of buffer
control table. Hence, modifying hfi1_get_ib_cfg() to return
actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Patel Jay P <jay.p.patel@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Michael J. Ruhl
4061f3a4da IB/hfi1: Race condition between user notification and driver state
The handler for link init state (HLS_UP_INIT) notifies userspace
(update_statusp()) before enabling the device
(RCV_CTRL_RCV_PORT_ENABLE_SMASK) or setting the device state
(ppd->host_link_state).  This causes a race condition where the
userspace thinks the interface is in the INIT state before the driver
has set that state.

Rework the code path to eliminate the race.

Delay setting the init state until after a HW settling period.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Somnath Kotur
5455e73a76 bnxt_re: Implement the shutdown hook of the L2-RoCE driver interface
When host is shutting down, it invokes the shutdown hook of the
L2 driver where it would attempt to free the MSI-X vectors, but would fail
because some vectors are held by the RoCE driver.
Implement the new hook in the L2 -> RoCE interface which will be invoked so that
the RoCE driver can unregister the device and free up the MSI-X vectors it had
claimed so that L2 can proceed with it's shutdown without failure.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 15:25:08 -04:00