Commit Graph

713 Commits

Author SHA1 Message Date
Eran Ben Elisha
da7525d2a9 IB/mlx5: Advertise atomic capabilities in query device
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA.  if not,
advertise IB_ATOMIC_NONE.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-24 00:17:31 -05:00
Leon Romanovsky
051f263098 IB/mlx5: Add driver cross-channel support
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.

The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:33:14 -05:00
Matan Barak
d69e3bcf79 IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak
b368d7cb8c IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak
7c60bcbb68 IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:25:59 -05:00
Matan Barak
972ecb8213 IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 23:23:02 -05:00
Christoph Hellwig
feb7c1e38b IB: remove in-kernel support for memory windows
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 14:29:04 -05:00
Achiad Shochat
e53505a802 IB/mlx5: Support RoCE
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat
2811ba51b0 IB/mlx5: Add RoCE fields to Address Vector
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat
3cca26069a IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat
cb34be6da2 IB/mlx5: Set network_hdr_type upon RoCE responder completion
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat
3f89a643eb IB/mlx5: Extend query_device/port to support RoCE
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:37 -05:00
Achiad Shochat
fc24fc5e95 IB/mlx5: Support IB device's callback for getting its netdev
For Eth ports only:
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:36 -05:00
Achiad Shochat
ebd61f68e1 IB/mlx5: Support IB device's callback for getting the link layer
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23 12:07:36 -05:00
Leon Romanovsky
ab5cdc3163 IB/mlx5: Postpone remove_keys under knowledge of coming preemption
The remove_keys() logic is performed as garbage collection task. Such
task is intended to be run when no other active processes are running.

The need_resched() will return TRUE if there are user tasks to be
activated in near future.

In such case, we don't execute remove_keys() and postpone
the garbage collection work to try to run in next cycle,
in order to free CPU resources to other tasks.

The possible pseudo-code to trigger such scenario:
1. Allocate a lot of MR to fill the cache above the limit.
2. Wait a small amount of time "to calm" the system.
3. Start CPU extensive operations on multi-node cluster.
4. Expect performance degradation during MR cache shrink operation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08 16:55:31 -05:00
Linus Torvalds
ab9f2faf8f Initial 4.4 merge window submission
- "Checksum offload support in user space" enablement
 - Misc cxgb4 fixes, add T6 support
 - Misc usnic fixes
 - 32 bit build warning fixes
 - Misc ocrdma fixes
 - Multicast loopback prevention extension
 - Extend the GID cache to store and return attributes of GIDs
 - Misc iSER updates
 - iSER clustering update
 - Network NameSpace support for rdma CM
 - Work Request cleanup series
 - New Memory Registration API
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
 4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
 EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
 4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
 pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
 o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
 agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
 Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
 mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
 yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
 58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
 hllyhNNolI6FJ64j07Xm
 =bBIY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is my initial round of 4.4 merge window patches.  There are a few
  other things I wish to get in for 4.4 that aren't in this pull, as
  this represents what has gone through merge/build/run testing and not
  what is the last few items for which testing is not yet complete.

   - "Checksum offload support in user space" enablement
   - Misc cxgb4 fixes, add T6 support
   - Misc usnic fixes
   - 32 bit build warning fixes
   - Misc ocrdma fixes
   - Multicast loopback prevention extension
   - Extend the GID cache to store and return attributes of GIDs
   - Misc iSER updates
   - iSER clustering update
   - Network NameSpace support for rdma CM
   - Work Request cleanup series
   - New Memory Registration API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
  IB/core, cma: Make __attribute_const__ declarations sparse-friendly
  IB/core: Remove old fast registration API
  IB/ipath: Remove fast registration from the code
  IB/hfi1: Remove fast registration from the code
  RDMA/nes: Remove old FRWR API
  IB/qib: Remove old FRWR API
  iw_cxgb4: Remove old FRWR API
  RDMA/cxgb3: Remove old FRWR API
  RDMA/ocrdma: Remove old FRWR API
  IB/mlx4: Remove old FRWR API support
  IB/mlx5: Remove old FRWR API support
  IB/srp: Dont allocate a page vector when using fast_reg
  IB/srp: Remove srp_finish_mapping
  IB/srp: Convert to new registration API
  IB/srp: Split srp_map_sg
  RDS/IW: Convert to new memory registration API
  svcrdma: Port to new memory registration API
  xprtrdma: Port to new memory registration API
  iser-target: Port to new memory registration API
  IB/iser: Port to new fast registration API
  ...
2015-11-07 13:33:07 -08:00
Sagi Grimberg
dd01e66a6c IB/mlx5: Remove old FRWR API support
No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg
8a187ee52b IB/mlx5: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Sagi Grimberg
a706000916 IB/mlx5: Remove dead fmr code
Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Christoph Hellwig
adec640e03 mlx5: stop including <asm-generic/kmap_types.h>
<linux/highmem.h> is the placace the get the kmap type flags, asm-generic
files are generic implementations only to be used by architecture code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-10-15 00:21:10 +02:00
Christoph Hellwig
25556ae6b9 IB: remove xrc_remote_srq_num from struct ib_send_wr
The field is only initialized in mlx, but never used.

If we want to add proper XRC support it should be done with a new
struct ib_xrc_wr.

This shrinks the various WR structures by another 4 bytes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
2015-10-08 11:09:11 +01:00
Christoph Hellwig
e622f2f4ad IB: split struct ib_send_wr
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-10-08 11:09:10 +01:00
Sagi Grimberg
81fb5e26a9 IB/mlx5: Remove pa_lkey usages
Since mlx5 driver cannot rely on registration using the
reserved lkey (global_dma_lkey) it used to allocate a private
physical address lkey for each allocated pd.
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") just does it in the core layer so we can go ahead
and use that.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Sagi Grimberg
c6790aa9f4 IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Sagi Grimberg
b636401f0e mlx5: Fix incorrect wc pkey_index assignment for GSI messages
Since patch series "Demux IB CM requests in the rdma_cm module" the
P_Key index is taken from the work completion rather than the message
itself.

The HCA provides us with the message P_Key. In order to provide the
P_Key index, we need to look it up. Given that this is relevant only
for GSI messages (session establishments) which is less performance critical,
micro-optimize against the GSI (is_qp1) branch.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:50:06 -04:00
Haggai Eran
11d748045c IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in
its error flow even though there is never a case where the error flow
occurs with a valid MR pointer to destroy.

Remove the clean_mr() call and the incorrect comment above it.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding
support for MMU notifiers")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:42:54 -04:00
Jason Gunthorpe
b37c788f59 IB/mlx5: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:34 -04:00
Sagi Grimberg
b3778ba8de mlx5: Drop mlx5_ib_alloc_fast_reg_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:46 -04:00
Sagi Grimberg
9bee178b4f IB: Modify ib_create_mr API
Use ib_alloc_mr with specific parameters.
Change the existing callers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:44 -04:00
Sagi Grimberg
8b91ffc1cf IB/core: Get rid of redundant verb ib_destroy_mr
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.

And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).

And, fixup the only callers (iser/isert) accordingly.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:44 -04:00
Sagi Grimberg
18ebd40773 mlx4, mlx5, mthca: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:10 -04:00
Roland Dreier
d6c7276be1 IB/mlx5: Remove dead code from alloc_cached_mr()
The only place that assigns mr inside the loop already does a break.
So "if (mr)" will never be true here since the function initializes mr
to NULL at the top.  We can just drop the extra if and break here.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Eli Cohen  <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:47 -04:00
Sagi Grimberg
e0238a6a36 mlx5: Expose correct page_size_cap in device attributes
Should be all the page sizes that are supported by the
device.

Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:46 -04:00
Sagi Grimberg
a3c874200c mlx5: Fix missing device local_dma_lkey
The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.

Query and set this lkey when creating the device resources.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:46 -04:00
Ira Weiny
3b8ab700af IB/mad: Remove improper use of BUG_ON
We recently added BUG_ON's which were inappropriate for a condition which
should never happen. Change these to be WARN_ON_ONCE as a debugging aid.

Fixes: 4cd7c9479a ('IB/mad: Add support for additional MAD info to/from drivers')

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14 13:20:08 -04:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
Ira Weiny
4cd7c9479a IB/mad: Add support for additional MAD info to/from drivers
In order to support alternate sized MADs (and variable sized MADs on OPA
devices) add in/out MAD size parameters to the process_mad core call.

In addition, add an out_mad_pkey_index to communicate the pkey index the driver
wishes the MAD stack to use when sending OPA MAD responses.

The out MAD size and the out MAD PKey index are required by the MAD
stack to generate responses on OPA devices.

Furthermore, the in and out MAD parameters are made generic by specifying them
as ib_mad_hdr rather than ib_mad.

Drivers are modified as needed and are protected by BUG_ON flags if the MAD
sizes passed to them is incorrect.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:17 -04:00
Ira Weiny
337877a466 IB/core: Add ability for drivers to report an alternate MAD size.
Add max MAD size to the device immutable data set and have all drivers that
support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core.

Verify MAD size data in both the MAD core and when reading the immutable data.

OPA drivers will report alternate MAD sizes in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:17 -04:00
Matan Barak
2528e33e68 IB/core: Pass hardware specific data in query_device
Vendors should be able to pass vendor specific data to/from
user-space via query_device uverb. In order to do this,
we need to pass the vendors' specific udata.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:10 -04:00
Matan Barak
8e37210b38 IB/core: Change ib_create_cq to use struct ib_cq_init_attr
Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.

Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:10 -04:00
Matan Barak
bcf4c1ea58 IB/core: Change provider's API of create_cq to be extendible
Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.

This commit does not change any functionality.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:10 -04:00
Saeed Mahameed
facc9699f0 net/mlx5e: Fix HW MTU settings
Previously we configured HW MTU to be netdev->mtu, actually we
need to configure netdev->mtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN).

Also, query MTU can not fail, hence make the relevant helper a
void functionm, add mlx5e_set_dev_port_mtu, helper function to
handle MTU setting.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Haggai Abramonvsky
4aa17b2879 mlx5: Enable mutual support for IB and Ethernet
Ethernet functionality is only available when working in ISSI > 0 mode.

Previously, the IB driver wasn't ready to work on that mode, and hence
building both the IB driver and the Ethernet functionality in the core
driver were disallowed by Kconfigs.

Now, once we have all the pre-steps in place, we can remove this limitation.

The last steps in the IB driver for getting that setup to work are:
create dummy SRQ for the driver's use (until now we could use XRC_SRQ
as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from
basic SRQs) and adapt the create QP function to be compatible with ISSI > 0.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:02 -07:00
Majd Dibbiny
647241ea10 IB/mlx5: Don't create IB instance over Ethernet ports
Since we still don't have RoCE support in mlx5, avoid
creating IB driver instance over Ethernet ports.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:02 -07:00
Majd Dibbiny
1b5daf11b0 IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode
In ISSI > 0 mode, most of the MAD_IFC command features are deprecated, and can't
be used. Therefore, when in that mode, we replace all of them with other commands
that provide the required functionality.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:02 -07:00
Haggai Abramonvsky
01949d0109 net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0
When working in ISSI > 0 mode, the model exposed by the device for
XRCs and SRQs is different. XRCs use XRC SRQs and plain SRQs are based
on RPM (Receive Memory Pool).

Add helper functions to create, modify, query, and arm XRC SRQs and RMPs.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Ira Weiny
a97e2d86a9 IB/core cleanup: Add const on args - device->process_mad
The process_mad device function declares some parameters as "in".  Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02 09:33:13 -04:00
Amir Vadai
f62b8bb8f2 net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality
This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4
Single/Dual-Port Adapter supporting 100Gb/s with VPI.  The driver
extends the existing mlx5 driver with Ethernet functionality.

This patch contains the driver entry points but does not include
transmit and receive (see the previous patch in the series) routines.

It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the
Ethernet functionality. Currently, Kconfig is programmed to make
Ethernet and Infiniband functionality mutally exclusive.
Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of
selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND
being selected.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:51 -07:00
Saeed Mahameed
938fe83c8d net/mlx5_core: New device capabilities handling
- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
  generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
  cap).
- Modify IB driver to use new macros for checking caps.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:22 -07:00
Amir Vadai
64ffaa2159 net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memory
As David Daney pointed in mlx4_core driver [1], mlx5_core is also
misusing the DMA-API.

This patch is removing the code that vmap() memory allocated by
dma_alloc_coherent().

After this patch, users of this drivers might fail allocating resources
on memory fragmeneted systems.  This will be fixed later on.

[1] - https://patchwork.ozlabs.org/patch/458531/

CC: David Daney <david.daney@cavium.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:22:37 -07:00
Ira Weiny
f9b22e355d IB/core: Convert core to use bitfield for caps
Remove query_protocol callback

Use the new Core Capability bits for:

rdma_protocol_*
rdma_cap_ib_mad
rdma_cap_ib_smi
rdma_cap_ib_cm
rdma_cap_iw_cm
rdma_cap_ib_sa
rdma_cap_ib_mcast
rdma_cap_af_ib
rdma_cap_eth_ah

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20 12:38:43 -04:00
Ira Weiny
7738613e7c IB/core: Add per port immutable struct to ib_device
As of commit 5eb620c81c "IB/core: Add helpers for uncached GID and P_Key
searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in
the ib_device.

The per port core capability flags to be added later are also immutable data to
be stored in the ib_device object.

In preparation for this create a structure for per port immutable data and
place the pkey and gid table lengths within this structure.

"get_port_immutable" is added as a mandatory device function to allow the
drivers to fill in this data.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20 12:38:13 -04:00
Michael Wang
6b90a6d66b IB/Verbs: Implement new callback query_protocol()
Add new callback query_protocol() and implement for each HW.

Mapping List:
		node-type	link-layer	transport	protocol
nes		RNIC		ETH		IWARP		IWARP
amso1100	RNIC		ETH		IWARP		IWARP
cxgb3   	RNIC		ETH		IWARP		IWARP
cxgb4   	RNIC		ETH		IWARP		IWARP
usnic   	USNIC_UDP	ETH		USNIC_UDP	USNIC_UDP
ocrdma  	IB_CA		ETH		IB		IBOE
mlx4    	IB_CA		IB/ETH		IB		IB/IBOE
mlx5    	IB_CA		IB		IB		IB
ehca    	IB_CA		IB		IB		IB
ipath   	IB_CA		IB		IB		IB
mthca   	IB_CA		IB		IB		IB
qib     	IB_CA		IB		IB		IB

Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18 13:35:03 -04:00
Joe Perches
f4f01b542c infiniband: Remove duplicated KERN_<LEVEL> from pr_<level> uses
These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause
bad logging output so remove them.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-12 15:52:37 -04:00
Saeed Mahameed
64613d9499 net/mlx5_core: Extend struct mlx5_interface to support multiple protocols
Preparation for ethernet driver.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:43 -04:00
Saeed Mahameed
ce0f750932 net/mlx5_core: Modify arm CQ in preparation for upcoming Ethernet driver
Pass consumer index as a parameter to arm CQ

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:43 -04:00
Saeed Mahameed
233d05d28a net/mlx5_core: Move completion eqs from mlx5_ib to mlx5_core
Preparation for ethernet driver.
These functions will be used in drivers other than mlx5_ib.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Saeed Mahameed
6cf0a15f07 IB/mlx5: Fix Mellanox copyright note
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Saeed Mahameed
b812b5441e net/mlx5_core: Clear doorbell record inside mlx5_db_alloc()
Do it in one place instead of every where the function is invoked

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Eli Cohen
7bef7ad24b net/mlx5_core: Coding style fix
Put a line of space before return and next statement.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:40 -04:00
Haggai Abramonvsky
c3c6c9c810 net/mlx5_core: Fix call to mlx5_core_qp_modify
Pass 0 in the sqd_event parameter.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:40 -04:00
Linus Torvalds
b5ccb078c8 InfiniBand/RDMA changes for 3.20 merge window:
- Re-enable on-demand paging changes with stable ABI
  - Fairly large set of ocrdma HW driver fixes
  - Some qib HW driver fixes
  - Other miscellaneous changes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJU52nHAAoJEENa44ZhAt0hkycP/0vwYNl0JJadasSUrLm2AWje
 iAePU+K7uiyxSuIU+eLnyyDV/iQ2sCjXfhQNE6FnFhH3JjwBYbhS2Q8WcjwfRWYX
 iFhORQltb3spSeLdT7N1QfkMF4MtAw3ENPE3QKIgNaIQSso+J256BHrSiqr9Akwe
 hwIVr0TLDO59ggPs3c083uUhC+5AMViMVgRR+N9+/59WHz2vG2WsTunpg1sYOAgC
 KpUhfZW4za5xYJ0c8BOnPiSAfJasD1UdDg3oX0RPD4j/diiWAO6EOR+jkOLtODpd
 8uhD8HM7ZvCKE1io5QnVdjTR/n51NaVGHk+yfWJRSvV1sw1AALtU4NVniI+E5mFs
 Fwe+zUzbQ3j08TtrU0VjaeteBh2oGC0pJlkJS9+HXeDmH30/LQCMCiA5mBSSEPDg
 0cPEAJgGAZqOIyyZe8jSslW/iN0cE6FDDb8+/1AZ80IfbdMxXh9FVwi7cdXn8+OQ
 nXmtnSa7yzKWS9VVXDrvSzI0Y2oDv8DSDpCWGCm7bUcDXd/T5LpPR3RdSorGkYAr
 O2zmuJZlCdkuKgW91O7cztNj2hTlDnJ+e+5P05KwPOb86Muum6tphLJd5b1h970X
 Actx/6EX/ycSQ3ukiKW7Ksn2bYD3RLZvdbPYQM5xUjlGGWPF6nvmbZ8MqnyaLFfE
 WKvx39DMGq3ktofiMkbf
 =JLsF
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA updates from Roland Dreier:
 - Re-enable on-demand paging changes with stable ABI
 - Fairly large set of ocrdma HW driver fixes
 - Some qib HW driver fixes
 - Other miscellaneous changes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (43 commits)
  IB/qib: Add blank line after declaration
  IB/qib: Fix checkpatch warnings
  IB/mlx5: Enable the ODP capability query verb
  IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
  IB/core: Add support for extended query device caps
  RDMA/cxgb4: Don't hang threads forever waiting on WR replies
  RDMA/ocrdma: Fix off by one in ocrdma_query_gid()
  RDMA/ocrdma: Use unsigned for bit index
  RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit
  RDMA/ocrdma: Update the ocrdma module version string
  RDMA/ocrdma: set vlan present bit for user AH
  RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure
  RDMA/ocrdma: Add support for interrupt moderation
  RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac
  RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP
  RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE
  RDMA/ocrdma: Host crash on destroying device resources
  RDMA/ocrdma: Report correct state in ibv_query_qp
  RDMA/ocrdma: Debugfs enhancments for ocrdma driver
  RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device
  ...
2015-02-21 12:53:21 -08:00
Roland Dreier
147d1da951 Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 'qib' and 'srp' into for-next 2015-02-20 09:04:40 -08:00
Haggai Eran
1707cb4ab7 IB/mlx5: Enable the ODP capability query verb
Re-enable the on-demand paging capability query through the
extended query device verb.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-18 08:36:26 -08:00
Majd Dibbiny
7eae20db6a IB/mlx5: Update the dev in reg_create
When we create an MR using reg_create, the mlx5_ib_dev pointer is not
updated on the new MR. This results in a kernel panics for ODP MRs
while handling page faults, when the mlx5_ib_update_mtt function uses
the invalid device pointer.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 22:13:50 -08:00
Dan Carpenter
f614fc15ae IB/mlx5: Fix error code in get_port_caps()
The current code returns success when kmalloc() fails.  It should
return an error code, -ENOMEM.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-17 10:22:59 -08:00
Linus Torvalds
c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Yann Droneaud
43c6116573 Revert "IB/core: Add support for extended query device caps"
While commit 7e36ef8205 ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9a9
("IB/core: Add support for extended query device caps") and commit
860f10a799 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].

Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in <rdma/ib_user_verbs.h> [3].

Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312cfe ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a799 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9a9
("IB/core: Add support for extended query device caps").

[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
    http://mid.gmane.org/1418733236.2779.26.camel@opteya.com

[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
    http://mid.gmane.org/1423067503.3030.83.camel@opteya.com

[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
    http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com

Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-06 00:54:33 -08:00
Arnd Bergmann
7835bfb526 infiniband: mlx5: avoid a compile-time warning
The return type of find_first_bit() is architecture specific,
on ARM it is 'unsigned int', while the asm-generic code used
on x86 and a lot of other architectures returns 'unsigned long'.

When building the mlx5 driver on ARM, we get a warning about
this:

infiniband/hw/mlx5/mem.c: In function 'mlx5_ib_cont_pages':
infiniband/hw/mlx5/mem.c:84:143: warning: comparison of distinct pointer types lacks a cast
     m = min(m, find_first_bit(&tmp, sizeof(tmp)));

This patch changes the driver to use min_t to make it behave
the same way on all architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:21 -05:00
Haggai Eran
b4cfe447d4 IB/mlx5: Implement on demand paging by adding support for MMU notifiers
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
  cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
  send/receive/RDMA read/RDMA write, and UD send.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:04 -08:00
Haggai Eran
eab668a6d0 IB/mlx5: Add support for RDMA read/write responder page faults
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
7bdf65d411 IB/mlx5: Handle page faults
This patch implement a page fault handler (leaving the pages pinned as
of time being).  The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
6aec21f6a8 IB/mlx5: Page faults handling infrastructure
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.

The registered fault handler is empty in this patch, and only a later
patch implements it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
832a6b06ab IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.

Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.

In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
cc149f751b IB/mlx5: Changes in memory region creation to support on-demand paging
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.

* Instead of accepting a UMR bit telling the function to enable all
  access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
  correct list, and enable/disable the access flags per-page according
  to whether the page is present.
* A new bit is set to enable writing of access flags when using the
  firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.

In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
8cdd312cfe IB/mlx5: Implement the ODP capability query verb
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports.  At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
c1395a2a8c IB/mlx5: Add function to read WQE from user-space
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues.  The function will be used in a later
patch by the page-fault handling code in mlx5_ib.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran
968e78dd96 IB/mlx5: Enhance UMR support to allow partial page table update
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.

It also changes the way the UMR operation validates the memory
region's state.  When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran
21af2c3ebf IB/mlx5: Remove per-MR pas and dma pointers
Since UMR code now uses its own context struct on the stack, the pas
and dma pointers for the UMR operation that remained in the mlx5_ib_mr
struct are not necessary.  This patch removes them.

Fixes: a74d24168d ("IB/mlx5: Refactor UMR to have its own context struct")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Eli Cohen
d14e71103b mlx5: Fix error flow in add_keys
If mlx5_core_create_mkey fails, decrease the pending counter to undo the
previous increment.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:56 -05:00
Eli Cohen
6a4f139aae mlx5: Fix sparse warnings
1. Add required __acquire/__release statements to balance spinlock usage.
2. Change the index parameter of begin_wqe() to be unsigned to match supplied
argument type.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:56 -05:00
Al Viro
479163f460 mlx5: don't duplicate kvfree()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:58:18 -05:00
Linus Torvalds
2eb7f910c1 Main set of InfiniBand/RDMA updates for 3.18 merge window:
- Large set of iSER initiator improvements
  - Hardware driver fixes for cxgb4, mlx5 and ocrdma
  - Small fixes to core midlayer
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUQEczAAoJEENa44ZhAt0h3n8P/RqklU+JJiF1eWRgvdf3fOPC
 WDzOzdKUHvv3Lm5qSv8V6q23oYzf2QC/vjuZJgyM5156vj/qSf3iw1ueZTwYSQ3v
 B6bV7/ptSpBlxRx/sI9/ks5yqT869jww7QAO+wtvzuq7JDxQr+t4Yw1j3WOM8DGd
 F/rBFWJgLCD3zFSeJVY+AgZwIeDpNvBO0/QVnchs9iPUY0jSBvhDLsWegGhs92Uv
 wfeiV36f8hPVnbYVMV+xA2t9NkBV21r1sUK1l+CfPDgL/unDoXXqriuqb401l4cj
 zR4/Xwro9WzOC0gey2a5KkyX9wQUW9+Y4TnHRLnJ5shO/yzqxmc+/FMksxnaoWtI
 koF5LqyfXxNhq0VZBpoy+astY4vv4h34WlyBC2lxDCJBEw8VYzO+Wg9QJAzWOxlq
 JXtY9l9zRxfgTLe78xjl2n9LEeOysbYJemp3YFpZVh7a4JvUK4L9Kh9mXKZu8tqt
 zd7YniNNJSdDfF5+Gx1kSK4kE1r/89f04ED6hg/eIf/IqhNYJ/vD5joQuJ4RqQNx
 5G0wdCfKMy9cCwbx1/eCRJOuP+dbGR73UskgXc41s5/VtXCdq8JHvBcWw5CgxQ5I
 AYGUrCuu/ZsQSMNM1i7Pocz8m4uLlnpXF7Qv62ULt/NQJQHj5Xfdvb9M3BK0urIf
 +aBpf+LiZahaUEfo5eYt
 =+ABw
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/RDMA updates from Roland Dreier:
 - large set of iSER initiator improvements
 - hardware driver fixes for cxgb4, mlx5 and ocrdma
 - small fixes to core midlayer

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (47 commits)
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Make c4iw_wr_log_size_order static
  IB/core: Fix XRC race condition in ib_uverbs_open_qp
  IB/core: Clear AH attr variable to prevent garbage data
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: Remove a unused-label warning
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs
  RDMA/ocrdma: Add default GID at index 0
  IB/mlx5, iser, isert: Add Signature API additions
  Target/iser: Centralize ib_sig_domain setting
  IB/iser: Centralize ib_sig_domain settings
  IB/mlx5: Use extended internal signature layout
  IB/iser: Set IP_CSUM as default guard type
  IB/iser: Remove redundant assignment
  IB/mlx5: Use enumerations for PI copy mask
  ...
2014-10-19 12:29:23 -07:00
Roland Dreier
7b909bb49a Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next 2014-10-14 14:09:12 -07:00
Sagi Grimberg
78eda2bb65 IB/mlx5, iser, isert: Add Signature API additions
Expose more signature setting parameters. We modify the signature API
to allow usage of some new execution parameters relevant to data
integrity feature.

This patch modifies ib_sig_domain structure by:

- Deprecate DIF type in signature API (operation will
  be determined by the parameters alone, no DIF type awareness)
- Add APPTAG check bitmask (for input domain)
- Add REFTAG remap (increment) flag for each domain
- Add APPTAG/REFTAG escape options for each domain

The mlx5 driver is modified to follow the new parameters in HW
signature setup.

At the moment the callers (iser/isert) hard-code new parameters (by
DIF type). In the future, callers will retrieve them from the scsi
command structure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg
142537f4e5 IB/mlx5: Use extended internal signature layout
Rather than using the basic BSF layout which utilizes a pre-configured
signature settings (sufficient for current DIF implementation), we use
the extended BSF layout to expose advanced signature settings. These
settings will also be exposed to the user later.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Sagi Grimberg
fd22f78cf7 IB/mlx5: Use enumerations for PI copy mask
In case input and output space parameters match, we can use a copy
mask from input and output space.  Use enums for those.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:10:53 -07:00
Yishai Hadas
f39f86971c IB/mlx5: Modify to work with arbitrary page size
When dealing with umem objects, the driver assumed host page sizes
defined by PAGE_SHIFT.  Modify the code to use arbitrary page shift
provided by umem->page_shift to support different page sizes.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Eli Cohen
f83b42636a IB/mlx5: Remove duplicate code from mlx5_set_path
Some of the fields were set twice. Re-organize to avoid that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Eli Cohen
1c3ce90d0a IB/mlx5: Fix possible array overflow
The check to verify that userspace does not provide an invalid index to the
micro UAR was placed too late. Fix this by moving the check before using the
index.

Reported by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Eli Cohen
900a6d7917 IB/mlx5: Improve debug prints in mlx5_ib_reg_user_mr
Print access flags and error code from ib_umem_get.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Eli Cohen
eefd56e589 IB/mlx5: Clear umr resources after ib_unregister_device
Some ULPs may make use of resources created in create_umr_res so make sure to
call destroy_umrc_res after returning from ib_unregister_device, which makes
sure all ULPs have closed their resources.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:08:40 -07:00
Eli Cohen
c7a08ac7ee net/mlx5_core: Update device capabilities handling
Rearrange struct mlx5_caps so it has a "gen" field to represent the current
capabilities configured for the device. Max capabilities can also be queried
from the device. Also update capabilities struct to contain more fields as per
the latest revision if firmware specification.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:31 -07:00
Linus Torvalds
e3b1fd56f1 Main set of InfiniBand/RDMA updates for 3.17 merge window:
- MR reregistration support
  - MAD support for RMPP in userspace
  - iSER and SRP initiator updates
  - ocrdma hardware driver updates
  - other fixes...
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJT7N2GAAoJEENa44ZhAt0hUiIQAKBqYIjpB3QY6Z/B19mxDxku
 I81B3OkirumbAaCLoLckvq4gnwQ+BAD+YUmXgP08TCIgrABZAYYw+WvMEY9WNyQB
 x3Pv+BzX+wKKNaQkSnB9JVdku+BSI76eW0YYrIX0F0x1o0Jq9JpSkia91KmRvqmX
 YSFy2R7BjEZ4lfo/uscydHT26Q6EdT3od4iv48K8qq5rKdjtyYNgD/75DLCN599Z
 uI1f98e6Tl+7nHWaioQB61zlYPkNPLAnZtMrY2j4tarTwYwX1KhF3eV7z39L1l81
 nhMIXr+qBXtYuZw3I9rKqw3VVCCqB9e6E8FIA5K/d2jWqO+0TqIMUYOuZXuCezWg
 o+uGgwbDOweBqwrKRmiR2M0lk2I1Z16jBxYuaUBbLImG0/NPtuUB22t6RbPAojTa
 EjDkb9XBA7uFUMrYYnou+HxEzmJUYkin6wgGxtklYEKUqvh8G9ccGt6httWrCSrV
 mpjwJv+S4LdFM49wdP993lYthpCoZ42yxEzZ7zJ/KTt17/Wb5F4RtHIROGVFkHdT
 mo8RUz5DGDznfNQgn0m/jC3woFnMNpLOduI+CivhrwrWCwMpSUxIjqWu3263pIJ7
 +H0kNOKDAbp6+F27j+AVznlyXnaEtYyM8EZnysG1Hkz24gCSWBYo5Ep8eSH8LgY4
 9VPc75KVG6uddx5mhg5h
 =wOm0
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 "Main set of InfiniBand/RDMA updates for 3.17 merge window:

   - MR reregistration support
   - MAD support for RMPP in userspace
   - iSER and SRP initiator updates
   - ocrdma hardware driver updates
   - other fixes..."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits)
  IB/srp: Fix return value check in srp_init_module()
  RDMA/ocrdma: report asic-id in query device
  RDMA/ocrdma: Update sli data structure for endianness
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/uapi: Include socket.h in rdma_user_cm.h
  IB/srpt: Handle GID change events
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mad: Add user space RMPP support
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid multicast join attempts with invalid P_key
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid flushing the workqueue from worker context
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipath: Add P_Key change event support
  mlx4_core: Add support for secure-host and SMP firewall
  ...
2014-08-14 11:09:05 -06:00
Fabian Frederick
a8f731ebd1 IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12 22:00:58 -07:00
Jack Morgenstein
4d2f9bbb65 mlx5: Adjust events to use unsigned long param instead of void *
In the event flow, we currently pass only a port number in the
void *data argument.  Rather than pass a pointer to the event handlers,
we should use an "unsigned long" parameter, and pass the port number
value directly.

In the future, if necessary for some events, we can use the unsigned long
parameter to pass a pointer.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Jack Morgenstein
f241e7497e mlx5: minor fixes (mainly avoidance of hidden casts)
There were many places where parameters which should be u8/u16 were
integer type.

Additionally, in 2 places, a check for a non-null pointer was added
before dereferencing the pointer (this is actually a bug fix).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Jack Morgenstein
9603b61de1 mlx5: Move pci device handling from mlx5_ib to mlx5_core
In preparation for a new mlx5 device which is VPI (i.e., ports can be
either IB or ETH), move the pci device functionality from mlx5_ib
to mlx5_core.

This involves the following changes:
1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev
   is now an independent structure maintained by mlx5_core.
   mlx5_ib_dev now has a pointer to that struct.
   This requires changing a lot of places where the core_dev
   struct was accessed via mlx5_ib_dev (now, this needs to
   be a pointer dereference).
2. All PCI initializations are now done in mlx5_core. Thus,
   it is now mlx5_core which does pci_register_device (and not
   mlx5_ib, as was previously).
3. mlx5_ib now registers itself with mlx5_core as an "interface"
   driver. This is very similar to the mechanism employed for
   the mlx4 (ConnectX) driver. Once the HCA is initialized
   (by mlx5_core), it invokes the interface drivers to do
   their initializations.
4. There is a new event handler which the core registers:
   mlx5_core_event(). This event handler invokes the
   event handlers registered by the interfaces.

Based on a patch by Eli Cohen <eli@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30 14:00:06 -07:00
Or Gerlitz
652c1a0517 IB/mlx5: Enable "block multicast loopback" for kernel consumers
In commit f360d88a2e, we advertise blocking multicast loopback to both
kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB)
to use it with their UD QPs.  Fix that.

Fixes: f360d88a2e ("IB/mlx5: Add block multicast loopback support")
Reported-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-16 23:14:26 -07:00
Roland Dreier
6c9b5d9b00 IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits
We need to cast wr_id to unsigned long before casting to a pointer.
This fixes:

       drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_umr_cq_handler':
    >> drivers/infiniband/hw/mlx5/mr.c:724:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
          context = (struct mlx5_ib_umr_context *)wc.wr_id;

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28 09:23:03 -07:00
Yann Droneaud
43bc889380 IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.

So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be
aligned on a 8 bytes multiple, while for i386, such padding is not
added.

Tool pahole could be used to find such implicit padding:

  $ pahole --anon_include \
           --nested_anon_include \
           --recursive \
           --class_name mlx5_ib_create_srq \
           drivers/infiniband/hw/mlx5/mlx5_ib.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt    2014-03-28 11:43:07.386413682 +0100
  --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt  2014-03-27 13:06:17.788472721 +0100
  @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq {
          __u64                      db_addr;              /*     8     8 */
          __u32                      flags;                /*    16     4 */

  -       /* size: 20, cachelines: 1, members: 3 */
  -       /* last cacheline: 20 bytes */
  +       /* size: 24, cachelines: 1, members: 3 */
  +       /* padding: 4 */
  +       /* last cacheline: 24 bytes */
   };

ABI disagreement will make an x86_64 kernel try to read past
the buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will
refuse to read past the i386 userspace provided buffer and the
uverb will fail.

Anyway, if the structure lay in memory on a page boundary and
next page is not mapped, ib_copy_from_udata() will fail and the
uverb will fail.

This patch makes create_srq_user() takes care of the input
data size to handle the case where no padding was provided.

This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq
as sent by unpatched and patched i386 libmlx5.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:16 -07:00
Yann Droneaud
a8237b32a3 IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.

So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a
8 bytes multiple, while for i386, such padding is not added.

The tool pahole can be used to find such implicit padding:

  $ pahole --anon_include \
  	 --nested_anon_include \
  	 --recursive \
  	 --class_name mlx5_ib_create_cq \
  	 drivers/infiniband/hw/mlx5/mlx5_ib.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt    2014-03-28 11:43:07.386413682 +0100
  --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt  2014-03-27 13:06:17.788472721 +0100
  @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq {
          __u64                      db_addr;              /*     8     8 */
          __u32                      cqe_size;             /*    16     4 */

  -       /* size: 20, cachelines: 1, members: 3 */
  -       /* last cacheline: 20 bytes */
  +       /* size: 24, cachelines: 1, members: 3 */
  +       /* padding: 4 */
  +       /* last cacheline: 24 bytes */
   };

This ABI disagreement will make an x86_64 kernel try to read past the
buffer provided by an i386 binary.

When boundary check will be implemented, a x86_64 kernel will refuse
to read past the i386 userspace provided buffer and the uverb will
fail.

Anyway, if the structure lies in memory on a page boundary and next
page is not mapped, ib_copy_from_udata() will fail when trying to read
the 4 bytes of padding and the uverb will fail.

This patch makes create_cq_user() takes care of the input data size to
handle the case where no padding is provided.

This way, x86_64 kernel will be able to handle struct
mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:13 -07:00
Shachar Raindel
a74d24168d IB/mlx5: Refactor UMR to have its own context struct
Instead of having the UMR context part of each memory region, allocate
a struct on the stack.  This allows queuing multiple UMRs that access
the same memory region.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:09 -07:00
Haggai Eran
48fea837bb IB/mlx5: Set QP offsets and parameters for user QPs and not just for kernel QPs
For user QPs, the creation process does not currently initialize the fields:

 * qp->rq.offset
 * qp->sq.offset
 * qp->sq.wqe_shift

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:08 -07:00
Haggai Eran
b475598aec mlx5_core: Store MR attributes in mlx5_mr_core during creation and after UMR
The patch stores iova, pd and size during mr creation and after UMRs
that modify them.  It removes the unused access flags field.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:06 -07:00
Haggai Eran
8605933a22 IB/mlx5: Add MR to radix tree in reg_mr_callback
For memory regions that are allocated using reg_umr, the suffix of
mlx5_core_create_mkey isn't being called.  Instead the creation is
completed in a callback function (reg_mr_callback).  This means that
these MRs aren't being added to the MR radix tree.  Add them in the
callback.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Haggai Eran
096f7e72c6 IB/mlx5: Fix error handling in reg_umr
If ib_post_send fails when posting the UMR work request in reg_umr,
the code doesn't release the temporary pas buffer allocated, and
doesn't dma_unmap it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Sagi Grimberg
c7f44fbda6 mlx5_core: Copy DIF fields only when input and output space values match
Some DIF implementations (SCSI initiator/target) may want to use different
input/output values for application tag and/or reference tag. So in
case memory/wire domain values don't match HW must not copy them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:02 -07:00
Sagi Grimberg
5c273b1677 mlx5_core: Simplify signature handover wqe for interleaved buffers
No need for repetition format pattern in case the data and protection
are already interleaved in the memory domain since the pattern
already exists. A single key entry is sufficient and may save some
extra fetch ops.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:58 -07:00
Sagi Grimberg
8524867b9c mlx5_core: Fix signature handover operation for interleaved buffers
When the data and protection are interleaved in the memory domain, no
need to expand the mkey total length.

At the moment no Linux user works (iSER initiator & target) in
interleaved mode. This may change in the future as for SCSI
pass-through devices there is no real point in target performing
de-interleaving and re-interleaving of the protection data in the PT
stage. Regardless, signature verbs support this mode.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:54 -07:00
Eli Cohen
f360d88a2e IB/mlx5: Add block multicast loopback support
Add support for the block multicast loopback QP creation flag along
the proper firmware API for that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-10 18:43:32 -07:00
Linus Torvalds
877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Sagi Grimberg
2dea909444 IB/mlx5: Expose support for signature MR feature
Currently support only T10-DIF types of signature handover operations
(types 1|2|3).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg
d5436ba010 IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error CQE generated
by the HW (if happened).  The underlying mlx5 driver will handle
signature error completions and will mark the relevant memory region
as dirty.

Once the consumer gets the completion for the transaction, it must
check for signature errors on signature memory region using a new
lightweight verb ib_check_mr_status().

In case the user doesn't check for signature error (i.e. doesn't call
ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the
memory region cannot be used for another signature operation
(REG_SIG_MR work request will fail).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg
e6631814fb IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involves 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
    * In case the user registered a single MR for data so the UMR data segment
      consists of:
      - single klm (data MR) passed by the user
      - BSF with signature attributes requested by the user.
    * In case the user registered 2 MRs, one for data and one for protection,
      the UMR consists of:
      - strided block format which includes data and protection MRs and
        their repetitive block format.
      - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the memory domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

3. post SET_PSV in order to set the wire domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

* After this compound WR we place a small fence for next WR to come.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:39:51 -08:00
Sagi Grimberg
2ac45934f8 IB/mlx5: Remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes
other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM.  So remove it from
helper, and callers will add their own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
6e5eadace1 IB/mlx5: Break up wqe handling into begin & finish routines
As a preliminary step for signature feature which will require posting
multiple (3) WQEs for a single WR, we break post_send routine WQE
indexing into begin and finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
e1e66cc264 IB/mlx5: Initialize mlx5_ib_qp signature-related members
If user requested signature enable we initialize relevant mlx5_ib_qp
members.  We mark the qp as sig_enable and we increase the effective
SQ size, but still limit the user max_send_wr to original size
computed.  We also allow the create_qp routine to accept sig_enable
create flag.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
3121e3c441 mlx5: Implement create_mr and destroy_mr
Support create_mr and destroy_mr verbs.  Creating ib_mr may be done
for either ib_mr that will register regular page lists like
alloc_fast_reg_mr routine, or indirect ib_mrs that can register other
(pre-registered) ib_mrs in an indirect manner.

In addition user may request signature enable, that will mean that the
created ib_mr may be attached with signature attributes (BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Yishai Hadas
eeb8461e36 IB: Refactor umem to use linear SG table
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-04 10:34:28 -08:00
Amir Vadai
169a1d85d0 net,IB/mlx: Bump all Mellanox driver versions
Bump all Mellanox driver versions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 17:34:44 -05:00
Eli Cohen
0861565f50 IB/mlx5: Remove dependency on X86
Remove Kconfig dependency of mlx5_ib/mlx5_core on X86, since there is
no such dependency in reality.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 20:48:02 -08:00
Eli Cohen
1a4c3a3dc5 IB/mlx5: Don't set "block multicast loopback" capability
Currently Connect-IB does not support blocking multicast loopback, so
don't set IB_DEVICE_BLOCK_MULTICAST_LOOPBACK in the device caps.

Reported by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:09:48 -08:00
Eli Cohen
78c0f98cc9 IB/mlx5: Fix binary compatibility with libmlx5
Commit c1be5232d2 ("Fix micro UAR allocator") broke binary compatibility
between libmlx5 and mlx5_ib since it defines a different value to the number
of micro UARs per page, leading to wrong calculation in libmlx5. This patch
defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
and in case of old library we use uuarn 0 which works fine -- this is
acheived due to create_user_qp() falling back from high to medium then to
low class where low class will return 0.  For new libraries we use the
more sophisticated allocation algorithm.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Eli Cohen
9e65dc371b IB/mlx5: Fix RC transport send queue overhead computation
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Roland Dreier
fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Eli Cohen
57761d8df8 IB/mlx5: Verify reserved fields are cleared
Verify that reserved fields in struct mlx5_ib_resize_cq are cleared
before continuing execution of the verb. This is required to allow
making use of this area in future revisions.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00
Eli Cohen
9e9c47d07d IB/mlx5: Allow creation of QPs with zero-length work queues
The current code attmepts to call ib_umem_get() even if the length is
zero, which causes a failure. Since the spec allows zero length work
queues, change the code so we don't call ib_umem_get() in those cases.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:53 -08:00
Eli Cohen
bde51583f4 IB/mlx5: Add support for resize CQ
Implement resize CQ which is a mandatory verb in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:50 -08:00
Eli Cohen
3bdb31f688 IB/mlx5: Implement modify CQ
Modify CQ is used by ULPs like IPoIB to change moderation parameters.  This
patch adds support in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Eli Cohen
ada388f7af IB/mlx5: Make sure doorbell record is visible before doorbell
Put a wmb() to make sure the doorbell record is visible to the HCA before we
hit doorbell.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Matan Barak
dd5f03beb4 IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:20:54 -08:00
Eli Cohen
c1be5232d2 IB/mlx5: Fix micro UAR allocator
The micro UAR (uuar) allocator had a bug which resulted from the fact
that in each UAR we only have two micro UARs avaialable, those at
index 0 and 1.  This patch defines iterators to aid in traversing the
list of available micro UARs when allocating a uuar.

In addition, change the logic in create_user_qp() so that if high
class allocation fails (high class means lower latency), we revert to
medium class and not to the low class.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 13:54:23 -08:00
Eli Cohen
d9fe409163 IB/mlx5: Remove unused code in mr.c
The variable start in struct mlx5_ib_mr is never used. Remove it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 13:54:23 -08:00
Eli Cohen
cf1c5e1f1c IB/mlx5: Fix page shift in create CQ for userspace
When creating a CQ, we must use mlx5 adapter page shift.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:36 -08:00
Eli Cohen
7e2e19210a IB/mlx5: Remove dead code
The value of the local variable index is never used in reg_mr_callback().

Signed-off-by: Eli Cohen <eli@mellanox.com>

[ Remove now-unused variable delta too.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:14 -08:00
Eli Cohen
1b77d2bd75 mlx5: Use enum to indicate adapter page size
The Connect-IB adapter has an inherent page size which equals 4K.
Define an new enum that equals the page shift and use it instead of
using the value 12 throughout the code.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen
c2a3431e61 IB/mlx5: Update opt param mask for RTS2RTS
RTS to RTS transition should allow update of alternate path.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen
07c9113fe8 IB/mlx5: Remove "Always false" comparison
mlx5_cur and mlx5_new cannot have negative values so remove the
redundant condition.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen
2d036fad94 IB/mlx5: Remove dead code in mr.c
In mlx5_mr_cache_init() the size variable is not used so remove it to
avoid compiler warnings when running with make W=1.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:00 -08:00
Eli Cohen
bf0bf77f65 mlx5: Support communicating arbitrary host page size to firmware
Connect-IB firmware requires 4K pages to be communicated with the
driver. This patch breaks larger pages to 4K units to enable support
for architectures utilizing larger page size, such as PowerPC.  This
patch also fixes several places that referred to PAGE_SHIFT instead of
explicit 12 which is the inherent page shift on Connect-IB.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:00 -08:00
Moshe Lazer
cfd8f1d49b IB/mlx5: Fix srq free in destroy qp
On destroy QP the driver walks over the relevant CQ and removes CQEs
reported for the destroyed QP.  It also frees the related SRQ entry
without checking that this is actually an SRQ-related CQE.  In case of
a CQ used for both send and receive QP, we could free SRQ entries for
send CQEs.  This patch resolves this issue by verifying that this is a
SRQ related CQE by checking the SRQ number in the CQE is not zero.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen
1faacf82df IB/mlx5: Simplify mlx5_ib_destroy_srq
Make use of destroy_srq_kernel() to clear SRQ resouces.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen
9641b74ebe IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR
Make sure not to overflow when reading the page list from struct
ib_fast_reg_page_list.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen
746b5583c1 IB/mlx5: Multithreaded create MR
Use asynchronous commands to execute up to eight concurrent create MR
commands. This is to fill memory caches faster so we keep consuming
from there.  Also, increase timeout for shrinking caches to five
minutes.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen
51ee86a4af IB/mlx5: Fix check of number of entries in create CQ
Verify that the value is non negative before rounding up to power of 2.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:58 -08:00
Eli Cohen
5431390707 IB/mlx5: Ensure proper synchronization accessing memory
Call mlx5_ib_populate_pas() before mapping the DMA buffer to ensure
the hardware reads the values written by the CPU.

Found by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:24:00 -07:00
Eli Cohen
fe45f82704 IB/mlx5: Fix alignment of reg umr gather buffers
The hardware requires that gather buffers for UMR work requests be
aligned to 2K.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:59 -07:00
Sagi Grimberg
ada9f5d007 IB/mlx5: Fix eq names to display nicely in /proc/interrupts
It's helpful for a driver to put the pci slot name in its interrupt
names, so /proc/interrupts will show the pci slot of the device.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:59 -07:00
Eli Cohen
a4774e9095 IB/mlx5: Fix opt param mask according to firmware spec
Failed to configure opt mask to configure rre from init to rtr.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen
75959f56fe mlx5: Fix opt param mask for sq err to rts transition
Add missing entry in the table for UC transport.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen
81bea28ffd IB/mlx5: Disable atomic operations
Currently Atomic operations don't work properly.  Disable them for the
time being.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen
a0c84c326f IB/mlx5: Avoid async events on invalid port number
On a single ported Connect-IB, its possible for the firmware to issue
events on the non-existing 2nd port.  Make sure to ignore events
generated for such ports.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:57 -07:00
Eli Cohen
203099fd73 IB/mlx5: Decrease memory consumption of mr caches
Change the logic so we do not allocate memory nor map the device
before actually posting to the REG_UMR QP. In addition, unmap and free
the memory after we get completion.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:56 -07:00
Moshe Lazer
56e1ab0f13 IB/mlx5: Fix memory leak in mlx5_ib_create_srq
The patch fixes the rollback in case of failure in creating SRQ.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Moshe Lazer
3c4619114c IB/mlx5: Flush cache workqueue before destroying it
Destroying the workqueue without flushing it first can lead to a case
in which the kernel tries to push a delayed work to the workqueue
which does not exist anymore.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Eli Cohen
b125a54bfd IB/mlx5: Fix send work queue size calculation
1. Make sure wqe_cnt does not exceed the limit published by firmware.

2. There is no requirement that the number of outstanding work
   requests will be a power of two. Remove the ilog2 in the
   calculation of sq.max_post to fix that.

3. Add case for IB_QPT_XRC_TGT in sq_overhead and return 0 as XRC
   target QPs do not have a send queue.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Andi Shyti
618af3846b mlx5_core: Variable may be used uninitialized
In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size
will be used uninitialized.

Signed-off-by: Andi Shyti <andi@etezian.org>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:12:33 -07:00
Dan Carpenter
92b0ca7cb1 IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()
We don't set "resp.reserved".  Since it's at the end of the struct
that means we don't have to copy it to the user.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:12:08 -07:00
Wei Yongjun
281d1a9211 IB/mlx5: Fix error return code in init_one()
Fix to return a negative error code from the error handling case
instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:12:07 -07:00
Dan Carpenter
5e631a03af mlx5: Return -EFAULT instead of -EPERM
For copy_to/from_user() failure, the correct error code is -EFAULT not
-EPERM.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:48:45 -07:00
Roland Dreier
ad32b95f82 IB/mlx5: Make profile[] static in main.c
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-08 10:32:32 -07:00
Eli Cohen
e126ba97db mlx5: Add driver for Mellanox Connect-IB adapters
The driver is comprised of two kernel modules: mlx5_ib and mlx5_core.
This partitioning resembles what we have for mlx4, except that mlx5_ib
is the pci device driver and not mlx5_core.

mlx5_core is essentially a library that provides general functionality
that is intended to be used by other Mellanox devices that will be
introduced in the future.  mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>.
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-08 10:32:24 -07:00