Fix covertiy warning of passing "&flow_attr" to function
"create_flow_rule" which uses it as an array.
In addition pass flow attributes argument as const.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Saving the flow table priority object's pointer in the flow handle
is necessary for downstream patches since the sniffer flow table isn't
placed at the standard flow_db structure but in a different database.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Counters weren't updated due to raw packet QPs' traffic since the
counter-id was not associated with the QP. Added support for
associating the q-counter-id with the raw packet QP. The attachment
is done only when changing RQ raw packet QP state from RST to INIT
in modify-RQ command. FW support is required for the above, without
this support raw packet QP counters will not count.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Added a struct for modifying raw QP, this will allow modifying
multiple parameters in raw packet QP RQ and can also be used for
SQ in the future.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Expose RSS related capabilities on both IB and vendor channels.
In addition to the IB capabilities the driver reports some extra
capabilities on its vendor channel:
- Bit mask of the supported types of hash functions.
- Bit mask of the supported RX fields that can participate
in the RX hashing.
Those capabilities are applicable only when the link layer
is Ethernet.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
less unchecked, this moves the capability of creating a global rkey into
the RDMA core, where it can be easily audited. It also prints a warning
everytime this feature is used as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set the source mac address in the FTE when L2 specification
is provided.
Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
MAD_IFC command is supported only for physical functions (PF)
and when physical port is IB. The proposed fix enforces it.
Fixes: d603c809ef ("IB/mlx5: Fix decision on using MAD_IFC")
Reported-by: David Chang <dchang@suse.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/Kconfig
All conflicts were cases of overlapping commits.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove returning errors from mlx5 poll_cq function. Polling CQ
operation in kernel never fails by Mellanox HCA architecture and
respective driver design.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use TIR number based on selector, it should be done to differentiate
between RSS QP to RAW one.
Reported-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Return variable was set in a line before the
actual return was called in begin_wqe function.
This patch removes such variable and simplifies the code.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The returned value should be EINVAL, because it is caused by wrong
caller and not by internal overflow event.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The 2nd parameter of 'find_first_bit' is the number of bits to search.
In this case, we are passing 'sizeof(tmp)' which is likely to be 4 or 8
because 'tmp' is an 'unsigned long'.
It is likely that the number of bits of 'tmp' was expected here. So use
BITS_PER_LONG instead.
It has been spotted by the following coccinelle script:
@@
expression ret, x;
@@
* ret = \(find_first_bit \| find_first_zero_bit\) (x, sizeof(...));
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This file does not use any structs or functions defined by io-mapping.h
(nor does it directly use iomap, ioremap, iounamp or friends). Remove it
to simplify verification of changes to io-mapping.h
The include existed since its inception in
commit e126ba97db
Author: Eli Cohen <eli@mellanox.com>
Date: Sun Jul 7 17:25:49 2013 +0300
mlx5: Add driver for Mellanox Connect-IB adapters
which looks like a copy across from the Mellanox ethernet driver.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eli Cohen <eli@mellanox.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Replaced mlx5_query_port_proto_oper with separate functions per link
type. The functions should take different arguments so no point in
trying to unite them.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Now as all commands use mlx5 ifc interface, instead of doing two calls
for executing a command we embed command status checking into
mlx5_cmd_exec to simplify the interface.
Also we do here some cleanup for redundant software structures
(inbox/outbox) and functions and improved command failure output.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Prior to this patch we assumed that modify QP commands have the
same layout.
In ConnectX-4 for each QP transition there is a specific command
and their layout can vary.
e.g: 2err/2rst commands don't have QP context in their layout and before
this patch we posted the QP context in those commands.
Fortunately the FW only checks the suffix of the commands and executes
them, while ignoring all invalid data sent after the valid command
layout.
This patch removes mlx5_modify_qp_mbox_in and changes
mlx5_core_qp_modify to receive the required transition and QP context
with opt_param_mask if needed. This way the caller is not required to
provide the command inbox layout and it will be generated automatically.
mlx5_core_qp_modify will generate the command inbox/outbox layouts
according to the requested transition and will fill the requested
parameters.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Remove old representation of manually created QP/XRCD commands layout
amd use mlx5_ifc canonical structures and defines.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Remove old representation of manually created MKey/PSV commands layout,
and use mlx5_ifc canonical structures and defines.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Remove old representation of manually created CQ commands layout,
and use mlx5_ifc canonical structures and defines.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
- Updates/fixes for iw_cxgb4 driver
- Updates/fixes for mlx5 driver
- Add flow steering and RSS API
- Add hardware stats to mlx4 and mlx5 drivers
- Add firmware version API for RDMA driver use
- Add the rxe driver (this is a software RoCE driver that makes any
Ethernet device a RoCE device)
- Fixes for i40iw driver
- Support for send only multicast joins in the cma layer
- Other minor fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
kwy96TvYtT43SkyNmmBf
=oZOO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull base rdma updates from Doug Ledford:
"Round one of 4.8 code: while this is mostly normal, there is a new
driver in here (the driver was hosted outside the kernel for several
years and is actually a fairly mature and well coded driver). It
amounts to 13,000 of the 16,000 lines of added code in here.
Summary:
- Updates/fixes for iw_cxgb4 driver
- Updates/fixes for mlx5 driver
- Add flow steering and RSS API
- Add hardware stats to mlx4 and mlx5 drivers
- Add firmware version API for RDMA driver use
- Add the rxe driver (this is a software RoCE driver that makes any
Ethernet device a RoCE device)
- Fixes for i40iw driver
- Support for send only multicast joins in the cma layer
- Other minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
Soft RoCE driver
IB/core: Support for CMA multicast join flags
IB/sa: Add cached attribute containing SM information to SA port
IB/uverbs: Fix race between uverbs_close and remove_one
IB/mthca: Clean up error unwind flow in mthca_reset()
IB/mthca: NULL arg to pci_dev_put is OK
IB/hfi1: NULL arg to sc_return_credits is OK
IB/mlx4: Add diagnostic hardware counters
net/mlx4: Query performance and diagnostics counters
net/mlx4: Add diagnostic counters capability bit
Use smaller 512 byte messages for portmapper messages
IB/ipoib: Report SG feature regardless of HW UD CSUM capability
IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
IB/hfi1: Disable by default
IB/rdmavt: Disable by default
IB/mlx5: Fix port counter ID association to QP offset
IB/mlx5: Fix iteration overrun in GSI qps
i40iw: Add NULL check for puda buffer
i40iw: Change dup_ack_thresh to u8
i40iw: Remove unnecessary check for moving CQ head
...
The q-counter-id is given in modify-QP command associates
the QP with the counter. The offset to which the counter
ID was set is incorrect, causing IB port counters not to
count on QP.
Fixes: 0837e86a7a ('IB/mlx5: Add per port counters')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Number of outstanding_pi may overflow and as a result may indicate that
there are no elements in the queue. The effect of doing this is that the
MAD layer will get stuck waiting for completions. The MAD layer will
think that the QP is full - because it didn't receive these completions.
This fix changes it so the outstanding_pi number is increased
with 32-bit wraparound and is not limited to max_send_wr so
that the difference between outstanding_pi and outstanding_ci will
really indicate the number of outstanding completions.
Cc: Stable <stable@vger.kernel.org>
Fixes: ea6dc20362 ('IB/mlx5: Reorder GSI completions')
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And remove sysfs entry in favor of the common code.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Expose new counters using the get protocol stats callback.
We expose the following counters:
|------------------------------------------------------------------------|
| Name | IB | EN | Description |
|------------------------------------------------------------------------|
|rx_write_requests | + | - | Number of received WRITE requests for |
| | | | the associated QP. |
|------------------------------------------------------------------------|
|rx_read_requests | + | - | Number of received READ requests for |
| | | | the associated QP. |
|------------------------------------------------------------------------|
|rx_atomic_requests | + | - | Number of received ATOMIC requests for |
| | | | the associated QP. |
|------------------------------------------------------------------------|
|out_of_buffer | + | + | Number of drops occurred due to lack |
| | | | of WQE for the associated QPs/RQs. |
|------------------------------------------------------------------------|
|out_of_sequence | + | - | Number of errors in the packet |
| | | | transport sequence number |
|------------------------------------------------------------------------|
|duplicate_request | + | + | Number of received duplicated packets. |
| | | | A request that previously executed is |
| | | | named duplicated. |
|------------------------------------------------------------------------|
|rnr_nak_retry_err | + | + | Number of received RNR NAC packets. |
| | | | The QP retry limit did not exceed. |
|------------------------------------------------------------------------|
|packet_seq_err | + | + | Number of received NAK - sequence error|
| | | | packets. The QP retry limit did not |
| | | | exceed. |
|------------------------------------------------------------------------|
|implied_nak_err | + | + | Number of times the requester detected |
| | | | an ACK with a PSN larger than expected |
| | | | PSN for RDMA READ or ATOMIC response |
| | | | The QP retry limit did not exceed. |
|------------------------------------------------------------------------|
|local_ack_timeout_err| + | - | Number of NO ACK responses from |
| | | | responder within timer interval. |
| | | | The QP retry limit did not exceed. |
|------------------------------------------------------------------------|
Counters are available if all of them are supported.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to support statistics for ports, we attach
each QP to a counter set which is dedicate to this port.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently, the SRQ API uses the obsolete mlx5_*_srq_mbox_{in,out}
structs which limit the ability to pass the SRQ attributes between
net and IB parts of the driver.
This patch changes the SRQ API so as to use auto-generated structs
and provides a better way to pass attributes which will be in use by
coming features.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable mlx5 based hardware to report TCP segmentation offload (TSO)
capabilities from kernel to user space. A TSO enabled NIC will accept
big chunks of data with sizes greater than MTU for TCP traffic. The TSO
engine will break the data into separate packets and will insert headers
automatically.
The capabilities are exposed to user space through query_device by uhw
directly. The following capabilities are reported:
1. The maximum payload size in bytes supported for segmentation by TSO
engine.
2. Bitmap showing which QP types are supported by TSO operation. The bitmap
is built by members from 'enmu ib_qp_type'. For example, similar code
should be performed if UD QP is supported:
supported_qpts |= 1 << IB_QPT_UD;
To make user-space library aware of whether kernel supports uhw or not, a
new flag: cmds_supp_uhw will be returned back to user-space through
alloc_ucontext.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable flow steering for IPv6 traffic by using an IPv6 spec.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver exposes interfaces that directly relate to HW state.
Upon fatal error, consumers of these interfaces (ULPs) that rely
on completion of all their posted work-request could hang, thereby
introducing dependencies in shutdown order. To prevent this from
happening, we manage the relevant resources (CQs, QPs) that are used
by the device. Upon a fatal error, we now generate simulated
completions for outstanding WQEs that were not completed at the
time the HW was reset.
It includes invoking the completion event handler for all involved
CQs so that the ULPs will poll those CQs. When polled we return
simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs
to clean up their resources and not wait forever for completions
upon receiving remove_one.
The above change requires an extra check in the data path to make
sure that when device is in error state, the simulated CQEs will
be returned and no further WQEs will be posted.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Implements the IB core disassociate_ucontext API.
The driver detaches the HW resources for a given user context to
prevent a dependency between application termination and device
disconnect. This is done by managing the VMAs that were mapped
to the HW bars such as doorbell and blueflame. When need to detach,
remap them to an arbitrary kernel page returned by the zap API.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add support for Raw Ethernet RX HASH QP. Currently, creation and
destruction of such a QP are supported. This QP is implemented as
a simple TIR object which points to the receive RQ indirection table.
The given hashing configuration is used to configure the TIR and by
that it chooses the right RQ from the RQ indirection table.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some mlx5 based hardwares support a RQ table object. This RQ table
points to a few RQ objects. We implement the receive work queue
indirection table API (create and destroy) by using this hardware
object.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.
WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple
Receive Work Queues
Receive WQs are implemented by RQs.
Implement the WQ creation, modification and destruction verbs.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
port_xmit_data is written instead of port_rcv_data.
Fixes: 3efd9a1121 ('IB/mlx5: Modify MAD reading counters method to use counter registers')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the caller specified IB_SEND_FENCE in the send flags of the work
request and no previous work request stated that the successive one
should be fenced, the work request would be executed without a fence.
This could result in RDMA read or atomic operations failure due to a MR
being invalidated. Fix this by adding the mlx5 enumeration for fencing
RDMA/atomic operations and fix the logic to apply this.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path
including fields alt_pkey_index and alt_timeout.
Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when
calling mlx5_set_path for the alternate path to force setting the
alt_pkey_index and alt_timeout values.
Fixes: bf24481a3a7c4 ('IB/mlx5: Consider alternate path in pkey ...')
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pkey index fields in the QP context path record are extended to 16
bits, as required by IB spec (version 1.3).
This change affects all QP commands which include path records.
To enable this change, moved the free adaptive routing flag bit
(free_ar) to the most significant byte of the QP path record.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB ...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Verify that number of entries is less than device capability.
Add an appropriate warning message for error flow.
Fixes: bde51583f4 ('IB/mlx5: Add support for resize CQ')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Number of entries shouldn't be greater than the device's max
capability. This should be checked before rounding the entries number
to power of two.
Fixes: 51ee86a4af ('IB/mlx5: Fix check of number of entries...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
BlueFlame support is reported only for PFs when the HCA capability is
on.
Fixes: 938fe83c8d ('net/mlx5_core: New device capabilities...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When PAGE_SIZE is larger than 4K, the user shouldn't be able to query
the HCA core clock. This counter is within 4KB boundary and the
user-space shall not read information that's after this boundary.
Fixes: b368d7cb8c ('IB/mlx5: Add hca_core_clock_offset to...')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a 4-digit padding to show FW version in proper format.
Fixes: 9603b61de1 ('mlx5: Move pci device handling from...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
FW port-change events are fired on Active <-> non Active port state
transitions only.
When the port state changes from Active to Initializing (Active ->
Down -> Initializing), a single event is fired.
The HCA transitions from Down to Initializing unless prevented from
doing so, hence the driver should also propagate events when the port
state is Initializing to consumers so they'll be aware that the port
is no longer Active and act accordingly.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Flow steering is supported by mlx5 device when the following
features are supported by firmware:
1. NIC RX flow table.
2. Device has enough flow steering levels.
3. Atomic modification of flow table entry.
4. Flow tables chaining.
To check if flow steering is supported it's enough to check
if the driver opened the mlx5 bypass namespace.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Updates to the new Intel X722 iWARP driver
- Updates to the hfi1 driver
- Fixes for the iw_cxgb4 driver
- Misc core fixes
- Generic RDMA READ/WRITE API addition
- SRP updates
- Misc ipoib updates
- Minor mlx5 updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXPIAnAAoJELgmozMOVy/dEYUP/0A6NH5ptzUrwPuLrWoz8h+e
KBfJE7H09mfKBx0Rq8YmnU+pz4lk8vrMLLaqpbGN57mwO0a1lK9bgc3E6KUhQPhc
dpGEX/NG1+aILomD7M4l1yAKkG17kxFLD75cLCeaxhO76jBRWsunukqk5mT/u0EG
fUYZs1fRb9t2LDtTNhPfSFR1+dgP5S17xLhpl9ttn87hTmIuiGWR6ig2nTC7azZD
G0d7RVjohfY2sDD28YgQiUEJ+q+1ymp3XTaCZhPCVl9VCRPweEdtLKcbNWZIvClx
ewuTCgADXg8tAL/6zu6bEKqZlC17UrmVJee3csKQLT09PJkSEICeFR/ld6ONUKAF
nDhi3ySa75Xxg9VDcCnuRkXKK+/zi7oDelZuh9mvMG0JJqPK9rTZDD29j2kBf7C0
bdx4R5cI4KJWQ/GlCyi/nLiuYkmAiCugzcGnRho4ub+EJ0yX1w6n8KVYr37kFsFu
q6MCnEfArEgDpbq1wo0+9MWtqBYrnOI/XtG81Zd+6X2MW975qU85wUdUSjg6OOb1
v1osyAmFDy9A0Y80yY+l1HHrSVIvI0IAWZDfxsbCLQY8O03ZNcvxE2RsrzWd5CKL
iZsX24tjV0WR9+lORHLfAKB3DL9CcfHv/tHo7q+5iAHmIuWZGrEN22ELkwS/4X7x
d/V0XDjzs6lgQeTJ7R4B
=e+Zv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Primary 4.7 merge window changes
- Updates to the new Intel X722 iWARP driver
- Updates to the hfi1 driver
- Fixes for the iw_cxgb4 driver
- Misc core fixes
- Generic RDMA READ/WRITE API addition
- SRP updates
- Misc ipoib updates
- Minor mlx5 updates"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits)
IB/mlx5: Fire the CQ completion handler from tasklet
net/mlx5_core: Use tasklet for user-space CQ completion events
IB/core: Do not require CAP_NET_ADMIN for packet sniffing
IB/mlx4: Fix unaligned access in send_reply_to_slave
IB/mlx5: Report Scatter FCS device capability when supported
IB/mlx5: Add Scatter FCS support for Raw Packet QP
IB/core: Add Scatter FCS create flag
IB/core: Add Raw Scatter FCS device capability
IB/core: Add extended device capability flags
i40iw: pass hw_stats by reference rather than by value
i40iw: Remove unnecessary synchronize_irq() before free_irq()
i40iw: constify i40iw_vf_cqp_ops structure
IB/mlx5: Add UARs write-combining and non-cached mapping
IB/mlx5: Allow mapping the free running counter on PROT_EXEC
IB/mlx4: Use list_for_each_entry_safe
IB/SA: Use correct free function
IB/core: Fix a potential array overrun in CMA and SA agent
IB/core: Remove unnecessary check in ibnl_rcv_msg
IB/IWPM: Fix a potential skb leak
RDMA/nes: replace custom print_hex_dump()
...
Previously, mlx5_ib_cq_comp was executed from interrupt context.
Under heavy load, this could cause the CPU core to be in an interrupt
context too long.
Instead of executing the handler from the interrupt context we
execute it from a much friendly tasklet context.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Report Scatter FCS support when the Firmware supports as well.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable Scatter FCS in the RQ context when the user passes
Scatter FCS create flag.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
By this patch, the user space library will be able to improve
performance using appropriate ringing DoorBell method according to the
memory type it asked for.
Currently only one mapping command is allowed for UARs:
MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the
UARs to write-combining (WC) if the system supports it.
If the system is not supporting WC the UARs are mapped to
non-cached(NC). In this case the user space library can't tell which
mapping is applied.
This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and
MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as
requested and fails if it can't.
Since there is no generic way to check if the requested memory region
can be mapped as WC, driver enables conclusive WC mapping only for
x86, PowerPC and ARM which support WC for the device's memory region.
Signed-off-by: Guy Levy <guyle@mellanox.com>
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current mlx5 code disallows mapping the free running counter of
mlx5 based hardwares when PROT_EXEC is set.
Although this behaviour is correct, Linux does add an implicit VM_EXEC
to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process
personality. This happens for example if the process stack is
executable.
This causes libmlx5 to output a warning and prevents the user from
reading the free running clock.
Executing the init segment of the hardware isn't a security risk
(at least no more than executing a process own stack), so we just
prevent writes to there.
Fixes: d69e3bcf79 ('IB/mlx5: Mmap the HCA's core clock register to
user-space')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The SRP initiator allows to set max_sectors to a value that exceeds
the largest amount of data that can be mapped at once with an mlx4
HCA using fast registration and a page size of 4 KB. Hence modify
ib_map_mr_sg() such that it can map partial sg-elements. If an
sg-element has been mapped partially, let the caller know
which fraction has been mapped by adjusting *sg_offset.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
- A number of collected fixes for oopses, memory corruptions, deadlocks,
etc. All of these fixes are small (many only 5-10 lines), obvious,
and tested.
- Fix for the security issue related to the use of write for
bi-directional communications.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXIrZVAAoJELgmozMOVy/d/XEP/1A4Ohm7WiZMN09wvlFGSgLe
2z2tY9ILvFuiAF++VZRfYyRmHorVKHYB1tk0JTsW1Ts1DrkjExgr4LS1/YDLOC42
q8YlBZw2x7pdnD5W1MJm+HK6oNj7aZVVjEHG7QnfLUIXr57a2rBQIeeWLx24M+OS
j1yvaY/v39qvf7dwHwVjs07rh2WW9QCZn2c/552G4xz1YDdTkYBTc2WNnl0eng1f
1NqqMXhnajmNyR+Q+0+Vbcp4YWv551l5E6j9M+5nebehNtPSRb0GEIjxT3KnSGEg
AjFev3XwnRF9EkQOwgbsg7a784+UHXe15vbr1MvbzGygcQeq4NVzLl04WEHExQe1
Om0ES/i8zfRs6d5XYB5zMY8pJbdjSVM/20d+h21SQs//4JXXJrN35WVAyy8lgwrX
M3oY4t21eBQlV7oezfEZQgEEbdtccr8LILfZZmRUWPHd2ymaTWg6e4pZwtn45rlD
O/Gb11G/UT7SXgw+XiPLBj5xlQk7nRn0kGuaStR7PonkLQ9Zy1wSSptvJeGj0VWE
W6TEJnIqtv0aiJLhIQn8Ee1pCxE/ds7UPW6wT5O9R8ccEdDeIB3BDgskTNg1xPuS
I8e1o7iA9752YS3wMDhLA8PifwbmVbkGHxJUQecOBPiDcdukkM0q4/YoNYqXKLCc
nLDv3fMztpinG1L1T2LM
=MFdC
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Final set of -rc fixes for 4.6.
I've collected up a number of patches that are all pretty small with
the exception of only a couple. The hfi1 driver has a number of
important patches, and it is what really drives the line count of this
pull request up. These are all small and I've got this kernel built
and running in the test lab (I have most of the hardware, I think nes
is the only thing in this patch set that I can't say I've personally
tested and have up and running).
Summary:
- A number of collected fixes for oopses, memory corruptions,
deadlocks, etc. All of these fixes are small (many only 5-10
lines), obvious, and tested.
- Fix for the security issue related to the use of write for
bi-directional communications"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/nes: don't leak skb if carrier down
IB/security: Restrict use of the write() interface
IB/hfi1: Use kernel default llseek for ui device
IB/hfi1: Don't attempt to free resources if initialization failed
IB/hfi1: Fix missing lock/unlock in verbs drain callback
IB/rdmavt: Fix send scheduling
IB/hfi1: Prevent unpinning of wrong pages
IB/hfi1: Fix deadlock caused by locking with wrong scope
IB/hfi1: Prevent NULL pointer deferences in caching code
MAINTAINERS: Update iser/isert maintainer contact info
IB/mlx5: Expose correct max_sge_rd limit
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
iw_cxgb4: handle draining an idle qp
iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping
iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping
IB/core: Don't drain non-existent rq queue-pair
IB/core: Fix oops in ib_cache_gid_set_default_gid
Currently, consumers of the flow steering infrastructure can't
choose their own flow table levels and are limited to one
flow table per level. This just waste levels.
Instead, we introduce here the possibility to use multiple
flow tables in a level. The user is free to connect these
flow tables, while following the rule (FTEs in FT of level x
could only point to FTs of level y where y > x).
In addition this patch switch the order of the create/destroy
flow tables of the NIC(vlan and main).
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch that added a couple of callback functions put
the declarations inside of an #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING,
which causes the build to fail if that option is disabled:
drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_add':
drivers/infiniband/hw/mlx5/main.c:2358:31: error: 'mlx5_ib_get_vf_config' undeclared (first use in this function)
This moves the four declarations below the #ifdef section so they
are always available.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: eff901d30e ("IB/mlx5: Implement callbacks for manipulating VFs")
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- A few minor core fixups needed for the next patch series
- The IB SRIOV series. This has bounced around for several versions.
Of note is the fact that the first patch in this series effects
the net core. It was directed to netdev and DaveM for each iteration
of the series (three versions total). Dave did not object, but did
not respond either. I've taken this as permission to move forward
with the series.
- The new Intel X722 iWARP driver
- A huge set of updates to the Intel hfi1 driver. Of particular interest
here is that we have left the driver in staging since it still has an
API that people object to. Intel is working on a fix, but getting
these patches in now helps keep me sane as the upstream and Intel's
trees were over 300 patches apart.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK
H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ
Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r
WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w
iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR
c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2
zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv
cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx
VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7
5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C
arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl
2IRxQ+ltNEscb2cwi5wE
=t2Ko
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"Round two of 4.6 merge window patches.
This is a monster pull request. I held off on the hfi1 driver updates
(the hfi1 driver is intimately tied to the qib driver and the new
rdmavt software library that was created to help both of them) in my
first pull request. The hfi1/qib/rdmavt update is probably 90% of
this pull request. The hfi1 driver is being left in staging so that
it can be fixed up in regards to the API that Al and yourself didn't
like. Intel has agreed to do the work, but in the meantime, this
clears out 300+ patches in the backlog queue and brings my tree and
their tree closer to sync.
This also includes about 10 patches to the core and a few to mlx5 to
create an infrastructure for configuring SRIOV ports on IB devices.
That series includes one patch to the net core that we sent to netdev@
and Dave Miller with each of the three revisions to the series. We
didn't get any response to the patch, so we took that as implicit
approval.
Finally, this series includes Intel's new iWARP driver for their x722
cards. It's not nearly the beast as the hfi1 driver. It also has a
linux-next merge issue, but that has been resolved and it now passes
just fine.
Summary:
- A few minor core fixups needed for the next patch series
- The IB SRIOV series. This has bounced around for several versions.
Of note is the fact that the first patch in this series effects the
net core. It was directed to netdev and DaveM for each iteration
of the series (three versions total). Dave did not object, but did
not respond either. I've taken this as permission to move forward
with the series.
- The new Intel X722 iWARP driver
- A huge set of updates to the Intel hfi1 driver. Of particular
interest here is that we have left the driver in staging since it
still has an API that people object to. Intel is working on a fix,
but getting these patches in now helps keep me sane as the upstream
and Intel's trees were over 300 patches apart"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
IB/ipoib: Allow mcast packets from other VFs
IB/mlx5: Implement callbacks for manipulating VFs
net/mlx5_core: Implement modify HCA vport command
net/mlx5_core: Add VF param when querying vport counter
IB/ipoib: Add ndo operations for configuring VFs
IB/core: Add interfaces to control VF attributes
IB/core: Support accessing SA in virtualized environment
IB/core: Add subnet prefix to port info
IB/mlx5: Fix decision on using MAD_IFC
net/core: Add support for configuring VF GUIDs
IB/{core, ulp} Support above 32 possible device capability flags
IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
net/mlx5_core: Introduce offload arithmetic hardware capabilities
net/mlx5_core: Refactor device capability function
net/mlx5_core: Fix caching ATOMIC endian mode capability
ib_srpt: fix a WARN_ON() message
i40iw: Replace the obsolete crypto hash interface with shash
IB/hfi1: Add SDMA cache eviction algorithm
IB/hfi1: Switch to using the pin query function
IB/hfi1: Specify mm when releasing pages
...
Implement the IB defined callbacks used to manipulate the policy for the
link state, set GUIDs or get statistics information. This functionality
is added into a new file that will be used to add any SRIOV related
functionality to the mlx5 IB layer.
The following callbacks have been added:
mlx5_ib_get_vf_config
mlx5_ib_set_vf_link_state
mlx5_ib_get_vf_stats
mlx5_ib_set_vf_guid
In addition, publish whether this device is based on a virtual function.
In mlx5 supported devices, virtual functions are implemented as vHCAs.
vHCAs have their own QP number space so it is possible that two vHCAs
will use a QP with the same number at the same time.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to
query counters of virtual functions. Also update current users of the
API.
PFs may call mlx5_core_query_vport_counter with other_vport set to
indicate that they are querying a virtual function. The virtual
function to be queried is given by the vf parameter. Virtual function
numbering is zero based so the first VF is 0 and so on. When a PF
queries its own function, the other_vport parameter is cleared.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix the condition that dictates when MAD_IFC should be used. According
to firmware specifications, MAD_IFC commands must be used only if the
ib_virt capability is off.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
Each bypass flow steering priority will be split into two priorities:
1. Priority for don't trap rules.
2. Priority for normal rules.
When user creates a flow using IB_FLOW_ATTR_FLAGS_DONT_TRAP flag, the
driver creates two flow rules, one used for receiving the traffic and
the other one for forwarding the packet to continue matching in lower
or equal priorities.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate proper context for arbitrary scatterlist registration
If ib_alloc_mr is called with IB_MR_MAP_ARB_SG, the driver
allocate a private klm list instead of a private page list.
Set the UMR wqe correctly when posting the fast registration.
Also, expose device cap IB_DEVICE_MAP_ARB_SG according to the
device id (until we have a FW bit that correctly exposes it).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
While documentation indicates that the number of translation
entries per memory key is unlimited, in practice, we can
only fit a finite amount of translation entries in a single
registration wqe (which is log_max_klm_list_size).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
These three related functions can't agree whether to put the
umrwr on the stack dirty and then memset it, or to initialize
it on the stack. Make them all agree.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Simplifies the code, and makes it more fair vs other users by using a
softirq for polling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently, the inlen field of the vendor's part of the command
doesn't match the command buffer. This happens because the inlen
accommodates ib_uverbs_cmd_hdr which is deducted from the in buffer.
This is problematic since the vendor function could be called either
from the legacy verb (where the input length mismatches the actual
length) or by the extended verb (where the length matches). The vendor
has no idea which function calls it and therefore has no way to know
how the length variable should be treated.
Fixing this by aligning the inlen to the correct length.
All vendor drivers either assumed that inlen >= sizeof(vendor_uhw_cmd)
or just failed wrongly (mlx5) and fixed in this patch.
Fixes: cfb5e088e2 ('IB/mlx5: Add CQE version 1 support to user QPs and SRQs')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Normal SRQs, unlike XRC SRQs, don't have user-index, therefore
avoid verifying it and using it.
Fixes: cfb5e088e2 ('IB/mlx5: Add CQE version 1 support to user QPs and SRQs')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds user-space support for memory windows allocation and
deallocation. It also exposes the supported types via
query_device_caps verb.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Tested-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mlx5's mkey mechanism is also used for memory windows.
The current code base uses MR (memory region) naming, which is
inaccurate. Changing MR to mkey in order to represent its different
usages more accurately.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds support for re-registration of memory regions in MLX5.
The functionality is basically the same as deregister followed by
register, but attempts to reuse the existing resources as much as
possible.
Original memory keys are kept if possible, saving the need to
communicate new ones to remote peers.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to add re-registration of memory region, some logic was
extracted to separate functions:
- ODP related logic.
- Some of the UMR WQE preparation code.
- DMA mapping.
- Umem creation.
- Creating MKey using FW interface.
- MR fields assignments after successful creation.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Now that the transmission of GSI MADs is done with the special transmission
QPs, eliminate the send buffers in the GSI receive QP.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pick the QP to use according to the wr.ud.pkey_index field in the work
request. If the QP doesn't exist, it means the P_Key is zero and the packet
would have been dropped, so just generate a completion and move on.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The emulated GSI QP's send completions are generated by multiple hardware
QPs, so their completions could arrive out of order with respect to the
order their work request were submitted.
Reorder the completions by keeping a list of the posted work request and
their completions. A newly received completion from the hardware updates
the list and marks its work request as completed. However, the completions
are only reported to the client according to the list order.
In order to support that, create a new private CQ to handle the hardware
completions.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The GSI QP emulation requires also emulating completions for transmitted
MADs. The CQ on which these completions are generated can also be used by
the hardware, and the MAD layer is free to use any CQ of the device for the
GSI QP.
Add a method for generating software completions to each mlx5 CQ. Software
completions are polled first, and generate calls to the completion handler
callback if necessary.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Whenever the P_Key table is changed, we create the required GSI
transmission QPs on-demand.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to send GSI MADs on different P_Keys, mlx5 needs different QPs to
be created, each with a different P_Key set when the QP is modified to the
INIT state.
Create QPs for each non-zero P_Key in the P_Key table.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
mlx5 creates special GSI QPs that has limited ability to control the P_Key
of transmitted packets. The sent P_Key is taken from the QP object,
similarly to what happens with regular UD QPs.
Create a software wrapper around GSI QPs that with the following patches
will be able to emulate the functionality of a GSI QP including control of
the P_Key per work request.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add debugging prints to the modify QP verb to help understand the cause a
returned error.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to create multiple GSI QPs, we need to set the source QP number to
one on all these QPs. Add the necessary definitions and infrastructure to
do that.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver checks the csum from the HW when completion arrived and marks
it in the wc->wc_flags field for the ulp drivers.
These is for packets from type IB_WC_RECV only.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to support LSO and CSUM in the TX flow the driver does the
following:
* LSO bit for the enum mlx5_ib_qp_flags was added, indicates QP that
supports LSO offloads.
* Enables the special offload when the QP is created, and enable the
special work request id (IB_WR_LSO) when comes.
* Calculates the size of the WQE according to the new WQE format that
support these offloads.
* Handles the new WQE format when arrived, sets the relevant
fields, and copies the needed data.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Modify mlx5_ib_process_mad to use PPCNT and query_vport commands
instead of MAD_IFC, as MAD_IFC is deprecated on new firmware
versions (and doesn't support RoCE anyway).
Traffic counters exist in both 32-bit and 64-bit forms.
Declaring support of extended coutners results in traffic counters
to be read in their 64-bit form only via the query_vport command.
Error counters exist only in 32-bit form and read via PPCNT command.
This commit also adds counters support in RoCE.
Signed-off-by: Meny Yossefi <menyy@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All the device physical port access functions are implemented in the
port.c file.
We just extract the exposure of these functions from driver.h into a
dedicated header file called port.h.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.
The ATOMIC and UMR segments can't coexist together, so chose maximum out
of them.
The commit 9e65dc371b ("IB/mlx5: Fix RC transport send queue overhead
computation") was intended to update RC transport as commit messages
states, but added the code to UC transport.
Fixes: 9e65dc371b ("IB/mlx5: Fix RC transport send queue overhead computation")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
MLX5_GET64 was used on end_padding_mode, which is a 2-bit field.
This is wrong as the calculated offset is incorrect. Using MLX5_GET
instead of MLX5_GET64 to fix that.
Fixes: 0fb2ed66a1 ('IB/mlx5: Add create and destroy functionality
for Raw Packet QP')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When a Raw Ethernet QP is created, a NULL pointer PD could be used.
Fixing that by only using the PD after validating it's valid.
smatch also reported this error:
drivers/infiniband/hw/mlx5/qp.c:1629 mlx5_ib_create_qp()
error: we previously assumed 'pd' could be null (see line 1616)
Fixes: 0fb2ed66a1 ('IB/mlx5: Add create and destroy functionality for Raw Packet QP')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Older libraries that don't have all the new req_v2 fields
should be able to work as well. Today, if the library uses v2, it
will fail to allocate context since the size of reqlen is smaller
than the req_v2 size.
Fix the validation to be with the original req_v2 size and not
the current.
Fixes: f72300c56c ('IB/mlx5: Expose CQE version to user-space')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5_ib driver supports the extended create_cq and create_qp user
verbs. In the current mechanism, a vendor supporting an extended uverb
should set the appropriate bit in the uverbs_ex_cmd_mask field.
Adding the actual support by setting the required bits in order to
support features like completion time-stamping and cross-channel.
Fixes: 972ecb8213 ('IB/mlx5: Add create_cq extended command')
Fixes: ddf9529be1 ('IB/core: Allow setting create flags in QP init
attribute')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed through the
RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
acknowledged by Bruce)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
KEMcM2we4bz+uyKMjEAD
=5oO7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
The create_cq() can receive creation flags which were used
differently by two commits which added create_cq extended
command and cross-channel. The merged code caused to not
accept any flags at all.
This patch unifies the check into one function and one return
error code.
Fixes: 972ecb8213 ("IB/mlx5: Add create_cq extended command")
Fixes: 051f263098 ("IB/mlx5: Add driver cross-channel support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Added Raw Packet QP modify functionality which will enable user
space consumers to use it.
Since Raw Packet QP is built of SQ and RQ sub-objects, therefore
Raw Packet QP state changes are implemented by changing the state
of the sub-objects.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When modifying a QP, the desired operation was determined in
the mlx5_core using a transition table that takes the current
state, the final state, and returns the desired operation.
Since this logic will be used for Raw Packet QP, move the
operation table to the mlx5_ib.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the user changes the Address Vector(AV) in the modify QP, he
provides an SL. This SL should be translated to Ethernet Priority
by taking the 3 LSB bits, and modify the QP's TIS according to this
Ethernet priority.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since Raw Packet QP is composed of RQ and SQ, the IB QP's
state is derived from the sub-objects. Therefore we need
to query each one of the sub-objects, and decide on the
IB QP's state.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds support for Raw Packet QP for the mlx5 device.
Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp
object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object.
Since the SQ and RQ work-queue (WQ) buffers are not contiguous like
other QPs, we allocate separate buffers in the user-space and pass
the address of each one of them separately to the kernel.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Extract specific IB QP fields to mlx5_ib_qp_trans structure.
The mlx5_core QP object resides in mlx5_ib_qp_base, which all QP types
inherit from. When we need to find mlx5_ib_qp using mlx5_core QP
(event handling and co), we use a pointer that resides in
mlx5_ib_qp_base.
In addition, we delete all redundant fields that weren't used anywhere
in the code:
-doorbell_qpn
-sq_max_wqes_per_wr
-sq_spare_wqes
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Transport Domain groups several TIS and TIR object. By grouping
these object, it defines wheather local loopback packets that
are sent from the TIS objects in the group are received by the
TIR objects in the same group.
Allocate a Transport Domain(TD) for each user context to be used
in the future by Raw Packet QP for Self-Loopback Control.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Per user context, work with CQE version that both the user-space
and the kernel support. Report this CQE version via the response of
the alloc_ucontext command.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enforce working with CQE version 1 when the user supports CQE
version 1 and asked to work this way.
If the user still works with CQE version 0, then use the default
CQE version to tell the Firmware that the user still works in the
older mode.
After this patch, the kernel still reports CQE version 0.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here
Fixes: d69e3bcf79 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.
Fixes: 938fe83c8d ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
With several ConnectX-4 cards installed on a server, one may receive
irqn > 255 from the kernel API, which we mistakenly trim to 8bit.
This causes EQ creation failure with the following stack trace:
[<ffffffff812a11f4>] dump_stack+0x48/0x64
[<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0
[<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180
[<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core]
[<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core]
[<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core]
[<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core]
[<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core]
[<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0
Fixing it by changing of the irqn type from u8 to unsigned int to
support values > 255
Fixes: 61d0e73e0a ('net/mlx5_core: Use the the real irqn in eq->irqn')
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding flow steering support by creating a flow-table per
priority (if rules exist in the priority). mlx5_ib uses
autogrouping and thus only creates the required destinations.
Also includes adding of these flow steering utilities
1. Parsing verbs flow attributes hardware steering specs.
2. Check if flow is multicast - this is required in order to decide
to which flow table will we add the steering rule.
3. Set outer headers in flow match criteria to zeros.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch will report the tx/rx checksum cap for raw qp via the
query device results.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA. if not,
advertise IB_ATOMIC_NONE.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.
The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
For Eth ports only:
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The remove_keys() logic is performed as garbage collection task. Such
task is intended to be run when no other active processes are running.
The need_resched() will return TRUE if there are user tasks to be
activated in near future.
In such case, we don't execute remove_keys() and postpone
the garbage collection work to try to run in next cycle,
in order to free CPU resources to other tasks.
The possible pseudo-code to trigger such scenario:
1. Allocate a lot of MR to fill the cache above the limit.
2. Wait a small amount of time "to calm" the system.
3. Start CPU extensive operations on multi-node cluster.
4. Expect performance degradation during MR cache shrink operation.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
hllyhNNolI6FJ64j07Xm
=bBIY
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is my initial round of 4.4 merge window patches. There are a few
other things I wish to get in for 4.4 that aren't in this pull, as
this represents what has gone through merge/build/run testing and not
what is the last few items for which testing is not yet complete.
- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
IB/core, cma: Make __attribute_const__ declarations sparse-friendly
IB/core: Remove old fast registration API
IB/ipath: Remove fast registration from the code
IB/hfi1: Remove fast registration from the code
RDMA/nes: Remove old FRWR API
IB/qib: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
RDMA/ocrdma: Remove old FRWR API
IB/mlx4: Remove old FRWR API support
IB/mlx5: Remove old FRWR API support
IB/srp: Dont allocate a page vector when using fast_reg
IB/srp: Remove srp_finish_mapping
IB/srp: Convert to new registration API
IB/srp: Split srp_map_sg
RDS/IW: Convert to new memory registration API
svcrdma: Port to new memory registration API
xprtrdma: Port to new memory registration API
iser-target: Port to new memory registration API
IB/iser: Port to new fast registration API
...
No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)
The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
<linux/highmem.h> is the placace the get the kmap type flags, asm-generic
files are generic implementations only to be used by architecture code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The field is only initialized in mlx, but never used.
If we want to add proper XRC support it should be done with a new
struct ib_xrc_wr.
This shrinks the various WR structures by another 4 bytes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr. This dramaticly
shrinks the size of a WR for most common operations:
sizeof(struct ib_send_wr) (old): 96
sizeof(struct ib_send_wr): 48
sizeof(struct ib_rdma_wr): 64
sizeof(struct ib_atomic_wr): 96
sizeof(struct ib_ud_wr): 88
sizeof(struct ib_fast_reg_wr): 88
sizeof(struct ib_bind_mw_wr): 96
sizeof(struct ib_sig_handover_wr): 80
And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:
sizeof(struct ib_fastreg_wr): 64
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Since mlx5 driver cannot rely on registration using the
reserved lkey (global_dma_lkey) it used to allocate a private
physical address lkey for each allocated pd.
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") just does it in the core layer so we can go ahead
and use that.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.
ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).
Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.
The local_dma_lkey support will be restored in CX4 depending on FW
capability query.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since patch series "Demux IB CM requests in the rdma_cm module" the
P_Key index is taken from the work completion rather than the message
itself.
The HCA provides us with the message P_Key. In order to provide the
P_Key index, we need to look it up. Given that this is relevant only
for GSI messages (session establishments) which is less performance critical,
micro-optimize against the GSI (is_qp1) branch.
Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in
its error flow even though there is never a case where the error flow
occurs with a valid MR pointer to destroy.
Remove the clean_mr() call and the incorrect comment above it.
Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding
support for MMU notifiers")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The only place that assigns mr inside the loop already does a break.
So "if (mr)" will never be true here since the function initializes mr
to NULL at the top. We can just drop the extra if and break here.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Should be all the page sizes that are supported by the
device.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.
Query and set this lkey when creating the device resources.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We recently added BUG_ON's which were inappropriate for a condition which
should never happen. Change these to be WARN_ON_ONCE as a debugging aid.
Fixes: 4cd7c9479a ('IB/mad: Add support for additional MAD info to/from drivers')
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
In order to support alternate sized MADs (and variable sized MADs on OPA
devices) add in/out MAD size parameters to the process_mad core call.
In addition, add an out_mad_pkey_index to communicate the pkey index the driver
wishes the MAD stack to use when sending OPA MAD responses.
The out MAD size and the out MAD PKey index are required by the MAD
stack to generate responses on OPA devices.
Furthermore, the in and out MAD parameters are made generic by specifying them
as ib_mad_hdr rather than ib_mad.
Drivers are modified as needed and are protected by BUG_ON flags if the MAD
sizes passed to them is incorrect.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add max MAD size to the device immutable data set and have all drivers that
support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core.
Verify MAD size data in both the MAD core and when reading the immutable data.
OPA drivers will report alternate MAD sizes in subsequent patches.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vendors should be able to pass vendor specific data to/from
user-space via query_device uverb. In order to do this,
we need to pass the vendors' specific udata.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.
Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.
This commit does not change any functionality.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: Doug Ledford <dledford@redhat.com>
Previously we configured HW MTU to be netdev->mtu, actually we
need to configure netdev->mtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN).
Also, query MTU can not fail, hence make the relevant helper a
void functionm, add mlx5e_set_dev_port_mtu, helper function to
handle MTU setting.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet functionality is only available when working in ISSI > 0 mode.
Previously, the IB driver wasn't ready to work on that mode, and hence
building both the IB driver and the Ethernet functionality in the core
driver were disallowed by Kconfigs.
Now, once we have all the pre-steps in place, we can remove this limitation.
The last steps in the IB driver for getting that setup to work are:
create dummy SRQ for the driver's use (until now we could use XRC_SRQ
as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from
basic SRQs) and adapt the create QP function to be compatible with ISSI > 0.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we still don't have RoCE support in mlx5, avoid
creating IB driver instance over Ethernet ports.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ISSI > 0 mode, most of the MAD_IFC command features are deprecated, and can't
be used. Therefore, when in that mode, we replace all of them with other commands
that provide the required functionality.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When working in ISSI > 0 mode, the model exposed by the device for
XRCs and SRQs is different. XRCs use XRC SRQs and plain SRQs are based
on RPM (Receive Memory Pool).
Add helper functions to create, modify, query, and arm XRC SRQs and RMPs.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The process_mad device function declares some parameters as "in". Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4
Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver
extends the existing mlx5 driver with Ethernet functionality.
This patch contains the driver entry points but does not include
transmit and receive (see the previous patch in the series) routines.
It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the
Ethernet functionality. Currently, Kconfig is programmed to make
Ethernet and Infiniband functionality mutally exclusive.
Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of
selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND
being selected.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
cap).
- Modify IB driver to use new macros for checking caps.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David Daney pointed in mlx4_core driver [1], mlx5_core is also
misusing the DMA-API.
This patch is removing the code that vmap() memory allocated by
dma_alloc_coherent().
After this patch, users of this drivers might fail allocating resources
on memory fragmeneted systems. This will be fixed later on.
[1] - https://patchwork.ozlabs.org/patch/458531/
CC: David Daney <david.daney@cavium.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of commit 5eb620c81c "IB/core: Add helpers for uncached GID and P_Key
searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in
the ib_device.
The per port core capability flags to be added later are also immutable data to
be stored in the ib_device object.
In preparation for this create a structure for per port immutable data and
place the pkey and gid table lengths within this structure.
"get_port_immutable" is added as a mandatory device function to allow the
drivers to fill in this data.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause
bad logging output so remove them.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Preparation for ethernet driver.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass consumer index as a parameter to arm CQ
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preparation for ethernet driver.
These functions will be used in drivers other than mlx5_ib.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do it in one place instead of every where the function is invoked
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass 0 in the sqd_event parameter.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Re-enable on-demand paging changes with stable ABI
- Fairly large set of ocrdma HW driver fixes
- Some qib HW driver fixes
- Other miscellaneous changes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJU52nHAAoJEENa44ZhAt0hkycP/0vwYNl0JJadasSUrLm2AWje
iAePU+K7uiyxSuIU+eLnyyDV/iQ2sCjXfhQNE6FnFhH3JjwBYbhS2Q8WcjwfRWYX
iFhORQltb3spSeLdT7N1QfkMF4MtAw3ENPE3QKIgNaIQSso+J256BHrSiqr9Akwe
hwIVr0TLDO59ggPs3c083uUhC+5AMViMVgRR+N9+/59WHz2vG2WsTunpg1sYOAgC
KpUhfZW4za5xYJ0c8BOnPiSAfJasD1UdDg3oX0RPD4j/diiWAO6EOR+jkOLtODpd
8uhD8HM7ZvCKE1io5QnVdjTR/n51NaVGHk+yfWJRSvV1sw1AALtU4NVniI+E5mFs
Fwe+zUzbQ3j08TtrU0VjaeteBh2oGC0pJlkJS9+HXeDmH30/LQCMCiA5mBSSEPDg
0cPEAJgGAZqOIyyZe8jSslW/iN0cE6FDDb8+/1AZ80IfbdMxXh9FVwi7cdXn8+OQ
nXmtnSa7yzKWS9VVXDrvSzI0Y2oDv8DSDpCWGCm7bUcDXd/T5LpPR3RdSorGkYAr
O2zmuJZlCdkuKgW91O7cztNj2hTlDnJ+e+5P05KwPOb86Muum6tphLJd5b1h970X
Actx/6EX/ycSQ3ukiKW7Ksn2bYD3RLZvdbPYQM5xUjlGGWPF6nvmbZ8MqnyaLFfE
WKvx39DMGq3ktofiMkbf
=JLsF
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA updates from Roland Dreier:
- Re-enable on-demand paging changes with stable ABI
- Fairly large set of ocrdma HW driver fixes
- Some qib HW driver fixes
- Other miscellaneous changes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (43 commits)
IB/qib: Add blank line after declaration
IB/qib: Fix checkpatch warnings
IB/mlx5: Enable the ODP capability query verb
IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
IB/core: Add support for extended query device caps
RDMA/cxgb4: Don't hang threads forever waiting on WR replies
RDMA/ocrdma: Fix off by one in ocrdma_query_gid()
RDMA/ocrdma: Use unsigned for bit index
RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit
RDMA/ocrdma: Update the ocrdma module version string
RDMA/ocrdma: set vlan present bit for user AH
RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure
RDMA/ocrdma: Add support for interrupt moderation
RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac
RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP
RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE
RDMA/ocrdma: Host crash on destroying device resources
RDMA/ocrdma: Report correct state in ibv_query_qp
RDMA/ocrdma: Debugfs enhancments for ocrdma driver
RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device
...
Re-enable the on-demand paging capability query through the
extended query device verb.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When we create an MR using reg_create, the mlx5_ib_dev pointer is not
updated on the new MR. This results in a kernel panics for ODP MRs
while handling page faults, when the mlx5_ib_update_mtt function uses
the invalid device pointer.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current code returns success when kmalloc() fails. It should
return an error code, -ENOMEM.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
While commit 7e36ef8205 ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9a9
("IB/core: Add support for extended query device caps") and commit
860f10a799 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].
Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in <rdma/ib_user_verbs.h> [3].
Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312cfe ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a799 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9a9
("IB/core: Add support for extended query device caps").
[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
http://mid.gmane.org/1418733236.2779.26.camel@opteya.com
[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
http://mid.gmane.org/1423067503.3030.83.camel@opteya.com
[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com
Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The return type of find_first_bit() is architecture specific,
on ARM it is 'unsigned int', while the asm-generic code used
on x86 and a lot of other architectures returns 'unsigned long'.
When building the mlx5 driver on ARM, we get a warning about
this:
infiniband/hw/mlx5/mem.c: In function 'mlx5_ib_cont_pages':
infiniband/hw/mlx5/mem.c:84:143: warning: comparison of distinct pointer types lacks a cast
m = min(m, find_first_bit(&tmp, sizeof(tmp)));
This patch changes the driver to use min_t to make it behave
the same way on all architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
send/receive/RDMA read/RDMA write, and UD send.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch implement a page fault handler (leaving the pages pinned as
of time being). The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.
The registered fault handler is empty in this patch, and only a later
patch implements it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.
Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.
In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.
* Instead of accepting a UMR bit telling the function to enable all
access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
correct list, and enable/disable the access flags per-page according
to whether the page is present.
* A new bit is set to enable writing of access flags when using the
firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.
In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports. At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>