The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.
At this stage, this event is unrecoverable, it may become recoverable
in the future.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Complete the transition of all WQ types to use fragmented
order-0 coherent memory instead of high-order allocations.
CQ-WQ already uses order-0.
Here we do the same for cyclic and linked-list WQs.
This allows the driver to load cleanly on systems with a highly
fragmented coherent memory.
Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.
No degradation is sensed.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If set, the FDB table supports the forward action with a
destination list that includes a flow table.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 core dirver updates for both net-next and rdma-next branches.
From Christophe JAILLET, first three patche to use kvfree where needed.
From: Or Gerlitz <ogerlitz@mellanox.com>
Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.
This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
=iHi6
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
From Christophe JAILLET, first three patche to use kvfree where needed.
From: Or Gerlitz <ogerlitz@mellanox.com>
Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.
This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The source e-switch owner allows a vport on one e-switch port be associated
with a rule defined on the second port e-switch.
The role of the source eswitch owner valid bit in the flow group is to
allow the firmware fail driver attempts to wild card the source eswitch
match field. If this bit is not set, the firmware ignores the source
eswitch owner field totally.
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The destination e-switch owner allows a rule in namespace of one e-switch
owner to point to a vport that is natively associated with another
e-switch owner.
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.
Under merged eswitch, both the matching on source vport and setting
destination vport can have a 2nd attribute which is the vhca id of the
eswitch owner.
For example:
esw0: {match: <src vport=1 owner=0> action: fwd to <dst vport=7, owner=1>}
is a flow set on eswitch0 matching on source vport=1 from his eswitch
and the action being fwd to dest vport=7 of eswitch1.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz Klein <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Avoid using the kernel's irq_descriptor and return IRQ vector affinity
directly from the driver.
This fixes the following build break when CONFIG_SMP=n
include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
‘struct irq_desc’ has no member named ‘affinity_hint’
Fixes: 6082d9c9c9 ("net/mlx5: Fix mlx5_get_vector_affinity function")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CC: Randy Dunlap <rdunlap@infradead.org>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add routines for manipulating TLS TX offload contexts.
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.
Add implementation for Innova TLS (FPGA-based) hardware.
These routines will be used by the TLS offload support in a later patch
mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.
In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.
The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.
Fixes: 2572cf57d7 ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84e0 ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back). This
series addresses an IRQ performance issue (patch 1), cleanups related to
the fix for the IRQ performance problem (patches 2-6), and then extends
the fragmented completion queue support that already exists in the net
side of the driver to the ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the remaining
10 patches to the IB subsystem. This series extends the current
'representor' framework when the mlx5 driver is in switchdev mode from
being a netdev only construct to being a netdev/IB dev construct. The IB
dev is limited to raw Eth queue pairs only, but by having an IB dev of
this type attached to the representor for a switchdev port, it enables
DPDK to work on the switchdev device.
* All net related, but needed as infrastructure for the rdma driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to
set the port for ib_srpt to listen on in configfs in order for it to be
enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while working on new
code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
user API features
- Infrastructure updates for the new IOCTL interface, based on increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
QuLEdvJIFmwrND1KDoqn
=Ku8g
-----END PGP SIGNATURE-----
Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Doug and I are at a conference next week so if another PR is sent I
expect it to only be bug fixes. Parav noted yesterday that there are
some fringe case behavior changes in his work that he would like to
fix, and I see that Intel has a number of rc looking patches for HFI1
they posted yesterday.
Parav is again the biggest contributor by patch count with his ongoing
work to enable container support in the RDMA stack, followed by Leon
doing syzkaller inspired cleanups, though most of the actual fixing
went to RC.
There is one uncomfortable series here fixing the user ABI to actually
work as intended in 32 bit mode. There are lots of notes in the commit
messages, but the basic summary is we don't think there is an actual
32 bit kernel user of drivers/infiniband for several good reasons.
However we are seeing people want to use a 32 bit user space with 64
bit kernel, which didn't completely work today. So in fixing it we
required a 32 bit rxe user to upgrade their userspace. rxe users are
still already quite rare and we think a 32 bit one is non-existing.
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back).
This series addresses an IRQ performance issue (patch 1),
cleanups related to the fix for the IRQ performance problem
(patches 2-6), and then extends the fragmented completion queue
support that already exists in the net side of the driver to the
ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the
remaining 10 patches to the IB subsystem. This series extends
the current 'representor' framework when the mlx5 driver is in
switchdev mode from being a netdev only construct to being a
netdev/IB dev construct. The IB dev is limited to raw Eth queue
pairs only, but by having an IB dev of this type attached to the
representor for a switchdev port, it enables DPDK to work on the
switchdev device.
* All net related, but needed as infrastructure for the rdma
driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users
need to set the port for ib_srpt to listen on in configfs in order
for it to be enabled
(/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while
working on new code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from
Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
memory' user API features
- Infrastructure updates for the new IOCTL interface, based on
increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64
bit kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them"
* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
IB/mlx5: Device memory mr registration support
net/mlx5: Mkey creation command adjustments
IB/mlx5: Device memory support in mlx5_ib
net/mlx5: Query device memory capabilities
IB/uverbs: Add device memory registration ioctl support
IB/uverbs: Add alloc/free dm uverbs ioctl support
IB/uverbs: Add device memory capabilities reporting
IB/uverbs: Expose device memory capabilities to user
RDMA/qedr: Fix wmb usage in qedr
IB/rxe: Removed GID add/del dummy routines
RDMA/qedr: Zero stack memory before copying to user space
IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
IB/mlx5: Add information for querying IPsec capabilities
IB/mlx5: Add IPsec support for egress and ingress
{net,IB}/mlx5: Add ipsec helper
IB/mlx5: Add modify_flow_action_esp verb
IB/mlx5: Add implementation for create and destroy action_xfrm
IB/uverbs: Introduce ESP steering match filter
IB/uverbs: Add modify ESP flow_action
...
This change updates the mlx5 interface to create mkey
on the device.
The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.
All the places that use the old access mode format are adjusted as
well.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.
The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.
The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds querying of device memory capabilities by the mlx5_core
driver during initialization.
Device memory capabilities is a new capability type and structure
which contains the necessary data that is needed for future device
memory allocation.
The presence of this new capabilities struct is indicated in the
general capabilities struct which is queried first by the driver.
If the presence bit is set, the driver will also query the new
capabilities struct and save it in the device context.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Simple wrapper to understand if we are dealing with IPsec flow.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Current Striding RQ HW feature utilizes the RX buffers so that
there is no wasted room between the strides. This maximises
the memory utilization.
This prevents the use of build_skb() (which requires headroom
and tailroom), and demands to memcpy the packets headers into
the skb linear part.
In this patch, whenever a set of conditions holds, we apply
an RQ configuration that allows combining the use of linear SKB
on top of a Striding RQ.
To use build_skb() with Striding RQ, the following must hold:
1. packet does not cross a page boundary.
2. there is enough headroom and tailroom surrounding the packet.
We can satisfy 1 and 2 by configuring:
stride size = MTU + headroom + tailoom.
This is possible only when:
a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE.
b. HW LRO is turned off.
Using linear SKB has many advantages:
- Saves a memcpy of the headers.
- No page-boundary checks in datapath.
- No filler CQEs.
- Significantly smaller CQ.
- SKB data continuously resides in linear part, and not split to
small amount (linear part) and large amount (fragment).
This saves datapath cycles in driver and improves utilization
of SKB fragments in GRO.
- The fragments of a resulting GRO SKB follow the IP forwarding
assumption of equal-size fragments.
Some implementation details:
HW writes the packets to the beginning of a stride,
i.e. does not keep headroom. To overcome this we make sure we can
extend backwards and use the last bytes of stride i-1.
Extra care is needed for stride 0 as it has no preceding stride.
We make sure headroom bytes are available by shifting the buffer
pointer passed to HW by headroom bytes.
This configuration now becomes default, whenever capable.
Of course, this implies turning LRO off.
Performance testing:
ConnectX-5, single core, single RX ring, default MTU.
UDP packet rate, early drop in TC layer:
--------------------------------------------
| pkt size | before | after | ratio |
--------------------------------------------
| 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
| 500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
| 64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
--------------------------------------------
TCP streams: ~20% gain
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
1. This function is not used anywhere in mlx5 driver
2. It has a memcpy statement that makes no sense and produces build
warning with gcc8
drivers/net/ethernet/mellanox/mlx5/core/transobj.c: In function 'mlx5_core_query_xsrq':
drivers/net/ethernet/mellanox/mlx5/core/transobj.c:347:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
Fixes: 01949d0109 ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
order to use it in a downstream patch from mlx5e.
In addition, use print_hex_dump instead of manual dumping of the buffer.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move query SQ state function from mlx5_ib to mlx5_core in order to
have it in shared code.
It will be used in a downstream patch from mlx5e.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.
Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.
Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Newer NICs (ConnectX-5 and onward) can apply vlan pop or push as an
action taking place during flow steering. Add the core bits for that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Added the following packets dropped while vport down statistics:
Rx dropped while vport down - counts packets which were steered by
e-switch to a vport, but dropped since the vport was down. This counter
will be shown on ip link tool as part of the vport rx_dropped counter.
Tx dropped while vport down - counts packets which were transmitted by
a vport, but dropped due to vport logical link down. This counter
will be shown on ip link tool as part of the vport tx_dropped counter.
The counters are read from FW by command QUERY_VNIC_ENV.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Added the following packets drop counter:
Rx steering missed dropped packets - counts packets which were dropped
due to miss on NIC rx steering rules.
This counter will be shown on ethtool as a new counter called
rx_steer_missed_packets.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for new FW command QUERY_VNIC_ENV.
The command is used by the driver to query vnic diagnostic statistics
from FW.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Implement set/get functions to configure PFC stall prevention
timeout by tunables api through ethtool.
By default the stall prevention timeout is configured to 8 sec.
Timeout range is: 80-8000 msec.
Enabling stall prevention with the auto timeout will set
the timeout to 100 msec.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the needed capability bit and counters to device spec description.
Expose the following two counters in ethtool:
tx_pause_storm_warning_events: when the device is stalled for a period
longer than a pre-configured watermark, the counter increase, allowing
the debug utility an insight into current device status.
tx_pause_storm_error_events: when the device is stalled for a period
longer than a pre-configured timeout, the pause transmission is disabled,
and the counter increase.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fun set of conflict resolutions here...
For the mac80211 stuff, these were fortunately just parallel
adds. Trivially resolved.
In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.
In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.
The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.
The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:
====================
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two new parameters: max_burst_sz and typical_pkt_size (both
in bytes) to rate limit configurations.
max_burst_sz: The device will schedule bursts of packets for an
SQ connected to this rate, smaller than or equal to this value.
Value 0x0 indicates packet bursts will be limited to the device
defaults. This field should be used if bursts of packets must be
strictly kept under a certain value.
typical_pkt_size: When the rate limit is intended for a stream of
similar packets, stating the typical packet size can improve the
accuracy of the rate limiter. The expected packet size will be
the same for all SQs associated with the same rate limit index.
Ethernet driver is updated according to this change, but these two
parameters will be kept as 0 due to lacking of proper way to get the
configurations from user space which requires to change
ndo_set_tx_maxrate interface.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
Signed-off-by: Doug Ledford <dledford@redhat.com>
BYPASS namespace is used by the RDMA side to insert flow rules into
the vport RX flow tables. Currently only 8 priorities are exposed,
increase this to 16 to allow more flexibility. This change will also
cause the BYPASS namespace to use 32 levels (as apposed to 16 today) of
flow tables, 16 levels for regular rules and 16 for don't trap rules.
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.
Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently ESN is not supported with IPSec device offload.
This patch adds ESN support to IPsec device offload.
Implementing new xfrm device operation to synchronize offloading device
ESN with xfrm received SN. New QP command to update SA state at the
following:
ESN 1 ESN 2 ESN 3
|-----------*-----------|-----------*-----------|-----------*
^ ^ ^ ^ ^ ^
^ - marks where QP command invoked to update the SA ESN state
machine.
| - marks the start of the ESN scope (0-2^32-1). At this point move SA
ESN overlap bit to zero and increment ESN.
* - marks the middle of the ESN scope (2^31). At this point move SA
ESN overlap bit to one.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to add a context to the FPGA, we need to get both the software
transform context (which includes the keys, etc) and the
source/destination IPs (which are included in the steering
rule). Therefore, we register new set of firmware like commands for
the FPGA. Each time a rule is added, the steering core infrastructure
calls the FPGA command layer. If the rule is intended for the FPGA,
it combines the IPs information with the software transformation
context and creates the respective hardware transform.
Afterwards, it calls the standard steering command layer.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current code has one layer that executed FPGA commands and
the Ethernet part directly used this code. Since downstream patches
introduces support for IPSec in mlx5_ib, we need to provide some
abstractions. This patch refactors the accel code into one layer
that creates a software IPSec transformation and another one which
creates the actual hardware context.
The internal command implementation is now hidden in the FPGA
core layer. The code also adds the ability to share FPGA hardware
contexts. If two contexts are the same, only a reference count
is taken.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch adds V2 command support.
New fpga devices support extended features (udp encap, esn etc...), this
features require new hardware sadb format therefore we have a new version
of commands to manipulate it.
Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Current hardware decrypts and authenticates incoming ESP packets.
Subsequently, the software extracts the nexthdr field, truncates the
trailer and adjusts csum accordingly.
With this patch and a capable device, the trailer is being removed
by the hardware and the nexthdr field is conveyed via PET. This way
we avoid both the need to access the trailer (cache miss) and to
compute its relative checksum, which significantly improve
the performance.
Experiment shows that trailer removal improves the performance by
2Gbps, (netperf). Both forwarding and host-to-host configurations.
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current code assume only SA QP commands.
Refactor in order to pave the way for new QP commands:
1. Generic cmd response format.
2. SA cmd checks are in dedicated functions.
3. Aligned debug prints.
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add helper functions that check if a protocol is
part of a flow steering match criteria.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, we don't support egress flow steering namespace in mlx5
flow steering core implementation. However, when we want to encrypt
a packet, we model it as a flow steering rule in the egress path.
To overcome this, we add an empty egress namespace to flow steering.
This namespace is initialized only when ipsec support exists.
In the future, this will grow to a full blown full steering
implementation, resembling the ingress path.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The has_tag member will indicate whether a tag action was specified
in flow specification.
A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag. HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.
So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Create a new representor type: REP_IB. which will be initialized by an IB
device that is used as a logical representor of a eswitch vport (VF or
uplink) just like we have a net device today in switchdev mode.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().
This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.
Base the Completion Queues (CQs) on this new fragmented buffer.
It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>