Struct assignment looks more clean, and implies resetting
the not assigned fields to zero, instead of holding values
from older ring cycles.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Into the txrx header file.
The mlx5e_sq_wqe_info structure describes WQE info for the ICOSQ,
rename it to better reflect this.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Every single DUMP WQE resides in a single WQEBB.
As the pi is calculated per each one separately, there is
no real need for a contiguous room for them, allow them to populate
different WQ fragments.
This reduces WQ waste and improves its utilization.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For the static and progress context params WQEs, do the edge
filling separately.
This improves the WQ utilization, code readability, and reduces
the chance of future bugs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After previous modifications, the offloads are no longer called one by
one, the pi is calculated and the wqe is cleared on between of TLS and
IPSEC offloads, which doesn't quite fit mlx5e_accel_handle_tx's purpose.
This patch splits mlx5e_accel_handle_tx into two functions that
correspond to two logical phases of running offloads:
1. Before fetching a WQE. Here runs the code that can post WQEs on its
own, before the main WQE is fetched. It's the main part of TLS offload.
2. After fetching a WQE. Here runs the code that updates the WQE's
fields, but can't post other WQEs any more. It's a minor part of TLS
offload that sets the tisn field in the cseg, and eseg-based offloads
(currently IPSEC, and later patches will move GENEVE and checksum
offloads there, too).
It allows to make mlx5e_xmit take care of all actions needed to transmit
a packet in the right order, improve the structure of the code and
reduce unnecessary operations. The structure will be further improved in
the following patches (all eseg-based offloads will be moved to a single
place, and reserving space for the main WQE will happen between phase 1
and phase 2 of offloads to eliminate unneeded data movements).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_udp_gso_handle_tx_skb updates the length field in the UDP header
in case of GSO. It doesn't interfere with other offloads, so do it first
to simplify further restructuring of the code. This way we'll make all
independent modifications to the SKB before starting to work with WQEs.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
TLS offload may write a 32-bit field (tisn) to the cseg of the WQE. To
do that, it receives pi and wqe pointers. As TLS offload may also send
additional WQEs, it has to update pi and wqe, and in many cases it even
doesn't use pi calculated before and wqe zeroed before and does it
itself. Also, mlx5e_sq_xmit has to copy the whole cseg if it goes to the
mlx5e_fill_sq_frag_edge flow. This all is not efficient.
It's more efficient to do the following:
1. Just return tisn from TLS offload and make the caller fill it in a
more appropriate place.
2. Calculate pi and clear wqe after calling TLS offload.
3. If TLS offload has to send WQEs, calculate pi and clear wqe just
before that. It's already done in all places anyway, so this commit
allows to remove some redundant memsets and calls.
Copying of cseg will be eliminated in one of the following commits, and
all other stuff is done here.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
IPSEC offload needs to modify the eseg of the WQE that is being filled,
but it receives a pointer to the whole WQE. To make the contract
stricter, pass only the pointer to the eseg of that WQE. This commit is
preparation for the following refactoring of offloads in the TX path and
for the MPWQE support.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_sq_xmit and mlx5i_sq_xmit always return NETDEV_TX_OK. Drop the
return value to simplify the code.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Both INNOVA and ConnectX TLS offloads perform the same checks in the
beginning. Unify them to reduce repeating code. Do WARN_ON_ONCE on
netdev mismatch and finish with an error in both offloads, not only in
the ConnectX one.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
TLS and IPSEC offloads currently return struct sk_buff *, but the value
is either NULL or the same skb that was passed as a parameter. Return
bool instead to provide stronger guarantees to the calling code (it
won't need to support handling a different SKB that could be potentially
returned before this change) and to simplify restructuring this code in
the following commits.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This merge includes updates to bonding driver needed for the rdma stack,
to avoid conflicts with the RDMA branch.
Maor Gottlieb Says:
====================
Bonding: Add support to get xmit slave
The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_get_xmit_slave. Every LAG module can
implement it and it first implemented in the bond driver.
This is follow-up to the RFC discussion [1].
The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.
The first part of this patchset introduces the new .ndo and add the
support to the bonding module.
The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new
.ndo.
The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.
====================
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The NL_SET_ERR_MSG_MOD macro is used to report a string describing an
error message to userspace via the netlink extended ACK structure. It
should not have a trailing newline.
Add a cocci script which catches cases where the newline marker is
present. Using this script, fix the handful of cases which accidentally
included a trailing new line.
I couldn't figure out a way to get a patch mode working, so this script
only implements context, report, and org.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following coccicheck warning:
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c:1396:5-8: Unneeded
variable: "err". Return "0" on line 1411
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
that the frontend does not need counters, this hw stats type request
never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
the driver to disable the stats, however, if the driver cannot disable
counters, it bails out.
TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
(this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.
Fixes: 319a1d1947 ("flow_offload: check for basic action hw stats type")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following coccicheck warning:
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c:1238:5-8: Unneeded
variable: "err". Return "0" on line 1252
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ENOSPC is set the idx is still valid and gets set to the global
MLX4_SINK_COUNTER_INDEX. However gcc's static analysis cannot tell that
ENOSPC is impossible from mlx4_cmd_imm() and gives this warning:
drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be
used uninitialized in this function [-Wmaybe-uninitialized]
2552 | priv->def_counter[port] = idx;
Also, when ENOSPC is returned mlx4_allocate_default_counters should not
fail.
Fixes: 6de5f7f6a1 ("net/mlx4_core: Allocate default counter per port")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add function to get the device physical port of the lag slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The lag lock could be a spin lock, the critical section is short
and there is no need that the thread will sleep.
Change the lock that protects the LAG structure from mutex
to spin lock. It is required for next patch that need to
access this structure from context that we can't sleep.
In addition there is no need to hold this lock when query the
congestion counters.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vregion helpers to get min and max priority depend on the correct
ordering of vchunks in the vregion list. However, the current code
always adds new chunk to the end of the list, no matter what the
priority is. Fix this by finding the correct place in the list and put
vchunk there.
Fixes: 22a677661f ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Add release all pages support, From Eran.
to release all FW pages at once on driver unload, when supported by FW.
2) From Maxim and Tariq, Trivial Data path cleanup and code improvements
in preparation for their next features, TLS offload and TX performance
improvements
3) Multiple cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl6rBpYACgkQSD+KveBX
+j5L4Qf+MA17+ENeqPfMLRKHtSn9D50M3+8uDuYd4VK0uqQIQHbBpxHNM4FLa1sI
WTBx/HnFKq5eZdOvYiZExVxpOcBWk+KVoIq8r4IPHsCU3Y2BqQOc6qqbi8haQ7J8
fgrdi+gbS02N8MD45uUbNP/8JhZxN+4s0uEaH9cQ68sSorZOF1VtExAttTpQoqso
Zur9gQH3MfYmQBPbr7mj4OsKiho7cb17UPadASyiLjvD7QDJ+++73PHF5YCGuSy0
HTSvdBZOesBzaeD7qTwdq3bFILYiuiNmMlLotdvXmuAceXfa7lO8bqvwlc8an+No
UMOBELwlw6tqVjSohMtqlbEMS9PNXQ==
=XSFa
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-04-30
1) Add release all pages support, From Eran.
to release all FW pages at once on driver unload, when supported by FW.
2) From Maxim and Tariq, Trivial Data path cleanup and code improvements
in preparation for their next features, TLS offload and TX performance
improvements
3) Multiple cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the old SPAN API now that matchall-based and flower-based
mirroring were converted to use the new API.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As previously explained, each port whose outgoing traffic is analyzed
needs to have an egress mirror buffer.
The size of the egress mirror buffer is calculated based on various
parameters, two of which are the speed and the MTU of the port.
Therefore, when the MTU or the speed of a port change, the SPAN code is
called to see if the egress mirror buffer of the port needs to be
adjusted.
Currently, this is done by traversing all the SPAN agents and for each
SPAN agent the list of bound ports is traversed.
Instead of the above, traverse the recently added list of analyzed
ports.
This will later allow us to remove the old SPAN API.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In flower-based mirroring, mirroring is done with ACLs and the SPAN
agent is not bound to a port. Instead its identifier is specified in an
ACL action.
Convert this type of mirroring to use the new API.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In matchall-based mirroring, mirroring is not done with ACLs, but a SPAN
agent is bound to the ingress / egress of a port and all incoming /
outgoing traffic is mirrored.
Convert this type of mirroring to use the new API.
First the SPAN agent is resolved, then the port is marked as analyzed
and its egress mirror buffer is potentially allocated. Lastly, the
binding is performed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a SPAN agent can only be bound to a per-port trigger where
the trigger is either an incoming packet (INGRESS) or an outgoing packet
(EGRESS) to / from the port.
A follow-up patch set will introduce the concept of global triggers and
per-{port, TC} enablement. With global triggers, the trigger entry is
only keyed by a trigger and not by a port and a trigger. The trigger can
be, for example, a packet that was early dropped.
While the binding between the SPAN agent and the trigger is performed
only once, the trigger entry needs to be reference counted, as the
trigger can be enabled on multiple ports.
Add APIs to bind / unbind a SPAN agent to a trigger and reference count
the trigger entry in preparation for global triggers.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code that adjusts the egress buffer size is not symmetric at the
moment. The update is done via a call to
mlxsw_sp_span_port_buffer_update(), but the disablement is done inline
by invoking the write to SBIB register directly.
Wrap the disablement code in mlxsw_sp_span_port_buffer_disable().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next patch will introduce mlxsw_sp_span_port_buffer_disable() function
that disables the egress buffer on an analyzed port. Rename the opposite
function that updates the buffer on an analyzed port accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An analyzed port is a port whose incoming / outgoing traffic is mirrored
to a SPAN agent and analyzed on a remote server.
A port can be analyzed by multiple tc filters and therefore the
corresponding analyzed port entry needs to be reference counted. This is
significant because ports whose outgoing traffic is analyzed need to
have an egress mirror buffer.
Add APIs to get / put an analyzed port. Allocate an egress mirror buffer
on a port when it is first inspected at egress and free the buffer when
it is no longer inspected at egress.
Protect the list of analyzed ports with a mutex, as a later patch will
traverse it from a context in which RTNL lock is not held.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given a netdev that packets should be mirrored to, create a SPAN agent
and return its identifier to the caller.
The SPAN agent is reference counted, as multiple tc-mirred actions can
point to the same destination netdev.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In our fast-path design, a WQE (Work Queue Element) must not cross the
page boundary. To enforce that, for WQEs consisting of more than one BB
(Basic Block), the driver checks the available contiguous space in the
WQ in advance, and if it's not enough, it pads it with NOPs.
This patch modifies the code that calculates the position of next WQE,
considering the padding, and prepares the WQE. This code is common for
all SQ types. In this patch it's reorganized in a way that makes the
usage pattern unified for all SQ types, and makes the implementations
self-contained and look almost the same, preparing the repeating code to
further attempts to deduplicate it.
One place is left as is: mlx5e_sq_xmit and mlx5e_fill_sq_frag_edge call
inside, because it is special in a way that it may also copy WQE's cseg
and eseg when reserving space. This will be eliminated in one of the
following patches, and this place will be converted to the new approach,
too.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Structs mlx5e_txqsq and mlx5e_xdpsq contain wqe_info arrays to store
supplementary information corresponding to WQEs in the queue. Struct
mlx5e_icosq also has such an array, but it's called differently -
ico_wqe. This patch renames it to unify with the other SQs.
In addition, rename the struct to emphasize its specific usage.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There are multiple functions mlx5{e,i}_*_fetch_wqe that contain the same
code, that is repeated, because they operate on different SQ struct
types. mlx5e_sq_fetch_wqe also returns void *, instead of the concrete
WQE type.
This commit generalizes the fetch WQE operation by putting this code
into a single function. To simplify calls of the generic function in
concrete use cases, macros are provided that substitute the right WQE
size and cast the return type.
Before this patch, fetch_wqe used to calculate pi itself, but the value
was often known to the caller. This calculation is moved outside to
eliminate this unnecessary step and prepare for the fill_frag_edge
refactoring in the next patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Upon an error completion on an XDP SQ, print the offending WQE
to ease the debug process.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Error CQE was dumped only for TXQ SQs.
Generalise the function, and add usage for error completions
on ICO SQs and XDP SQs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Even though some of the WQE control segment's field share
the same memory bits (a union of fields), prefer having the
right field name for every different usage.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If FW sets release_all_pages bit in MLX5_EVENT_TYPE_PAGE_REQUEST,
driver shall release all pages of a given function id, with no further
pages reclaim negotiation with FW nor MANAGE_PAGES commands from driver
towards FW.
Upon receiving this bit as part of pages reclaim event, driver will
initiate release all flow, in which it will iterate and release all
function's pages.
As part of driver <-> FW capabilities handshake, FW will report
release_all_pages max HCA cap bit, and driver will set the
release_all_pages bit in HCA cap.
NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: Simulataniously FLR 4 VFs, and measure FW release pages by
driver.
Before: 3.18 Sec
After: 0.31 Sec
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Thousands of pages are released with free_addr() function. In case of
buggy sync between FW and driver on released address, the log will be
flooded with error messages. Use mlx5_core_warn_rl() to limit it.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Factor out the fwp address release page to an helper function, will be
used in the downstream patch.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The size field in EQ is not in use.
Remove it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use existing helper API to get unique devlink port index for all
devlink port flavours.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The cited commit introduced the following coverity issue at functions
mlx5_fpga_is_ipsec_device() and mlx5_fpga_ipsec_release_sa_ctx():
- bit_and_with_zero:
accel_xfrm->attrs.action & MLX5_ACCEL_ESP_ACTION_DECRYPT is always 0.
As MLX5_ACCEL_ESP_ACTION_DECRYPT is not a bitwise flag and was wrongly
used with bitwise operation, the above expression is always zero value
as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero.
Fix by using "==" comparison operator instead.
Fixes: 7dfee4b1d7 ("net/mlx5: IPsec, Refactor SA handle creation and destruction")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 updates for both net-next and rdma-next:
1) HW bits and definitions for TLS and IPsec offlaods
2) Release all pages capability bits
3) New command interface helpers and some code cleanup as a result
4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Need to allocate the q counters before init_rx which needs them
when creating the rq.
Fixes: 8520fa57a4 ("net/mlx5e: Create q counters on uplink representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Processing commands by cmd_work_handler() while already in Internal
Error State will result in entry leak, since the handler process force
completion without doorbell. Forced completion doesn't release the entry
and event completion will never arrive, so entry should be released.
Fixes: 73dd3a4839 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5_cmd_flush() will trigger forced completions to all valid command
entries. Triggered by an asynch event such as fast teardown it can
happen at any stage of the command, including command initialization.
It will trigger forced completion and that can lead to completion on an
uninitialized command entry.
Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is
initialized will ensure force completion is treated only if command
entry is initialized.
Fixes: 73dd3a4839 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In polling mode, set arm_db member to a value that will avoid CQ
event recovery by the HW.
Otherwise we might get event without completion function.
In addition,empty completion function to was added to protect from
unexpected events.
Fixes: 297cccebdc ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When mlx5_modify_header_alloc() fails, instead of printing the error
value returned, current error log prints 0.
Fix by printing correct error value returned by
mlx5_modify_header_alloc().
Fixes: 6724e66b90 ("net/mlx5: E-Switch, Get reg_c1 value on miss")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Error unwinding is done incorrectly in the cited commit.
When steering init fails, there is no need to perform steering cleanup.
When vport error exists, error cleanup should be mirror of the setup
routine, i.e. to perform steering cleanup before metadata cleanup.
This avoids the call trace in accessing uninitialized objects which are
skipped during steering_init() due to failure in steering_init().
Call trace:
mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header
actions 1, max supported 0
E-Switch: Failed to create restore mod header
BUG: kernel NULL pointer dereference, address: 00000000000000d0
[ 677.263079] mlx5_destroy_flow_group+0x13/0x80 [mlx5_core]
[ 677.268921] esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core]
[ 677.275281] esw_offloads_enable+0x1a5/0x800 [mlx5_core]
[ 677.280949] mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
[ 677.287227] mlx5_devlink_eswitch_mode_set+0x1af/0x320
[ 677.293741] devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
[ 677.299217] genl_rcv_msg+0x1eb/0x430
Fixes: 7983a675ba ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata
Current implementation doesn't reflect the intended usage of the
field at usage time.
Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Move the code taking case of setup of flow offload into spectrum_flow.c
Do small renaming of callbacks on the way.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are two callbacks registered: one for matchall,
one for flower. This causes the user to see "in_hw_count 2" in TC filter
dump. Because of this and also as a preparation for future matchall
offload for rules equivalent to flower-all-match, move the processing of
shared block into matchall.c. Leave only one cb for mlxsw driver
per-block.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, only the psample_group is accessed using RCU on RX path.
However, it is possible (unlikely) that other sample values get change
during RX processing. Fix this by having the port->sample struct
accessed as RCU pointer, containing all sample values including
psample_group pointer. That avoids extra alloc per-port, copying the
values and the race condition described above.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the replace/destroy is going to be used later on per-block, push
the per-port rule addition/deletion into separate functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having it in mirror_entry structure, move it to mall_entry
and set it during rule insertion.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the preparation for future changes, have the
mlxsw_sp_mall_port_sample_add() function to accept mall_entry including
all needed info originally obtained from cls and act pointers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the preparation for future changes, have the
mlxsw_sp_mall_port_mirror_add() function to accept mall_entry including
the "to_dev" originally obtained from act pointer.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On couple of places in mlxsw_sp_acl_rule_del(), block variable is not
used directly as it could be. So do it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to flower, have matchall related code in a separate file.
Do some small renaming on the way (consistent "mall" prefixes,
dropped "_tc_", dropped "_port_" where suitable).
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code around flow_block is currently mixed in spectrum_acl.c.
However, as it really does not directly relate to ACL part only,
push the bits into a separate file.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The acl_block structure is going to be used for non-acl case - matchall
offload. So rename it accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct is defined in the header, no need to have the helpers
in the c file. Move the helpers to the header.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following coccicheck warning:
drivers/net/ethernet/mellanox/mlx4/crdump.c:200:2-8: ERROR: missing iounmap;
ioremap on line 190 and execution via conditional on line 198
Fixes: 7ef19d3b1d ("devlink: report error once U32_MAX snapshot ids have been used")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to commit e99f8e7f88 ("mlxsw: Replace zero-length
array with flexible-array member"), use a flexible-array member to get a
compiler warning in case the flexible array does not occur last in the
structure.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'refcount_t' is very useful for catching over/under flows. Convert the
SPAN agent objects to use it instead of 'int' for their reference count.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To the best of my knowledge, these debug prints were never used. Remove
them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a more meaningful name for parms() function.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use early return to avoid unnecessary nesting.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to benefit from the new napi_defer_hard_irqs feature,
we need to use napi_complete_done() variant in this driver.
RX path is already using it, this patch implements TX completion side.
mlx4_en_process_tx_cq() now returns the amount of retired packets,
instead of a boolean, so that mlx4_en_poll_tx_cq() can pass
this value to napi_complete_done().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw_sp_acl_rulei_create() function is supposed to return an error
pointer from mlxsw_afa_block_create(). The problem is that these
functions both return NULL instead of error pointers. Half the callers
expect NULL and half expect error pointers so it could lead to a NULL
dereference on failure.
This patch changes both of them to return error pointers and changes all
the callers which checked for NULL to check for IS_ERR() instead.
Fixes: 4cda7d8d70 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do mass update of port.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of rl.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of uar.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of pd.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of pagealloc.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mr.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mcg.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of main.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of vxlan.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of mpfs.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of gid.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of lag.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of fw.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of fs_core to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of FPGA to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of statistics to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of eq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of ecpf.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of debugfs.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Do mass update of cq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
In the switchdev mode, when running "cat
/sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
accessed to get the latest values. But currently this command can
not get the correct values from ppcnt.
From firmware manual, before getting the 802_3 counters, the 802_3
data layout should be set to the ppcnt register.
When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
run, before updating 802_3 data layout with ppcnt register, the
monitor counters are tested. The test result will decide the
802_3 data layout is updated or not.
Actually the monitor counters do not support to monitor rx/tx
stats of 802_3 in switchdev mode. So the rx/tx counters change
will not trigger monitor counters. So the 802_3 data layout will
not be updated in ppcnt register. Finally this command can not get
the latest values from ppcnt register with 802_3 data layout.
Fixes: 5c7e8bbb02 ("net/mlx5e: Use monitor counters for update stats")
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
MLXFW and PCI_HYPERV_INTERFACE.
This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
regardless of their config states.
Due to the changes in the cited commit below, the semantics of 'imply'
was changed to not force any restriction on the implied config.
As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
would result in undefined references, as VXLAN now would stay as 'm'.
To fix this we change MLX5_CORE to have a weak dependency on
these modules/configs and make sure they are reachable, by adding:
depend on symbol || !symbol.
For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
Fixes: def2fbffe6 ("kconfig: allow symbols implied by y to become m")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
ICOSQ. When the application floods the driver with wakeup requests by
calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
the XSK ICOSQ may overflow.
Multiple NOPs are not required and won't accelerate the process, so
avoid posting a second NOP if there is one already on the way. This way
we also avoid increasing the queue size (which might not help anyway).
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
allocation as done for the firmware tracer will always fail.
Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
that copies the debug data into the trace array and there is no need for
the allocation to be contiguous in physical memory. We can therefor use
kvzalloc() instead of kzalloc() and get rid of the large contiguous
allcoation.
Fixes: f53aaa31cc ("net/mlx5: FW tracer, implement tracer logic")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Embedded CPU bit doesn't change with PCI resume/suspend.
Hence read it only once while probing the PCI device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
netif_set_real_num_tx_queues and netif_set_real_num_rx_queues may fail.
Now that mlx5e supports handling errors in the preactivate hook, this
commit leverages that functionality to handle errors from those
functions and roll back all changes on failure.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We use mapping to save and restore the tunnel options.
Save also the tunnel options mask.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In notify HW (ring doorbell) flow, we set the bit to request a completion
on the TX descriptor.
When doing so, we should not unset other bits in the same byte.
Currently, this does not fix a real issue, as we still don't have a flow
where both MLX5_WQE_CTRL_CQ_UPDATE and any adjacent bit are set together.
Fixes: 542578c679 ("net/mlx5e: Move helper functions to a new txrx datapath header")
Fixes: 864b2d7153 ("net/mlx5e: Generalize tx helper functions for different SQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the SA handle is created and managed as part of the common
code for different IPsec supporting HW, this handle is passed to HW
to be used on Rx to identify the SA handle that was used to
return the xfrm state to stack.
The above implementation pose a limitation on managing this handle.
Refactor by moving management of this field to the specific HW code.
Downstream patches will introduce the Connect-X support for IPsec that
will use this handle differently than current implementation.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current HW counters are supported only by Innova, split the ipsec
stats group into two groups, one for HW and one for SW. And expose
the HW counters to ethtool only if Innova HW is used for IPsec offload.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the FPGA IPsec is the only hw implementation of the IPsec
acceleration api, and so the mlx5_accel_esp_create_hw_context was
wrongly made to suit this HW api, among other in its parameter list
and some of its parameter endianness.
This implementation might not be suitable for different HW.
Refactor by group and pass all function arguments of
mlx5_accel_esp_create_hw_context in common mlx5_accel_esp_xfrm_attrs
struct field of mlx5_accel_esp_xfrm struct and correct the endianness
according to the HW being called.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The cited commit relies on include <net/geneve.h> being included
implicitly prior to include "en_accel/en_accel.h".
This mandates that all files that needs to include en_accel.h
to redantantly include net/geneve.h.
Include net/geneve.h explicitly at "en_accel/en_accel.h" to avoid
undesired constrain as above.
Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the IPsec acceleration capability function is also used
at IPsec fpga capable device code.
This could cause a future bug as the acceleration layer is agnostic
to the device implementing its API.
Fix by using the IPsec FPGA capability function instead of acceleration
layer capability function in case of FPGA IPsec only related operations.
Downstream patches will add support for Connect-X IPsec, this can avoid
a future bug.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The Infrastructure Entry Delete Register (IEDR) is used to delete
entries stored in the KVD linear database. Currently, it is only
possible to delete entries of size up to 2048. Future firmware versions
will support deletion of entries of size up to 4096.
Increase the size of the field so that the driver will be able to
perform such deletions in the future, when required.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in commit fc25996e6f ("mlxsw: spectrum_router: Increase
scale of IPv6 nexthop groups"), each nexthop group is hashed by XOR-ing
the interface indexes of all the member nexthop devices.
To avoid many different nexthop groups ending up using the same key, the
above commit started hashing the interface indexes themselves before
they are XOR-ed.
However, in cases in which there are many nexthop groups that all use
the same nexthop device and only differ in the gateway IP, we can still
end up in a situation in which all the groups are using the same key.
This eventually leads to -EBUSY error from rhashtable during insertion.
Improve the situation by also making the gateway IP part of the key.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When this is enabled, UDP source port for RoCEv2 packets are defined
by software instead of firmware.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reduce the amount of kzalloc/kfree cycles by allocating
command structure in the parent function and leverage the
knowledge that set_caps() is called for HCA capabilities
only with specific HW structure as parameter to calculate
mailbox size.
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The structures defined in the cmd header are not used and can be safely
removed from the driver. This patch removes that file and deletes all
relevant includes.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.
This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
By changing debugfs to not use any QP related API, convert the debugfs
code to use MLX5_GET() directly to access QP context instead of hand
written struct.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The FPGA, SW steering and IPoIB need to have only QPN from the
mlx5_core_qp struct, so reduce memory footprint by storing QPN
directly.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Remove dependency on qp.c from the IPoIB by open coding
modify QP interface.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Remove dependency on qp.c from the FPGA by open coding
modify QP interface.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Remove dependency on qp.c from SW steering by open
coding modify QP interface.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The QP and CQ events functions do nothing except printing some debug
messages. There is nothing to do with this knowledge and such events,
so remove them.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
FPGA, IPoIB and SW steering don't need anything from the
mlx5_core_create_qp() and mlx5_core_destroy_qp() except calls
to mlx5_cmd_exec().
Let's open-code it, so we will be able to move qp.c to mlx5_ib.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Commit 9ecc2d8617 ("net/mlx4_en: add xdp forwarding and data write support")
brought another indirect call in fast path.
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes CT entries list corruption.
After allowing parallel insertion/removals in upper nf flow table
layer, unprotected ct entries list can be corrupted by parallel add/del
on the same flow table.
CT entries list is only used while freeing a ct zone flow table to
go over all the ct entries offloaded on that zone/table, and flush
the table.
As rhashtable already provides an api to go over all the inserted entries,
fix the race by using the rhashtable iteration instead, and remove the list.
Fixes: 7da182a998 ("netfilter: flowtable: Use work entry per offload command")
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cited patch missed to extract PCI pf number accurately for PF and VF
port flavour. It considered PCI device + function number.
Due to this, device having non zero device number shown large pfnum.
Hence, use only PCI function number; to avoid similar errors, derive
pfnum one time for all port flavours.
Fixes: f60f315d33 ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With ct clear action we should not allocate the action in hw
and not release the mod_acts parsed in advance.
It will be done when handling the ct clear action.
Fixes: 1ef3018f5a ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Current value of nest_level, assigned from net_device lower_level value,
does not reflect the actual number of vlan headers, needed to pop.
For ex., if we have untagged ingress traffic sended over vlan devices,
instead of one pop action, driver will perform two pop actions.
To fix that, calculate nest_level as difference between vlan device and
parent device lower_levels.
Fixes: f3b0a18bb6 ("net: remove unnecessary variables and callback")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Once driver finishes flashing the firmware image, it should release it.
Fixes: 9c8bca2637 ("mlx5: Move firmware flash implementation to devlink")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When we destroy rules from slow path we need to avoid destroying
termination tables since termination tables are never created in slow
path. By doing so we avoid destroying the termination table created for the
slow path.
Fixes: d8a2034f15 ("net/mlx5: Don't use termination tables in slow path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull networking fixes from David Miller:
1) Slave bond and team devices should not be assigned ipv6 link local
addresses, from Jarod Wilson.
2) Fix clock sink config on some at803x PHY devices, from Oleksij
Rempel.
3) Uninitialized stack space transmitted in slcan frames, fix from
Richard Palethorpe.
4) Guard HW VLAN ops properly in stmmac driver, from Jose Abreu.
5) "=" --> "|=" fix in aquantia driver, from Colin Ian King.
6) Fix TCP fallback in mptcp, from Florian Westphal. (accessing a plain
tcp_sk as if it were an mptcp socket).
7) Fix cavium driver in some configurations wrt. PTP, from Yue Haibing.
8) Make ipv6 and ipv4 consistent in the lower bound allowed for
neighbour entry retrans_time, from Hangbin Liu.
9) Don't use private workqueue in pegasus usb driver, from Petko
Manolov.
10) Fix integer overflow in mlxsw, from Colin Ian King.
11) Missing refcnt init in cls_tcindex, from Cong Wang.
12) One too many loop iterations when processing cmpri entries in ipv6
rpl code, from Alexander Aring.
13) Disable SG and TSO by default in r8169, from Heiner Kallweit.
14) NULL deref in macsec, from Davide Caratti.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
macsec: fix NULL dereference in macsec_upd_offload()
skbuff.h: Improve the checksum related comments
net: dsa: bcm_sf2: Ensure correct sub-node is parsed
qed: remove redundant assignment to variable 'rc'
wimax: remove some redundant assignments to variable result
mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_PRIORITY
r8169: change back SG and TSO to be disabled by default
net: dsa: bcm_sf2: Do not register slave MDIO bus with OF
ipv6: rpl: fix loop iteration
tun: Don't put_page() for all negative return values from XDP program
net: dsa: mt7530: fix null pointer dereferencing in port5 setup
mptcp: add some missing pr_fmt defines
net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers
net_sched: fix a missing refcnt in tcindex_init()
net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting
mlxsw: spectrum_trap: fix unintention integer overflow on left shift
pegasus: Remove pegasus' own workqueue
neigh: support smaller retrans_time settting
net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry
...
The handler for FLOW_ACTION_VLAN_MANGLE ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.
Fixes: a150201a70 ("mlxsw: spectrum: Add support for vlan modify TC action")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The handler for FLOW_ACTION_PRIORITY ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.
Fixes: 463957e3fb ("mlxsw: spectrum_flower: Offload FLOW_ACTION_PRIORITY")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shifting the integer value 1 is evaluated using 32-bit
arithmetic and then used in an expression that expects a 64-bit
value, so there is potentially an integer overflow. Fix this
by using the BIT_ULL macro to perform the shift and avoid the
overflow.
Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 13f2e64b94 ("mlxsw: spectrum_trap: Add devlink-trap policer support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The majority of the patches are cleanups, refactorings and clarity
improvements
- Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
- Lots of cleanup patches for hns
- Convert more places to use refcount
- Aggressively lock the RDMA CM code that syzkaller says isn't working
- Work to clarify ib_cm
- Use the new ib_device lifecycle model in bnxt_re
- Fix mlx5's MR cache which seems to be failing more often with the new
ODP code
- mlx5 'dynamic uar' and 'tx steering' user interfaces
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
=ae+a
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"The majority of the patches are cleanups, refactorings and clarity
improvements.
This cycle saw some more activity from Syzkaller, I think we are now
clean on all but one of those bugs, including the long standing and
obnoxious rdma_cm locking design defect. Continue to see many drivers
getting cleanups, with a few new user visible features.
Summary:
- Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
- Lots of cleanup patches for hns
- Convert more places to use refcount
- Aggressively lock the RDMA CM code that syzkaller says isn't
working
- Work to clarify ib_cm
- Use the new ib_device lifecycle model in bnxt_re
- Fix mlx5's MR cache which seems to be failing more often with the
new ODP code
- mlx5 'dynamic uar' and 'tx steering' user interfaces"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
RDMA/bnxt_re: make bnxt_re_ib_init static
IB/qib: Delete struct qib_ivdev.qp_rnd
RDMA/hns: Fix uninitialized variable bug
RDMA/hns: Modify the mask of QP number for CQE of hip08
RDMA/hns: Reduce the maximum number of extend SGE per WQE
RDMA/hns: Reduce PFC frames in congestion scenarios
RDMA/mlx5: Add support for RDMA TX flow table
net/mlx5: Add support for RDMA TX steering
IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
IB/hfi1: Fix memory leaks in sysfs registration and unregistration
IB/mlx5: Move to fully dynamic UAR mode once user space supports it
IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
IB/mlx5: Extend QP creation to get uar page index from user space
IB/mlx5: Extend CQ creation to get uar page index from user space
IB/mlx5: Expose UAR object and its alloc/destroy commands
IB/hfi1: Get rid of a warning
RDMA/hns: Remove redundant judgment of qp_type
RDMA/hns: Remove redundant assignment of wc->smac when polling cq
RDMA/hns: Remove redundant qpc setup operations
RDMA/hns: Remove meaningless prints
...
Implement support for setting of packet trap group parameters by
invoking the trap_group_init() callback with the new parameters.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some packet traps are currently exposed to user space as being member of
"l3_drops" trap group, but internally they are member of a different
group.
Switch these traps to use the correct group so that they are all subject
to the same policer, as exposed to user space.
Set the trap priority of packets trapped due to loopback error during
routing to the lowest priority. Such packets are not routed again by the
kernel and therefore should not mask other traps (e.g., host miss) that
should be routed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The policer is now initialized as part of the registration with devlink,
so there is no need to initialize it before the registration.
Remove the initialization.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register supported packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.
Prevent user space from passing invalid policer parameters down to the
device by checking their validity and communicating the failure via an
appropriate extack message.
v2:
* Remove the max/min validity checks from __mlxsw_sp_trap_policer_set()
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare an array of policer IDs to register with devlink and their
associated parameters.
The array is composed from both policers that are currently bound to
exposed trap groups and policers that are not bound to any trap group.
v2:
* Provide max/min rate/burst size when registering policers
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization the driver configures various packet trap groups
and binds policers to them.
Currently, most of these groups are not exposed to user space and
therefore their policers should not be exposed as well. Otherwise, user
space will be able to alter policer parameters without knowing which
packet traps are policed by the policer.
Use a bitmap to track the used policer IDs so that these policers will
not be registered with devlink in a subsequent patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QoS Policer Configuration Register (QPCR) is used to configure
hardware policers. Extend this register with following fields and
defines which will be used by subsequent patches:
1. Violate counter: reads number of packets dropped by the policer
2. Clear counter: to ensure we start counting from 0
3. Rate and burst size limits
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.
In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.
Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.
The next patch will enable user space to change this default binding.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cited commit extended the enums 'hwtstamp_tx_types' and
'hwtstamp_rx_filters' with values that were not accounted for in the
switch statements, resulting in the build warnings below.
Fix by adding a default case.
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c: In function ‘mlxsw_sp_ptp_get_message_types’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:915:2: warning: enumeration value ‘__HWTSTAMP_TX_CNT’ not handled in switch [-Wswitch]
915 | switch (tx_type) {
| ^~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:927:2: warning: enumeration value ‘__HWTSTAMP_FILTER_CNT’ not handled in switch [-Wswitch]
927 | switch (rx_filter) {
| ^~~~~~
Fixes: f76510b458 ("ethtool: add timestamping related string sets")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.
Remove this flag from all drivers that called
devlink_health_reporter_create.
All existing health reporters set auto recovery to true if they have a
recover method.
Yet, administrator can unset auto recover via netlink command as prior to
this patch.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.
$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add mlx5e_rep_indr_setup_ft_cb to support indr block setup
in FT mode.
Both tc rules and flow table rules are of the same format,
It can re-use tc parsing for that, and move the flow table rules
to their steering domain(the specific chain_index), the indr
block offload in FT also follow this scenario.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor indr setup block for support ft indr setup in the
next patch. The function mlx5e_rep_indr_offload exposes
'flags' in order set additional flag for FT in next patch.
Rename mlx5e_rep_indr_setup_tc_block to mlx5e_rep_indr_setup_block
and add flow_setup_cb_t callback parameters in order set the
specific callback for FT in next patch.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
eswitch_offloads_chains.{c,h} were just introduced this kernel release
cycle, eswitch is in high development demand right now and many
features are planned to be added to it. eswitch deserves its own
directory and here we move these new files to there, in preparation for
upcoming eswitch features and new files.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
In VF lag mode when remove the bonding module without bring down the
bond device first, we could potentially have circular dependency when we
unload IB devices and also handle fib events:
1. The bond work starts first;
2. The "modprobe -rv bonding" process tries to release the bond device,
with the "pernet_ops_rwsem" lock hold;
3. The bond work blocks in unregister_netdevice_notifier() and waits for
the lock because fib event came right before;
4. The kernel fib module tries to free all the fib entries by broadcasting
the "FIB_EVENT_NH_DEL" event;
5. Upon the fib event this lag_mp module holds the fib lock and queue a
fib work.
So:
bond work -> modprobe task -> kernel fib module -> lag_mp -> bond work
Today we either reload IB devices in roce lag in nic mode or either handle
fib events in switchdev mode, but a new feature could change that we'll
need to reload IB devices also in switchdev mode so this is a future proof
fix as one may not notice this later.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A recent commit e893768179 ("devlink: prepare to support region
operations") used the region_cr_space_str and region_fw_health_str
variables as initializers for the devlink_region_ops structures.
This can result in compiler errors:
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: error: initializer
element is not constant
.name = region_cr_space_str,
^
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: note: (near
initialization for ‘region_cr_space_ops.name’)
drivers/net/ethernet/mellanox//mlx4/crdump.c:50:10: error: initializer
element is not constant
.name = region_fw_health_str,
The variables were made to be "const char * const", indicating that both
the pointer and data were constant. This was enough to resolve this on
recent GCC (gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1) for this author).
Unfortunately this is not enough for older compilers to realize that the
variable can be treated as a constant expression.
Fix this by introducing macros for the string and use those instead of
the variable name in the region ops structures.
Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Fixes: e893768179 ("devlink: prepare to support region operations")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppress the following smatch errors. None of these are actually
possible with current code paths.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol 'saddrp'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_len'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1221
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_prefix_len'.
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1390
mlxsw_sp_netdevice_ipip_ol_reg_event() error: uninitialized symbol
'ipipt'.
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3255
mlxsw_sp_nexthop_group_update() error: uninitialized symbol 'err'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppress following warning from coccinelle:
drivers/net/ethernet/mellanox/mlxsw//switchx2.c:183:63-68: WARNING:
conversion to bool not needed here
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The static array 'mlxsw_afk_element_infos' in 'core_acl_flex_keys.h' is
copied to each file that includes the header, but not all use it. This
results in the following warnings when compiling with W=1:
drivers/net/ethernet/mellanox/mlxsw//core_acl_flex_keys.h:76:44:
warning: ‘mlxsw_afk_element_infos’ defined but not used
[-Wunused-const-variable=]
One way to suppress the warning is to mark the array with
'__maybe_unused', but another option is to remove it from the header
file entirely.
Change 'struct mlxsw_afk_element_inst' to store the key to the array
('element') instead of the array value keyed by 'element'. Adjust the
different users accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In merge commit 50853808ff ("Merge branch
'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") I flipped mlxsw to use
emulated 802.1Q FIDs and correspondingly emulated VLAN RIFs. This means
that the non-emulated variants are no longer used. Remove them and
suppress the following warnings when compiling with W=1:
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:7572:38: warning:
‘mlxsw_sp_rif_vlan_ops’ defined but not used [-Wunused-const-variable=]
drivers/net/ethernet/mellanox/mlxsw//spectrum_fid.c:584:41: warning:
‘mlxsw_sp_fid_8021q_family’ defined but not used
[-Wunused-const-variable=]
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppress following warnings when compiling with W=1:
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'mlxsw_sp' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'ipip_entry' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'extack' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppress following warning when compiling with W=1:
drivers/net/ethernet/mellanox/mlxsw//i2c.c:78: warning: Function
parameter or member 'cmd' not described in 'mlxsw_i2c'
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky says:
====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_tx_steering':
RDMA/mlx5: Add support for RDMA TX flow table
net/mlx5: Add support for RDMA TX steering
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.
Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.
There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.
The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.
The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.
Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.
By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.
With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.
This work also paves the way to introduce a mechanism for userspace to
request a snapshot.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.
A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.
This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.
Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.
This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.
Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the devlink region code in preparation for adding new operations
on regions.
Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).
This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.
In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_for_each_entry_from_reverse() iterates backwards over the list from
the current position, but in the error path we should start from the
previous position.
Fix this by using list_for_each_entry_continue_reverse() instead.
This suppresses the following error from coccinelle:
drivers/net/ethernet/mellanox/mlxsw//spectrum_mr.c:655:34-38: ERROR:
invalid reference to the index variable of the iterator on line 636
Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offload action pedit ex munge when used with a flower classifier. Only
allow setting of DSCP, ECN, or the whole DSField in IPv4 and IPv6 packets.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QOS_ACTION is used for manipulating the QOS attributes of the packet.
Add the defines and helpers related to DSCP and ECN fields, and dscp_rw.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original idea was to reuse this set of actions for ECN rewrite as well,
but on second look, it's not such a great idea. These two items should each
have its own command. Rename the existing enum to make it obvious that it
belongs to switch_prio_cmd.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Cleanups from Dan Carpenter and wenxu.
2) Paul and Roi, Some minor updates and fixes to E-Switch to address
issues introduced in the previous reg_c0 updates series.
3) Eli Cohen simplifies and improves flow steering matching group searches
and flow table entries version management.
4) Parav Pandit, improves devlink eswitch mode changes thread safety.
By making devlink rely on driver for thread safety and introducing mlx5
eswitch mode change protection.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl58SW8ACgkQSD+KveBX
+j4AxQf8DdrFrBD0NFTcAILS4bnTJC0I3xKRPb/2oYtWLVyJ9G5XAZqHC0DAG7xs
jy8xhIFbeUxgLEdcx0la5vdR1mPlzs4XBHTe99YwzwK/jojrA7YXrlb3kv+RXWVY
uNVAby68wh4EnO61R51ahIBXLPNbiYpo/wAWKvvBKRkOcYMVTKIFiP157AnJWObY
fxnt06I0NFaIX8Va4MEqkrmUYrI4jJcqOJC9FwRBLDhFHcFkLh0Gav3vJJ7M4BCB
ggPJpuZ4pu43qX9TtSOm8V/GlWWN0RB7PdbvliFBEHYG21hf9MfE8bPPKRlB7CO+
B5+9ULhpvbjX7yRrkZ7fd4zlQ1siew==
=Flln
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-03-25
1) Cleanups from Dan Carpenter and wenxu.
2) Paul and Roi, Some minor updates and fixes to E-Switch to address
issues introduced in the previous reg_c0 updates series.
3) Eli Cohen simplifies and improves flow steering matching group searches
and flow table entries version management.
4) Parav Pandit, improves devlink eswitch mode changes thread safety.
By making devlink rely on driver for thread safety and introducing mlx5
eswitch mode change protection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently eswitch mode change is occurring from 2 different execution
contexts as below.
1. sriov sysfs enable/disable
2. devlink eswitch set commands
Both of them need to access eswitch related data structures in
synchronized manner.
Without any synchronization below race condition exist.
SR-IOV enable/disable with devlink eswitch mode change:
cpu-0 cpu-1
----- -----
mlx5_device_disable_sriov() mlx5_devlink_eswitch_mode_set()
mlx5_eswitch_disable() esw_offloads_stop()
esw_offloads_disable() mlx5_eswitch_disable()
esw_offloads_disable()
Hence, they are synchronized using a new mode_lock.
eswitch's state_lock is not used as it can lead to a deadlock scenario
below and state_lock is only for vport and fdb exclusive access.
ip link set vf <param>
netlink rcv_msg() - Lock A
rtnl_lock
vfinfo()
esw->state_lock() - Lock B
devlink eswitch_set
devlink_mutex
esw->state_lock() - Lock B
attach_netdev()
register_netdev()
rtnl_lock - Lock A
Alternatives considered:
1. Acquiring rtnl lock before taking esw->state_lock to follow similar
locking sequence as ip link flow during eswitch mode set.
rtnl lock is not good idea for two reasons.
(a) Holding rtnl lock for several hundred device commands is not good
idea.
(b) It leads to below and more similar deadlocks.
devlink eswitch_set
devlink_mutex
rtnl_lock - Lock A
esw->state_lock() - Lock B
eswitch_disable()
reload()
ib_register_device()
ib_cache_setup_one()
rtnl_lock()
2. Exporting devlink lock may lead to undesired use of it in vendor
driver(s) in future.
3. Unloading representors outside of the mode_lock requires
serialization with other process trying to enable the eswitch.
4. Differing the representors life cycle to a different workqueue
requires synchronization with func_change_handler workqueue.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Subsequent patch protects eswitch mode changes across sriov and devlink
interfaces. It is desirable for eswitch to provide thread safe eswitch
enable and disable APIs.
Hence, extend eswitch enable API to optionally update num_vfs when
requested.
In subsequent patch, eswitch num_vfs are updated after all the eswitch
users eswitch drops its reference count.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to check eswitch state under a lock, prepare code to split
capability check and eswitch state check into two helper functions.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5_register_device() doesn't check for any error and always returns 0.
Simplify mlx5_register_device() to return void and its caller, remove
dead code related to it.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Group version is used when modifying a rule is allowed
(FLOW_ACT_NO_APPEND is clear) to detect a case where the rule was found
but while the groups where unlocked a new FTE was added. In this case,
the added FTE could be one with identical match value so we need to
attempt again with group lock held.
Change the code so version is retrieved only when FLOW_ACT_NO_APPEND is
cleared. As result, later compare can also be avoided if FLOW_ACT_NO_APPEND
is cleared.
Also improve comments text.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
FTE version is not used anywhere in the code so avoid incrementing it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When adding a rule to a flow group we need increment the version of the
group. Current code fails to do that and as a result, when trying to add
a rule, we will fail to discover a case where an FTE with the same match
value was added while we scanned the groups of the same match criteria,
thus we may try to add an FTE that was already added.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of using two different structs for searching groups with the
same match, use a single struct and thus simplify the code, make it more
readable and smaller size which means less code cache misses.
text data bss dec hex
before: 35524 2744 0 38268 957c
after: 35038 2744 0 37782 9396
When testing add 70000 rules, delete all the rules, and repeat three
times taking the average, we get (time in seconds):
Before the change: insert 16.80, delete 11.02
After the change: insert 16.55, delete 10.95
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The correct type is u32.
Fixes: d18296ffd9 ("net/mlx5: E-Switch, Introduce global tables")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We allocate a temporary memory but forget to free it.
Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Register c0 loopback is needed to fully support chains and prios.
Enable chains and prio only if loopback (of reg c1 which came together
with c0), is enabled. To be able to check that, move enabling of loopback
before eswitch chains init.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reg c0/c1 matching, rewrite of regs c0/c1, and copy header of regs c1,B
is needed for the restore table to function, might not be supported by
firmware, and creation of the restore table or the copy header will
fail.
Check reg_c1 loopback support, as firmware which supports this,
should have all of the above.
Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The function mlx5e_rep_setup_ft_cb check chain_index is zero twice.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The actions_match_supported() function returns a bool, true for success
and false for failure. This error path is returning a negative which
is cast to true but it should return false.
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be
triggered. In these scenarios, the RQ is not actually in ERR state.
This misleads the recovery flow which assumes that the RQ is really in
error state and no more completions arrive, causing crashes on bad page
state.
Fixes: 8276ea1353 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In striding RQ mode, the buffers of an RX WQE are first
prepared and posted to the HW using a UMR WQEs via the ICOSQ.
We maintain the state of these in-progress WQEs in the RQ
SW struct.
In the flow of ICOSQ recovery, the corresponding RQ is not
in error state, hence:
- The buffers of the in-progress WQEs must be released
and the RQ metadata should reflect it.
- Existing RX WQEs in the RQ should not be affected.
For this, wrap the dealloc of the in-progress WQEs in
a function, and use it in the ICOSQ recovery flow
instead of mlx5e_free_rx_descs().
Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When resetting the RQ (moving RQ state from RST to RDY), the driver
resets the WQ's SW metadata.
In striding RQ mode, we maintain a field that reflects the actual
expected WQ head (including in progress WQEs posted to the ICOSQ).
It was mistakenly not reset together with the WQ. Fix this here.
Fixes: 8276ea1353 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the
number of WQEBBs on WQE post, and increment the consumer counter (cc)
on completion.
In case of error completions, the cc was mistakenly not incremented,
keeping a gap between cc and pc (producer counter). This failed the
recovery flow on the ICOSQ from a CQE error which timed-out waiting for
the cc and pc to meet.
Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The cap_mask1 isn't protected by field_select and not listed among RW
fields, but it is required to be written to properly initialize ports
in IB virtualization mode.
Link: https://lore.kernel.org/linux-rdma/88bab94d2fd72f3145835b4518bc63dda587add6.camel@redhat.com
Fixes: ab118da4c1 ("net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT command")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.
Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building arm32 allyesconfig:
ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by spectrum_cnt.c
>>> net/ethernet/mellanox/mlxsw/spectrum_cnt.o:(mlxsw_sp_counter_resources_register) in archive drivers/built-in.a
>>> did you mean: __aeabi_uidivmod
>>> defined in: arch/arm/lib/lib.a(lib1funcs.o)
pool_size and bank_size are u64; use div64_u64 so that 32-bit platforms
do not error.
Fixes: ab8c4cc604 ("mlxsw: spectrum_cnt: Move config validation along with resource register")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 53eca1f347 ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5hh6wACgkQSD+KveBX
+j6qvQf9HQsiQ+cE1UIbM/IzyTWXeBMzjljCWFgQfvyKQjSFnoATeVl6GMQJNk7M
ovQ7XHOlN36E/tQW/ypnwbX+btjl/mDEJsxEcvVf4gnw/QH1AwUjo291vPfrE5md
DcWWe9Jrq2MkHeZAgIt/Tnw4GYIwQOBVmYgky3A0+azzuvxK+nXX6JJG5JqRR8Wd
tBsN9UKok1nOt73d+TyHjGnzQHWzGBdS0vlxl0MYcD1QD66UzA0Atgz5aUQLGvTK
Nk/IcHr3SF+0kvbtpjRxlrpi4ywD2gBNHXMJX1DDnCkqg9nWjJ5DByYDASo+I6uL
0fMNsimp/gZG8HD0oFEVTFgaqPACUA==
=DEqe
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-03-05
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v5.4
('net/mlx5: DR, Fix postsend actions write length')
For -stable v5.5
('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
('net/mlx5e: Fix endianness handling in pedit mask')
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offload action skbedit priority when keyed to a flower classifier. The
skb->priority field in Linux is very generic, so only allow setting the
bottom 8 priorities and bounce anything else.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QOS_ACTION is used for manipulating the QoS attributes of a packet.
Add the corresponding defines and helpers, in particular for the
switch_priority override.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization the driver issues a software reset command and
then waits for the system status to change back to "ready" state.
However, before issuing the reset command the driver does not check that
the system is actually in "ready" state. On Spectrum-{1,2} systems this
was always the case as the hardware initialization time is very short.
On Spectrum-3 systems this is no longer the case. This results in the
software reset command timing-out and the driver failing to load:
[ 6.347591] mlxsw_spectrum3 0000:06:00.0: Cmd exec timed-out (opcode=40(ACCESS_REG),opcode_mod=0,in_mod=0)
[ 6.358382] mlxsw_spectrum3 0000:06:00.0: Reg cmd access failed (reg_id=9023(mrsr),type=write)
[ 6.368028] mlxsw_spectrum3 0000:06:00.0: cannot register bus device
[ 6.375274] mlxsw_spectrum3: probe of 0000:06:00.0 failed with error -110
Fix this by waiting for the system to become ready both before issuing
the reset command and afterwards. In case of failure, print the last
system status to aid in debugging.
Fixes: da382875c6 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Compiler warnings and cleanup for the connection tracking series
2) Bug fixes for the connection tracking series
3) Fix devlink port register sequence
4) Last five patches in the series, By Eli cohen
Add the support for forwarding traffic between two eswitch uplink
representors (Hairpin for eswitch), using mlx5 termination tables
to change the direction of a packet in hw from RX to TX pipeline.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5ximgACgkQSD+KveBX
+j4iGwf9FrTxtGjgVXuwmc5LmSU6tak5SjK+dW5PdCw4mNorN2hSJeV/f9evLrf7
7Cxfm4OH8/ivOSpVQz6XZEF0aYTq9T3JakGzAWESWwo/s+i7iwA+lVPYKhcvHXeg
C9ImWbnyDCZkPZM6jz4KNpSMRWkyB7sEtQ51hYF0bdiSzcLSDaLCoKPEljp7sNKb
f1456/yDuOIZ3sb6rYPH6e8EqqfUMiyYAyY3bBu09sl3deXopyueYVqPSPgjOoC7
SfM5K+9nnuQJdvSqwUJLexxDZo1Z7fizz73LwUp0SBLk5zvZdn2bhxbt4wPS/xH/
CSjsWFAs1eu2rDqRH48G3jKqTZeq2w==
=LkML
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-03-17
1) Compiler warnings and cleanup for the connection tracking series
2) Bug fixes for the connection tracking series
3) Fix devlink port register sequence
4) Last five patches in the series, By Eli cohen
Add the support for forwarding traffic between two eswitch uplink
representors (Hairpin for eswitch), using mlx5 termination tables
to change the direction of a packet in hw from RX to TX pipeline.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement occupancy counting for counters and expose over devlink
resource API.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put all init operations related to subpools into
mlxsw_sp_counter_sub_pools_init().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the validation of subpools configuration, to avoid possible over
commitment to resource registration. Add WARN_ON to indicate bug
in the code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement devlink resources support for counter pools. Move the subpool
sizes calculations into the new resources register function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new field to subpool struct that would indicate which
resource id should be used to query the entry size for
the subpool from the device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the global static array of subpools is used. Make it
per-instance as multiple instances of the mlxsw driver can have
different values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bank size is different between Spectrum versions. Also it is
a resource that can be queried. So instead of hard coding the value in
code, query it from the firmware.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.
Remove the "types" part from the new flow_action helpers
and enum values.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not allow forwarding of encapsulated traffic received from one eswtich's
uplink to another eswtich's uplink.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add dependencny on cap termination_table_raw_traffic to allow non
encapsulated packets received from uplink to be forwarded back to the
received uplink port.
Refactor the conditions into a separate function.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Termination tables change the direction of a packet in hw from RX to SX
pipeline. Use that to offload hairpin flows received from uplink and
sent back to uplink.
Currently termination tables are used for pushing VLAN to packets
received from uplink and targeting a VF. Extend the implementation to
allow forwarding packets to uplink. These packets can either be
encapsulated or not.
In case encapsulation is needed before forwarding, move the reformat
object to the termination table as required.
Extend the hash table key to include tunnel information for the sake of
reusing reformat objects.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Don't use termination tables for packets that are steered to the slow path,
as a pre-step for supporting packet encap (packet reformat) action on
termination tables. Packet encap (reformat action) actions steer the packet
to the slow path until outer arp entries are resolved.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Check if QoS is enabled for the eswitch before attempting to configure
QoS parameters and emit a netlink error if not supported.
Introduce an API to check if QoS is supported for the eswitch.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If udevd is configured to rename interfaces according to persistent
naming rules and if a network interface has phys_port_name in sysfs,
its contents will be appended to the interface name.
However, register_netdev creates device in sysfs and if
devlink_port_register is called after that, there is a timeframe in
which udevd may read an empty phys_port_name value. The consequence is
that the interface will lose this suffix and its name will not be
really persistent.
The solution is to register the port before registering a netdev.
Fixes: c6acd629ee ("net/mlx5e: Add support for devlink-port in non-representors mode")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The original condition rejected all egress rules that
are not on tunnel device.
Also, the whole point of this egress reject was to disallow bad
rules because of egdev which doesn't exists today, so remove
this check entirely.
Fixes: 0a7fcb78cc ("net/mlx5e: Support inner header rewrite with goto action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Register loopback which is needed for tunnel restoration, is now always
enabled if supported and not just with metadata enabled, check for
that instead.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix the following warnings: [-Werror=frame-larger-than=]
In function ‘mlx5_tc_ct_entry_add_rule’:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:541:1:
error: the frame size of 1136 bytes is larger than 1024 bytes
In function ‘__mlx5_tc_ct_flow_offload’:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:1049:1:
error: the frame size of 1168 bytes is larger than 1024 bytes
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
If CONFIG_MLX5_TC_CT isn't enabled, all offloading of eswitch tc rules
fails on parsing ct match, even if there is no ct match.
Return success if there is no ct match, regardless of config.
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:
In function mlx5_tc_ct_parse_match:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:699:36: warning:
variable unnew set but not used [-Wunused-but-set-variable]
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Restore modify header writes the chain mapping on the packet.
This modify header and action is added on all prios connections,
and gets overwritten with the same value consecutively in prios
of the same chain.
Use the chain's modify header only for the last prio of a given tc
chain.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, if firmware doesn't support fwd and modify, driver fails
initializing eswitch chains while entering switchdev mode.
Instead, on such cases, disable the chains and prio feature (as we can't
restore the chain on miss) and the usage of fwd and modify.
Fixes: 8f1e0b97cc ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When CONFIG_MLX5_ESWITCH is unset, clang warns:
In file included from drivers/net/ethernet/mellanox/mlx5/core/main.c:58:
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:670:1: warning: unused
function 'esw_add_restore_rule' [-Wunused-function]
esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag)
^
1 warning generated.
This stub function is missing inline; add it to suppress the warning.
Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Clang warns:
../drivers/net/ethernet/mellanox/mlx5/core/mr.c:63:21: warning: variable
'key' is uninitialized when used here [-Wuninitialized]
mkey_index, key, mkey->key);
^~~
../drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h:54:6: note:
expanded from macro 'mlx5_core_dbg'
##__VA_ARGS__)
^~~~~~~~~~~
../include/linux/dev_printk.h:114:39: note: expanded from macro
'dev_dbg'
dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~
../include/linux/dynamic_debug.h:158:19: note: expanded from macro
'dynamic_dev_dbg'
dev, fmt, ##__VA_ARGS__)
^~~~~~~~~~~
../include/linux/dynamic_debug.h:143:56: note: expanded from macro
'_dynamic_func_call'
__dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
^~~~~~~~~~~
../include/linux/dynamic_debug.h:125:15: note: expanded from macro
'__dynamic_func_call'
func(&id, ##__VA_ARGS__); \
^~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/mr.c:47:8: note: initialize
the variable 'key' to silence this warning
u8 key;
^
= '\0'
1 warning generated.
key's initialization was removed in commit fc6a9f86f0 ("{IB,net}/mlx5:
Assign mkey variant in mlx5_ib only") but its use was not fully removed.
Remove it now so that there is no more warning.
Fixes: fc6a9f86f0 ("{IB,net}/mlx5: Assign mkey variant in mlx5_ib only")
Link: https://github.com/ClangBuiltLinux/linux/issues/932
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Tariq Toukan <tariqt@mellanox.com>
To: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cited commit set a value of 2^31-1 in order to "disable" the shaper
on a given a port. However, the length of the maximum shaper rate field
was not updated from 28 bits to 31 bits, which means ports are still
limited to ~268Gbps despite supporting speeds of 400Gbps.
Fix this by increasing the field's length.
Fixes: 92afbfedb7 ("mlxsw: reg: Increase MLXSW_REG_QEEC_MAS_DIS")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RED ECN nodrop mode means that non-ECT traffic should not be early-dropped,
but enqueued normally instead. In Spectrum systems, this is achieved by
disabling CWTPM.ew (enable WRED) for a given traffic class.
So far CWTPM.ew was unconditionally enabled. Instead disable it when the
RED qdisc is in nodrop mode.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dummy functions declaration, the dummy functions are not needed
since fs_dr is the only one to call mlx5dr and both fs_dr and dr files
depend on the same config flag (MLX5_SW_STEERING).
Fixes: 70605ea545 ("net/mlx5: DR, Expose APIs for direct rule managing")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This action allows to go to a flow table based on the table id.
Goto flow table id is required for supporting user space SW.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
All callers needs to work on mlx5_core_dev and it is already derived
before calling mlx5_devlink_eswitch_check().
Hence, accept mlx5_core_dev in mlx5_devlink_eswitch_check().
Given that it works on mlx5_core_dev change helper function name to
drop devlink prefix.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Annotate mutex destroy to keep it symmetric to init sequence.
It should be destroyed after its users (representor netdevices) are
destroyed in below flow.
esw_offloads_disable()
esw_offloads_unload_rep()
Hence, initialize the mutex before creating the representors which uses
it.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow passing NULL spec when creating a flow rule. Such rules will act
as "catch all" flow rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Following introduction of per vport configuration of vport and rep,
unload all reps per rep type is still needed as IB reps can be
unloaded individually. However, a few internal functions exist purely
for this purpose, merge them to a single function.
This patch doesn't change any existing functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, ECPF eswitch manager does one-time only configuration for
VF vports when device switches to offloads mode. However, when num of
VFs changed from host side, driver doesn't update VF vports
configurations.
Hence, perform VFs vport configuration update whenever num_vfs change
event occurs.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Both legacy and offload modes require vport setup, only offload mode
requires rep setup. Before this patch, vport and rep operations are
separated applied to all relevant vports in different stages.
Change to use per vport configuration, so that vport and rep operations
are modularized per vport.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Vport enable and disable sequence is incorrect. It should be:
enable()
esw_vport_setup_acl,
esw_vport_setup,
esw_vport_enable_qos.
disable()
esw_vport_disable_qos,
esw_vport_cleanup,
esw_vport_cleanup_acl.
Instead of having two setup functions for port and acl, merge
acl setup to port setup function.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename esw_apply_vport_config() to esw_vport_setup(), and add new
helper function esw_vport_cleanup() to make them symmetric.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
esw_vport_enable_qos can return error in cases below:
1. QoS is already enabled. Warnning is useless in this case.
2. Create scheduling element cmd failed. There is already a warning.
Remove the redundant warnning if esw_vport_enable_qos returns err.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Consider scenario below, CPU 1 is at risk to query already destroyed
drop counters. Need to apply the same state mutex when disabling vport.
+-------------------------------+-------------------------------------+
| CPU 0 | CPU 1 |
+-------------------------------+-------------------------------------+
| mlx5_device_disable_sriov | mlx5e_get_vf_stats |
| mlx5_eswitch_disable | mlx5_eswitch_get_vport_stats |
| esw_disable_vport | mlx5_eswitch_query_vport_drop_stats |
| mlx5_fc_destroy(drop_counter) | mlx5_fc_query(drop_counter) |
+-------------------------------+-------------------------------------+
Fixes: b8a0dbe3a9 ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
esw_vport_create_legacy_acl_tables bails out immediately for eswitch
manager, hence remove all the check of esw manager cap after.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Leon Romanovsky says:
====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_mr-cache':
RDMA/mlx5: Allow MRs to be created in the cache synchronously
RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
RDMA/mlx5: Fix locking in MR cache work queue
RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
RDMA/mlx5: Fix MR cache size and limit debugfs
RDMA/mlx5: Always remove MRs from the cache before destroying them
RDMA/mlx5: Simplify how the MR cache bucket is located
RDMA/mlx5: Rename the tracking variables for the MR cache
RDMA/mlx5: Replace spinlock protected write with atomic var
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.
To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Clear action, as with software, removes all ct metadata from
the packet.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark packets with a unique tupleid, and on miss use that id to get
the act ct restore_cookie. Using that restore cookie, we ask CT to
restore the relevant info on the SKB.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register driver callbacks with the nf flow table platform.
FT add/delete events will create/delete FTE in the CT/CT_NAT tables.
Restoring the CT state on miss will be added in the following patch.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading tc ct action and ct matches.
We translate the tc filter with CT action the following HW model:
+-------------------+ +--------------------+ +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +----->
+ original match + | + tuple + zone match + | + fte_id match + |
+-------------------+ | +--------------------+ | +--------------+ |
v v v
set chain miss mapping set mark original
set fte_id set label filter
set zone set established actions
set tunnel_id do nat (if needed)
do decap
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we write chain register mapping on miss from the the last
prio of a chain. It is used to restore the chain in software.
To support re-using the chain register mapping from global tables (such
as CT tuple table) misses, export the chain mapping.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FTEs in global tables may match on packets from multiple in_ports.
Provide the capability to omit the in_port match condition.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, flow tables are automatically connected according to their
<chain,prio,level> tuple.
Introduce global tables which are flow tables that are detached from the
eswitch chains processing, and will be connected by explicitly referencing
them from multiple chains.
Add this new table type, and allow connecting them by refenece.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The eswitch offloads table, which has the reps (vport) rx miss rules,
was moved from OFFLOADS namespace [0,0] (prio, level), to [1,0], so
the restore table (the new [0,0]) can come before it. The destinations
of these miss rules is the rep root ft (ttc for non uplink reps).
Uplink rep root ft is created as OFFLOADS namespace [0,1], and is used
as a hook to next RX prio (either ethtool or ttc), but this fails to
pass fs_core level's check.
Move uplink rep root ft to OFFLOADS prio 1, level 1 ([1,1]), so it
will keep the same relative position after the restore table
change.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable reg c1 loopback if firmware reports it's supported,
as this is needed for restoring packet metadata (e.g chain).
Also define helper to query if it is enabled.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention of this helper was to allow driver to specify one type
that it supports, so not only "any" value would pass. So make the API
more strict and allow driver to pass only 1 bit that is going
to be checked.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yishai Hadas Says:
====================
Expose raw packet pacing APIs to be used by DEVX based applications. The
existing code was refactored to have a single flow with the new raw APIs.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_packet_pacing':
IB/mlx5: Introduce UAPIs to manage packet pacing
net/mlx5: Expose raw packet pacing APIs
Reuse infrastructure that already exists for pf in legacy mode to show/set
Rx network flow classification rules for uplink representors.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
During transition to uplink representors the code responsible for
initializing ethtool steering functionality wasn't added to representor
init rx routine. This causes NULL pointer dereference during configuration
of network flow classification rule with ethtool (only possible to
reproduce with next commit in this series which registers necessary ethtool
callbacks).
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reuse infrastructure that already exists for pf in legacy mode to show/set
Rx flow hash indirection table and RSS hash key for uplink representors.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Uplink representor traffic will be redirected to an empty root ft rather
than directly to a direct tir or ttc table, this root ft will be empty and
will be used as a link for auto-chaining with ttc table or ethtool tables
in downstream patches.
On load, fs core will connect uplink rep root_ft with ttc table. In case
ethtool steering will be used, fs core will auto connect root_ft with
the ethtool bypass tables, which will be connected with the ttc table.
vport_rx_rule[uplink_rep]->root_ft->ethtool->ttc.
For non-uplink representors, for simplicity root_ft will always point at
ttc table, hence the replace vport_rx rule logic is removed.
vport_rx_rule[non_uplink_rep]->root_ft(ttc).
For now ethtool steering support can only be available on uplink rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
mlx5_eswitch_inline_mode_get() is used only in eswitch_offloads.c.
Hence, make it static and adjacent to its caller function.
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of giving ft tables one of the largest tables available - 4M,
give it a more reasonable size - 64k. Especially since it will
always be created as a miss hook in the following patch.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The esw_vport_tbl_get() function returns error pointers on error.
Fixes: 96e326878f ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
According to PRM, forward to flow table along with either packet
reformat or decap is supported only if reformat_and_fwd_to_table
capability is set for the flow table.
Add dependency on the capability and pack all the conditions for "goto
chain" in a single function.
Fix language in error message in case of not supporting forward to a
lower numbered flow table.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Multi-port RoCE mode requires tagging traffic that passes through the
vport.
This matching can cause performance degradation, therefore disable it
and use the legacy matching on vhca_id and source_port when possible.
Fixes: 92ab1eb392 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use reverse chirstmas tree inside mlx5e_ethtool_get_link_ksettings.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When port speed can't be reported based on ext_eth_proto_capability
or eth_proto_capability instead of reporting speed as unknown check
if the port's speed can be inferred based on the data_rate_oper field.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This series adds some HW bits and definitions for mlx5 driver, to be
used by downstream features in both rdma and netdev branches.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: HW bit for goto chain offload support
net/mlx5: Expose link speed directly
net/mlx5: Introduce TLS and IPSec objects enums
net/mlx5: Introduce egress acl forward-to-vport capability
net/mlx5: Expose raw packet pacing APIs
net/mlx5e: Replace zero-length array with flexible-array member
net/mlx5: fix spelling mistake "reserverd" -> "reserved"
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce new type for disabled HW stats and allow the value in
mlxsw offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set a flag in case rule counter was created. Only query the device for
stats of a rule, which has the valid counter assigned.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new type for delayed HW stats and allow the value in
mlx5 offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new type for immediate HW stats and allow the value in
mlxsw offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently don't allow actions with any other type to be inserted.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As there is one set of counters for the whole action chain, forbid to
mix the HW stats types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce flow_action_basic_hw_stats_types_check() helper and use it
in drivers. That sanitizes the drivers which do not have support
for action HW stats types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose the TLS encryption key general object type enum correctly,
and add the IPSec encryption key general object type enum.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After returning from unregister_netdevice_notifier_dev_net(), set the
notifier_call field to NULL so successive call to mlx5_lag_add() will
function as expected.
Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The mask value is provided as 64 bit and has to be casted in
either 32 or 16 bit. On big endian systems the wrong half was
casted which resulted in an all zero mask.
Fixes: 2b64beba02 ("net/mlx5e: Support header re-write of partial fields in TC pedit offload")
Signed-off-by: Sebastian Hense <sebastian.hense1@ibm.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix to match the HW spec: TRACKING state is 1, SEARCHING is 2.
No real issue for now, as these values are not currently used.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We have an off-by-1 issue in the TCP seq comparison.
The last sequence number that belongs to the TCP packet's payload
is not "start_seq + len", but one byte before it.
Fix it so the 'ends_before' is evaluated properly.
This fixes a bug that results in error completions in the
kTLS HW offload flows.
Fixes: ffbd9ca94e ("net/mlx5e: kTLS, Fix corner-case checks in TX resync flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix the send info write length to be (actions x action) size in bytes.
Fixes: 297cccebdc ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There are two peculiarities about offloading FIFO:
- sometimes the qdisc has an unspecified handle (it is "invisible")
- it may be created before the qdisc that it will be a child of
These features make the offload a bit more tricky. The approach chosen in
this patch is to make note of all the FIFOs that needed to be rejected
because their parents were not known. Later when the parent is created,
they are offloaded
FIFO is only offloaded for its counters, queue length is ignored.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PRIO and ETS will need to check the value of qdisc handle in their
handlers. Add it to the callback and propagate through.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have a tidy structure where to put information related to Qdisc
offloads, introduce a new structure. Move there the two existing pieces of
data: root_qdisc and tclass_qdiscs. Embed them directly, because there's no
reason to go through pointer anymore. Convert users, update init/fini
functions.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
v3: adjust commit message for new member name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose raw packet pacing APIs to be used by DEVX based applications.
The existing code was refactored to have a single flow with the new raw
APIs.
The new raw APIs considered the input of 'pp_rate_limit_context', uid,
'dedicated', upon looking for an existing entry.
This raw mode enables future device specification data in the raw
context without changing the existing logic and code.
The ability to ask for a dedicated entry gives control for application
to allocate entries according to its needs.
A dedicated entry may not be used by some other process and it also
enables the process spreading its resources to some different entries
for use different hardware resources as part of enforcing the rate.
The counter per entry was changed to be u64 to prevent any option to
overflow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use newly introduce 'virtual' port flavour for devlink
port of PCI VF devlink device in non-representors mode.
While at it, remove recently introduced empty lines at end of the file.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 misc updates and minor cleanups:
1) Use per vport tables for mirroring
2) Improve log messages for SW steering (DR)
3) Add devlink fdb_large_groups parameter
4) E-Switch, Allow goto earlier chain
5) Don't allow forwarding between uplink representors
6) Add support for devlink-port in non-representors mode
7) Minor misc cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5YYYwACgkQSD+KveBX
+j7vIQgAsFQSjJGdyUzVDfeHu94Ze79OMRstuEh9fbcNXeyawMTShPLPE94E91qo
Nj6AfZp8sH9M6fLtFcOMrPou87ITslP3mQ/C2cYWWJn/iq9RN13kO+howZNeRhlk
iettT8qHGYR7aU3tBU38pc9/bWVHRPt+FZGBeCAcvwgQ1zfjBEtPcMJ8svjsfETA
0bzEDEq0eKEaynHr3phJpzbcnvzm3154HkfZtDZFAApgr4tpEGSlFa4n+8ctcmqy
zf82HH8GnB+GhTUPU3W33NwyH2otHOcE3dPAepQNmS3f8EemMJebAtjkexo6SfjY
rwUs5qTU05myOy5RPmzwhQsnzRyPBw==
=z2AA
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-02-27
mlx5 misc updates and minor cleanups:
1) Use per vport tables for mirroring
2) Improve log messages for SW steering (DR)
3) Add devlink fdb_large_groups parameter
4) E-Switch, Allow goto earlier chain
5) Don't allow forwarding between uplink representors
6) Add support for devlink-port in non-representors mode
7) Minor misc cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The mptcp conflict was overlapping additions.
The SMC conflict was an additional and removal happening at the same
time.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code is self explanatory and makes the comment redundant.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_tc_offload_to_slow_path() and mlx5e_tc_unoffload_from_slow_path()
take an extra argument allocated on the stack of the caller but not used
by the caller. Avoid the extra argument and use local variable in the
function itself.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
parse_attr is not used by parse_tc_pedit_action() so revmove it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This to be consistent and adds the module name to the error message.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This is for added netdev prefix that helps identify
the source of the message.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This helps identify the source of the message.
If netdev still doesn't exists use mlx5_core_warn().
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Few print messages are in debug level where they should be in error, and
few messages are missing.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Change matcher priority parameter type from u16 to u32,
this change is needed since sometimes upper levels
create a matcher with priority bigger than 2^16.
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a devlink parameter to control the number of large groups in a
autogrouped flow table. The default value is 15, and the range is between 1
and 1024.
The size of each large group can be calculated according to the following
formula: size = 4M / (fdb_large_groups + 1).
Examples:
- Set the number of large groups to 20.
$ devlink dev param set pci/0000:82:00.0 name fdb_large_groups \
cmode driverinit value 20
Then run devlink reload command to apply the new value.
$ devlink dev reload pci/0000:82:00.0
- Read the number of large groups in flow table.
$ devlink dev param show pci/0000:82:00.0 name fdb_large_groups
pci/0000:82:00.0:
name fdb_large_groups type driver-specific
values:
cmode driverinit value 20
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The prefix should be "MLX5_DEVLINK_PARAM_ID_" for all in
mlx5_devlink_param_id enum.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Added devlink_port field to mlx5e_priv structure and a callback to
netdev ops to enable devlink to get info about the port. The port
registration happens at driver initialization.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename representor's mlx5e_get_devlink_port() to
mlx5e_rep_get_devlink_port().
The downstream patch will add a non-representor mlx5e function called
mlx5e_get_devlink_phy_port().
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mellanox FW can support this if ignore_flow_level capability exists.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When using port mirroring, we forward the traffic to another table and
use that table to forward to the mirrored vport. Since the hardware
loses the values of reg c, and in particular reg c0, we fail the match
on the input vport which previously existed in reg c0. To overcome this
situation, we use a set of per vport tables, positioned at the lowest
priority, and forward traffic to those tables. Since these tables are
per vport, we can avoid matching on reg c0.
Fixes: c01cfd0f11 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
misc_params.source_port is a 16 bit field already so no need for
redundant masking against 0xffff. Also change local variables type to
u16.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We can install forwarding packets rule between uplink
in switchdev mode, as show below. But the hardware does
not do that as expected (mlnx_perf -i $PF1, we can't get
the counter of the PF1). By the way, if we add the uplink
PF0, PF1 to Open vSwitch and enable hw-offload, the rules
can be offloaded but not work fine too. This patch add a
check and if so return -EOPNOTSUPP.
$ tc filter add dev $PF0 protocol all parent ffff: prio 1 handle 1 \
flower skip_sw action mirred egress redirect dev $PF1
$ tc -d -s filter show dev $PF0 ingress
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device enp130s0f1) stolen
...
Sent hardware 408954 bytes 4173 pkt
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
During initialization the driver issues a reset to the device and waits
for 100ms before checking if the firmware is ready. The waiting is
necessary because before that the device is irresponsive and the first
read can result in a completion timeout.
While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is
insufficient for Spectrum-3.
Fix this by increasing the timeout to 200ms.
Fixes: da382875c6 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are couple new values that PMTM register can return
in module_type field. Add them and map them to module width in
mlxsw_core_module_max_width(). Fix the existing names on the way.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code causes problems when the unregistering netdevice could
be different then the registering one.
Since the check in mlx5_lag_netdev_event() does not allow any other
network namespace anyway, fix this by registerting the lag notifier
per init network namespace only.
Fixes: d48834f9d4 ("mlx5: Use dev_net netdevice notifier registrations")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Aya Levin <ayal@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The buffer factor on Spectrum-3 is larger than on Spectrum-2. Add a new
callback and use it for mlxsw_sp->span_ops on Spectrum-3.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During port initialization the driver instructs the device to only
advertise speeds that can be supported by the port's current width.
Since the device now returns the supported speeds based on the port's
current width, the driver no longer needs to compute the speeds that can
be advertised.
Simplify port initialization by setting the advertised speeds to the
queried supported speeds.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-1 and Spectrum-2 do not have a per-TC counter of number of packets
marked by ECN. The value reported currently is the total number of marked
packets. Showing this value at individual TC Qdiscs is misleading.
Move the counter to ethtool instead.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, only one SFN query is done from repetitive work at a time,
processing 64 entries. Another work iteration is scheduled in 100ms,
that means that the max rate of learned FDB entries is limited to 6400/s.
That is slow. Fix this by doing 2 optimizations:
1) Run 10 SFN queries at a time.
2) In case the SFN is not drained, schedule work with 0 delay to allow
to continue processing rest of the records.
On a testing setup with 500K entries the time to process decreased
from 870secs to 10secs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
In file included from
../drivers/net/ethernet/mellanox/mlx5/core/main.c:73:
../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning:
'__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define
of a different macro [-Wheader-guard]
#ifndef __MLX5_RSC_DUMP_H
^~~~~~~~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note:
'__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'?
#define __MLX5_RSC_DUMP__H
^~~~~~~~~~~~~~~~~~
__MLX5_RSC_DUMP_H
1 warning generated.
Make them match to get the intended behavior and remove the warning.
Fixes: 12206b1723 ("net/mlx5: Add support for resource dump")
Link: https://github.com/ClangBuiltLinux/linux/issues/897
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We can avoid an indirect call per compressed completion wrapping the
completion handling call with the appropriate helper.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We can avoid an indirect call per NAPI cycle wrapping the RX descriptors
posting call with the appropriate helper.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current steps that are performed when the trust state changes, if
the channels are active:
1. The trust state is changed in hardware.
2. The new inline mode is calculated.
3. If the new inline mode is different, the channels are recreated using
the new inline mode.
This approach has some issues:
1. There is a time gap between changing trust state in hardware and
starting sending enough inline headers (the latter happens after
recreation of channels). It leads to failed transmissions and error
CQEs.
2. If the new channels fail to open, we'll be left with the old ones,
but the hardware will be configured for the new trust state, so the
interval when we can see TX errors never ends.
This patch fixes the issues above by moving the trust state change into
the preactivate hook that runs during the recreation of the channels
when no channels are active, so it eliminates the gap of partially
applied configuration. If the inline mode doesn't change with the change
of the trust state, the channels won't be recreated, just like before
this patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more
parameters than just struct mlx5e_priv *. For such cases, a new
parameter (void *context) is added to preactivate hooks.
Some of the existing normal functions are currently used as preactivate
callbacks. To avoid adding an extra unused parameter, they are wrapped
in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX
macro.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently mlx5e_switch_priv_channels expects that the preactivate hook
doesn't fail, however, it can fail, because it may set hardware
parameters. This commit addresses this issue and provides a way to
recover from failures of the preactivate hook: the old channels are not
closed until the point where nothing can fail anymore, so in case
preactivate fails, the driver can roll back the old channels and
activate them again.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The number of queues is now updated by mlx5e_update_netdev_queues in a
centralized way, when no channels are active. Remove an extra occurrence
of netif_set_real_num_tx_queues to prepare it for the next commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, mlx5e notifies the kernel about the number of queues and sets
the default XPS cpumasks when channels are activated. This
implementation has several corner cases, in which the kernel may not be
updated on time, or XPS cpumasks may be reset when not directly touched
by the user.
This commit fixes these corner cases to match the following expected
behavior:
1. The number of queues always corresponds to the number of channels
configured.
2. XPS cpumasks are set to driver's defaults on netdev attach.
3. XPS cpumasks set by user are not reset, unless the number of channels
changes. If the number of channels changes, they are reset to driver's
defaults. (In general case, when the number of channels increases or
decreases, it's not possible to guess how to convert the current XPS
cpumasks to work with the new number of channels, so we let the user
reconfigure it if they change the number of channels.)
XPS cpumasks are no longer stored per channel. Only one temporary
cpumask is used. The old stored cpumasks didn't reflect the user's
changes and were not used after applying them.
A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t
requires allocation, and the preactivate hook can't fail, we need to
preallocate the temporary cpumask in advance. It's stored in the
scratchpad.
Fixes: 149e566fef ("net/mlx5e: Expand XPS cpumask to cover all online cpus")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_ethtool_set_channels updates the indirection table before
switching to the new channels. If the switch fails, the indirection
table is new, but the channels are old, which is wrong. Fix it by using
the preactivate hook of mlx5e_safe_switch_channels to update the
indirection table at the stage when nothing can fail anymore.
As the code that updates the indirection table is now encapsulated into
a new function, use that function in the attach flow when the driver has
to reduce the number of channels, and prepare the code for the next
commit.
Fixes: 85082dba0a ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5e_safe_switch_channels accepts a callback to be called before
activating new channels. It is intended to configure some hardware
parameters in cases where channels are recreated because some
configuration has changed.
Recently, this callback has started being used to update the driver's
internal MLX5E_STATE_XDP_OPEN flag, and the following patches also
intend to use this callback for software preparations. This patch
renames the hw_modify callback to preactivate, so that the name fits
better.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As a preparation for one of the following commits, create a function to
encapsulate the code that notifies the kernel about the new amount of
RX and TX queues. The code will be called multiple times in the next
commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The LRO boolean state in params->lro_en must not be set in case
the NIC is not capable.
Enforce this check and remove the TODO comment.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We shall always extract channel index out of the txq, regardless
of the relation between txq_ix and num channels. The extraction is
always valid, as if txq is smaller than number of channels,
txq_ix == priv->txq2sq[txq_ix]->ch_ix.
By doing so, we can remove an if clause from the select queue method,
and have one flow for all packets.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the cookie index received along with the packet to lookup original
flow_offload cookie binary and pass it down to devlink_trap_report().
Add "fa_cookie" metadata to the ACL trap.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the received packet comes in due to one of ACL discard traps,
take the user_def_val_orig_pkt_len field from CQE and store it
in skb->cb as ACL cookie index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Track cookies coming down to driver by flow_offload.
Assign a cookie_index to each unique cookie binary. Use previously
defined "Trap with userdef" flex action to ask HW to pass cookie_index
alongside with the dropped packets.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose "Trap action with userdef". It is the same as already
defined "Trap action" with a difference that it would ask the policy
engine to pass arbitrary value (userdef) alongside with received packets.
This would be later on used to carry cookie index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap group used to report ACL drops. Setup the trap IDs for
ingress/egress flow action drop. Register the two packet traps
associated with ACL trap group with devlink during driver
initialization. As these are "source traps", set the disabled
trap group to be the dummy, discarding as many packets in HW
as possible.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For "source traps" it is not possible to change HPKT action to discard.
But there is still need to disallow packets arriving to CPU as much as
possible. Handle this by introduction of a "dummy group". It has a
"thin" policer, which passes as less packets to CPU as possible. The
rest is going to be discarded there. The "dummy group" is to be used
later on by ACL trap (which is a "source trap").
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the mlxsw_listener struct to contain trap group for disabled
traps too. Rename the original "trap_group" item to "en_trap_group" as
it represents enabled state. Let both groups be the same for MLXSW_RXL
however extend MLXSW_RXL_DIS to register separate groups for enable and
disable.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For source traps, the "thin policer" is going to be used in order
to reduce the amount of trapped packets to minimum. However, there
will be still small number of packets coming in that need to be dropped
in the driver. Allow to enable/disable rx_listener related to specific
trap in order to prevent unwanted packets to go up the stack.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new set of traps:
DISCARD_INGRESS_ACL and DISCARD_EGRESS_ACL
Set the trap_action from NOP to TRAP which causes the packets dropped
by the TRAP action to be trapped under new trap IDs, depending on the
ingress/egress binding point.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>