Add common code to handle all policer-related functionality in mlxsw.
Currently, only policer for policy engines are supported, but it in the
future more policer families will be added such as CPU (trap) policers
and storm control policers.
The API allows different modules to add / delete policers and read their
drop counter.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a resource identifier for maximum global policers so that it could
be later used to query the information from firmware.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add policer bandwidth limits for both rate and burst size so that they
could be enforced by a later patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Rx listener abstraction allows the switch driver (e.g.,
mlxsw_spectrum) to register a function that is called when a packet is
received (trapped) for a specific reason.
Up until now, the Rx listener lookup was solely based on the trap
identifier. However, when a packet is mirrored to the CPU the trap
identifier merely indicates that the packet was mirrored, but not why it
was mirrored. This makes it impossible for the switch driver to register
different Rx listeners for different mirror reasons.
Solve this by allowing the switch driver to register a Rx listener with
a mirror reason and by extending the Rx listener lookup to take the
mirror reason into account.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the mirror reason is valid, retrieve it into the Rx information
so that it could be used during listener lookup in a later patch.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Completion Queue Element version 2 (CQEv2) includes a field called
'mirror_reason' which indicates why the packet was mirrored to the CPU.
Add the field so that it can be used by a later patch.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets that are mirrored to the CPU port are trapped with one of eight
trap identifiers. Add them.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trap identifier was increased to 10 bits in new versions of the
Programmer's Reference Manual (PRM).
Increase it accordingly in the Host PacKet Trap (HPKT) register and in
the Completion Queue Element (CQE).
This is significant for subsequent patches that will introduce trap
identifiers which utilize the extended range.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mirroring packets to the CPU port the mirrored packets are trapped
to the CPU. However, unlike other traps, it is not possible to set a
policer on the associated trap group. Instead, the policer needs to be
set on the SPAN agent.
Moreover, the policer ID must be within a specified range: From a
configurable (even) base ID to this base plus the maximum number of SPAN
agents.
While the immediate use case is to set the policer on a SPAN agent that
mirrors to the CPU port, a policer can be set on any SPAN agent.
Therefore, the operation is implemented for all SPAN agent types.
Extend the SPAN agent request API to allow passing the desired policer
ID that should be bound to the SPAN agent. Return an error for
Spectrum-1, as it does not support policer setting on a SPAN agent.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the only parameter of a SPAN agent is the netdev which
the SPAN agent should mirror to.
The next patch will add the ability to request a SPAN agent that mirrors
to a specific netdev and has a specific policer ID bound to it. This is
required when mirroring packets to the CPU port.
Therefore, encapsulate the sole parameter to mlxsw_sp_span_agent_get()
in a structure, so that it could later be extended with policer
information.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum-2 and Spectrum-3 ASICs are able to mirror packets towards
the CPU. These packets are then trapped like any other packet, but with
a special packet trap and additional metadata such as why the packet was
mirrored.
The ability to mirror packets towards the CPU will be utilized by a
subsequent patch set that will mirror packets that were dropped by the
ASIC for various buffer-related reasons, such as tail-drop and
early-drop.
Add mirroring towards the CPU as a new SPAN agent type and re-use the
functions that mirror to a physical port where possible.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the destination netdev to which we mirror must be a valid
netdev. However, this is going to change with the introduction of
mirroring towards the CPU port, as the CPU port does not have a backing
netdev.
Avoid dereferencing the destination netdev when it is not clear if it is
valid or not.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The parms_set() callback is supposed to fill in the parameters for the
SPAN agent, such as the destination port and encapsulation info, if any.
When mirroring to the CPU port we cannot resolve the destination port
(the CPU port) without access to the driver private info.
Pass the driver private info to parms_set() callback so that it could be
used later on to resolve the CPU port.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The various SPAN agent types differ in their mirror targets (i.e.,
physical port netdev vs. VLAN netdev) and the encapsulation headers that
they need to encapsulate the mirrored packets with.
The Spectrum-2 and Spectrum-3 ASICs support a SPAN agent type that is
able to mirror towards the CPU, whereas the Spectrum-1 ASIC does not.
Prepare for the addition of this new SPAN agent type by splitting the
SPAN agent operations to be per-ASIC.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow setting mirroring_pid_base using MOGCR register.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow setting session_id and pid as part of port analyzer
configurations.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RED qevents early_drop and mark can be offloaded under the following
fairly strict conditions:
- At most one filter is configured at the qevent block
- The protocol is "any"
- The classifier is matchall
- The action is trap, sample, or mirror with the same conditions as
with other SPAN offloads
- The hw_counters type is none
In this patchset, implement offload of mirror for early_drop qevent.
The ECN trigger is currently not implemented in the FW and therefore
the mark qevent is not supported.
The qevent notifications look exactly like regular block binding
notifications with a binder type that identifies them as qevents.
Therefore the details of processing this binding are fairly similar
to the matchall offload.
struct flow_block_offload.sch points at the qdisc in question. Use it to
figure out if the qdisc is offloaded at all and what TC it configures.
Bounce bindings on not-offloaded qdiscs.
Individual bindings are kept in a list so that several qevents can share
the same block and all binding points get configured as the configured
filters change.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two RED qevents have been introduced recently. From the point of view of a
driver, qevents are simply blocks with unusual binder types. However they
need to be handled by different logic than ACL-like flows.
Thus rename mlxsw_sp_setup_tc_block() to mlxsw_sp_setup_tc_block_clsact()
and move the binder-type dispatch from there to spectrum.c into a new
function of the original name. The new dispatcher is easier to extend with
new binder types.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A following patch introduces offloading of filters attached to blocks bound
to the RED tail_drop qevent. The only classifier that mlxsw will permit in
this role is matchall. mlxsw currently offloads matchall filters used with
clsact qdisc. The data structures used for that offload will come handy for
the qevent offload as well. Publish them in spectrum.h.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field "dev" in struct mlxsw_sp_flow_block_binding is not used. Drop it.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No clean-up is performed at the target label of this goto. Convert it to a
direct return.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the binding of global mirroring triggers to a SPAN agent is
global, packets are only mirrored if they belong to a port and TC on
which the trigger was enabled. This allows, for example, to mirror
packets that were tail-dropped on a specific netdev.
Implement the operations that allow to enable / disable a global
mirroring trigger on a specific port and TC.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Global mirroring triggers are triggers that are only keyed by their
trigger, as opposed to per-port triggers, which are keyed by their
trigger and port.
Such triggers allow mirroring packets that were tail/early dropped or
ECN marked to a SPAN agent.
Implement the previously added trigger operations for these global
triggers. Since such triggers are only supported from Spectrum-2
onwards, have the Spectrum-1 operations return an error.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a SPAN agent can only be bound to a per-port trigger where
the trigger is either an incoming packet (INGRESS) or an outgoing packet
(EGRESS) to / from the port.
The subsequent patch will introduce the concept of global mirroring
triggers. The binding / unbinding of global triggers is different than
that of per-port triggers. Such triggers also need to be enabled /
disabled on a per-{port, TC} basis and are only supported from
Spectrum-2 onwards.
Add trigger operations that allow us to abstract these differences. Only
implement the operations for per-port triggers. Next patch will
implement the operations for global triggers.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The per-ASIC SPAN operations are relevant to the SPAN module and
therefore should be implemented there and not in the main driver file.
Move them.
These operations will be extended later on.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is used for global port analyzer configurations.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is used to configure the mirror enable for different
mirror reasons.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, shared blocks were only relevant for the pseudo-qdiscs ingress
and clsact. Recently, a qevent facility was introduced, which allows to
bind blocks to well-defined slots of a qdisc instance. RED in particular
got two qevents: early_drop and mark. Drivers that wish to offload these
blocks will be sent the usual notification, and need to know which qdisc it
is related to.
To that end, extend flow_block_offload with a "sch" pointer, and initialize
as appropriate. This prompts changes in the indirect block facility, which
now tracks the scheduler in addition to the netdevice. Update signatures of
several functions similarly.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case devlink reload failed, it is possible to trigger a
use-after-free when querying the kernel for device info via 'devlink dev
info' [1].
This happens because as part of the reload error path the PCI command
interface is de-initialized and its mailboxes are freed. When the
devlink '->info_get()' callback is invoked the device is queried via the
command interface and the freed mailboxes are accessed.
Fix this by initializing the command interface once during probe and not
during every reload.
This is consistent with the other bus used by mlxsw (i.e., 'mlxsw_i2c')
and also allows user space to query the running firmware version (for
example) from the device after a failed reload.
[1]
BUG: KASAN: use-after-free in memcpy include/linux/string.h:406 [inline]
BUG: KASAN: use-after-free in mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675
Write of size 4096 at addr ffff88810ae32000 by task syz-executor.1/2355
CPU: 1 PID: 2355 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
memcpy+0x39/0x60 mm/kasan/common.c:106
memcpy include/linux/string.h:406 [inline]
mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675
mlxsw_cmd_exec+0x249/0x550 drivers/net/ethernet/mellanox/mlxsw/core.c:2335
mlxsw_cmd_access_reg drivers/net/ethernet/mellanox/mlxsw/cmd.h:859 [inline]
mlxsw_core_reg_access_cmd drivers/net/ethernet/mellanox/mlxsw/core.c:1938 [inline]
mlxsw_core_reg_access+0x2f6/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1985
mlxsw_reg_query drivers/net/ethernet/mellanox/mlxsw/core.c:2000 [inline]
mlxsw_devlink_info_get+0x17f/0x6e0 drivers/net/ethernet/mellanox/mlxsw/core.c:1090
devlink_nl_info_fill.constprop.0+0x13c/0x2d0 net/core/devlink.c:4588
devlink_nl_cmd_info_get_dumpit+0x246/0x460 net/core/devlink.c:4648
genl_lock_dumpit+0x85/0xc0 net/netlink/genetlink.c:575
netlink_dump+0x515/0xe50 net/netlink/af_netlink.c:2245
__netlink_dump_start+0x53d/0x830 net/netlink/af_netlink.c:2353
genl_family_rcv_msg_dumpit.isra.0+0x296/0x300 net/netlink/genetlink.c:638
genl_family_rcv_msg net/netlink/genetlink.c:733 [inline]
genl_rcv_msg+0x78d/0x9d0 net/netlink/genetlink.c:753
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:672
____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
___sys_sendmsg+0xff/0x170 net/socket.c:2417
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: a9c8336f65 ("mlxsw: core: Add support for devlink info command")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not trigger a warning when a memory allocation fails. Remove
the WARN_ON().
The warning is constantly triggered by syzkaller when it is injecting
faults:
[ 2230.758664] FAULT_INJECTION: forcing a failure.
[ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0
[ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
...
[ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0
[ 2230.898179] Kernel panic - not syncing: panic_on_warn set ...
[ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
[ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Fixes: 3057224e01 ("mlxsw: spectrum_router: Implement FIB offload in deferred work")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Utilize new devlink-health port reporters API to move rx and tx
reporters from device to port.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink ports upon NIC init. TX and RX health reporters handle
errors which may occur early on at driver initialization. And because
these reporters are to be moved to port context, they require devlink
ports to be already registered.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 connection tracking offloads updates:
1) Restore CT state from lookup in zone instead of tupleid
On a miss, Use this zone + 5 tuple taken from the skb, to lookup the CT
entry and restore it, instead of the driver allocated tuple id.
This improves flow insertion rate by avoiding the allocation of a header
rewrite context to maintain the tupleid.
2) Re-use modify header HW objects for identical modify actions.
3) Expand tunnel register mappings
Reg_c1 is 32 bits wide. Before this patchset, 24 bit were allocated
for the tuple_id, 6 bits for tunnel mapping and 2 bits for tunnel
options mappings.
Restoring the ct state from zone lookup instead of tuple id requires
reg_c1 to store 8 bits mapping the ct zone, leaving 24 bits for tunnel
mappings.
Expand tunnel and tunnel options register mappings to 12 bit each.
4) Trivial cleanup and fixes.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl8H16UACgkQSD+KveBX
+j7yTwf/eza7ftn9Jq1f6yyTM9qQZ64oC0cboDZQ3EyJtY++frWzo4bNbHFbQ26Y
EDjRGqG0Hiby95dgTrGtRzf9PQuDwWfdNavLKyV1D//cPeTDYpHkwKVF4sozfd5Q
g1RB6rySvYfx8BKALaJBclYlRoiVevLoIEfuMSrmstR1/tBCvmMLiB0p1VsLIS0+
XBDEezO4rqDyNJwuznMYIX44w8Xa4IzIb9/YwEubMPs52WjktXAmTPTChcO8cu/9
4VLsTkFKUDlm3TDXg99Lpk8L+0dfo7dUcHsqaoXMs5eER6kw8bjK/f7muSSIiIcd
Nnba/UaU+FYzA4EF98xQD0bFQJNrmQ==
=NMLY
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2020-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-07-09
This series provides updates to mlx5 CT (connection tracking) offloads
For more information please see tag log below.
Please pull and let me know if there is any problem.
The following conflict is expected when net is merged into net-next:
to resolve just use the hunks from net-next.
<<<<<<< HEAD (net-next)
mlx5_tc_ct_del_ft_entry(ct_priv, entry);
kfree(entry);
======= (net)
mlx5_tc_ct_entry_del_rules(ct_priv, entry);
kfree(entry);
>>>>>>> b1a7d5bdfe54c98eca46e2c997d4e3b1484a49af
mlx5 connection tracking offloads updates:
1) Restore CT state from lookup in zone instead of tupleid
On a miss, Use this zone + 5 tuple taken from the skb, to lookup the CT
entry and restore it, instead of the driver allocated tuple id.
This improves flow insertion rate by avoiding the allocation of a header
rewrite context to maintain the tupleid.
2) Re-use modify header HW objects for identical modify actions.
3) Expand tunnel register mappings
Reg_c1 is 32 bits wide. Before this patchset, 24 bit were allocated
for the tuple_id, 6 bits for tunnel mapping and 2 bits for tunnel
options mappings.
Restoring the ct state from zone lookup instead of tuple id requires
reg_c1 to store 8 bits mapping the ct zone, leaving 24 bits for tunnel
mappings.
Expand tunnel and tunnel options register mappings to 12 bit each.
4) Trivial cleanup and fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to new infra, make use of the ability to sleep in the callback.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this commit, on ft flush, ft entries were not removed
from the ct_tuple hashtables. Fix it.
Fixes: ac991b48d4 ("net/mlx5e: CT: Offload established flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
"flow" parameter is not used in __mlx5_tc_ct_flow_offload_clear(),
remove it.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Instead of having to deal with converting between int and ERR_PTR for
return values in mlx5_tc_ct_flow_offload(), make the internal helper
functions return a ptr to mlx5_flow_handle instead of passing it as
output param, this will also avoid gcc confusion and false alarms,
thus we remove the redundant ERR_PTR rule initialization.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reg_c1 is 32 bits wide. Originally, 24 bit were allocated for the tuple_id,
6 bits for tunnel mapping and 2 bits for tunnel options mappings.
Restoring the ct state from zone lookup instead of tuple id requires
reg_c1 to store 8 bits mapping the ct zone, leaving 24 bits for tunnel
mappings.
Expand tunnel and tunnel options register mappings to 12 bit each.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use a single byte mapping for zone restore register (zone matching
remains 16 bit).
This makes room for using the freed 8 bits on register C1 for
mapping more tunnels and tunnel options.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After removing the tupleid register which changed per tuple,
tuple modify headers set the ct_state, zone, mark, and label registers.
For non-natted tuples going through the same tc rules path, their values
will be the same, and all their modify headers will be the same.
Re-use tuple modify header when possible, by adding each new modify
header to an hahstable, and looking up identical ones before creating
a new one.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor sharing of mod headers to new file and while there,
remove spin lock and flows list, as this is only used for warn on.
Use the generic API in the next patch to re-use tuple modify headers
for identical modify actions,
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove tupleid, and replace it with zone_restore, which is the zone an
established tuple sets after match. On miss, Use this zone + tuple
taken from the skb, to lookup the ct entry and restore it.
This improves flow insertion rate by avoiding the allocation of a header
rewrite context.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Next patches will remove the tupleid registers that is used
to restore the ct state on miss, and instead use the tuple on
the missed packet to lookup which state to restore.
Disable tuple rewrites after connection tracking.
For tuple rewrites, inject a ct_state=-trk match so it won't
change the tuple for established flows (+trk) that passed connection
tracking, and instead miss to software.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The next patch will pass the mlx5e_priv struct to the
modify_header_match_supported method. Use this opportunity to refactor
the existing pr_info call to a netdev_info call.
Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With ct clear we don't jump to the ct tables, so header rewrite
of 5-tuple can be done in place (and not moved to after the CT action).
Check for ct clear action, and if so, allow 5-tuple header
rewrite.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Save original tuple and natted tuple in two new hashtables.
This is a pre-step for restoring ct state after hw miss by performing a
5-tuple lookup on the hash tables.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch is unsupported, currently -EPERM error code is returned
instead of -EOPNOTSUPP.
Due to this VF device's devlink virtual port is not enumerated because
port_function_get() callback returned -EPERM instead of -EOPNOTSUPP.
Hence, return the error code -EOPNOTSUPP when eswitch is unsupported.
Fixes: bd93975353 ("net/mlx5: E-switch, Introduce and use eswitch support check helper")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CT entries are deleted via a workqueue from netfilter. If removing the
module before that, the rules are cleaned by the driver itself, but the
memory entries for them are not freed. Fix that.
Fixes: ac991b48d4 ("net/mlx5e: CT: Offload established flows")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.
Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.
In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.
If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.
Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.
While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.
Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>