One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:
while true; do
echo 10 > /sys/bus/netdevsim/new_device
devlink dev reload netdevsim/netdevsim10 &
echo 10 > /sys/bus/netdevsim/del_device
done
Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.
Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap IDs used to report layer 3 exceptions.
Trapped packets are first reported to devlink and then injected to the
kernel's receive path. All the packets have 'offload_fwd_mark' set in
order to prevent them from potentially being forwarded by the bridge
again.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw does not differentiate between these two cases of
routes with invalid nexthops:
1. Nexthops whose nexthop device is a mlxsw upper (has a RIF), but whose
neighbour could not be resolved
2. Nexthops whose nexthop device is not a mlxsw upper (e.g., management
interface)
Up until now this did not matter and mlxsw trapped packets for both
cases using the same trap ID. However, packets that should have been
routed in hardware (case 1), but incurred a problem are considered
exceptions and should be reported to the user. The two cases should
therefore be split between two different trap IDs.
Allocate a new adjacency entry during initialization and upon the
insertion of the first route with an invalid mlxsw nexthop, program this
entry to discard packets. Packets hitting this entry will be reported
using new trap ID - "DISCARD_ROUTER3".
In the future, the entry could be written during initialization, but
currently firmware requires a valid RIF, which is not available at this
stage.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, packets that cannot be routed in hardware (e.g., nexthop
device is not upper of mlxsw), are trapped to the kernel for forwarding.
Such packets are trapped using "RTR_INGRESS0" trap. This trap also traps
packets that hit reject routes (e.g., "unreachable") so that the kernel
will generate the appropriate ICMP error message for them.
Subsequent patch will need to only report to devlink packets that hit a
reject route, which is impossible as long as "RTR_INGRESS0" is
overloaded like that.
Solve this by using "RTR_INGRESS1" trap for packets that hit reject
routes.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap IDs and trap group used to report layer 3 drops. Register
layer 3 packet traps and associated layer 3 trap group with devlink
during driver initialization.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 4a7f970f12 ("mlxsw: spectrum: Replace port_to_module array with array of structs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For vlan push action, if eswitch flow source capability is enabled, flow
source value compared with MLX5_VPORT_UPLINK enum, to determine uplink
port. This lead to syndrome in dmesg if try to add vlan push action.
For example:
$ tc filter add dev vxlan0 ingress protocol ip prio 1 flower \
enc_dst_port 4789 \
action tunnel_key unset pipe \
action vlan push id 20 pipe \
action mirred egress redirect dev ens1f0_0
$ dmesg
...
[ 2456.883693] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 5273): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa9c090)
Use the correct enum value MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK.
Fixes: bb204dcf39fe ("net/mlx5e: Determine source port properly for vlan push action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The rewrite data was no freed.
Fixes: 9db810ed2d ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The value is already the calculation so remove the log prefix.
Fixes: e52c280240 ("net/mlx5: E-Switch, Add chains and priorities")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The reason for the pre-allocation of one CQE is to enable resizing of
the CQ.
Fix comment accordingly.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.
The rest were (relatively) trivial in nature.
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now SW steering supported matchers that are IPv4 and IPv6.
The limitation was mixed matchers in which the outer header IP version
was different from the inner header IP version.
To support the mixed matcher we create all the possible ste_builder
combinations, once we create a rule we select the correct one to
be used for rule creation.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of using explicit indexes, simply use affinity
type enumerators to make the code more readable.
Fixes: 544fe7c2e6 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of using explicit array indexes, simply use
ports enumerators to make the code more readable.
Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
when debug a bug, which triggers TX hang, and kernel log is
spammed with the following info message
[ 1172.044764] mlx5_core 0000:21:00.0: cmd_work_handler:930:(pid 8):
failed to allocate command entry
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for rewriting of DSCP part of ToS field.
Next commands, for example, can be used to offload rewrite action:
OVS:
$ ovs-ofctl add-flow ovs-sriov "ip, in_port=REP, \
actions=mod_nw_tos:68, output:NIC"
iproute2 (used retain mask, as tc command rewrite whole ToS field):
$ tc filter add dev REP ingress protocol ip prio 1 flower skip_sw \
ip_proto icmp action pedit munge ip tos set 68 retain 0xfc pipe \
action mirred egress redirect dev NIC
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch doesn't change any functionality, but is a pre-step for
adding support for rewriting of bit-sized fields, like DSCP and ECN
in IPv4 header, similar fields in IPv6, etc.
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move short Work Queue API getter functions into the WQ
header file.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
During connection tracking offloads with high number of connections,
(40K connections per second), flow table group lock contention is
observed.
To improve the performance by reducing lock contention, lockless
FTE read lookup is performed as described below.
Each flow table entry is refcounted.
Flow table entry is removed when refcount drops to zero.
rhash table allows rcu protected lookup.
Each hash table entry insertion and removal is write lock protected.
Hence, it is possible to perform lockless lookup in rhash table using
following scheme.
(a) Guard FTE entry lookup per group using rcu read lock.
(b) Before freeing the FTE entry, wait for all readers to finish
accessing the FTE.
Below example of one reader and write in parallel racing, shows
protection in effect with rcu lock.
lookup_fte_locked()
rcu_read_lock();
search_hash_table()
existing_flow_group_write_lock();
tree_put_node(fte)
drop_ref_cnt(fte)
del_sw_fte(fte)
del_hash_table_entry();
call_rcu();
existing_flow_group_write_unlock();
get_ref_cnt(fte) fails
rcu_read_unlock();
rcu grace period();
[..]
kmem_cache_free(fte);
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
FTE memory allocation using alloc_fte() doesn't have any dependency
on the flow group.
Hence, do not hold flow group lock while performing alloc_fte().
This helps to reduce contention of flow group lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, mlx5 tc layer doesn't verify that rule has at least one forward
or drop action which leads to following firmware syndrome when user tries
to offload such action:
[ 1824.860501] mlx5_core 0000:81:00.0: mlx5_cmd_check:753:(pid 29458): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a)
Add check at the end of parse_tc_fdb_actions() that verifies that resulting
attribute has action fwd or drop flag set.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When setting number of VFs to 0 (disable SRIOV), clear VF's
configuration.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5_unload_one do not need local variable to store different value,
Hence just remove it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Not all mlx5 cards with FPGA device use it for network processing.
mlx5_core driver configures network connection to FPGA device
for all mlx5 cards with installed FPGA. If FPGA is not a part of
network path, driver crashes in this case
Check FPGA name in function mlx5_fpga_device_start() and continue
integrate FPGA into packets flow only for dedicated cards.
Currently there are Newton and Edison cards.
Signed-off-by: Igor Leshenko <igorle@mellanox.com>
Reviewed-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use kernel function to calculate crc32 Instead of dr implementation
since it has the same algorithm "slice by 8".
Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When building for 32-bit ARM, there is a link time error because of a
64-bit division:
ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by spectrum_buffers.c
>>> net/ethernet/mellanox/mlxsw/spectrum_buffers.o:(mlxsw_sp_buffers_init) in archive drivers/built-in.a
>>> did you mean: __aeabi_uidivmod
>>> defined in: arch/arm/lib/lib.a(lib1funcs.o
Avoid this by using div_u64, which is designed to avoid this problem.
Fixes: bc9f6e94bc ("mlxsw: spectrum_buffers: Calculate the size of the main pool")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the check generic for any possible value, not only 2 and 4.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During recreation of original unsplit ports, just simply iterate over
the whole gap and recreate whatever originally existed.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code considers only split by 2 or 4. Make the base port
getting generic and allow split by 8 to be handled correctly. Generalize
the used port checks as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using constant value, use port_module_max_width which is
aligned with the cluster size.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't compute the original base local port during unsplit, rather
remember it in mlxsw_sp_port structure during split port creation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-3 the modules have 8 lanes, so split by count 2 results in
two split ports each of 4 lanes. Add a resource that can be used to
obtain local port offset in that case.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get local port offsets of split port in a separate helper function and
use it in both split and unsplit function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver assumes certain values in the PMLP register. Add checks that
verify that PMLP register provides fitting values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the port mapping structure down to create, module_map and other
function instead of individual values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't use constant max width value and instead of that, use the actual
width of the port. Also don't pass module value and use the value
stored in the same structure.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store the initial PMLP register configuration into array of structures
instead of just simple array of module numbers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when user does split, he is not able to distinguish if the
port cannot be split because it is already split, or because it cannot
be split at all. Add another check for split flag to distinguish this.
Also add check forbidding split when maximal width is 1.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fact that the port cannot be split further should be checked before
checking the count, so move it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the max module width is hard-coded according to ASIC type.
That is not entirely correct, as the max module width might differ
per-board. Use PMTM register to query FW for maximal width of a module.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PMTM allows query or configuration of module types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tx/rx lane fields got extended to 4 bits, update the reg field
description accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to Spectrum-1, enforce a specific firmware version
for Spectrum-2 so that the driver and firmware are always in sync with
regards to new features.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version adds support for querying port module type. It will be used
by a followup patch set from Jiri to make port split code more generic.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SN3800 Spectrum-2 based systems have gearboxes that need to be
initialized by the firmware during its initialization flow. In certain
cases, the firmware might need to flash these gearboxes, which is
currently a time-consuming process.
In newer firmware versions, the firmware will not signal to the driver
that it is ready until the gearboxes are flashed. Increase the PCI reset
timeout for these situations. In normal cases, the driver will need to
wait no longer than 5 seconds.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In new firmware versions this register is extended with a sampling rate
for Spectrum-2 and future ASICs.
Increase the size of the register to ensure the field is initialized to
0 which means every packet is mirrored.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>