Up until now we only supported bridged interfaces. Packets ingressing
through the switch ports were either classified to FIDs (in the case of
the VLAN-aware bridge) or vFIDs (in the case of VLAN-unaware bridges).
The packets were then forwarded according to the FDB. Routing was done
entirely in slowpath, by splitting the vFID range in two and using the
lower 0.5K vFIDs as dummy bridges that simply flooded all incoming
traffic to the CPU.
Instead, allow packets to be routed in the device by creating router
interfaces (RIFs) that will direct them to the router block.
Specifically, the RIFs introduced here are Sub-port RIFs used for VLAN
devices and port netdevs. Packets ingressing from the {Port / LAG ID, VID}
with which the RIF was programmed with will be assigned to a special
kind of FIDs called rFIDs and from there directed to the router.
Create a RIF whenever the first IPv4 address was programmed on a VLAN /
LAG / port netdev. Destroy it upon removal of the last IPv4 address.
Receive these notifications by registering for the 'inetaddr'
notification chain. A non-zero (10) priority is used for the
notification block, so that RIFs will be created before routes are
offloaded via FIB code.
Note that another trigger for RIF destruction are CHANGEUPPER
notifications causing the underlying FID's reference count to go down to
zero. This can happen, for example, when a VLAN netdev with an IP address
is put under bridge. While this configuration doesn't make sense it does
cause the device and the kernel to get out of sync when the netdev is
unbridged. We intend to address this in the future, hopefully in current
cycle.
Finally, Remove the lower 0.5K vFIDs, as they are deprecated by the RIFs,
which will trap packets according to their DIP.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are just about to introduce router interfaces (RIFs), but before that
we need to be able update the device with the correct RIF attributes
whenever they change for the netdev the RIF is backing. Two such
attributes are MTU and MAC.
The MAC is used both to set the source MAC of packets egressing from the
RIF and also to program an FDB rule that will direct packets to the
router block.
Use the existing netdevice notification block and respond to CHANGEADDR
and CHANGEMTU accordingly. Store both attributes in the RIF struct
in case we need to revert to old attributes following a failed update.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.
Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ipv4 FIB entries addition and removal. Initially, we support
local and broadcast routes using "ip2me" trap action.
Also, unicast routes without nexthop are supported using "local" action.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serves for adding, updating and removing fib entries.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual router is a construct used inside HW. In this implementation
we map kernel tables to virtual routers one to one. Introduce management
logic to create virtual routers when needed and destroy in case they are
no longer in use. According to that, call into LPM tree management.
Each virtual router is always bound to one LPM tree.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce basic LPM tree management allowing to share the trees in
between tables if the used prefixes in the tables are the same.
Build the tree structure according to the used prefixes. Although it is
not optimal for many use cases, this initial implementation does only
simple linear left-tree. More advanced structures will be introduced
later on, possibly including mechanisms to change trees on the fly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is used to bind virtual router and protocol to an
allocated LPM tree.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serves to build LPM tree structure.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register serves for allocation and deallocation of LPM search tree.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shadow FIB is needed in order to hold additional information for FIB
entries and keep track of used prefixes. That is needed for the LPM tree
construction to be introduced later on in this set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip2me:
To instruct HW to send trapped ip2me traffic to kernel, we have to add
this trap. Selection ip2me traffic is introduced later on in this set.
ARPs:
We are going to stop flooding to CPU port when netdev isn't bridged and
only get packets destined to the netdev's IP address and certain control
packets.
Add traps for ARP request (broadcast) and response (unicast) in order to
get these to the CPU and resolve neighbours.
host miss:
If a packet is routed through a directly connected route and its
destination IP is not in the device's neighbour table, then we need to
trap it to CPU. This will cause the host to resolve the MAC of the
neighbour, which will be eventually programmed to the device's table.
router ingress:
In order to trap packets in router part.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing packet traps we should use action 'discard' instead of
'forward', as some trap IDs we'll add cannot be configured with the
later. However, result is the same, as packets are not trapped to the
CPU.
In the future we will be able to reverse the operation properly by
detaching the trap group from the CPU.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the Router Interface Table Register (RITR), which allows us to
create and configure router interfaces (RIFs).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Incoming packets are directed to the router when they match an FDB
entry with action forward to IP router.
Add this action, which was mistakenly named "TRAP".
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enabling the router in the device we will represent L3 netdevs
using router interfaces (RIFs). These will be specified whenever
programming routes or neighbours on the netdev.
Introduce the basic RIF infrastructure which allows one to lookup a RIF
by its netdev. Later patches in the series will extend this, but the
basic routines are needed now in order to direct traffic to CPU.
Pointers to the RIF structs are stored in an array indexed by the RIF's
number. This will allow us to efficiently update the kernel's neighbour
table when regularly dumping the device's table.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a skeleton router file and do basic HW initialization of router.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During ports initialization a net device is registered for each
available port, which implies the port is usable. However, a port is
only usable after the different parts of the device (e.g. flooding,
buffers) are initialized. This is especially important now, when we must
initialize the router before the ports, as otherwise the device can't be
initialized.
Solve that by initializing the switch ports at the end of init sequence.
Also, remove an unnecessary warning about port up/down events, which
would otherwise be invoked whenever removing the driver, as ports are
removed before unregistering the listener for these events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the Router General Configuration Register (RGCR), which allows us to
enable the router in the device and configure its various parameters.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are going to assign router interfaces (RIFs) to netdevs if an IPv4
address was assigned to them. If one was assigned to a port netdev, this
will translate to the PVID vPort being member in a RIF.
While it's possible for a LAG slave to have an IP address, we can't have
a vPort being member in two FIDs (assuming the LAG device will be
put in bridge / assigned an IP address).
Solve that by making the PVID vPort leave any FID it might be a member
in when joining / leaving LAG.
Note that the PVID vPort is the only vPort that can be present on the
port when it's put under LAG.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VLAN devices are created on top of LAG, their underlying vPorts are
configured correctly with LAG membership.
However, the PVID vPort is implicit and already present when the port
netdev is put under LAG, so its LAG membership is never set. Set it
correctly when joining / leaving LAG.
This didn't matter until now, but we are going to introduce support for
router interfaces (RIFs), which need to take into account LAG membership.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When port isn't bridged it is still possible to invoke switchdev ops and
configure the device's VLAN filters.
However, this will require us to use different Router InterFaces (RIFs)
for the same netdev, instead of one per-netdev as with any other
configuration.
Taking the above into account and the fact that this functionality is
questionable with regards to the device's normal use-case, remove it and
instead return an error.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port netdevs (e.g. swXpY) that are not bridged are represented in the
device using a vPort with VID=PVID=1 (the PVID vPort), as untagged
packets entering the switch are internally tagged with the PVID VLAN.
When these packets are routed through a different port netdev they
should egress untagged.
This wasn't a problem until now, as non-bridged traffic only originated
from the CPU, which transmits packets out of the port as-is.
When a vPort is created with VID 1 mark it as egress untagged.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().
Signed-off-by: David S. Miller <davem@davemloft.net>
For debug purposes, it's useful to know the order in which the driver
responds to changes in the topology of its upper devices.
Add debug prints to signal these events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are situations in which a vPort is destroyed while still holding
references to device's resources such as FIDs and FDB records. This can
happen, for example, when a VLAN device is deleted while still being
bridged.
Instead of trying to make sure vPort destruction is invoked when it no
longer uses device's resources, just free them upon destruction. This
simplifies the code, as we no longer need to take different situations
into account when events are received - cleanup is taken care of in one
place.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FDB entries are learned using {Port / LAG ID, FID} and therefore should
be flushed whenever a port (vPort) leaves its FID (vFID).
However, when the bridge port is a LAG device (or a VLAN device on top),
then FDB flushing is conditional. Ports removed from such LAG
configurations must not trigger flushing, as other ports might still be
members in the LAG and therefore the bridge port is still active.
The decision whether to flush or not was previously computed in the
netdevice notification block, but in order to flush the entries when a
port leaves its FID this decision should be computed there.
Strip the notification block from this logic and instead move it to one
FDB flushing function that is invoked from both the FID / vFID leave
functions.
When port isn't member in LAG, FDB flushing should always occur.
Otherwise, it should occur only when the last port (vPort) member in the
LAG leaves the FID (vFID).
This will allow us - in the next patch - to simplify the cleanup code
paths that are hit whenever the topology above the port netdevs changes.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all vPorts will have FIDs assigned to them, so make sure functions
first test for FID presence.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As previously explained, not all vPorts will be assigned FIDs, so instead
of returning the FID index of a vPort, return a pointer to its FID
struct. This will allow us to know whether it's legal to access the
vPort's FID parameters such as index and device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When L3 interfaces will be introduced a vPort won't necessarily have a
FID assigned to it. This can happen if it's not member in a bridge (in
which case it's assigned a vFID) or doesn't have an IP address (in which
case it's assigned an rFID).
Therefore, instead check the VID parameter to test whether a port is a
vPort or not.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a very similar way to the vFIDs, make the first 4K FIDs - used in the
VLAN-aware bridge - use the new FID struct.
Upon first use of the FID by any of the ports do the following:
1) Create the FID
2) Setup a matching flooding entry
3) Create a mapping for the FID
Unlike vFIDs, upon creation of a FID we always create a global
VID-to-FID mapping, so that ports without upper vPorts can use it
instead of creating an explicit {Port, VID} to FID mapping.
When a port leaves a FID the reverse is performed. Whenever the FID's
reference count reaches zero the FID is deleted along with the global
mapping.
The per-FID struct will later allow us to configure L3 interfaces on top
of the VLAN-aware bridge.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a vPort is created or when it joins a bridge we always do the same
set of operations:
1) Create the vFID, if not already created
2) Setup flooding for the vFID
3) Map the {Port, VID} to the vFID
When a vPort is destroyed or when it leaves a bridge the reverse is
performed.
Encapsulate the above in join / leave functions and simplify the code.
FIDs and rFIDs will use a similar set of functions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we had a dedicated struct only for vFIDs, but before
introducing support for L3 interfaces we need to make it generic and
use it for all three types of FIDs:
1) FIDs - 0..4K-1, used for the VLAN-aware bridge
2) vFIDs - 4K..15K-1, used for VLAN-unaware bridges
3) rFIDs - 15K..16K-1, used to direct traffic to / from the router in
the device. Will be introduced later in the series.
The three types of L3 interfaces - Router InterFaces, RIFs - that will
be introduced correspond to the three types of FIDs and are configured
using them. Therefore, we'll need to store the links between them as
well as a reference count on the underlying FID, so that the
corresponding RIF will be destroyed when it reaches zero.
Note that the lower 0.5K vFIDs are currently used for for non-bridged
netdevs, so that traffic could be flooded to the CPU port. However, when
rFIDs will be introduced we'll no longer need these and they too will be
used for VLAN-unaware bridges.
Make the vFID struct generic by renaming it and some of its fields. FIDs
will be converted to use it later in the series.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a FID index instead of vFID and ease the transition towards a
generic FID struct.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A FID used by a vPort (vFID, but also rFID later in the series) is
always mapped using {Port, VID} and not only VID as with the 4K FIDs of
the VLAN-aware bridge.
Instead of specifying all the arguments each time, just wrap this
operation using a dedicated function and simplify the code.
As before, the function takes FID as its argument in preparation for a
generic FID struct.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code and use only one function for vFID creation /
destruction.
Unlike before, the function receives a FID index as its argument and not
a vFID index. Instead of passing 0, now one would need to pass 4K, which
is the first vFID.
This is the first step in creating a generic FID struct that will be
used for all three types of FIDs.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In all call sites 'only_uc' is set to false, so strip it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a macro to do this kind of declarations, so use it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We hold a reference count on the number of ports member in the
VLAN-aware bridge, as we only support one.
Instead of always incrementing / decrementing the reference count after
joining / leaving the bridge, simply do this accounting in the join /
leave functions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The argument 'br_dev' is never used, so remove it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When responding to unlinking CHANGEUPPER notifications we shouldn't
return any value, as it's not checked by upper layers.
In addition, there's nothing the driver can do in case of failure, so it
should simply continue and try to free as much resources as possible and
not stop on first error.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking for a condition and then issue the warning, just do
it in one go and simplify the code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When upper device of a VLAN device changes we already made sure it's
a bridge device in PRECHANGEUPPER, so no need to check it's a master
device in CHANGEUPPER.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a port netdev is put under LAG it cannot have VLAN upper devices,
so forbid that. The LAG device itself can have VLAN upper devices.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently only support the following upper devices for port netdevs:
1) Bridge
2) LAG (bond / team)
3) VLAN
Any other device is forbidden, so return an error.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking the error value and returning NOTIFY_BAD, just use
notifier_from_errno() and simplify the code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop the SW TX counter from counting the TX header bytes
since they are not being sent out.
Fixes: e577516b9d ("mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop the SW TX counter from counting the TX header bytes
since they are not being sent out.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/sched/act_police.c
net/sched/sch_drr.c
net/sched/sch_hfsc.c
net/sched/sch_prio.c
net/sched/sch_red.c
net/sched/sch_tbf.c
In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.
A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.
Signed-off-by: David S. Miller <davem@davemloft.net>