linux_dsm_epyc7002/include/net/dsa.h

814 lines
23 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-or-later */
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 20:44:02 +07:00
/*
* include/net/dsa.h - Driver for Distributed Switch Architecture switch chips
dsa: add switch chip cascading support The initial version of the DSA driver only supported a single switch chip per network interface, while DSA-capable switch chips can be interconnected to form a tree of switch chips. This patch adds support for multiple switch chips on a network interface. An example topology for a 16-port device with an embedded CPU is as follows: +-----+ +--------+ +--------+ | |eth0 10| switch |9 10| switch | | CPU +----------+ +-------+ | | | | chip 0 | | chip 1 | +-----+ +---++---+ +---++---+ || || || || ||1000baseT ||1000baseT ||ports 1-8 ||ports 9-16 This requires a couple of interdependent changes in the DSA layer: - The dsa platform driver data needs to be extended: there is still only one netdevice per DSA driver instance (eth0 in the example above), but each of the switch chips in the tree needs its own mii_bus device pointer, MII management bus address, and port name array. (include/net/dsa.h) The existing in-tree dsa users need some small changes to deal with this. (arch/arm) - The DSA and Ethertype DSA tagging modules need to be extended to use the DSA device ID field on receive and demultiplex the packet accordingly, and fill in the DSA device ID field on transmit according to which switch chip the packet is heading to. (net/dsa/tag_{dsa,edsa}.c) - The concept of "CPU port", which is the switch chip port that the CPU is connected to (port 10 on switch chip 0 in the example), needs to be extended with the concept of "upstream port", which is the port on the switch chip that will bring us one hop closer to the CPU (port 10 for both switch chips in the example above). - The dsa platform data needs to specify which ports on which switch chips are links to other switch chips, so that we can enable DSA tagging mode on them. (For inter-switch links, we always use non-EtherType DSA tagging, since it has lower overhead. The CPU link uses dsa or edsa tagging depending on what the 'root' switch chip supports.) This is done by specifying "dsa" for the given port in the port array. - The dsa platform data needs to be extended with information on via which port to reach any given switch chip from any given switch chip. This info is specified via the per-switch chip data struct ->rtable[] array, which gives the nexthop ports for each of the other switches in the tree. For the example topology above, the dsa platform data would look something like this: static struct dsa_chip_data sw[2] = { { .mii_bus = &foo, .sw_addr = 1, .port_names[0] = "p1", .port_names[1] = "p2", .port_names[2] = "p3", .port_names[3] = "p4", .port_names[4] = "p5", .port_names[5] = "p6", .port_names[6] = "p7", .port_names[7] = "p8", .port_names[9] = "dsa", .port_names[10] = "cpu", .rtable = (s8 []){ -1, 9, }, }, { .mii_bus = &foo, .sw_addr = 2, .port_names[0] = "p9", .port_names[1] = "p10", .port_names[2] = "p11", .port_names[3] = "p12", .port_names[4] = "p13", .port_names[5] = "p14", .port_names[6] = "p15", .port_names[7] = "p16", .port_names[10] = "dsa", .rtable = (s8 []){ 10, -1, }, }, }, static struct dsa_platform_data pd = { .netdev = &foo, .nr_switches = 2, .sw = sw, }; Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Gary Thomas <gary@mlbassoc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-20 16:52:09 +07:00
* Copyright (c) 2008-2009 Marvell Semiconductor
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 20:44:02 +07:00
*/
#ifndef __LINUX_NET_DSA_H
#define __LINUX_NET_DSA_H
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/list.h>
#include <linux/notifier.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
#include <linux/of.h>
#include <linux/ethtool.h>
#include <linux/net_tstamp.h>
#include <linux/phy.h>
#include <linux/platform_data/dsa.h>
net: phylink: Add struct phylink_config to PHYLINK API The phylink_config structure will encapsulate a pointer to a struct device and the operation type requested for this instance of PHYLINK. This patch does not make any functional changes, it just transitions the PHYLINK internals and all its users to the new API. A pointer to a phylink_config structure will be passed to phylink_create() instead of the net_device directly. Also, the same phylink_config pointer will be passed back to all phylink_mac_ops callbacks instead of the net_device. Using this mechanism, a PHYLINK user can get the original net_device using a structure such as 'to_net_dev(config->dev)' or directly the structure containing the phylink_config using a container_of call. At the moment, only the PHYLINK_NETDEV is defined as a valid operation type for PHYLINK. In this mode, a valid reference to a struct device linked to the original net_device should be passed to PHYLINK through the phylink_config structure. This API changes is mainly driven by the necessity of adding a new operation type in PHYLINK that disconnects the phy_device from the net_device and also works when the net_device is lacking. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29 00:38:12 +07:00
#include <linux/phylink.h>
#include <net/devlink.h>
#include <net/switchdev.h>
struct tc_action;
struct phy_device;
struct fixed_phy_status;
struct phylink_link_state;
#define DSA_TAG_PROTO_NONE_VALUE 0
#define DSA_TAG_PROTO_BRCM_VALUE 1
#define DSA_TAG_PROTO_BRCM_PREPEND_VALUE 2
#define DSA_TAG_PROTO_DSA_VALUE 3
#define DSA_TAG_PROTO_EDSA_VALUE 4
#define DSA_TAG_PROTO_GSWIP_VALUE 5
#define DSA_TAG_PROTO_KSZ9477_VALUE 6
#define DSA_TAG_PROTO_KSZ9893_VALUE 7
#define DSA_TAG_PROTO_LAN9303_VALUE 8
#define DSA_TAG_PROTO_MTK_VALUE 9
#define DSA_TAG_PROTO_QCA_VALUE 10
#define DSA_TAG_PROTO_TRAILER_VALUE 11
net: dsa: Optional VLAN-based port separation for switches without tagging This patch provides generic DSA code for using VLAN (802.1Q) tags for the same purpose as a dedicated switch tag for injection/extraction. It is based on the discussions and interest that has been so far expressed in https://www.spinics.net/lists/netdev/msg556125.html. Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q does not offer a complete solution for drivers (nor can it). Instead, it provides generic code that driver can opt into calling: - dsa_8021q_xmit: Inserts a VLAN header with the specified contents. Can be called from another tagging protocol's xmit function. Currently the LAN9303 driver is inserting headers that are simply 802.1Q with custom fields, so this is an opportunity for code reuse. - dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb. Removing the VLAN header is left as a decision for the caller to make. - dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID and a Tx VID, for proper untagged traffic identification on ingress and steering on egress. Also sets up the VLAN trunk on the upstream (CPU or DSA) port. Drivers are intentionally left to call this function explicitly, depending on the context and hardware support. The expected switch behavior and VLAN semantics should not be violated under any conditions. That is, after calling dsa_port_setup_8021q_tagging, the hardware should still pass all ingress traffic, be it tagged or untagged. For uniformity with the other tagging protocols, a module for the dsa_8021q_netdev_ops structure is registered, but the typical usage is to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q, and calls the API from tag_8021q.h. Null function definitions are also provided so that a "depends on" is not forced in the Kconfig. This tagging protocol only works when switch ports are standalone, or when they are added to a VLAN-unaware bridge. It will probably remain this way for the reasons below. When added to a bridge that has vlan_filtering 1, the bridge core will install its own VLANs and reset the pvids through switchdev. For the bridge core, switchdev is a write-only pipe. All VLAN-related state is kept in the bridge core and nothing is read from DSA/switchdev or from the driver. So the bridge core will break this port separation because it will install the vlan_default_pvid into all switchdev ports. Even if we could teach the bridge driver about switchdev preference of a certain vlan_default_pvid (task difficult in itself since the current setting is per-bridge but we would need it per-port), there would still exist many other challenges. Firstly, in the DSA rcv callback, a driver would have to perform an iterative reverse lookup to find the correct switch port. That is because the port is a bridge slave, so its Rx VID (port PVID) is subject to user configuration. How would we ensure that the user doesn't reset the pvid to a different value (which would make an O(1) translation impossible), or to a non-unique value within this DSA switch tree (which would make any translation impossible)? Finally, not all switch ports are equal in DSA, and that makes it difficult for the bridge to be completely aware of this anyway. The CPU port needs to transmit tagged packets (VLAN trunk) in order for the DSA rcv code to be able to decode source information. But the bridge code has absolutely no idea which switch port is the CPU port, if nothing else then just because there is no netdevice registered by DSA for the CPU port. Also DSA does not currently allow the user to specify that they want the CPU port to do VLAN trunking anyway. VLANs are added to the CPU port using the same flags as they were added on the user port. So the VLANs installed by dsa_port_setup_8021q_tagging per driver request should remain private from the bridge's and user's perspective, and should not alter the VLAN semantics observed by the user. In the current implementation a VLAN range ending at 4095 (VLAN_N_VID) is reserved for this purpose. Each port receives a unique Rx VLAN and a unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they serve different purposes: on Rx the switch must process traffic as untagged and process it with a port-based VLAN, but with care not to hinder bridging. On the other hand, the Tx VLAN is where the reachability restrictions are imposed, since by tagging frames in the xmit callback we are telling the switch onto which port to steer the frame. Some general guidance on how this support might be employed for real-life hardware (some comments made by Florian Fainelli): - If the hardware supports VLAN tag stacking, it should somehow back up its private VLAN settings when the bridge tries to override them. Then the driver could re-apply them as outer tags. Dedicating an outer tag per bridge device would allow identical inner tag VID numbers to co-exist, yet preserve broadcast domain isolation. - If the switch cannot handle VLAN tag stacking, it should disable this port separation when added as slave to a vlan_filtering bridge, in that case having reduced functionality. - Drivers for old switches that don't support the entire VLAN_N_VID range will need to rework the current range selection mechanism. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:22 +07:00
#define DSA_TAG_PROTO_8021Q_VALUE 12
#define DSA_TAG_PROTO_SJA1105_VALUE 13
#define DSA_TAG_PROTO_KSZ8795_VALUE 14
#define DSA_TAG_PROTO_OCELOT_VALUE 15
#define DSA_TAG_PROTO_AR9331_VALUE 16
enum dsa_tag_protocol {
DSA_TAG_PROTO_NONE = DSA_TAG_PROTO_NONE_VALUE,
DSA_TAG_PROTO_BRCM = DSA_TAG_PROTO_BRCM_VALUE,
DSA_TAG_PROTO_BRCM_PREPEND = DSA_TAG_PROTO_BRCM_PREPEND_VALUE,
DSA_TAG_PROTO_DSA = DSA_TAG_PROTO_DSA_VALUE,
DSA_TAG_PROTO_EDSA = DSA_TAG_PROTO_EDSA_VALUE,
DSA_TAG_PROTO_GSWIP = DSA_TAG_PROTO_GSWIP_VALUE,
DSA_TAG_PROTO_KSZ9477 = DSA_TAG_PROTO_KSZ9477_VALUE,
DSA_TAG_PROTO_KSZ9893 = DSA_TAG_PROTO_KSZ9893_VALUE,
DSA_TAG_PROTO_LAN9303 = DSA_TAG_PROTO_LAN9303_VALUE,
DSA_TAG_PROTO_MTK = DSA_TAG_PROTO_MTK_VALUE,
DSA_TAG_PROTO_QCA = DSA_TAG_PROTO_QCA_VALUE,
DSA_TAG_PROTO_TRAILER = DSA_TAG_PROTO_TRAILER_VALUE,
net: dsa: Optional VLAN-based port separation for switches without tagging This patch provides generic DSA code for using VLAN (802.1Q) tags for the same purpose as a dedicated switch tag for injection/extraction. It is based on the discussions and interest that has been so far expressed in https://www.spinics.net/lists/netdev/msg556125.html. Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q does not offer a complete solution for drivers (nor can it). Instead, it provides generic code that driver can opt into calling: - dsa_8021q_xmit: Inserts a VLAN header with the specified contents. Can be called from another tagging protocol's xmit function. Currently the LAN9303 driver is inserting headers that are simply 802.1Q with custom fields, so this is an opportunity for code reuse. - dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb. Removing the VLAN header is left as a decision for the caller to make. - dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID and a Tx VID, for proper untagged traffic identification on ingress and steering on egress. Also sets up the VLAN trunk on the upstream (CPU or DSA) port. Drivers are intentionally left to call this function explicitly, depending on the context and hardware support. The expected switch behavior and VLAN semantics should not be violated under any conditions. That is, after calling dsa_port_setup_8021q_tagging, the hardware should still pass all ingress traffic, be it tagged or untagged. For uniformity with the other tagging protocols, a module for the dsa_8021q_netdev_ops structure is registered, but the typical usage is to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q, and calls the API from tag_8021q.h. Null function definitions are also provided so that a "depends on" is not forced in the Kconfig. This tagging protocol only works when switch ports are standalone, or when they are added to a VLAN-unaware bridge. It will probably remain this way for the reasons below. When added to a bridge that has vlan_filtering 1, the bridge core will install its own VLANs and reset the pvids through switchdev. For the bridge core, switchdev is a write-only pipe. All VLAN-related state is kept in the bridge core and nothing is read from DSA/switchdev or from the driver. So the bridge core will break this port separation because it will install the vlan_default_pvid into all switchdev ports. Even if we could teach the bridge driver about switchdev preference of a certain vlan_default_pvid (task difficult in itself since the current setting is per-bridge but we would need it per-port), there would still exist many other challenges. Firstly, in the DSA rcv callback, a driver would have to perform an iterative reverse lookup to find the correct switch port. That is because the port is a bridge slave, so its Rx VID (port PVID) is subject to user configuration. How would we ensure that the user doesn't reset the pvid to a different value (which would make an O(1) translation impossible), or to a non-unique value within this DSA switch tree (which would make any translation impossible)? Finally, not all switch ports are equal in DSA, and that makes it difficult for the bridge to be completely aware of this anyway. The CPU port needs to transmit tagged packets (VLAN trunk) in order for the DSA rcv code to be able to decode source information. But the bridge code has absolutely no idea which switch port is the CPU port, if nothing else then just because there is no netdevice registered by DSA for the CPU port. Also DSA does not currently allow the user to specify that they want the CPU port to do VLAN trunking anyway. VLANs are added to the CPU port using the same flags as they were added on the user port. So the VLANs installed by dsa_port_setup_8021q_tagging per driver request should remain private from the bridge's and user's perspective, and should not alter the VLAN semantics observed by the user. In the current implementation a VLAN range ending at 4095 (VLAN_N_VID) is reserved for this purpose. Each port receives a unique Rx VLAN and a unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they serve different purposes: on Rx the switch must process traffic as untagged and process it with a port-based VLAN, but with care not to hinder bridging. On the other hand, the Tx VLAN is where the reachability restrictions are imposed, since by tagging frames in the xmit callback we are telling the switch onto which port to steer the frame. Some general guidance on how this support might be employed for real-life hardware (some comments made by Florian Fainelli): - If the hardware supports VLAN tag stacking, it should somehow back up its private VLAN settings when the bridge tries to override them. Then the driver could re-apply them as outer tags. Dedicating an outer tag per bridge device would allow identical inner tag VID numbers to co-exist, yet preserve broadcast domain isolation. - If the switch cannot handle VLAN tag stacking, it should disable this port separation when added as slave to a vlan_filtering bridge, in that case having reduced functionality. - Drivers for old switches that don't support the entire VLAN_N_VID range will need to rework the current range selection mechanism. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:22 +07:00
DSA_TAG_PROTO_8021Q = DSA_TAG_PROTO_8021Q_VALUE,
DSA_TAG_PROTO_SJA1105 = DSA_TAG_PROTO_SJA1105_VALUE,
DSA_TAG_PROTO_KSZ8795 = DSA_TAG_PROTO_KSZ8795_VALUE,
DSA_TAG_PROTO_OCELOT = DSA_TAG_PROTO_OCELOT_VALUE,
DSA_TAG_PROTO_AR9331 = DSA_TAG_PROTO_AR9331_VALUE,
};
struct packet_type;
struct dsa_switch;
struct dsa_device_ops {
struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt);
int (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
int *offset);
net: dsa: Allow drivers to filter packets they can decode source port from Frames get processed by DSA and redirected to switch port net devices based on the ETH_P_XDSA multiplexed packet_type handler found by the network stack when calling eth_type_trans(). The running assumption is that once the DSA .rcv function is called, DSA is always able to decode the switch tag in order to change the skb->dev from its master. However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105, user of DSA_TAG_PROTO_8021Q) where this assumption is not completely true, since switch tagging piggybacks on the absence of a vlan_filtering bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't rely on switch tagging, but on a different mechanism. So it would make sense to at least be able to terminate that. Having DSA receive traffic it can't decode would put it in an impossible situation: the eth_type_trans() function would invoke the DSA .rcv(), which could not change skb->dev, then eth_type_trans() would be invoked again, which again would call the DSA .rcv, and the packet would never be able to exit the DSA filter and would spiral in a loop until the whole system dies. This happens because eth_type_trans() doesn't actually look at the skb (so as to identify a potential tag) when it deems it as being ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer installed (therefore it's a DSA master) and that there exists a .rcv callback (everybody except DSA_TAG_PROTO_NONE has that). This is understandable as there are many switch tags out there, and exhaustively checking for all of them is far from ideal. The solution lies in introducing a filtering function for each tagging protocol. In the absence of a filtering function, all traffic is passed to the .rcv DSA callback. The tagging protocol should see the filtering function as a pre-validation that it can decode the incoming skb. The traffic that doesn't match the filter will bypass the DSA .rcv callback and be left on the master netdevice, which wasn't previously possible. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:23 +07:00
/* Used to determine which traffic should match the DSA filter in
* eth_type_trans, and which, if any, should bypass it and be processed
* as regular on the master net device.
*/
bool (*filter)(const struct sk_buff *skb, struct net_device *dev);
unsigned int overhead;
const char *name;
enum dsa_tag_protocol proto;
};
#define DSA_TAG_DRIVER_ALIAS "dsa_tag-"
#define MODULE_ALIAS_DSA_TAG_DRIVER(__proto) \
MODULE_ALIAS(DSA_TAG_DRIVER_ALIAS __stringify(__proto##_VALUE))
struct dsa_skb_cb {
struct sk_buff *clone;
};
struct __dsa_skb_cb {
struct dsa_skb_cb cb;
u8 priv[48 - sizeof(struct dsa_skb_cb)];
};
#define DSA_SKB_CB(skb) ((struct dsa_skb_cb *)((skb)->cb))
#define DSA_SKB_CB_PRIV(skb) \
((void *)(skb)->cb + offsetof(struct __dsa_skb_cb, priv))
struct dsa_switch_tree {
struct list_head list;
/* Notifier chain for switch-wide events */
struct raw_notifier_head nh;
/* Tree identifier */
unsigned int index;
/* Number of switches attached to this tree */
struct kref refcount;
/* Has this tree been applied to the hardware? */
bool setup;
/*
* Configuration data for the platform device that owns
* this dsa switch tree instance.
*/
struct dsa_platform_data *pd;
/* List of switch ports */
struct list_head ports;
/* List of DSA links composing the routing table */
struct list_head rtable;
};
/* TC matchall action types */
enum dsa_port_mall_action_type {
DSA_PORT_MALL_MIRROR,
DSA_PORT_MALL_POLICER,
};
/* TC mirroring entry */
struct dsa_mall_mirror_tc_entry {
u8 to_local_port;
bool ingress;
};
/* TC port policer entry */
struct dsa_mall_policer_tc_entry {
s64 burst;
u64 rate_bytes_per_sec;
};
/* TC matchall entry */
struct dsa_mall_tc_entry {
struct list_head list;
unsigned long cookie;
enum dsa_port_mall_action_type type;
union {
struct dsa_mall_mirror_tc_entry mirror;
struct dsa_mall_policer_tc_entry policer;
};
};
struct dsa_port {
/* A CPU port is physically connected to a master device.
* A user port exposed to userspace has a slave device.
*/
union {
struct net_device *master;
struct net_device *slave;
};
/* CPU port tagging operations used by master or slave devices */
const struct dsa_device_ops *tag_ops;
/* Copies for faster access in master receive hot path */
struct dsa_switch_tree *dst;
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt);
net: dsa: Allow drivers to filter packets they can decode source port from Frames get processed by DSA and redirected to switch port net devices based on the ETH_P_XDSA multiplexed packet_type handler found by the network stack when calling eth_type_trans(). The running assumption is that once the DSA .rcv function is called, DSA is always able to decode the switch tag in order to change the skb->dev from its master. However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105, user of DSA_TAG_PROTO_8021Q) where this assumption is not completely true, since switch tagging piggybacks on the absence of a vlan_filtering bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't rely on switch tagging, but on a different mechanism. So it would make sense to at least be able to terminate that. Having DSA receive traffic it can't decode would put it in an impossible situation: the eth_type_trans() function would invoke the DSA .rcv(), which could not change skb->dev, then eth_type_trans() would be invoked again, which again would call the DSA .rcv, and the packet would never be able to exit the DSA filter and would spiral in a loop until the whole system dies. This happens because eth_type_trans() doesn't actually look at the skb (so as to identify a potential tag) when it deems it as being ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer installed (therefore it's a DSA master) and that there exists a .rcv callback (everybody except DSA_TAG_PROTO_NONE has that). This is understandable as there are many switch tags out there, and exhaustively checking for all of them is far from ideal. The solution lies in introducing a filtering function for each tagging protocol. In the absence of a filtering function, all traffic is passed to the .rcv DSA callback. The tagging protocol should see the filtering function as a pre-validation that it can decode the incoming skb. The traffic that doesn't match the filter will bypass the DSA .rcv callback and be left on the master netdevice, which wasn't previously possible. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:23 +07:00
bool (*filter)(const struct sk_buff *skb, struct net_device *dev);
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU,
DSA_PORT_TYPE_DSA,
DSA_PORT_TYPE_USER,
} type;
struct dsa_switch *ds;
unsigned int index;
const char *name;
struct dsa_port *cpu_dp;
const char *mac;
struct device_node *dn;
unsigned int ageing_time;
bool vlan_filtering;
u8 stp_state;
struct net_device *bridge_dev;
struct devlink_port devlink_port;
struct phylink *pl;
net: phylink: Add struct phylink_config to PHYLINK API The phylink_config structure will encapsulate a pointer to a struct device and the operation type requested for this instance of PHYLINK. This patch does not make any functional changes, it just transitions the PHYLINK internals and all its users to the new API. A pointer to a phylink_config structure will be passed to phylink_create() instead of the net_device directly. Also, the same phylink_config pointer will be passed back to all phylink_mac_ops callbacks instead of the net_device. Using this mechanism, a PHYLINK user can get the original net_device using a structure such as 'to_net_dev(config->dev)' or directly the structure containing the phylink_config using a container_of call. At the moment, only the PHYLINK_NETDEV is defined as a valid operation type for PHYLINK. In this mode, a valid reference to a struct device linked to the original net_device should be passed to PHYLINK through the phylink_config structure. This API changes is mainly driven by the necessity of adding a new operation type in PHYLINK that disconnects the phy_device from the net_device and also works when the net_device is lacking. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-29 00:38:12 +07:00
struct phylink_config pl_config;
net: dsa: Add support for deferred xmit Some hardware needs to take work to get convinced to receive frames on the CPU port (such as the sja1105 which takes temporary L2 forwarding rules over SPI that last for a single frame). Such work needs a sleepable context, and because the regular .ndo_start_xmit is atomic, this cannot be done in the tagger. So introduce a generic DSA mechanism that sets up a transmit skb queue and a workqueue for deferred transmission. The new driver callback (.port_deferred_xmit) is in dsa_switch and not in the tagger because the operations that require sleeping typically also involve interacting with the hardware, and not simply skb manipulations. Therefore having it there simplifies the structure a bit and makes it unnecessary to export functions from the driver to the tagger. The driver is responsible of calling dsa_enqueue_skb which transfers it to the master netdevice. This is so that it has a chance of performing some more work afterwards, such as cleanup or TX timestamping. To tell DSA that skb xmit deferral is required, I have thought about changing the return type of the tagger .xmit from struct sk_buff * into a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value. But the trailer tagger is reallocating every skb on xmit and therefore making a valid use of the pointer return value. So instead of reworking the API in complicated ways, right now a boolean property in the newly introduced DSA_SKB_CB is set. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:25 +07:00
struct list_head list;
/*
* Give the switch driver somewhere to hang its per-port private data
* structures (accessible from the tagger).
*/
void *priv;
/*
* Original copy of the master netdev ethtool_ops
*/
const struct ethtool_ops *orig_ethtool_ops;
/*
* Original copy of the master netdev net_device_ops
*/
const struct net_device_ops *orig_ndo_ops;
bool setup;
};
/* TODO: ideally DSA ports would have a single dp->link_dp member,
* and no dst->rtable nor this struct dsa_link would be needed,
* but this would require some more complex tree walking,
* so keep it stupid at the moment and list them all.
*/
struct dsa_link {
struct dsa_port *dp;
struct dsa_port *link_dp;
struct list_head list;
};
struct dsa_switch {
bool setup;
struct device *dev;
/*
* Parent switch tree, and switch index.
*/
struct dsa_switch_tree *dst;
unsigned int index;
/* Listener for switch fabric events */
struct notifier_block nb;
/*
* Give the switch driver somewhere to hang its private data
* structure.
*/
void *priv;
/*
* Configuration data for this switch.
*/
struct dsa_chip_data *cd;
/*
* The switch operations.
*/
const struct dsa_switch_ops *ops;
/*
* Slave mii_bus and devices for the individual ports.
*/
u32 phys_mii_mask;
struct mii_bus *slave_mii_bus;
/* Ageing Time limits in msecs */
unsigned int ageing_time_min;
unsigned int ageing_time_max;
/* devlink used to represent this switch device */
struct devlink *devlink;
/* Number of switch port queues */
unsigned int num_tx_queues;
/* Disallow bridge core from requesting different VLAN awareness
* settings on ports if not hardware-supported
*/
bool vlan_filtering_is_global;
/* In case vlan_filtering_is_global is set, the VLAN awareness state
* should be retrieved from here and not from the per-port settings.
*/
bool vlan_filtering;
/* MAC PCS does not provide link state change interrupt, and requires
* polling. Flag passed on to PHYLINK.
*/
bool pcs_poll;
net: dsa: implement auto-normalization of MTU for bridge hardware datapath Many switches don't have an explicit knob for configuring the MTU (maximum transmission unit per interface). Instead, they do the length-based packet admission checks on the ingress interface, for reasons that are easy to understand (why would you accept a packet in the queuing subsystem if you know you're going to drop it anyway). So it is actually the MRU that these switches permit configuring. In Linux there only exists the IFLA_MTU netlink attribute and the associated dev_set_mtu function. The comments like to play blind and say that it's changing the "maximum transfer unit", which is to say that there isn't any directionality in the meaning of the MTU word. So that is the interpretation that this patch is giving to things: MTU == MRU. When 2 interfaces having different MTUs are bridged, the bridge driver MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it adjusts the MTU of the bridge net device itself (and not that of the slave net devices) to the minimum value of all slave interfaces, in order for forwarded packets to not exceed the MTU regardless of the interface they are received and send on. The idea behind this behavior, and why the slave MTUs are not adjusted, is that normal termination from Linux over the L2 forwarding domain should happen over the bridge net device, which _is_ properly limited by the minimum MTU. And termination over individual slave devices is possible even if those are bridged. But that is not "forwarding", so there's no reason to do normalization there, since only a single interface sees that packet. The problem with those switches that can only control the MRU is with the offloaded data path, where a packet received on an interface with MRU 9000 would still be forwarded to an interface with MRU 1500. And the br_mtu_auto_adjust() function does not really help, since the MTU configured on the bridge net device is ignored. In order to enforce the de-facto MTU == MRU rule for these switches, we need to do MTU normalization, which means: in order for no packet larger than the MTU configured on this port to be sent, then we need to limit the MRU on all ports that this packet could possibly come from. AKA since we are configuring the MRU via MTU, it means that all ports within a bridge forwarding domain should have the same MTU. And that is exactly what this patch is trying to do. >From an implementation perspective, we try to follow the intent of the user, otherwise there is a risk that we might livelock them (they try to change the MTU on an already-bridged interface, but we just keep changing it back in an attempt to keep the MTU normalized). So the MTU that the bridge is normalized to is either: - The most recently changed one: ip link set dev swp0 master br0 ip link set dev swp1 master br0 ip link set dev swp0 mtu 1400 This sequence will make swp1 inherit MTU 1400 from swp0. - The one of the most recently added interface to the bridge: ip link set dev swp0 master br0 ip link set dev swp1 mtu 1400 ip link set dev swp1 master br0 The above sequence will make swp0 inherit MTU 1400 as well. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 02:55:43 +07:00
/* For switches that only have the MRU configurable. To ensure the
* configured MTU is not exceeded, normalization of MRU on all bridged
* interfaces is needed.
*/
bool mtu_enforcement_ingress;
size_t num_ports;
};
static inline struct dsa_port *dsa_to_port(struct dsa_switch *ds, int p)
{
struct dsa_switch_tree *dst = ds->dst;
struct dsa_port *dp;
list_for_each_entry(dp, &dst->ports, list)
if (dp->ds == ds && dp->index == p)
return dp;
return NULL;
}
static inline bool dsa_is_unused_port(struct dsa_switch *ds, int p)
{
return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_UNUSED;
}
static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
{
return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_CPU;
}
static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p)
{
return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_DSA;
}
static inline bool dsa_is_user_port(struct dsa_switch *ds, int p)
{
return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_USER;
}
static inline u32 dsa_user_ports(struct dsa_switch *ds)
{
u32 mask = 0;
int p;
for (p = 0; p < ds->num_ports; p++)
if (dsa_is_user_port(ds, p))
mask |= BIT(p);
return mask;
}
/* Return the local port used to reach an arbitrary switch device */
static inline unsigned int dsa_routing_port(struct dsa_switch *ds, int device)
{
struct dsa_switch_tree *dst = ds->dst;
struct dsa_link *dl;
list_for_each_entry(dl, &dst->rtable, list)
if (dl->dp->ds == ds && dl->link_dp->ds->index == device)
return dl->dp->index;
return ds->num_ports;
}
/* Return the local port used to reach an arbitrary switch port */
static inline unsigned int dsa_towards_port(struct dsa_switch *ds, int device,
int port)
{
if (device == ds->index)
return port;
else
return dsa_routing_port(ds, device);
}
/* Return the local port used to reach the dedicated CPU port */
static inline unsigned int dsa_upstream_port(struct dsa_switch *ds, int port)
{
const struct dsa_port *dp = dsa_to_port(ds, port);
const struct dsa_port *cpu_dp = dp->cpu_dp;
if (!cpu_dp)
return port;
return dsa_towards_port(ds, cpu_dp->ds->index, cpu_dp->index);
}
static inline bool dsa_port_is_vlan_filtering(const struct dsa_port *dp)
{
const struct dsa_switch *ds = dp->ds;
if (ds->vlan_filtering_is_global)
return ds->vlan_filtering;
else
return dp->vlan_filtering;
}
typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid,
bool is_static, void *data);
struct dsa_switch_ops {
enum dsa_tag_protocol (*get_tag_protocol)(struct dsa_switch *ds,
int port,
enum dsa_tag_protocol mprot);
int (*setup)(struct dsa_switch *ds);
void (*teardown)(struct dsa_switch *ds);
u32 (*get_phy_flags)(struct dsa_switch *ds, int port);
/*
* Access to the switch's PHY registers.
*/
int (*phy_read)(struct dsa_switch *ds, int port, int regnum);
int (*phy_write)(struct dsa_switch *ds, int port,
int regnum, u16 val);
/*
* Link state adjustment (called from libphy)
*/
void (*adjust_link)(struct dsa_switch *ds, int port,
struct phy_device *phydev);
void (*fixed_link_update)(struct dsa_switch *ds, int port,
struct fixed_phy_status *st);
/*
* PHYLINK integration
*/
void (*phylink_validate)(struct dsa_switch *ds, int port,
unsigned long *supported,
struct phylink_link_state *state);
int (*phylink_mac_link_state)(struct dsa_switch *ds, int port,
struct phylink_link_state *state);
void (*phylink_mac_config)(struct dsa_switch *ds, int port,
unsigned int mode,
const struct phylink_link_state *state);
void (*phylink_mac_an_restart)(struct dsa_switch *ds, int port);
void (*phylink_mac_link_down)(struct dsa_switch *ds, int port,
unsigned int mode,
phy_interface_t interface);
void (*phylink_mac_link_up)(struct dsa_switch *ds, int port,
unsigned int mode,
phy_interface_t interface,
struct phy_device *phydev,
int speed, int duplex,
bool tx_pause, bool rx_pause);
void (*phylink_fixed_state)(struct dsa_switch *ds, int port,
struct phylink_link_state *state);
/*
* ethtool hardware statistics.
*/
void (*get_strings)(struct dsa_switch *ds, int port,
u32 stringset, uint8_t *data);
void (*get_ethtool_stats)(struct dsa_switch *ds,
int port, uint64_t *data);
int (*get_sset_count)(struct dsa_switch *ds, int port, int sset);
void (*get_ethtool_phy_stats)(struct dsa_switch *ds,
int port, uint64_t *data);
/*
* ethtool Wake-on-LAN
*/
void (*get_wol)(struct dsa_switch *ds, int port,
struct ethtool_wolinfo *w);
int (*set_wol)(struct dsa_switch *ds, int port,
struct ethtool_wolinfo *w);
/*
* ethtool timestamp info
*/
int (*get_ts_info)(struct dsa_switch *ds, int port,
struct ethtool_ts_info *ts);
/*
* Suspend and resume
*/
int (*suspend)(struct dsa_switch *ds);
int (*resume)(struct dsa_switch *ds);
/*
* Port enable/disable
*/
int (*port_enable)(struct dsa_switch *ds, int port,
struct phy_device *phy);
void (*port_disable)(struct dsa_switch *ds, int port);
/*
* Port's MAC EEE settings
*/
int (*set_mac_eee)(struct dsa_switch *ds, int port,
struct ethtool_eee *e);
int (*get_mac_eee)(struct dsa_switch *ds, int port,
struct ethtool_eee *e);
/* EEPROM access */
int (*get_eeprom_len)(struct dsa_switch *ds);
int (*get_eeprom)(struct dsa_switch *ds,
struct ethtool_eeprom *eeprom, u8 *data);
int (*set_eeprom)(struct dsa_switch *ds,
struct ethtool_eeprom *eeprom, u8 *data);
/*
* Register access.
*/
int (*get_regs_len)(struct dsa_switch *ds, int port);
void (*get_regs)(struct dsa_switch *ds, int port,
struct ethtool_regs *regs, void *p);
/*
* Bridge integration
*/
int (*set_ageing_time)(struct dsa_switch *ds, unsigned int msecs);
int (*port_bridge_join)(struct dsa_switch *ds, int port,
struct net_device *bridge);
void (*port_bridge_leave)(struct dsa_switch *ds, int port,
struct net_device *bridge);
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
int (*port_egress_floods)(struct dsa_switch *ds, int port,
bool unicast, bool multicast);
/*
* VLAN support
*/
int (*port_vlan_filtering)(struct dsa_switch *ds, int port,
bool vlan_filtering);
int (*port_vlan_prepare)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan);
void (*port_vlan_add)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan);
int (*port_vlan_del)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan);
/*
* Forwarding database
*/
int (*port_fdb_add)(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int (*port_fdb_del)(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int (*port_fdb_dump)(struct dsa_switch *ds, int port,
dsa_fdb_dump_cb_t *cb, void *data);
/*
* Multicast database
*/
int (*port_mdb_prepare)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_mdb *mdb);
void (*port_mdb_add)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_mdb *mdb);
int (*port_mdb_del)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_mdb *mdb);
/*
* RXNFC
*/
int (*get_rxnfc)(struct dsa_switch *ds, int port,
struct ethtool_rxnfc *nfc, u32 *rule_locs);
int (*set_rxnfc)(struct dsa_switch *ds, int port,
struct ethtool_rxnfc *nfc);
/*
* TC integration
*/
int (*cls_flower_add)(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
int (*cls_flower_del)(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
int (*cls_flower_stats)(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
int (*port_mirror_add)(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror,
bool ingress);
void (*port_mirror_del)(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
int (*port_policer_add)(struct dsa_switch *ds, int port,
struct dsa_mall_policer_tc_entry *policer);
void (*port_policer_del)(struct dsa_switch *ds, int port);
int (*port_setup_tc)(struct dsa_switch *ds, int port,
enum tc_setup_type type, void *type_data);
/*
* Cross-chip operations
*/
net: dsa: permit cross-chip bridging between all trees in the system One way of utilizing DSA is by cascading switches which do not all have compatible taggers. Consider the following real-life topology: +---------------------------------------------------------------+ | LS1028A | | +------------------------------+ | | | DSA master for Felix | | | |(internal ENETC port 2: eno2))| | | +------------+------------------------------+-------------+ | | | Felix embedded L2 switch | | | | | | | | +--------------+ +--------------+ +--------------+ | | | | |DSA master for| |DSA master for| |DSA master for| | | | | | SJA1105 1 | | SJA1105 2 | | SJA1105 3 | | | | | |(Felix port 1)| |(Felix port 2)| |(Felix port 3)| | | +--+-+--------------+---+--------------+---+--------------+--+--+ +-----------------------+ +-----------------------+ +-----------------------+ | SJA1105 switch 1 | | SJA1105 switch 2 | | SJA1105 switch 3 | +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3| +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ The above can be described in the device tree as follows (obviously not complete): mscc_felix { dsa,member = <0 0>; ports { port@4 { ethernet = <&enetc_port2>; }; }; }; sja1105_switch1 { dsa,member = <1 1>; ports { port@4 { ethernet = <&mscc_felix_port1>; }; }; }; sja1105_switch2 { dsa,member = <2 2>; ports { port@4 { ethernet = <&mscc_felix_port2>; }; }; }; sja1105_switch3 { dsa,member = <3 3>; ports { port@4 { ethernet = <&mscc_felix_port3>; }; }; }; Basically we instantiate one DSA switch tree for every hardware switch in the system, but we still give them globally unique switch IDs (will come back to that later). Having 3 disjoint switch trees makes the tagger drivers "just work", because net devices are registered for the 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC port. So packets received on the ENETC port are stripped of their stacked DSA tags one by one. Currently, hardware bridging between ports on the same sja1105 chip is possible, but switching between sja1105 ports on different chips is handled by the software bridge. This is fine, but we can do better. In fact, the dsa_8021q tag used by sja1105 is compatible with cascading. In other words, a sja1105 switch can correctly parse and route a packet containing a dsa_8021q tag. So if we could enable hardware bridging on the Felix DSA master ports, cross-chip bridging could be completely offloaded. Such as system would be used as follows: ip link add dev br0 type bridge && ip link set dev br0 up for port in sw0p0 sw0p1 sw0p2 sw0p3 \ sw1p0 sw1p1 sw1p2 sw1p3 \ sw2p0 sw2p1 sw2p2 sw2p3; do ip link set dev $port master br0 done The above makes switching between ports on the same row be performed in hardware, and between ports on different rows in software. Now assume the Felix switch ports are called swp0, swp1, swp2. By running the following extra commands: ip link add dev br1 type bridge && ip link set dev br1 up for port in swp0 swp1 swp2; do ip link set dev $port master br1 done the CPU no longer sees packets which traverse sja1105 switch boundaries and can be forwarded directly by Felix. The br1 bridge would not be used for any sort of traffic termination. For this to work, we need to give drivers an opportunity to listen for bridging events on DSA trees other than their own, and pass that other tree index as argument. I have made the assumption, for the moment, that the other existing DSA notifiers don't need to be broadcast to other trees. That assumption might turn out to be incorrect. But in the meantime, introduce a dsa_broadcast function, similar in purpose to dsa_port_notify, which is used only by the bridging notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 23:37:41 +07:00
int (*crosschip_bridge_join)(struct dsa_switch *ds, int tree_index,
int sw_index, int port,
struct net_device *br);
void (*crosschip_bridge_leave)(struct dsa_switch *ds, int tree_index,
int sw_index, int port,
struct net_device *br);
/*
* PTP functionality
*/
int (*port_hwtstamp_get)(struct dsa_switch *ds, int port,
struct ifreq *ifr);
int (*port_hwtstamp_set)(struct dsa_switch *ds, int port,
struct ifreq *ifr);
bool (*port_txtstamp)(struct dsa_switch *ds, int port,
struct sk_buff *clone, unsigned int type);
bool (*port_rxtstamp)(struct dsa_switch *ds, int port,
struct sk_buff *skb, unsigned int type);
net: dsa: Add support for deferred xmit Some hardware needs to take work to get convinced to receive frames on the CPU port (such as the sja1105 which takes temporary L2 forwarding rules over SPI that last for a single frame). Such work needs a sleepable context, and because the regular .ndo_start_xmit is atomic, this cannot be done in the tagger. So introduce a generic DSA mechanism that sets up a transmit skb queue and a workqueue for deferred transmission. The new driver callback (.port_deferred_xmit) is in dsa_switch and not in the tagger because the operations that require sleeping typically also involve interacting with the hardware, and not simply skb manipulations. Therefore having it there simplifies the structure a bit and makes it unnecessary to export functions from the driver to the tagger. The driver is responsible of calling dsa_enqueue_skb which transfers it to the master netdevice. This is so that it has a chance of performing some more work afterwards, such as cleanup or TX timestamping. To tell DSA that skb xmit deferral is required, I have thought about changing the return type of the tagger .xmit from struct sk_buff * into a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value. But the trailer tagger is reallocating every skb on xmit and therefore making a valid use of the pointer return value. So instead of reworking the API in complicated ways, right now a boolean property in the newly introduced DSA_SKB_CB is set. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:25 +07:00
/* Devlink parameters */
int (*devlink_param_get)(struct dsa_switch *ds, u32 id,
struct devlink_param_gset_ctx *ctx);
int (*devlink_param_set)(struct dsa_switch *ds, u32 id,
struct devlink_param_gset_ctx *ctx);
net: dsa: configure the MTU for switch ports It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com> Co-developed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-28 02:55:42 +07:00
/*
* MTU change functionality. Switches can also adjust their MRU through
* this method. By MTU, one understands the SDU (L2 payload) length.
* If the switch needs to account for the DSA tag on the CPU port, this
* method needs to to do so privately.
*/
int (*port_change_mtu)(struct dsa_switch *ds, int port,
int new_mtu);
int (*port_max_mtu)(struct dsa_switch *ds, int port);
};
#define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes) \
DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes, \
dsa_devlink_param_get, dsa_devlink_param_set, NULL)
int dsa_devlink_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx);
int dsa_devlink_param_set(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx);
int dsa_devlink_params_register(struct dsa_switch *ds,
const struct devlink_param *params,
size_t params_count);
void dsa_devlink_params_unregister(struct dsa_switch *ds,
const struct devlink_param *params,
size_t params_count);
int dsa_devlink_resource_register(struct dsa_switch *ds,
const char *resource_name,
u64 resource_size,
u64 resource_id,
u64 parent_resource_id,
const struct devlink_resource_size_params *size_params);
void dsa_devlink_resources_unregister(struct dsa_switch *ds);
void dsa_devlink_resource_occ_get_register(struct dsa_switch *ds,
u64 resource_id,
devlink_resource_occ_get_t *occ_get,
void *occ_get_priv);
void dsa_devlink_resource_occ_get_unregister(struct dsa_switch *ds,
u64 resource_id);
net: dsa: introduce a dsa_port_from_netdev public helper As its implementation shows, this is synonimous with calling dsa_slave_dev_check followed by dsa_slave_to_port, so it is quite simple already and provides functionality which is already there. However there is now a need for these functions outside dsa_priv.h, for example in drivers that perform mirroring and redirection through tc-flower offloads (they are given raw access to the flow_cls_offload structure), where they need to call this function on act->dev. But simply exporting dsa_slave_to_port would make it non-inline and would result in an extra function call in the hotpath, as can be seen for example in sja1105: Before: 000006dc <sja1105_xmit>: { 6dc: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} 6e0: e1a04000 mov r4, r0 6e4: e591958c ldr r9, [r1, #1420] ; 0x58c <- Inline dsa_slave_to_port 6e8: e1a05001 mov r5, r1 6ec: e24dd004 sub sp, sp, #4 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 6f0: e1c901d8 ldrd r0, [r9, #24] 6f4: ebfffffe bl 0 <dsa_8021q_tx_vid> 6f4: R_ARM_CALL dsa_8021q_tx_vid u8 pcp = netdev_txq_to_tc(netdev, queue_mapping); 6f8: e1d416b0 ldrh r1, [r4, #96] ; 0x60 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 6fc: e1a08000 mov r8, r0 After: 000006e4 <sja1105_xmit>: { 6e4: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} 6e8: e1a04000 mov r4, r0 6ec: e24dd004 sub sp, sp, #4 struct dsa_port *dp = dsa_slave_to_port(netdev); 6f0: e1a00001 mov r0, r1 { 6f4: e1a05001 mov r5, r1 struct dsa_port *dp = dsa_slave_to_port(netdev); 6f8: ebfffffe bl 0 <dsa_slave_to_port> 6f8: R_ARM_CALL dsa_slave_to_port 6fc: e1a09000 mov r9, r0 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 700: e1c001d8 ldrd r0, [r0, #24] 704: ebfffffe bl 0 <dsa_8021q_tx_vid> 704: R_ARM_CALL dsa_8021q_tx_vid Because we want to avoid possible performance regressions, introduce this new function which is designed to be public. Suggested-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 02:20:52 +07:00
struct dsa_port *dsa_port_from_netdev(struct net_device *netdev);
struct dsa_devlink_priv {
struct dsa_switch *ds;
};
struct dsa_switch_driver {
struct list_head list;
const struct dsa_switch_ops *ops;
};
struct net_device *dsa_dev_to_net_device(struct device *dev);
/* Keep inline for faster access in hot path */
net: bridge: allow enslaving some DSA master network devices Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices as bridge members") added a special check in br_if.c in order to check for a DSA master network device with a tagging protocol configured. This was done because back then, such devices, once enslaved in a bridge would become inoperative and would not pass DSA tagged traffic anymore due to br_handle_frame returning RX_HANDLER_CONSUMED. But right now we have valid use cases which do require bridging of DSA masters. One such example is when the DSA master ports are DSA switch ports themselves (in a disjoint tree setup). This should be completely equivalent, functionally speaking, from having multiple DSA switches hanging off of the ports of a switchdev driver. So we should allow the enslaving of DSA tagged master network devices. Instead of the regular br_handle_frame(), install a new function br_handle_frame_dummy() on these DSA masters, which returns RX_HANDLER_PASS in order to call into the DSA specific tagging protocol handlers, and lift the restriction from br_add_if. Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 23:37:40 +07:00
static inline bool netdev_uses_dsa(const struct net_device *dev)
{
#if IS_ENABLED(CONFIG_NET_DSA)
return dev->dsa_ptr && dev->dsa_ptr->rcv;
#endif
return false;
}
net: dsa: Allow drivers to filter packets they can decode source port from Frames get processed by DSA and redirected to switch port net devices based on the ETH_P_XDSA multiplexed packet_type handler found by the network stack when calling eth_type_trans(). The running assumption is that once the DSA .rcv function is called, DSA is always able to decode the switch tag in order to change the skb->dev from its master. However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105, user of DSA_TAG_PROTO_8021Q) where this assumption is not completely true, since switch tagging piggybacks on the absence of a vlan_filtering bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't rely on switch tagging, but on a different mechanism. So it would make sense to at least be able to terminate that. Having DSA receive traffic it can't decode would put it in an impossible situation: the eth_type_trans() function would invoke the DSA .rcv(), which could not change skb->dev, then eth_type_trans() would be invoked again, which again would call the DSA .rcv, and the packet would never be able to exit the DSA filter and would spiral in a loop until the whole system dies. This happens because eth_type_trans() doesn't actually look at the skb (so as to identify a potential tag) when it deems it as being ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer installed (therefore it's a DSA master) and that there exists a .rcv callback (everybody except DSA_TAG_PROTO_NONE has that). This is understandable as there are many switch tags out there, and exhaustively checking for all of them is far from ideal. The solution lies in introducing a filtering function for each tagging protocol. In the absence of a filtering function, all traffic is passed to the .rcv DSA callback. The tagging protocol should see the filtering function as a pre-validation that it can decode the incoming skb. The traffic that doesn't match the filter will bypass the DSA .rcv callback and be left on the master netdevice, which wasn't previously possible. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:23 +07:00
static inline bool dsa_can_decode(const struct sk_buff *skb,
struct net_device *dev)
{
#if IS_ENABLED(CONFIG_NET_DSA)
return !dev->dsa_ptr->filter || dev->dsa_ptr->filter(skb, dev);
#endif
return false;
}
void dsa_unregister_switch(struct dsa_switch *ds);
int dsa_register_switch(struct dsa_switch *ds);
struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
#ifdef CONFIG_PM_SLEEP
int dsa_switch_suspend(struct dsa_switch *ds);
int dsa_switch_resume(struct dsa_switch *ds);
#else
static inline int dsa_switch_suspend(struct dsa_switch *ds)
{
return 0;
}
static inline int dsa_switch_resume(struct dsa_switch *ds)
{
return 0;
}
#endif /* CONFIG_PM_SLEEP */
enum dsa_notifier_type {
DSA_PORT_REGISTER,
DSA_PORT_UNREGISTER,
};
struct dsa_notifier_info {
struct net_device *dev;
};
struct dsa_notifier_register_info {
struct dsa_notifier_info info; /* must be first */
struct net_device *master;
unsigned int port_number;
unsigned int switch_number;
};
static inline struct net_device *
dsa_notifier_info_to_dev(const struct dsa_notifier_info *info)
{
return info->dev;
}
#if IS_ENABLED(CONFIG_NET_DSA)
int register_dsa_notifier(struct notifier_block *nb);
int unregister_dsa_notifier(struct notifier_block *nb);
int call_dsa_notifiers(unsigned long val, struct net_device *dev,
struct dsa_notifier_info *info);
#else
static inline int register_dsa_notifier(struct notifier_block *nb)
{
return 0;
}
static inline int unregister_dsa_notifier(struct notifier_block *nb)
{
return 0;
}
static inline int call_dsa_notifiers(unsigned long val, struct net_device *dev,
struct dsa_notifier_info *info)
{
return NOTIFY_DONE;
}
#endif
/* Broadcom tag specific helpers to insert and extract queue/port number */
#define BRCM_TAG_SET_PORT_QUEUE(p, q) ((p) << 8 | q)
#define BRCM_TAG_GET_PORT(v) ((v) >> 8)
#define BRCM_TAG_GET_QUEUE(v) ((v) & 0xff)
net: dsa: Add support for deferred xmit Some hardware needs to take work to get convinced to receive frames on the CPU port (such as the sja1105 which takes temporary L2 forwarding rules over SPI that last for a single frame). Such work needs a sleepable context, and because the regular .ndo_start_xmit is atomic, this cannot be done in the tagger. So introduce a generic DSA mechanism that sets up a transmit skb queue and a workqueue for deferred transmission. The new driver callback (.port_deferred_xmit) is in dsa_switch and not in the tagger because the operations that require sleeping typically also involve interacting with the hardware, and not simply skb manipulations. Therefore having it there simplifies the structure a bit and makes it unnecessary to export functions from the driver to the tagger. The driver is responsible of calling dsa_enqueue_skb which transfers it to the master netdevice. This is so that it has a chance of performing some more work afterwards, such as cleanup or TX timestamping. To tell DSA that skb xmit deferral is required, I have thought about changing the return type of the tagger .xmit from struct sk_buff * into a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value. But the trailer tagger is reallocating every skb on xmit and therefore making a valid use of the pointer return value. So instead of reworking the API in complicated ways, right now a boolean property in the newly introduced DSA_SKB_CB is set. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 17:19:25 +07:00
netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device *dev);
int dsa_port_get_phy_strings(struct dsa_port *dp, uint8_t *data);
int dsa_port_get_ethtool_phy_stats(struct dsa_port *dp, uint64_t *data);
int dsa_port_get_phy_sset_count(struct dsa_port *dp);
void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up);
struct dsa_tag_driver {
const struct dsa_device_ops *ops;
struct list_head list;
struct module *owner;
};
void dsa_tag_drivers_register(struct dsa_tag_driver *dsa_tag_driver_array[],
unsigned int count,
struct module *owner);
void dsa_tag_drivers_unregister(struct dsa_tag_driver *dsa_tag_driver_array[],
unsigned int count);
#define dsa_tag_driver_module_drivers(__dsa_tag_drivers_array, __count) \
static int __init dsa_tag_driver_module_init(void) \
{ \
dsa_tag_drivers_register(__dsa_tag_drivers_array, __count, \
THIS_MODULE); \
return 0; \
} \
module_init(dsa_tag_driver_module_init); \
\
static void __exit dsa_tag_driver_module_exit(void) \
{ \
dsa_tag_drivers_unregister(__dsa_tag_drivers_array, __count); \
} \
module_exit(dsa_tag_driver_module_exit)
/**
* module_dsa_tag_drivers() - Helper macro for registering DSA tag
* drivers
* @__ops_array: Array of tag driver strucutres
*
* Helper macro for DSA tag drivers which do not do anything special
* in module init/exit. Each module may only use this macro once, and
* calling it replaces module_init() and module_exit().
*/
#define module_dsa_tag_drivers(__ops_array) \
dsa_tag_driver_module_drivers(__ops_array, ARRAY_SIZE(__ops_array))
#define DSA_TAG_DRIVER_NAME(__ops) dsa_tag_driver ## _ ## __ops
/* Create a static structure we can build a linked list of dsa_tag
* drivers
*/
#define DSA_TAG_DRIVER(__ops) \
static struct dsa_tag_driver DSA_TAG_DRIVER_NAME(__ops) = { \
.ops = &__ops, \
}
/**
* module_dsa_tag_driver() - Helper macro for registering a single DSA tag
* driver
* @__ops: Single tag driver structures
*
* Helper macro for DSA tag drivers which do not do anything special
* in module init/exit. Each module may only use this macro once, and
* calling it replaces module_init() and module_exit().
*/
#define module_dsa_tag_driver(__ops) \
DSA_TAG_DRIVER(__ops); \
\
static struct dsa_tag_driver *dsa_tag_driver_array[] = { \
&DSA_TAG_DRIVER_NAME(__ops) \
}; \
module_dsa_tag_drivers(dsa_tag_driver_array)
net: Distributed Switch Architecture protocol support Distributed Switch Architecture is a protocol for managing hardware switch chips. It consists of a set of MII management registers and commands to configure the switch, and an ethernet header format to signal which of the ports of the switch a packet was received from or is intended to be sent to. The switches that this driver supports are typically embedded in access points and routers, and a typical setup with a DSA switch looks something like this: +-----------+ +-----------+ | | RGMII | | | +-------+ +------ 1000baseT MDI ("WAN") | | | 6-port +------ 1000baseT MDI ("LAN1") | CPU | | ethernet +------ 1000baseT MDI ("LAN2") | |MIImgmt| switch +------ 1000baseT MDI ("LAN3") | +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4") | | | | +-----------+ +-----------+ The switch driver presents each port on the switch as a separate network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces. This initial patch supports the MII management interface register layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and supports the "Ethertype DSA" packet tagging format. (There is no officially registered ethertype for the Ethertype DSA packet format, so we just grab a random one. The ethertype to use is programmed into the switch, and the switch driver uses the value of ETH_P_EDSA for this, so this define can be changed at any time in the future if the one we chose is allocated to another protocol or if Ethertype DSA gets its own officially registered ethertype, and everything will continue to work.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Tested-by: Nicolas Pitre <nico@marvell.com> Tested-by: Byron Bradley <byron.bbradley@gmail.com> Tested-by: Tim Ellis <tim.ellis@mac.com> Tested-by: Peter van Valderen <linux@ddcrew.com> Tested-by: Dirk Teurlings <dirk@upexia.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 20:44:02 +07:00
#endif