In order to benefit from the new napi_defer_hard_irqs feature,
we need to use napi_complete_done() variant in this driver.
RX path is already using it, this patch implements TX completion side.
mlx4_en_process_tx_cq() now returns the amount of retired packets,
instead of a boolean, so that mlx4_en_poll_tx_cq() can pass
this value to napi_complete_done().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro_flush_timeout and napi_defer_hard_irqs can be read
from napi_complete_done() while other cpus write the value,
whithout explicit synchronization.
Use READ_ONCE()/WRITE_ONCE() to annotate the races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in commit 3b47d30396 ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.
Since then, we added the napi_complete_done() interface, and commit
364b605573 ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.
This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.
We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.
In contrast, hrtimers are using local (percpu) resources and might have lower
cost.
The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)
By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.
One concrete usage can be :
echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
If at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.
On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().
This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.
Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.
It also keeps serving cpus not idle too long, reducing tail latencies.
Co-developed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru says:
====================
qed*: Add support for pcie advanced error recovery.
The patch series adds qed/qede driver changes for PCIe Advanced Error
Recovery (AER) support.
Patch (1) adds qed changes to enable the device to send error messages
to root port when detected.
Patch (2) adds qede support for handling the detected errors (AERs).
Changes from previous version:
-------------------------------
v2: use pci_num_vf() instead of caching the value in edev.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The error recovery is handled by management firmware (MFW) with the help of
qed/qede drivers. Upon detecting the errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch enables the device to send error messages to root port when
an error is detected.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro_cells lib is used by different encapsulating netdevices, such as
geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
CPU tag is a sort of "encapsulation", and we can use the same mechs to
greatly improve overall DSA performance.
skbs are passed to the GRO layer after removing CPU tags, so we don't
need any new packet offload types as it was firstly proposed by me in
the first GRO-over-DSA variant [1].
The size of struct gro_cells is sizeof(void *), so hot struct
dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
remain in one 32-byte cacheline.
The other positive side effect is that drivers for network devices
that can be shipped as CPU ports of DSA-driven switches can now use
napi_gro_frags() to pass skbs to kernel. Packets built that way are
completely non-linear and are likely being dropped without GRO.
This was tested on to-be-mainlined-soon Ethernet driver that uses
napi_gro_frags(), and the overall performance was on par with the
variant from [1], sometimes even better due to minimal overhead.
net.core.gro_normal_batch tuning may help to push it to the limit
on particular setups and platforms.
iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
on 1.2 GHz MIPS board:
5.7-rc2 baseline:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender
[ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver
Iface RX packets TX packets
eth0 7097731 7097702
port0 426050 6671829
port1 6671681 425862
port1.218 6671677 425851
With this patch:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender
[ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver
Iface RX packets TX packets
eth0 9474792 9474777
port0 455200 353288
port1 9019592 455035
port1.218 353144 455024
v2:
- Add some performance examples in the commit message;
- No functional changes.
[1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/
Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.
This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs. In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei says:
====================
dpaa2-eth: add support for xdp bulk enqueue
The first patch moves the DEV_MAP_BULK_SIZE macro into the xdp.h header
file so that drivers can take advantage of it and use it.
The following 3 patches are there to setup the scene for using the bulk
enqueue feature. First of all, the prototype of the enqueue function is
changed so that it returns the number of enqueued frames. Second, the
bulk enqueue interface is used but without any functional changes, still
one frame at a time is enqueued. Third, the .ndo_xdp_xmit callback is
split into two stages, create all FDs for the xdp_frames received and
then enqueue them.
The last patch of the series builds on top of the others and instead of
issuing an enqueue operation for each FD it issues a bulk enqueue call
for as many frames as possible. This is repeated until all frames are
enqueued or the maximum number of retries is hit. We do not use the
XDP_XMIT_FLUSH flag since the architecture is not capable to store all
frames dequeued in a NAPI cycle, instead we send out right away all
frames received in a .ndo_xdp_xmit call.
Changes in v2:
- statically allocate an array of dpaa2_fd by frame queue
- use the DEV_MAP_BULK_SIZE as the maximum number of xdp_frames
received in .ndo_xdp_xmit()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Take advantage of the bulk enqueue feature in .ndo_xdp_xmit.
We cannot use the XDP_XMIT_FLUSH since the architecture is not capable
to store all the frames dequeued in a NAPI cycle so we instead are
enqueueing all the frames received in a ndo_xdp_xmit call right away.
After setting up all FDs for the xdp_frames received, enqueue multiple
frames at a time until all are sent or the maximum number of retries is
hit.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having a function that both creates a frame descriptor from
an xdp_frame and enqueues it, split this into two stages.
Add the dpaa2_eth_xdp_create_fd that just transforms an xdp_frame into a
FD while the actual enqueue callback is called directly from the ndo for
each frame.
This is particulary useful in conjunction with bulk enqueue.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the dpaa2-eth driver to use the bulk enqueue function introduced
with the change to QBMAN ring mode. At the moment, no functional changes
are made but rather the driver just transitions to the new interface
while still enqueuing just one frame at a time.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enqueue dpaa2-eth callback now returns the number of successfully
enqueued frames. This is a preliminary patch necessary for adding
support for bulk ring mode enqueue.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export the DEV_MAP_BULK_SIZE macro to the header file so that drivers
can directly use it as the maximum number of xdp_frames received in the
.ndo_xdp_xmit() callback.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add nodad when adding IPv6 addresses and remove the sleep.
A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.
Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
Add selftests for pedit ex munge ip6 dsfield
Patch #1 extends the existing generic forwarding selftests to cover pedit
ex munge ip6 traffic_class as well. Patch #2 adds TDC test coverage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a self-test for the IPv6 dsfield munge that iproute2 will support.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel says:
====================
add TJA1102 support
changes v5:
- rename __of_mdiobus_register_phy() to of_mdiobus_phy_device_register()
changes v4:
- remove unused phy_id variable
changes v3:
- export part of of_mdiobus_register_phy() and reuse it in tja11xx
driver
- coding style fixes
changes v2:
- use .match_phy_device
- add irq support
- add add delayed registration for PHY1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
TJA1102 is a dual PHY package with PHY0 having proper PHYID and PHY1
having no ID. On one hand it is possible to for PHY detection by
compatible, on other hand we should be able to reset complete chip
before PHY1 configured it, and we need to define dependencies for proper
power management.
We can solve it by defining PHY1 as child of PHY0:
tja1102_phy0: ethernet-phy@4 {
reg = <0x4>;
interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;
reset-gpios = <&gpio5 9 GPIO_ACTIVE_LOW>;
reset-assert-us = <20>;
reset-deassert-us = <2000>;
tja1102_phy1: ethernet-phy@5 {
reg = <0x5>;
interrupts-extended = <&gpio5 8 IRQ_TYPE_LEVEL_LOW>;
};
};
The PHY1 should be a subnode of PHY0 and registered only after PHY0 was
completely reset and initialized.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function will be needed in tja11xx driver for secondary PHY
support.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
TJA1102 is an dual T1 PHY chip. Both PHYs are separately addressable.
Both PHYs are similar but have different amount of functionality. For
example PHY 1 has no PHY ID and no health monitor.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the NXP TJA11xx PHY bindings.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use IS_ERR() and PTR_ERR() instead of PTR_ZRR_OR_ZERO()
to simplify code, avoid redundant paramenter definitions
and judgements.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for following phy-modes: rgmii, rgmii-id, rgmii-txid, rgmii-rxid.
This PHY has an internal RX delay of 1.2ns and no delay for TX.
The pad skew registers allow to set the total TX delay to max 1.38ns and
the total RX delay to max of 2.58ns (configurable 1.38ns + build in
1.2ns) and a minimal delay of 0ns.
According to the RGMII v1.3 specification the delay provided by PCB traces
should be between 1.5ns and 2.0ns. The RGMII v2.0 allows to provide this
delay by MAC or PHY. So, we configure this PHY to the best values we can
get by this HW: TX delay to 1.38ns (max supported value) and RX delay to
1.80ns (best calculated delay)
The phy-modes can still be fine tuned/overwritten by *-skew-ps
device tree properties described in:
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Philippe Schenker <philippe.schenker@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following coccicheck warning:
net/caif/caif_dev.c:410:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:445:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:145:1-12: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:223:1-12: WARNING: Assignment of 0/1 to bool
variable
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For dwmac4, enable VLAN promiscuity when MAC controller is requested to
enter promiscuous mode.
Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan_hash_lookup() uses list_for_each_entry_rcu() for traversing
should either under RCU in fast path or the protection of rtnl_mutex.
In the case of holding RTNL, we should add the corresponding lockdep
expression to silence the following false-positive warning:
=============================
WARNING: suspicious RCU usage
5.7.0-rc1-next-20200416-00003-ga3b8d28bc #1 Not tainted
-----------------------------
drivers/net/macvlan.c:126 RCU-list traversed in non-reader section!!
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now it can be seen that the VSC9959 (Felix) switch will not flood
frames if they have a VLAN tag with a PCP of 1-7 (nonzero).
It turns out that Felix is quite different from its cousin, Ocelot, in
that frame flooding can be allowed/denied per traffic class. Where
Ocelot has 1 instance of the ANA_FLOODING register, Felix has 8.
The approach that this driver is going to take is "thanks, but no
thanks". We have no use case of limiting the flooding domain based on
traffic class, so we just want to allow packets to be flooded, no matter
what traffic class they have.
So we copy the line of code from ocelot.c which does the one-shot
initialization of the flooding PGIDs, and we add it to felix.c as well -
except replicated 8 times.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tracepoint support for QRTR with NS as the first candidate. Later on
this can be extended to core QRTR and transport drivers.
The trace_printk() used in NS has been replaced by tracepoints.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv6/ila/ila_xlat.c:604:0: warning: macro "ILA_HASH_TABLE_SIZE" is not used [-Wunused-macros]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the act_ct SW offload in flowtable, The counter of the conntrack
entry will never update. So update the nf_conn_acct conuter in act_ct
flowtable software offload.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: add device-managed devm_mdiobus_register
If there's no special ordering requirement for mdiobus_unregister(),
then driver code can be simplified by using a device-managed version
of mdiobus_register(). Prerequisite is that bus allocation has been
done device-managed too. Else mdiobus_free() may be called whilst
bus is still registered, resulting in a BUG_ON(). Therefore let
devm_mdiobus_register() return -EPERM if bus was allocated
non-managed.
First user of the new functionality is r8169 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new function devm_mdiobus_register() to simplify the driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there's no special ordering requirement for mdiobus_unregister(),
then driver code can be simplified by using a device-managed version
of mdiobus_register(). Prerequisite is that bus allocation has been
done device-managed too. Else mdiobus_free() may be called whilst
bus is still registered, resulting in a BUG_ON(). Therefore let
devm_mdiobus_register() return -EPERM if bus was allocated
non-managed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY supports monitoring its die temperature as well as two analog
voltages. Add support for it.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Broadcom BCM54140 is a Quad SGMII/QSGMII Copper/Fiber Gigabit
Ethernet transceiver.
This also adds support for tunables to set and get downshift and
energy detect auto power-down.
The PHY has four ports and each port has its own PHY address.
There are per-port registers as well as global registers.
Unfortunately, the global registers can only be accessed by reading
and writing from/to the PHY address of the first port. Further,
there is no way to find out what port you actually are by just
reading the per-port registers. We therefore, have to scan the
bus on the PHY probe to determine the port and thus what address
we need to access the global registers.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDB (Register Data Base) registers are used on newer Broadcom PHYs. Add
helper to read, write and modify these registers.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
dt-bindings: net: mdio.yaml fixes
This patch series documents some common MDIO devices properties such as
resets (and delays) and broken-turn-around. The second patch also
rephrases some descriptions to be more general towards MDIO devices and
not specific towards Ethernet PHYs.
Changes in v3:
- corrected wording of 'broken-turn-around' in ethernet-phy.yaml and
mdio.yaml, add Andrew's R-b tag to patch #3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of descriptions assume a PHY device, but since this binding
describes a MDIO bus which can have different kinds of MDIO devices
attached to it, rephrase some descriptions to be more general in that
regard.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the properties pertaining to the broken turn around or resets
were only documented in ethernet-phy.yaml while they are applicable
across all MDIO devices and not Ethernet PHYs specifically which are a
superset.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The turn around bytes (2) are placed between the control phase of the
MDIO transaction and the data phase, correct the wording to be more
exact.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Ocelot MAC_ETYPE tc-flower key improvements
As discussed in the comments surrounding this patch:
https://patchwork.ozlabs.org/project/netdev/patch/20200417190308.32598-1-olteanv@gmail.com/
the restrictions imposed on non-MAC_ETYPE rules were harsher than they
needed to be. IP, IPv6, ARP rules can still be added concurrently with
src_mac and dst_mac rules, as long as those MAC address rules do not ask
for an offending EtherType.
For that to actually be supported, we need to parse the EtherType from
the flower classification rule first.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
An attempt was made in commit fe3490e610 ("net: mscc: ocelot: Hardware
ofload for tc flower filter") to avoid clashes between MAC_ETYPE rules
and IP rules. Because the protocol blacklist should have included
ETH_P_ALL too, it created some confusion, but now the situation should
be dealt with a bit better by the patch immediately previous to this one
("net: mscc: ocelot: refine the ocelot_ace_is_problematic_mac_etype
function").
So now we can remove that check. MAC_ETYPE rules with a protocol of
ETH_P_IP, ETH_P_IPV6, ETH_P_ARP and ETH_P_ALL _are_ supported, with some
restrictions regarding per-port exclusivity which are enforced now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit mentioned below was a bit too harsh, and while it restricted
the invalid key combinations which are known to not work, such as:
tc filter add dev swp0 ingress proto ip \
flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto all \
flower src_mac 00:11:22:33:44:55 action drop
it also restricted some which still should work, such as:
tc filter add dev swp0 ingress proto ip \
flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto 0x22f0 \
flower src_mac 00:11:22:33:44:55 action drop
What actually does not match "sanely" is a MAC_ETYPE rule on frames
having an EtherType of ARP, IPv4, IPv6, in addition to SNAP and OAM
frames (which the ocelot tc-flower implementation does not parse yet, so
the function might need to be revisited again in the future).
So just make the function recognize the problematic MAC_ETYPE rules by
EtherType - thus the VCAP IS2 can be forced to match even on those
packets.
This patch makes it possible for IP rules to live on a port together
with MAC_ETYPE rules that are non-all, non-arp, non-ip and non-ipv6.
Fixes: d4d0cb741d7b ("net: mscc: ocelot: deal with problematic MAC_ETYPE VCAP IS2 rules")
Reported-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the filter's protocol is ignored except for a few special
cases (IPv4 and IPv6).
The EtherType can be matched inside VCAP IS2 by using a MAC_ETYPE key.
So there are 2 cases in which EtherType matches are supported:
- As part of a larger MAC_ETYPE rule, such as:
tc filter add dev swp0 ingress protocol ip \
flower skip_sw src_mac 42:be:24:9b:76:20 action drop
- Standalone (matching on protocol only):
tc filter add dev swp0 ingress protocol arp \
flower skip_sw action drop
As before, if the protocol is not specified, is it implicitly "all" and
the EtherType mask in the MAC_ETYPE half key is set to zero.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu says:
====================
Support programmable pins for Ocelot PTP driver
The Ocelot PTP clock driver had been embedded into ocelot.c driver.
It had supported basic gettime64/settime64/adjtime/adjfine functions
by now which were used by both Ocelot switch and Felix switch.
This patch-set is to move current ptp clock code out of ocelot.c driver
maintaining as a single ocelot_ptp.c driver, and to implement 4
programmable pins with only PTP_PF_PEROUT function for now.
The PTP_PF_EXTTS function will be supported in the future, and it should
be implemented separately for Felix and Ocelot, because of different
hardware interrupt implementation in them.
Changes for v2:
- Put PTP driver under drivers/net/ethernet/mscc/.
- Dropped MAINTAINERS patch. Kept original maintaining.
- Initialized PTP separately in ocelot/felix platforms.
- Supported PPS case in programmable pin.
- Supported disabling pin function since deadlock is fixed by Richard.
- Returned -EBUSY if not finding pin available.
Changes for v3:
- Re-sent.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>