Commit Graph

796791 Commits

Author SHA1 Message Date
Stefano Brivio
e7cc082455 udp: Support for error handlers of tunnels with arbitrary destination port
ICMP error handling is currently not possible for UDP tunnels not
employing a receiving socket with local destination port matching the
remote one, because we have no way to look them up.

Add an err_handler tunnel encapsulation operation that can be exported by
tunnels in order to pass the error to the protocol implementing the
encapsulation. We can't easily use a lookup function as we did for VXLAN
and GENEVE, as protocol error handlers, which would be in turn called by
implementations of this new operation, handle the errors themselves,
together with the tunnel lookup.

Without a socket, we can't be sure which encapsulation error handler is
the appropriate one: encapsulation handlers (the ones for FoU and GUE
introduced in the next patch, e.g.) will need to check the new error codes
returned by protocol handlers to figure out if errors match the given
encapsulation, and, in turn, report this error back, so that we can try
all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.

v2:
- Name all arguments in err_handler prototypes (David Miller)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
32bbd8793f net: Convert protocol error handlers from void to int
We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
ce7336610c selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a025fb5f49 geneve: Allow configuration of DF behaviour
draft-ietf-nvo3-geneve-08 says:

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC1981]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Now that ICMP error handling is working for GENEVE, we can comply with
this recommendation.

Make this configurable, though, to avoid breaking existing setups. By
default, DF won't be set. It can be set or inherited from inner IPv4
packets. If it's configured to be inherited and we are encapsulating IPv6,
it will be set.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, apply DF
  setting only if (!geneve->collect_md) in geneve_xmit_skb()
  (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a07966447f geneve: ICMP error lookup handler
Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
582888792f selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Change all occurrences of VxLAN to VXLAN (Jiri Benc)
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
b4d3069783 vxlan: Allow configuration of DF behaviour
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.

For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.

According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.

Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, move DF
  setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
c3a43b9fec vxlan: ICMP error lookup handler
Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a36e185e8c udp: Handle ICMP errors for tunnels with same destination port on both endpoints
For both IPv4 and IPv6, if we can't match errors to a socket, try
tunnels before ignoring them. Look up a socket with the original source
and destination ports as found in the UDP packet inside the ICMP payload,
this will work for tunnels that force the same destination port for both
endpoints, i.e. VXLAN and GENEVE.

Actually, lwtunnels could break this assumption if they are configured by
an external control plane to have different destination ports on the
endpoints: in this case, we won't be able to trace ICMP messages back to
them.

For IPv6 redirect messages, call ip6_redirect() directly with the output
interface argument set to the interface we received the packet from (as
it's the very interface we should build the exception on), otherwise the
new nexthop will be rejected. There's no such need for IPv4.

Tunnels can now export an encap_err_lookup() operation that indicates a
match. Pass the packet to the lookup function, and if the tunnel driver
reports a matching association, continue with regular ICMP error handling.

v2:
- Added newline between network and transport header sets in
  __udp{4,6}_lib_err_encap() (David Miller)
- Removed redundant skb_reset_network_header(skb); in
  __udp4_lib_err_encap()
- Removed redundant reassignment of iph in __udp4_lib_err_encap()
  (Sabrina Dubroca)
- Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
  won't work with lwtunnels configured to use asymmetric ports. By the way,
  it's VXLAN, not VxLAN (Jiri Benc)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Colin Ian King
141b95d551 net: hns3: fix spelling mistake, "assertting" -> "asserting"
Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:07:56 -08:00
Ganesh Goudar
6d444c4efc cxgb4: Add new T6 PCI device ids 0x608a
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:05:20 -08:00
Li RongQing
1c51dc9ad6 net/ipv6: compute anycast address hash only if dev is null
avoid to compute the hash value if dev is not null, since
hash value is not used

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:04:43 -08:00
YueHaibing
0db55093b5 net: bcmgenet: return correct value 'ret' from bcmgenet_power_down
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 'bcmgenet_power_down':
drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning:
 variable 'ret' set but not used [-Wunused-but-set-variable]

bcmgenet_power_down should return 'ret' instead of 0.

Fixes: ca8cf34190 ("net: bcmgenet: propagate errors from bcmgenet_power_down")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:21:41 -08:00
David S. Miller
3ed3857011 Merge branch 'net-sched-prepare-for-more-Qdisc-offloads'
Jakub Kicinski says:

====================
net: sched: prepare for more Qdisc offloads

This series refactors the "switchdev" Qdisc offloads a little.  We have
a few Qdiscs which can be fully offloaded today to the forwarding plane
of switching devices.

First patch adds a helper for handing statistic dumps, the code seems
to be copy pasted between PRIO and RED.  Second patch removes unnecessary
parameter from RED offload function.  Third patch makes the MQ offload
use the dump helper which helps it behave much like PRIO and RED when
it comes to the TCQ_F_OFFLOADED flag.  Patch 4 adds a graft helper,
similar to the dump helper.

Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a
small refactor - no functional changes there.

Last two patches move the qdisc_put() call outside of the sch_tree_lock
section for RED and PRIO.  The child Qdiscs will get removed from the
hierarchy under the lock, but having the put (and potentially destroy)
called outside of the lock helps offload which may choose to sleep,
and it should generally lower the Qdisc change impact.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
7b8e0b6e65 net: sched: prio: delay destroying child qdiscs on change
Move destroying of the old child qdiscs outside of the sch_tree_lock()
section.  This should improve the software qdisc replace but is even
more important for offloads.  Calling offloads under a spin lock is
best avoided, and child's destroy would be called under sch_tree_lock().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
0c8d13ac96 net: sched: red: delay destroying child qdisc on replace
Move destroying of the old child qdisc outside of the sch_tree_lock()
section.  This should improve the software qdisc replace but is even
more important for offloads.  Firstly calling offloads under a spin
lock is best avoided.  Secondly the destroy event of existing child
would have been sent to the offload device before the replace, causing
confusion.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
9da93ece59 net: sched: refactor grafting Qdiscs with a parent
The code for grafting Qdiscs when there is a parent has two needless
indentation levels, and breaks the "keep the success path unindented"
guideline.  Refactor.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
bfaee9113f net: sched: add an offload graft helper
Qdisc graft operation of offload-capable qdiscs performs a few
extra steps which are identical among all the qdiscs.  Add
a helper to share this code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
58f8927399 net: sched: set TCQ_F_OFFLOADED flag for MQ
PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload,
make MQ do the same.  The consistency will help with consistent
graft callback behaviour.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
dad54c0fab net: sched: red: remove unnecessary red_dump_offload_stats parameter
Offload dump helper does not use opt parameter, remove it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:47 -08:00
Jakub Kicinski
b592843c67 net: sched: add an offload dump helper
Qdisc dump operation of offload-capable qdiscs performs a few
extra steps which are identical among all the qdiscs.  Add
a helper to share this code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:47 -08:00
David S. Miller
80b6265c0f Merge branch 'net-phy-improve-and-simplify-phylib-state-machine'
Heiner Kallweit says:

====================
net: phy: improve and simplify phylib state machine

This patch series is based on two axioms:

- During autoneg a PHY always reports the link being down

- Info in clause 22/45 registers doesn't allow to differentiate between
  these two states:
  1. Link is physically down
  2. A link partner is connected and PHY is autonegotiating
  In both cases "link up" and "aneg finished" bits aren't set.
  One consequence is that having separate states PHY_NOLINK and PHY_AN
  isn't needed.

By using these two axioms the state machine can be significantly
simplified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:06 -08:00
Heiner Kallweit
c8e977bab3 net: phy: use phy_check_link_status in more places in the state machine
Use phy_check_link_status in more places in the state machine.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:06 -08:00
Heiner Kallweit
85a1f31d63 net: phy: remove state PHY_AN
After the recent changes in the state machine state PHY_AN isn't used
any longer and can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:06 -08:00
Heiner Kallweit
74a992b359 net: phy: add phy_check_link_status
In few places in the state machine the state is set to PHY_RUNNING or
PHY_NOLINK after doing a phy_read_status(). So factor this out to
phy_check_link_status().

First use it in phy_start_aneg(): By setting the state to PHY_RUNNING
or PHY_NOLINK directly we can remove the code to handle the case that
we're using interrupts and aneg was finished already.

Definition of phy_link_up and phy_link_down needs to be moved because
they are called in the new function.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:05 -08:00
Heiner Kallweit
c96469f830 net: phy: remove useless check in state machine case PHY_RESUMING
If aneg isn't finished yet then the PHY reports the link as down.
There's no benefit in setting the state to PHY_AN because the next
state machine run would set the status to PHY_NOLINK anyway (except
in the meantime aneg has been finished and link is up). Therefore
we can set the state to PHY_RUNNING or PHY_NOLINK directly.

In addition change the do_carrier parameter in phy_link_down() to true.
If carrier was marked as up before (what should never be the case because
PHY was in state PHY_HALTED before) then we should mark it as down now.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:05 -08:00
Heiner Kallweit
3b01ea72f3 net: phy: remove useless check in state machine case PHY_NOLINK
If aneg is enabled and the PHY reports the link as up then definitely
aneg finished successfully. Therefore this check is useless and
can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 15:02:05 -08:00
David S. Miller
5867b33014 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-11-07

This series contains updates to almost all of the Intel wired LAN
drivers.

Lance Roy replaces a spin lock with lockdep_assert_held() for igbvf
driver in move toward trying to remove spin_is_locked().

Colin Ian King fixes a potential null pointer dereference by adding a
check in ixgbe.  Also fixed the igc driver by properly assigning the
return error code of a function call, so that we can properly check it.

Shannon Nelson updates the ixgbe driver to not block IPsec offload when
in VEPA mode, in VEB mode, IPsec offload is still blocked because the
device drops packets into a black hole.

Jake adds support for software timestamping for packets sent over
ixgbevf.  Also modifies i40e, iavf, igb, igc, and ixgbe to delay calling
skb_tx_timestamp() to the latest point possible, which is just prior to
notifying the hardware of the new Tx packet.

Todd adds the new WoL filter flag so that we properly report that we do
not support this new feature.

YueHaibing from Huawei fixes the igc driver by cleaning up variables
that are not "really" used.

Dan Carpenter cleans up igc whitespace issues.

Miroslav Lichvar fixes e1000e for potential underflow issue in the
timecounter, so modify the driver to use timecounter_cyc2time() to allow
non-monotonic SYSTIM readings.

Sasha provides additional igc cleanups based on community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 23:07:04 -08:00
David S. Miller
be08989c4d Merge branch 'nfp-add-and-use-tunnel-netdev-helpers'
John Hurley says:

====================
nfp: add and use tunnel netdev helpers

A recent patch introduced the function netif_is_vxlan() to verify the
tunnel type of a given netdev as vxlan.

Add a similar function to detect geneve netdevs and make use of this
function in the NFP driver. Also make use of the vxlan helper where
applicable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 23:00:24 -08:00
John Hurley
e963e1097a nfp: flower: include geneve as supported offload tunnel type
Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 23:00:23 -08:00
John Hurley
83f27d027d nfp: flower: use geneve and vxlan helpers
Make use of the recently added VXLAN and geneve helper functions to
determine the type of the netdev from its rtnl_link_ops.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 23:00:23 -08:00
John Hurley
1d10bd1676 net: add netif_is_geneve()
Add a helper function to determine if the type of a netdev is geneve based
on its rtnl_link_ops. This allows drivers that may wish to offload tunnels
to check the underlying type of the device.

A recent patch added a similar helper to vxlan.h

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 23:00:23 -08:00
Edward Cree
cea0604d3f sfc: add missing NVRAM partition types for EF10
Expose the MUM/SUC Firmware, UEFI Expansion ROM and MC Status partitions
 of the NIC's NVRAM as MTDs if found on the NIC.  The first two are needed
 in order to properly update them when performing firmware updates; the MC
 Status partition is used to determine whether a signed firmware image was
 accepted or rejected by a Secure NIC.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:58:00 -08:00
David S. Miller
7025abb2e4 Merge branch 'vlan-prepare-for-removal-of-VLAN_TAG_PRESENT'
Michał Mirosław says:

====================
net/vlan: prepare for removal of VLAN_TAG_PRESENT

This is a preparatory patchset before removing the use of VLAN_TAG_PRESENT
bit in skb->vlan_tci as indication of VLAN offload. This set includes
only cleanups that allow abstracting of code testing VLAN tag presence
in drivers and networking code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:41:20 -08:00
Michał Mirosław
295d072a42 net/vlan: remove unused #define HAVE_VLAN_GET_TAG
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:41:20 -08:00
Michał Mirosław
9b319148cb net/vlan: include the shift in skb_vlan_tag_get_prio()
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:41:19 -08:00
Michał Mirosław
e0a6b80973 net/vlan: introduce __vlan_hwaccel_copy_tag() helper
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:41:19 -08:00
Michał Mirosław
c8accd5a0a net/vlan: introduce __vlan_hwaccel_clear_tag() helper
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:41:19 -08:00
Yafang Shao
1295e2cf30 inet: minor optimization for backlog setting in listen(2)
Set the backlog earlier in inet_dccp_listen() and inet_listen(),
then we can avoid the redundant setting.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:31:07 -08:00
Davide Caratti
7dad9937e0 net: vlan: add support for tunnel offload
GSO tunneled packets are always segmented in software before they are
transmitted by a VLAN, even when the lower device can offload tunnel
encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL
mask are set in the lower device 'vlan_features'). If we let VLANs have
the same tunnel offload capabilities as their lower device, throughput
can improve significantly when CPU is limited on the transmitter side.

 - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure
 that 'features' will have those bits zeroed only when the lower device
 has no hardware support for tunnel encapsulation.
 - for the same reason, copy GSO-related bits of 'hw_enc_features' from
 lower device to VLAN, and ensure to update that value when the lower
 device changes its features.
 - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev'
 is able to compute checksums at least for a kind of packets, like done
 with commit 8403debeea ("vlan: Keep NETIF_F_HW_CSUM similar to other
 software devices"). This avoids software segmentation due to mismatching
 checksum capabilities between VLAN's 'features' and 'hw_enc_features'.

Reported-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:23:30 -08:00
Paolo Abeni
f29eb2a96c tun: compute the RFS hash only if needed.
The tun XDP sendmsg code path, unconditionally computes the symmetric
hash of each packet for RFS's sake, even when we could skip it. e.g.
when the device has a single queue.

This change adds the check already in-place for the skb sendmsg path
to avoid unneeded hashing.

The above gives small, but measurable, performance gain for VM xmit
path when zerocopy is not enabled.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:22:16 -08:00
Mathias Thore
2e7ad56aa5 net/wan/fsl_ucc_hdlc: add BQL support
Add byte queue limits support in the fsl_ucc_hdlc driver.

Signed-off-by: Mathias Thore <mathias.thore@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:21:25 -08:00
Heiner Kallweit
3b73e842c7 net: phy: realtek: load driver for all PHYs with a Realtek OUI
Instead of listing every single PHYID, load the driver for every PHYID
with a Realtek OUI, independent of model number and revision.

This patch also improves two further aspects:
- constify realtek_tbl[]
- the mask should have been 0xffffffff instead of 0x001fffff so far,
  by masking out some bits a PHY from another vendor could have been
  matched

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:18:58 -08:00
Heiner Kallweit
a3320bcf28 net: phy: make phy_trigger_machine static
phy_trigger_machine() is used in phy.c only, so we can make it static.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:18:11 -08:00
kbuild test robot
f908620019 net: dsa: bcm_sf2: fix semicolon.cocci warnings
drivers/net/dsa/bcm_sf2_cfp.c:1168:2-3: Unneeded semicolon
drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: ae7a5aff78 ("net: dsa: bcm_sf2: Keep copy of inserted rules")
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 22:14:05 -08:00
Justin Chen
8572a1b4db net: phy: bcm7xxx: Add entry for BCM7255
Add support for BCM7255 EPHY.

Signed-off-by: Justin Chen <justinpopo6@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 21:50:27 -08:00
David S. Miller
cab6949bf7 Merge branch 'udp-gro'
Paolo Abeni says:

====================
udp: implement GRO support

This series implements GRO support for UDP sockets, as the RX counterpart
of commit bec1f6f697 ("udp: generate gso with UDP_SEGMENT").
The core functionality is implemented by the second patch, introducing a new
sockopt to enable UDP_GRO, while patch 3 implements support for passing the
segment size to the user space via a new cmsg.
UDP GRO performs a socket lookup for each ingress packets and aggregate datagram
directed to UDP GRO enabled sockets with constant l4 tuple.

UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT
rules, and that could potentially confuse existing applications.

The solution adopted here is to de-segment the GRO packet before enqueuing
as needed. Since we must cope with packet reinsertion after de-segmentation,
the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed
to UDP usage.

While the current code can probably be improved, this safeguard ,implemented in
the patches 4-7, allows future enachements to enable UDP GSO offload on more
virtual devices eventually even on forwarded packets.

The last 4 for patches implement some performance and functional self-tests,
re-using the existing udpgso infrastructure. The problematic scenario described
above is explicitly tested.

This revision of the series try to address the feedback provided by Willem and
Subash on previous iteration.
====================

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni
3327a9c463 selftests: add functionals test for UDP GRO
Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

rfc v3 -> v1:
 - use ip route to attach the xdp helper to the veth

rfc v2 -> rfc v3:
 - add missing test program options documentation
 - fix sporatic test failures (receiver faster than sender)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni
e87f53b4fa selftests: add some benchmark for UDP GRO
Run on top of veth pair, using a dummy XDP program to enable the GRO.

 rfc v3 -> v1:
  - use ip route to attach the xdp helper to the veth

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00
Paolo Abeni
bd8e1afe64 selftests: add dummy xdp test helper
This trivial XDP program does nothing, but will be used by the
next patch to test the GRO path in a net namespace, leveraging
the veth XDP implementation.

It's added here, despite its 'net' usage, to avoid the duplication
of the llc-related makefile boilerplate.

rfc v3 -> v1:
 - move the helper implementation into the bpf directory, don't
   touch udpgso_bench_rx

rfc v2 -> rfc v3:
 - move 'x' option handling here

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07 16:23:05 -08:00