Commit Graph

663090 Commits

Author SHA1 Message Date
Russell King
060fbc894b net: phy: clean up mmd_phy_indirect()
Make mmd_phy_indirect() use the same terminology as the rest of the
code, making clear what each address is - phy address, devad, and
register number.

While here, remove the "inline" from this static function, leaving
it to the compiler to decide whether to inline this function, and
get rid of unnecessary parens.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
3b85d8df26 net: phy: remove the indirect MMD read/write methods
Remove the indirect MMD read/write methods which are now no longer
necessary.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
d11437e0af net: phy: convert micrel to new read_mmd/write_mmd driver methods
Convert micrel to the new read_mmd/write_mmd driver methods.  This
Clause 22 PHY does not support any MMD access method.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
a6d99fcd3f net: phy: switch remaining users to phy_(read|write)_mmd()
Switch everyone over to using phy_read_mmd() and phy_write_mmd() now
that they are able to handle both Clause 22 indirect addressing and
Clause 45 direct addressing methods to the MMD registers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
5f61367729 net: lan78xx: update for phy_(read|write)_mmd_indirect() removal
lan78xx appears to use phylib in a rather weird way, accessing the PHY
partly through phylib, and partly by making direct accesses to it,
including to the Clause 45 registers.  As the indirect MMD accessors are
going away, update this driver to use the plain phy_(read|write)_mmd()
accessors instead.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
1ee6b9bc62 net: phy: make phy_(read|write)_mmd() generic MMD accessors
Make phy_(read|write)_mmd() generic 802.3 clause 45 register accessors
for both Clause 22 and Clause 45 PHYs, using either the direct register
reading for Clause 45, or the indirect method for Clause 22 PHYs.
Allow this behaviour to be overriden by PHY drivers where necessary.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:43:00 -07:00
Russell King
9860118b58 net: phy: move phy MMD accessors to phy-core.c
Move the phy_(read|write)__mmd() helpers out of line, they will become
our main MMD accessor functions, and so will be a little more complex.
This complexity doesn't belong in an inline function.  Also move the
_indirect variants as well to keep like functionality together.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:42:59 -07:00
Alexander Potapenko
d515684d78 ipv6: make sure to initialize sockc.tsflags before first use
In the case udp_sk(sk)->pending is AF_INET6, udpv6_sendmsg() would
jump to do_append_data, skipping the initialization of sockc.tsflags.
Fix the problem by moving sockc.tsflags initialization earlier.

The bug was detected with KMSAN.

Fixes: c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:40:22 -07:00
David S. Miller
1b33c0d2a1 Merge branch 'fjes-fixes'
YASUAKI ISHIMATSU says:

====================
fjes: Do not load fjes driver

The fjes driver is used only by FUJITSU servers and almost of all
servers in the world never use it. But currently if ACPI PNP0C02
is defined in the ACPI table, the following message is always shown:

 "FUJITSU Extended Socket Network Device Driver - version 1.2
  - Copyright (c) 2015 FUJITSU LIMITED"

The message makes users confused because there is no reason that
the message is shown in other vendor servers.

To avoid the confusion, the patch adds several checks.

v3:
  - Rebase on latest net tree.
  - Add _STA method check to avoid loading fjes driver.

v2:
  - Order local variable declarations from longest to shortest line
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:38:17 -07:00
Yasuaki Ishimatsu
2b396d3026 fjes: Do not load fjes driver if extended socket device is not power on.
The extended device socket cannot turn on/off while system is running.
So when system boots up and the device is not power on, the fjes driver
does not need be loaded.

To check the status of the device, the patch adds ACPI _STA method check.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
CC: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:38:17 -07:00
Yasuaki Ishimatsu
ac23d3cac1 fjes: Do not load fjes driver if system does not have extended socket device.
The fjes driver is used only by FUJITSU servers and almost of all
servers in the world never use it. But currently if ACPI PNP0C02
is defined in the ACPI table, the following message is always shown:

 "FUJITSU Extended Socket Network Device Driver - version 1.2
  - Copyright (c) 2015 FUJITSU LIMITED"

The message makes users confused because there is no reason that
the message is shown in other vendor servers.

To avoid the confusion, the patch adds a check that the server
has a extended socket device or not.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
CC: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:38:17 -07:00
Thierry Reding
2d72d5016f net: stmmac: Use AVB mode by default
Prior to the recent multi-queue changes the driver would configure the
queues to use the AVB mode, but the mode then got switched to DCB. The
hardware still works fine in DCB mode, but my testing capabilities are
limited, so it's safer to revert to the prior setting anyway.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:15:15 -07:00
Thierry Reding
33e85b8dd6 net: stmmac: Restore DT backwards-compatibility
Recent changes to support multiple queues in the device tree bindings
resulted in the number of RX and TX queues to be initialized to zero for
device trees not adhering to the new bindings.

Restore backwards-compatibility with those device trees by falling back
to a single RX and TX queues each.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:15:15 -07:00
Thierry Reding
f39768744f net: stmmac: Always enable MAC RX queues
The MAC RX queues always need to be enabled in order to receive network
packets. Remove the condition that this only needs to be done for multi-
queue configurations.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:15:14 -07:00
David S. Miller
efad54a115 Merge branch 'mlx5-fixes'
Saeed Mahameed says:

====================
Mellanox mlx5 fixes 2017-03-21

This series contains some mlx5 core and ethernet driver fixes.

For -stable:
net/mlx5e: Count LRO packets correctly (for kernel >= 4.2)
net/mlx5e: Count GSO packets correctly (for kernel >= 4.2)
net/mlx5: Increase number of max QPs in default profile (for kernel >= 4.0)
net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps (for kernel >= 4.10)
net/mlx5e: Use the proper UAPI values when offloading TC vlan actions (for kernel >= v4.9)
net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured (for kernel >= 4.10)
net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch (for kernel >= 4.10)
net/mlx5: Add missing entries for set/query rate limit commands (for kernel >= 4.8)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:14 -07:00
Gal Pressman
8ab7e2ae15 net/mlx5e: Count LRO packets correctly
RX packets statistics ('rx_packets' counter) used to count LRO packets
as one, even though it contains multiple segments.
This patch will increment the counter by the number of segments, and
align the driver with the behavior of other drivers in the stack.

Note that no information is lost in this patch due to 'rx_lro_packets'
counter existence.

Before, ethtool showed:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
     rx_packets: 435277
     rx_lro_packets: 35847
     rx_packets_phy: 1935066

Now, we will see the more logical statistics:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
     rx_packets: 1935066
     rx_lro_packets: 35847
     rx_packets_phy: 1935066

Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00
Gal Pressman
d3a4e4da54 net/mlx5e: Count GSO packets correctly
TX packets statistics ('tx_packets' counter) used to count GSO packets
as one, even though it contains multiple segments.
This patch will increment the counter by the number of segments, and
align the driver with the behavior of other drivers in the stack.

Note that no information is lost in this patch due to 'tx_tso_packets'
counter existence.

Before, ethtool showed:
$ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets"
     tx_packets: 61340
     tx_tso_packets: 60954
     tx_packets_phy: 2451115

Now, we will see the more logical statistics:
$ ethtool -S ens6 | egrep "tx_packets|tx_tso_packets"
     tx_packets: 2451115
     tx_tso_packets: 60954
     tx_packets_phy: 2451115

Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00
Maor Gottlieb
5f40b4ed97 net/mlx5: Increase number of max QPs in default profile
With ConnectX-4 sharing SRQs from the same space as QPs, we hit a
limit preventing some applications to allocate needed QPs amount.
Double the size to 256K.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00
Paul Blakey
1ad9a00ae0 net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
This was added to allow the TC offloading code to identify offloading
encap/decap vxlan rules.

The VF reps are effectively related to the same mlx5 PCI device as the
PF. Since the kernel invokes the (say) delete ndo for each netdev, the
FW erred on multiple vxlan dst port deletes when the port was deleted
from the system.

We fix that by keeping the registration to be carried out only by the
PF. Since the PF serves as the uplink device, the VF reps will look
up a port there and realize if they are ok to offload that.

Tested:
 <SETUP VFS>
 <SETUP switchdev mode to have representors>
 ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport 9999
 ip link set vxlan1 up
 ip link del dev vxlan1

Fixes: 4a25730eb2 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00
Or Gerlitz
09c91ddf2c net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
Currently we use the non UAPI values and we miss erring on
the modify action which is not supported, fix that.

Fixes: 8b32580df1 ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00
Roi Dayan
375f51e2b5 net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured
Changing the eswitch inline mode can potentially cause already configured
flows not to match the policy. E.g. set policy L4, add some L4 rules,
set policy to L2 --> bad! Hence we disallow it.

Keep track of how many offloaded rules are now set and refuse
inline mode changes if this isn't zero.

Fixes: bffaa91658 ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:12 -07:00
Or Gerlitz
d85cdccbb3 net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch
Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch
offloading use case, and push the latter into the e-switch code. This provides
better separation and is to be used in down-stream patch for applying a fix.

Fixes: bffaa91658 ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:12 -07:00
Or Gerlitz
1f30a86c58 net/mlx5: Add missing entries for set/query rate limit commands
The switch cases for the rate limit set and query commands were
missing, which could get us wrong under fw error or driver reset
flow, fix that.

Fixes: 1466cc5b23 ('net/mlx5: Rate limit tables support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:12 -07:00
David S. Miller
bf601fe52e wireless-drivers fixes for 4.11
iwlwifi
 
 * fix a user reported warning in DQA
 
 mwifiex
 
 * fix a potential double free
 * fix lost early debug logs
 * fix init wakeup warning message from device framework
 * add Ganapathi and Xinming as maintainers
 
 ath10k
 
 * fix regression with QCA6174 during resume and firmware crash
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJY0S5DAAoJEG4XJFUm622bYoYH/AxiLF0UleRgeHeQAQd4d1d0
 QsISkT+Xm1p43P3GQ9qoKocHd0S0GvWCqzHC2HAeDmLFAUf5gfo3hhGqEvTVEvCo
 eUiNjIJyezNcvo9JJwmB/P8O2nVJH9tUtSISwKfnvG/XT4olSJfE+vPi0edOXG4s
 IvDdO8AiuoZHoP11utNtMsg4vnA1uvy75ti20fe69PX5g9G/+gcDBkFHh/IpcfML
 GSRZaZjKxsaWKnSUimPJlJsvsJE+mcsXD2rPImEiinQbR2yTfkAcmdxaQR9fkVSP
 1FdT4MvWoxrK4sUaPSn3ZO/eVzNeDXi9bQPGQwBpq6rUZp7r9OPkdRqGWbi/hW8=
 =zzBr
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2017-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.11

iwlwifi

* fix a user reported warning in DQA

mwifiex

* fix a potential double free
* fix lost early debug logs
* fix init wakeup warning message from device framework
* add Ganapathi and Xinming as maintainers

ath10k

* fix regression with QCA6174 during resume and firmware crash
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:06:49 -07:00
Reshetova, Elena
4c355cdfbb net: convert sk_filter.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:06:08 -07:00
Tobias Klauser
726bceca81 net: greth: Utilize of_get_mac_address()
Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:00:39 -07:00
Ying Xue
557d054c01 tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe
Until now, tipc_nametbl_unsubscribe() is called at subscriptions
reference count cleanup. Usually the subscriptions cleanup is
called at subscription timeout or at subscription cancel or at
subscriber delete.

We have ignored the possibility of this being called from other
locations, which causes deadlock as we try to grab the
tn->nametbl_lock while holding it already.

   CPU1:                             CPU2:
----------                     ----------------
tipc_nametbl_publish
spin_lock_bh(&tn->nametbl_lock)
tipc_nametbl_insert_publ
tipc_nameseq_insert_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
                             tipc_close_conn
                             tipc_subscrb_release_cb
                             tipc_subscrb_delete
                             tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<<grab nametbl_lock again>>

   CPU1:                              CPU2:
----------                     ----------------
tipc_nametbl_stop
spin_lock_bh(&tn->nametbl_lock)
tipc_purge_publications
tipc_nameseq_remove_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
                             tipc_close_conn
                             tipc_subscrb_release_cb
                             tipc_subscrb_delete
                             tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<<grab nametbl_lock again>>

In this commit, we advance the calling of tipc_nametbl_unsubscribe()
from the refcount cleanup to the intended callers.

Fixes: d094c4d5f5 ("tipc: add subscription refcount to avoid invalid delete")
Reported-by: John Thompson <thompa.atl@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:59:16 -07:00
Felix Manlunas
58ad319834 liquidio: fix Coverity scan errors
Fix Coverity scan errors by not dereferencing lio->glists_dma_base pointer
if it's NULL.

See http://marc.info/?l=linux-netdev&m=149002294305614&w=2

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:48:34 -07:00
Gao Feng
cfc62d878f net: tcp: Permit user set TCP_MAXSEG to default value
When user_mss is zero, it means use the default value. But the current
codes don't permit user set TCP_MAXSEG to the default value.
It would return the -EINVAL when val is zero.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:45:13 -07:00
David S. Miller
b2a1674aa1 Merge branch 'ovs-sample-action-optimization'
Andy Zhou says:

====================
net-next sample action optimization v4

The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.

Patch 3 commit message has more details on the specific optimizations
implemented.

---
v3->v4: Enhance patch 4.
        Fix two bugs pointed out by Pravin,
        Remove 'is_sample' variable.

v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute().

v1->v2: Address Pravin's comment, Refactor recirc and sample
        to share more common code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:28:35 -07:00
andy zhou
bef7f7567a Openvswitch: Refactor sample and recirc actions implementation
Added clone_execute() that both the sample and the recirc
action implementation can use.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:28:35 -07:00
andy zhou
798c166173 openvswitch: Optimize sample action for the clone use cases
With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.

While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.

The main optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.

One related optimization attempts to avoid copying flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.

Another related optimization is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:28:35 -07:00
andy zhou
4572ef52a0 openvswitch: Refactor recirc key allocation.
The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well.  Refactor the logic into its own function clone_key().

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:28:35 -07:00
andy zhou
47c697aa2d openvswitch: Deferred fifo API change.
add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:28:34 -07:00
David S. Miller
29dd5ec094 Merge branch 'vrf-perf'
David Ahern says:

====================
net: vrf: performance improvements

Device based features for VRF such as qdisc, netfilter and packet
captures are implemented by switching the dst on skbuffs to its per-VRF
dst. This has the effect of controlling the output function which points
a function in the VRF driver. [1] The skb proceeds down the stack with
dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
network taps are evaluated based on this device. Finally, the skb makes
it to the vrf_xmit function which resets the dst based on a FIB lookup.

The feature comes at cost - between 5 and 10% depending on test (TCP vs
UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
lookup in the VRF driver for each packet sent through it. The FIB lookup
is required because the real dst gets dropped so that the skb can
traverse the stack with dst->dev set to the VRF device.

All of that is really driven by the qdisc and not replicating the
processing of __dev_queue_xmit if a qdisc is set up on the device. But,
VRF devices by default do not have a qdisc and really have no need for
multiple Tx queues. This means the performance overhead is inflicted upon
all users for the potential use case of a qdisc being configured.

The overhead can be avoided by checking if the default configuration
applies to a specific VRF device before switching the dst. If a device
does not have a qdisc, the pass through netfilter hooks and packet taps
can be done inline without dropping the dst and thus avoiding the
performance penalty. With this change performance overhead of VRF drops
to neglible (difference with run-over-run variance) to 3% depending on
test type.

netperf performance comparison for 3 cases:
1. L3_MASTER_DEVICE compiled out
2. VRF with this patch set
3. current VRF code

IPv4
----
           no-l3mdev     new-vrf     old-vrf
TCP_RR       28778        28938*       27169
TCP_CRR      10706        10490         9770
UDP_RR       30750        29813        29256

* Although higher in the final run used for submitting this patch set, I
  think what this really represents is a neglible performance overhead for
  VRF with this change (i.e, within the +-1% variance of runs). Most
  notably the FIB lookups in the Tx path are avoided for TCP_RR.

IPv6
----
           no-l3mdev     new-vrf     old-vrf
TCP_RR       29495        29432       27794
TCP_CRR      10520        10338        9870
UDP_RR       26137        27019*      26511

* UDP is consistently better with VRF for two reasons:
  1. Source address selection with L3 domains is considering fewer
     addresses since only addresses on interfaces in the domain are
     considered for the selection. Specifically, perf-top shows
     shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
     running much lower with vrf than without.

  2. The VRF table contains all routes (i.e, there are no separate local
     and main tables per VRF). That means ip6_pol_route_output only has 1
     lookup for VRF where it does 2 without it (1 in the local table and 1
     in the main table).

[1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:19:48 -07:00
David Ahern
a9ec54d1b0 net: vrf: performance improvements for IPv6
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv6 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) from a +3% improvement for UDP which has a lookup
per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
which connects a socket for each request-response.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:19:48 -07:00
David Ahern
dcdd43c41e net: vrf: performance improvements for IPv4
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv4 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
< 1% overhead for connected sockets that leverage early demux and
avoid FIB lookups.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:19:48 -07:00
Josh Hunt
a2d133b1d4 sock: introduce SO_MEMINFO getsockopt
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.

Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().

Suggested by Eric Dumazet.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:18:58 -07:00
Colin Ian King
c7cd4c9bf8 mlxsw: spectrum: fix swapped order of arguments packets and bytes
The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
in the wrong order. Fix this by swapping them.

Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")

Fixes: 7c1b8eb175 ("mlxsw: spectrum: Add support for TC flower offload statistics")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:59:12 -07:00
Xin Long
581947787e sctp: remove useless err from sctp_association_init
This patch is to remove the unnecessary temporary variable 'err' from
sctp_association_init.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:57:52 -07:00
Xin Long
1511949c61 sctp: declare struct sctp_stream before using it
sctp_stream_free uses struct sctp_stream as a param, but struct sctp_stream
is defined after it's declaration.

This patch is to declare struct sctp_stream before sctp_stream_free.

Fixes: a83863174a ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:57:52 -07:00
Arnd Bergmann
07fef36234 cpsw/netcp: cpts depends on posix_timers
With posix timers having become optional, we get a build error with
the cpts time sync option of the CPSW driver:

drivers/net/ethernet/ti/cpts.c: In function 'cpts_find_ts':
drivers/net/ethernet/ti/cpts.c:291:23: error: implicit declaration of function 'ptp_classify_raw';did you mean 'ptp_classifier_init'? [-Werror=implicit-function-declaration]

This adds a hard dependency on PTP_CLOCK to avoid the problem, as
building it without PTP support makes no sense anyway.

Fixes: baa73d9e47 ("posix-timers: Make them configurable")
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:56:42 -07:00
Arnd Bergmann
be9ca0d33c cpsw/netcp: work around reverse cpts dependency
The dependency is reversed: cpsw and netcp call into cpts,
but cpts depends on the other two in Kconfig. This can lead
to cpts being a loadable module and its callers built-in:

drivers/net/ethernet/ti/cpsw.o: In function `cpsw_remove':
cpsw.c:(.text.cpsw_remove+0xd0): undefined reference to `cpts_release'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_rx_handler':
cpsw.c:(.text.cpsw_rx_handler+0x2dc): undefined reference to `cpts_rx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_tx_handler':
cpsw.c:(.text.cpsw_tx_handler+0x7c): undefined reference to `cpts_tx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_ndo_stop':

As a workaround, I'm introducing another Kconfig symbol to
control the compilation of cpts, while making the actual
module controlled by a silent symbol that is =y when necessary.

Fixes: 6246168b4a ("net: ethernet: ti: netcp: add support of cpts")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:56:42 -07:00
Arjun Vynipadath
bb58d07964 cxgb4: Update IngPad and IngPack values
We are using the smallest padding boundary (8 bytes), which isn't
smaller than the Memory Controller Read/Write Size

We get best performance in 100G when the Packing Boundary is a multiple
of the Maximum Payload Size. Its related to inefficient chopping of DMA
packets by PCIe, that causes more overhead on bus. So driver is helping
by making the starting address alignment to be MPS size.

We will try to determine PCIE MaxPayloadSize capabiltiy  and set
IngPackBoundary based on this value. If cache line size is greater than
MPS or determinig MPS fails, we will use cache line size to determine
IngPackBoundary(as before).

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:53:49 -07:00
Arnd Bergmann
3588f29e06 net: dwc-xlgmac: add module license
When building the driver as a module, we get a warning about the
lack of a license:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
see include/linux/module.h for more information

Curiously the text in the .c files only mentions GPLv2+, while the license
tag in the PCI driver contains both GPL and BSD. I picked the license text
as the more definite reference here and put a GPL tag in there.

Fixes: 65e0ace2c5 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:52:34 -07:00
Arnd Bergmann
424fa00ea3 net: dwc-xlgmac: include dcbnl.h
Without this header, we can run into a build error:

drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
  prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,

Fixes: 65e0ace2c5 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:52:33 -07:00
David S. Miller
8fb106b28c Merge branch 'r8152-rx-settings'
Hayes Wang says:

====================
r8152: fix the rx settings of RTL8153

The RMS and the rx early size should base on the same rx size. However,
the RMS is set to 9K bytes now and the rx early depends on mtu. For using
the rx buffer effectively, sync the two settings according to the mtu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:50:37 -07:00
hayeswang
b20cb60e2b r8152: fix the rx early size of RTL8153
revert commit a59e6d8152 ("r8152: correct the rx early size") and
fix the rx early size as

	(rx buffer size - rx packet size - rx desc size - alignment) / 4

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:50:36 -07:00
hayeswang
210c4f70b4 r8152: set the RMS of RTL8153 according to the mtu
Set the received maximum size (RMS) according to the mtu size. It is
unnecessary to receive a packet which is more than the size we could
transmit. Besides, this could let the rx buffer be used effectively.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:50:36 -07:00
Roopa Prabhu
7b8f7a402d neighbour: fix nlmsg_pid in notifications
neigh notifications today carry pid 0 for nlmsg_pid
in all cases. This patch fixes it to carry calling process
pid when available. Applications (eg. quagga) rely on
nlmsg_pid to ignore notifications generated by their own
netlink operations. This patch follows the routing subsystem
which already sets this correctly.

Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 10:48:49 -07:00