Commit Graph

855580 Commits

Author SHA1 Message Date
YueHaibing
566495de16 net: mediatek: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
4237678846 net: dsa: bcm_sf2: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
291f4b6de4 net: dsa: b53: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
6551c8c807 net: dsa: lantiq: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
3230a55b36 mvpp2: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
Nikolay Aleksandrov
3247b27204 net: bridge: mcast: add delete due to fast-leave mdb flag
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:13:40 -04:00
Lucas Bates
0eba31ef5c tc-testing: Clarify the use of tdc's -d option
The -d command line argument to tdc requires the name of a physical device
on the system where the tests will be run. If -d has not been used, tdc
will skip tests that require a physical device.

This patch is intended to better document what the -d option does and how
it is used.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 18:54:34 -04:00
David S. Miller
21947f467c mlx5-updates-2019-07-29
This series includes updates to mlx5 driver,
 1) Simplifications, cleanup and warning prints improvements
 
 2) From Vlad Buslov:
 Refactor mlx5 tc flow handling for unlocked execution (Part 1)
 
 Currently, all cls API hardware offloads driver callbacks require caller
 to hold rtnl lock when calling them. Cls API has already been updated to
 update software filters in parallel (on classifiers that support
 unlocked execution), however hardware offloads code still obtains rtnl
 lock before calling driver tc callbacks. This set implements partial
 support for unlocked execution that is leveraged by follow up
 refactorings in specific mlx5 tc subsystems and patch to cls API that
 implements API that allows drivers to register their callbacks as
 rtnl-unlocked.
 
 In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
 represent tc filter. Currently, code the structure itself and its
 handlers in both tc and eswitch layers do not implement any kind of
 synchronizations and assume external global synchronization provided by
 rtnl lock instead. Implement following changes to remove dependency on
 rtnl lock in flow handling code that are intended to be used a
 groundwork for following changes to provide fully rtnl-independent mlx5
 tc:
 
 - Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
   allow concurrent access from multiple tc and neigh update workqueue
   instances without introducing any additional locks specific to the
   structure. Its 'flags' field type is changed to atomic bitmask ops which
   is necessary for tc to interact with other concurrent tc instances or
   concurrent neigh update that need to skip flows that are not fully
   initialized (new INIT_DONE flow flag) and can change the flags
   according to neighbor state (flipping OFFLOADED flag).
 
 - Protect unready flows list by new uplink_priv->unready_flows_lock
   mutex.
 
 - Convert calls to netdev APIs that require rtnl lock in flow handling
   code to their rcu counterparts.
 
 - Modify eswitch code that is called from tc layer and assume implicit
   external synchronization to be concurrency safe: change
   esw->offloads.num_flows type to atomic integer and re-arrange
   esw->state_lock usage to protect additional data.
 
 Some of approaches to synchronizations presented in this patch set are
 quite complicated (lockless concurrent usage of data structures with rcu
 and reference counting, using fine-grained locking when necessary, retry
 mechanisms to handle concurrent insertion of another instance of data
 structure with same key, etc.). This is necessary to allow calling the
 firmware in parallel in most cases, which is the main motivation of this
 change since firmware calls are mach heavier operation than atomic
 operations, multitude of locks and potential multiple retries during
 concurrent accesses to same elements.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0/g+oACgkQSD+KveBX
 +j7vqAf/TXz9uGKkXR1EHfRMcrWIjQH6M3zKkrwo2K3wTwssi3ymCNfZH+VYkUwk
 LMrVnBQxHHPjuhUfmYfXbswVZiGm5Mul0zFNoeWChvcaEbQzsisHXCNNtzQFy9p2
 zQUmmvmvNNFQ0KPuwGuK/0rbUmXM3tXqWs2mtBsRLHxZUvqpGh7hETCFvZ0TnBaO
 EeFrWV6h8rfynBsRVeDuFGk5hDoo/ASEbWVTsocpPm6UPhIwKyH/Tbv4+wKZgZfr
 p/yPKSjwtQw9jPXj/lYxRXBbbZXey+BEptIurmwh3d8MLyrM4VElP6rgaoAb2Wzq
 ov5pnEGR5qsAINkOnImBZ4Mw4HN9WQ==
 =UKN2
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-07-29

This series includes updates to mlx5 driver,
1) Simplifications, cleanup and warning prints improvements

2) From Vlad Buslov:
Refactor mlx5 tc flow handling for unlocked execution (Part 1)

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. Cls API has already been updated to
update software filters in parallel (on classifiers that support
unlocked execution), however hardware offloads code still obtains rtnl
lock before calling driver tc callbacks. This set implements partial
support for unlocked execution that is leveraged by follow up
refactorings in specific mlx5 tc subsystems and patch to cls API that
implements API that allows drivers to register their callbacks as
rtnl-unlocked.

In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
represent tc filter. Currently, code the structure itself and its
handlers in both tc and eswitch layers do not implement any kind of
synchronizations and assume external global synchronization provided by
rtnl lock instead. Implement following changes to remove dependency on
rtnl lock in flow handling code that are intended to be used a
groundwork for following changes to provide fully rtnl-independent mlx5
tc:

- Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
  allow concurrent access from multiple tc and neigh update workqueue
  instances without introducing any additional locks specific to the
  structure. Its 'flags' field type is changed to atomic bitmask ops which
  is necessary for tc to interact with other concurrent tc instances or
  concurrent neigh update that need to skip flows that are not fully
  initialized (new INIT_DONE flow flag) and can change the flags
  according to neighbor state (flipping OFFLOADED flag).

- Protect unready flows list by new uplink_priv->unready_flows_lock
  mutex.

- Convert calls to netdev APIs that require rtnl lock in flow handling
  code to their rcu counterparts.

- Modify eswitch code that is called from tc layer and assume implicit
  external synchronization to be concurrency safe: change
  esw->offloads.num_flows type to atomic integer and re-arrange
  esw->state_lock usage to protect additional data.

Some of approaches to synchronizations presented in this patch set are
quite complicated (lockless concurrent usage of data structures with rcu
and reference counting, using fine-grained locking when necessary, retry
mechanisms to handle concurrent insertion of another instance of data
structure with same key, etc.). This is necessary to allow calling the
firmware in parallel in most cases, which is the main motivation of this
change since firmware calls are mach heavier operation than atomic
operations, multitude of locks and potential multiple retries during
concurrent accesses to same elements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 18:48:01 -04:00
YueHaibing
6a7ce95d75 staging/octeon: Fix build error without CONFIG_NETDEVICES
While do COMPILE_TEST build without CONFIG_NETDEVICES,
we get Kconfig warning:

WARNING: unmet direct dependencies detected for PHYLIB
  Depends on [n]: NETDEVICES [=n]
  Selected by [y]:
  - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=n] || COMPILE_TEST [=y])

Reported-by: Hulk Robot <hulkci@huawei.com>
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: 171a9bae68 ("staging/octeon: Allow test build on !MIPS")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 09:07:05 -07:00
David S. Miller
ac5fe22636 We have a reasonably large number of changes:
* lots more HE (802.11ax) support, particularly things
    relevant for the the AP side, but also mesh support
  * debugfs cleanups from Greg
  * some more work on extended key ID
  * start using genl parallel_ops, as preparation for
    weaning ourselves off RTNL and getting parallelism
  * various other changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BtsAACgkQB8qZga/f
 l8QTdQ//bD3NLitiIW4rBmC/w9str2TRpUbRN8Hf9xTnRh1smLGQop8jI8xpYdAE
 hXFlWn4glCkTkZrqtT8IDMcERb2YPHI94L7lMR1L6dXH7e1tiBZ7Qum5+rojLUCQ
 /XXZJnVa9AbdzBDtkhzCjkN8dCldMnkId0m6vkGdFre/b70FLDg2GlolqIchbDCz
 GxeKaw/uTLwu9nxEJhspDmiXDQtQd1ZsGNZRrovQ2nUuUfvX9sU54OmaoDHhqrUM
 iSwFZJAhrCV7nidOLs2X1rVs8hRymbrYnGu6NdUlfTQMqaOVgWhqniM1RMnJnnVi
 Ec/1OQg4KIj2rjJIXW/3QXAzAO6TDwV8xNk9al3m+yAkkSXR7tXP01plNUoMU4UW
 YxcaPD341H919GIqa71lieeEMyi7XIKKFfaZvGFLiDzidKKi0JkDuC+1w5UOJrbj
 PM/dzvKqvwOMEa5XahjrUQLsiGMgxBkDo36+F4rfaelzDKPuJBOokDS23vg/lYf5
 2v74WbeJVc45cAhSwp5/0RTsG0xhmFImfrE66VNB1vcJUVLvHTuATfXsBbRahOz/
 SglGBAGqigxJwIeYfgec3lPNfdV+0LwYnPAEjBIjCO8qlGvst4jLroHg7ayApr2E
 JfUbaWtpEb9cgAz7X6tEjv/BAnsRWPzpxb8Nm6JoZttTtA2VakE=
 =eUnh
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a reasonably large number of changes:
 * lots more HE (802.11ax) support, particularly things
   relevant for the the AP side, but also mesh support
 * debugfs cleanups from Greg
 * some more work on extended key ID
 * start using genl parallel_ops, as preparation for
   weaning ourselves off RTNL and getting parallelism
 * various other changes all over
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:59:41 -07:00
David S. Miller
164f0de315 Merge branch 'mlxsw-Test-coverage-for-DSCP-leftover-fix'
Petr Machata says:

====================
mlxsw: Test coverage for DSCP leftover fix

This patch set fixes some global scope pollution issues in the DSCP tests
(in patch #1), and then proceeds (in patch #2) to add a new test for
checking whether, after DSCP prioritization rules are removed from a port,
DSCP rewrites consistently to zero, instead of the last removed rule still
staying in effect.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:47:14 -07:00
Petr Machata
d11786bb96 selftests: mlxsw: Add a test for leftover DSCP rule
Commit dedfde2fe1 ("mlxsw: spectrum_dcb: Configure DSCP map as the last
rule is removed") fixed a problem in mlxsw where last DSCP rule to be
removed remained in effect when DSCP rewrite was applied.

Add a selftest that covers this problem.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:47:13 -07:00
Petr Machata
7700476f31 selftests: mlxsw: Fix local variable declarations in DSCP tests
These two tests have some problems in the global scope pollution and on
contrary, contain unnecessary local declarations. Fix them.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:47:13 -07:00
Ding Xiang
7084148854 myri10ge: remove unneeded variable
"error" is unneeded,just return 0

Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:44:00 -07:00
Christophe JAILLET
a9d41e7b8b net: ag71xx: Slighly simplify code in 'ag71xx_rings_init()'
A few lines above, we have:
   tx_size = BIT(tx->order);

So use 'tx_size' directly to be consistent with the way 'rx->descs_cpu' and
'rx->descs_dma' are computed below.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:38:17 -07:00
Shay Bar
f39b07fdfb mac80211: HE STA disassoc due to QOS NULL not sent
In case of HE AP-STA link, ieee80211_send_nullfunc() will not
send the QOS NULL packet to check if AP is still associated.

In this case, probe_send_count will be non-zero and
ieee80211_sta_work() will later disassociate the AP, even
though no packet was ever sent.

Fix this by decrementing probe_send_count and not calling
ieee80211_send_nullfunc() in case of HE link, so that we
still wait for some time for the AP beacon to reappear and
don't disconnect right away.

Signed-off-by: Shay Bar <shay.bar@celeno.com>
Link: https://lore.kernel.org/r/20190703131848.22879-1-shay.bar@celeno.com
[clarify commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 13:26:41 +02:00
John Crispin
1ced169cc1 mac80211: allow setting spatial reuse parameters from bss_conf
Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or
when a station connects to an AP. This allows the driver to configure the
HW accordingly.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-3-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
6d4dd4ef1a nl80211: add strict start type
Add a strict start type so all new attributes starting from
NL80211_ATTR_HE_OBSS_PD are validated strictly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
John Crispin
796e90f42b cfg80211: add support for parsing OBBS_PD attributes
Add the data structure, policy and parsing code allowing userland to send
the OBSS PD information into the kernel.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Karthikeyan Periyasamy
52dba8d7d5 mac80211: reject zero MAC address in add station
This came up in fuzz testing, and really we don't consider
all-zeroes to be a valid MAC address in most places, so
also reject it here to avoid confusion later on.

Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org>
Link: https://lore.kernel.org/r/1563959770-21570-1-git-send-email-periyasa@codeaurora.org
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:52 +02:00
Johannes Berg
50508d941c cfg80211: use parallel_ops for genl
Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.

Reviewed-By: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190729143109.18683-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 11:00:46 +02:00
Johannes Berg
05d610af3e mac80211_hwsim: fill boottime_ns in netlink RX path
Give a proper boottime_ns value for netlink RX to avoid scan
issues here.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20190729160605.1074-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 10:51:43 +02:00
Colin Ian King
f12cac539f mac80211: add missing null return check from call to ieee80211_get_sband
The return from ieee80211_get_sband can potentially be a null pointer, so
it seems prudent to add a null check to avoid a null pointer dereference
on sband.

Addresses-Coverity: ("Dereference null return")
Fixes: 2ab4587675 ("mac80211: add support for the ADDBA extension element")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20190730143205.14261-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-31 10:51:17 +02:00
David S. Miller
5133f36cef Merge branch 'net-dsa-ksz-Add-Microchip-KSZ87xx-support'
Marek Vasut says:

====================
net: dsa: ksz: Add Microchip KSZ87xx support

This series adds support for Microchip KSZ87xx switches, which are
slightly simpler compared to KSZ9xxx .
====================

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
Tristram Ha
e66f840c08 net: dsa: ksz: Add Microchip KSZ8795 DSA driver
Add Microchip KSZ8795 DSA driver.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
Tristram Ha
016e43a26b net: dsa: ksz: Add KSZ8795 tag code
Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler
and the tag is only 1 byte, instead of 2 as is the case with KSZ9477.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
Marek Vasut
4c173472d0 dt-bindings: net: dsa: ksz: document Microchip KSZ87xx family switches
Document Microchip KSZ87xx family switches. These include
KSZ8765 - 5 port switch
KSZ8794 - 4 port switch
KSZ8795 - 5 port switch

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:12:50 -07:00
David S. Miller
c69e6eafff Merge branch 'vsock-virtio-optimizations-to-increase-the-throughput'
Stefano Garzarella says:

====================
vsock/virtio: optimizations to increase the throughput

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v5:
- rebased all patches on net-next
- added Stefan's R-b and Michael's A-b

v4: https://patchwork.kernel.org/cover/11047717
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Michael suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106

                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56

        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5

32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93

[1] https://github.com/stefano-garzarella/iperf/
====================

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
0038ff357f vsock/virtio: change the maximum packet size allowed
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
6dbd3e66e7 vhost/vsock: split packets to send using multiple buffers
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
9632e9f61b vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
b89d882dc9 vsock/virtio: reduce credit update messages
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stefano Garzarella
473c7391ce vsock/virtio: limit the memory used per-socket
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 15:00:00 -07:00
Stephen Boyd
d1a55841ab net: Remove dev_err() usage after platform_get_irq()
We don't need dev_err() messages when platform_get_irq() fails now that
platform_get_irq() prints an error message itself when something goes
wrong. Let's remove these prints with a simple semantic patch.

// <smpl>
@@
expression ret;
struct platform_device *E;
@@

ret =
(
platform_get_irq(E, ...)
|
platform_get_irq_byname(E, ...)
);

if ( \( ret < 0 \| ret <= 0 \) )
{
(
-if (ret != -EPROBE_DEFER)
-{ ...
-dev_err(...);
-... }
|
...
-dev_err(...);
)
...
}
// </smpl>

While we're here, remove braces on if statements that only have one
statement (manually).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:37:35 -07:00
David S. Miller
2d73a6c38d Merge branch 'Finish-conversion-of-skb_frag_t-to-bio_vec'
Jonathan Lemon says:

====================
Finish conversion of skb_frag_t to bio_vec

The recent conversion of skb_frag_t to bio_vec did not include
skb_frag's page_offset.  Add accessor functions for this field,
utilize them, and remove the union, restoring the original structure.

v2:
  - rename accessors
  - follow kdoc conventions
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:32 -07:00
Jonathan Lemon
65c84f148e linux: Remove bvec page_offset, use bv_offset
Now that page_offset is referenced through accessors, remove
the union, and use bv_offset.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:32 -07:00
Jonathan Lemon
b54c9d5bd6 net: Use skb_frag_off accessors
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:32 -07:00
Jonathan Lemon
7240b60c98 linux: Add skb_frag_t page_offset accessors
Add skb_frag_off(), skb_frag_off_add(), skb_frag_off_set(),
and skb_frag_off_copy() accessors for page_offset.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:21:31 -07:00
David S. Miller
6ca04afbf9 Merge branch 'sctp-clean-up-sctp_connect-function'
Xin Long says:

====================
sctp: clean up __sctp_connect function

This patchset is to factor out some common code for
sctp_sendmsg_new_asoc() and __sctp_connect() into 2
new functioins.

v1->v2:
  - add the patch 1/5 to avoid a slab-out-of-bounds warning.
  - add some code comment for the check change in patch 2/5.
  - remove unused 'addrcnt' as Marcelo noticed in patch 3/5.
====================

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
a64e59c72c sctp: factor out sctp_connect_add_peer
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it adds a peer with the other addr into the
asoc after this asoc is created with the 1st addr.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
f26f995122 sctp: factor out sctp_connect_new_asoc
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it creates the asoc and adds a peer with the
1st addr.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
dd8378b3af sctp: clean up __sctp_connect
__sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc.
To factor out common functions, this patch is to clean up their code
to make them look more similar:

  1. create the asoc and add a peer with the 1st addr.
  2. add peers with the other addrs into this asoc one by one.

while at it, also remove the unused 'addrcnt'.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
f40f1177c3 sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
sctp_inet_connect(), the latter has done addr_size check with size
of sa_family_t.

In the next patch to clean up __sctp_connect(), we will remove
addr_size check with size of sa_family_t from __sctp_connect()
for the 1st address.

So before doing that, __sctp_setsockopt_connectx() should do
this check first, as sctp_inet_connect() does.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
Xin Long
4c31bc6b1e sctp: only copy the available addr data in sctp_transport_init
'addr' passed to sctp_transport_init is not always a whole size
of union sctp_addr, like the path:

  sctp_sendmsg() ->
  sctp_sendmsg_new_asoc() ->
  sctp_assoc_add_peer() ->
  sctp_transport_new() -> sctp_transport_init()

In the next patches, we will also pass the address length of data
only to sctp_assoc_add_peer().

So sctp_transport_init() should copy the only available data from
addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
cause slab-out-of-bounds.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 14:18:14 -07:00
David Howells
1db88c5343 rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto
rxkad sometimes triggers a warning about oversized stack frames when
building with clang for a 32-bit architecture:

net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=]
net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=]

The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in
rxkad_verify_packet()/rxkad_secure_packet() with the relatively large
scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt().

The warning does not show up when using gcc, which does not inline the
functions as aggressively, but the problem is still the same.

Allocate the cipher buffers from the slab instead, caching the allocated
packet crypto request memory used for DATA packet crypto in the rxrpc_call
struct.

Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30 10:32:35 -07:00
Vlad Buslov
b6fac0b46a net/mlx5e: Protect tc flow table with mutex
TC flow table is created when first flow is added, and destroyed when last
flow is removed. This assumes that all accesses to the table are externally
synchronized with rtnl lock. To remove dependency on rtnl lock, add new
mutex mlx5e_tc_table->t_lock and use it to protect the flow table.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:26 -07:00
Vlad Buslov
fa833bd52b net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev
Function netdev_master_upper_dev_get() generates warning if caller doesn't
hold rtnl lock. Modify rules update path to use rcu version of that
function.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:26 -07:00
Vlad Buslov
0e18134f4f net/mlx5e: Eswitch, use state_lock to synchronize vlan change
esw->state_lock is already used to protect vlan vport configuration change.
However, all preparation and correctness checks, and code that sets vport
data are not protected by this lock and assume external synchronization by
rtnl lock. In order to remove dependency on rtnl lock, extend
esw->state_lock protection to whole eswitch vlan add/del functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:26 -07:00
Vlad Buslov
525e84bea5 net/mlx5e: Eswitch, change offloads num_flows type to atomic64
Eswitch implements its own locking by means of state_lock mutex and
multiple fine-grained lock in containing data structures, and is supposed
to not rely on rtnl lock. However, eswitch offloads num_flows type is a
regular long long integer and cannot be modified concurrently. This is an
implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other
means). In order to remove implicit dependency on rtnl lock, change
num_flows type to atomic64 to allow concurrent modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:25 -07:00
Vlad Buslov
ad86755b18 net/mlx5e: Protect unready flows with dedicated lock
In order to remove dependency on rtnl lock for protecting unready_flows
list when reoffloading unready flows on workqueue, extend representor
uplink private structure with dedicated 'unready_flows_lock' mutex. Take
the lock in all users of unready_flows list before accessing it. Implement
helper functions to add and delete unready flow.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-29 16:40:25 -07:00