Commit Graph

725080 Commits

Author SHA1 Message Date
Nathan Fontenot
09fb35ead5 ibmvnic: Don't handle RX interrupts when not up.
Initiating a kdump via the command line can cause a pending interrupt
to be handled by the ibmvnic driver when initializing the sub-CRQ
irqs during driver initialization.

NIP [d000000000ca34f0] ibmvnic_interrupt_rx+0x40/0xd0 [ibmvnic]
LR [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
Call Trace:
[c000000047fcfde0] [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
[c000000047fcfea0] [c00000000813317c] handle_irq_event_percpu+0x3c/0x90
[c000000047fcfee0] [c00000000813323c] handle_irq_event+0x6c/0xd0
[c000000047fcff10] [c0000000081385e0] handle_fasteoi_irq+0xf0/0x250
[c000000047fcff40] [c0000000081320a0] generic_handle_irq+0x50/0x80
[c000000047fcff60] [c000000008014984] __do_irq+0x84/0x1d0
[c000000047fcff90] [c000000008027564] call_do_irq+0x14/0x24
[c00000003c92af00] [c000000008014b70] do_IRQ+0xa0/0x120
[c00000003c92af50] [c000000008002594] hardware_interrupt_common+0x114/0x180

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 11:32:25 -05:00
Ganesh Goudar
4621ffd604 cxgb4: implement ndo_features_check
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 10:58:56 -05:00
Ganesh Goudar
d0a1299c6b cxgb4: add support for vxlan segmentation offload
add changes to t4_eth_xmit to enable vxlan segmentation
offload support.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 10:58:56 -05:00
Ganesh Goudar
846eac3fcc cxgb4: implement udp tunnel callbacks
Implement ndo_udp_tunnel_add and ndo_udp_tunnel_del
to support vxlan tunnelling.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 10:58:55 -05:00
Ganesh Goudar
ef0fd85aed cxgb4: add data structures to support vxlan
Add data structures and macros to be used in vxlan
offload.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 10:58:55 -05:00
David S. Miller
c5e62a2427 Merge branch 'sfc-support-25G-configuration-with-ethtool'
Edward Cree says:

====================
sfc: support 25G configuration with ethtool

Adds support for advertise bits beyond the 32-bit legacy masks, and plumbs in
 translation of the new 25/50/100G bits to/from MCDI.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:23:39 -05:00
Edward Cree
5abb5e7f91 sfc: add bits for 25/50/100G supported/advertised speeds
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:23:38 -05:00
Edward Cree
c2ab85d2da sfc: support the ethtool ksettings API properly so that 25/50/100G works
Store and handle ethtool link mode masks within the driver instead of
 just a single u32.  However, quite a significant amount of existing code
 wants to manipulate the masks directly, and thus now uses the first
 unsigned long (i.e. mask[0]) as though it were a legacy u32 mask.  This
 is ok because all the bits that code is interested in are in the first
 32 bits of the mask; but it might be a good idea to change them in
 future to use the proper bitmap API.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:23:38 -05:00
Edward Cree
702b3d5136 sfc: basic MCDI mapping of 25/50/100G link speeds
Only handles direct speed setting, not autoneg, because the driver is
 still trying to pretend it uses the legacy ethtool API which doesn't
 have advertised/supported bits for 25/50/100G.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:23:38 -05:00
David S. Miller
c92342b0bf Merge branch 'mlxsw-qdisc-refactoring'
Jiri Pirko says:

====================
mlxsw qdisc refactoring

This patchset refactors the qdisc handling in mlxsw driver in order to make
it more object oriented like.
It helps readability, laying the groundwork for the offloading of
additional qdiscs by the driver
This patchset also makes the qdiscs statistics more generic.

Patch 1 moves the qdiscs declaration to the spectrum_qdisc.c
Patches 2-3 clean the offloaded stats requests. Patch 2 changes the RED
generic stats struct to be sharable by other offloaded qdiscs. Patch 3
changes the xstats request to be like the stats. Note that these patches
are outside the driver scope.
Patches 4-5 clean the statistics related functions and structs within the
driver.
Patches 6-7 decrease the need for the same parameters to be sent to many
functions.
Patches 8-11 create a functions pointers struct, to make the qdiscs
handling more object oriented like.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:42 -05:00
Nogah Frankel
56202ca4ed mlxsw: spectrum: qdiscs: Remove qdisc before setting a new one
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9cf6c9c758 mlxsw: spectrum: qdiscs: Create a generic replace function
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9a37a59f71 mlxsw: spectrum: qdiscs: Create a generic destroy function
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
562ffbc4b3 mlxsw: spectrum: qdiscs: Add an ops struct
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
cba7158ff1 mlxsw: spectrum: qdiscs: Unite all handle checks
Every qdisc op gets the qdisc handle ID as well as its location.  Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
d56c89550b mlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
c2ed6db765 mlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
4d1a4b8473 mlxsw: spectrum: qdiscs: Clean qdisc statistics structs
Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f8253df553 net: sch: red: Change offloaded xstats to be incremental
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f34b4aac46 net: sch: red: Change the name of the stats struct to be generic
Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
371b437a32 mlxsw: spectrum: qdiscs: Move qdisc's declarations to its designated file
Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Ido Schimmel
d016e13d80 mlxsw: spectrum: Fix typo in firmware upgrade message
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:05:00 -05:00
Wei Yongjun
809a79e913 tcp: make local function tcp_recv_timestamp static
Fixes the following sparse warning:

net/ipv4/tcp.c:1736:6: warning:
 symbol 'tcp_recv_timestamp' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:55:35 -05:00
Wei Yongjun
e213f5b6dd net/mlx5e: fix error return code in mlx5e_alloc_rq()
Fix to return a negative error code from the xdp_rxq_info_reg() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 0ddf543226 ("xdp/mlx5: setup xdp_rxq_info")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:54:36 -05:00
Arjun Vynipadath
ea0a42109a cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and
the extant logic would flag that as an error. This was already fixed in
cxgb4 driver with "92ddcc7 cxgb4: Fix some small bugs in
t4_sge_init_soft() when our Page Size is 64KB".

Original Work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:53:58 -05:00
Stephen Rothwell
1125b00871 tuntap: fix for "tuntap: XDP transmission"
Fixes: fc72d1d54d ("tuntap: XDP transmission")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:52:49 -05:00
Jesper Dangaard Brouer
fd3ba21478 net: fix xdp_rxq_info build issue when CONFIG_SYSFS is not set
The commit e817f85652 ("xdp: generic XDP handling of xdp_rxq_info")
removed some ifdef CONFIG_SYSFS in net/core/dev.c, but forgot to
remove the corresponding ifdef's in include/linux/netdevice.h.

Fixes: e817f85652 ("xdp: generic XDP handling of xdp_rxq_info")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:37:00 -05:00
Andrew Lunn
fee2d54641 net: phy: marvell: mv88e6390 temperature sensor reading
The internal PHYs in the mv88e6390 switch have a temperature sensor.
It uses a different register layout to other PHY currently supported.
It also has an errata, in that some reads of the sensor result in bad
values. So a number of reads need to be made, and the average taken.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:35:05 -05:00
David S. Miller
0c959a2e86 Merge branch 'net-create-dynamic-software-irq-moderation-library'
Andy Gospodarek says:

====================
net: create dynamic software irq moderation library

This converts the dynamic interrupt moderation library from the mlx5e
driver into a library so it can be used by any driver.  The penultimate
patch in this set adds support for this new dynamic interrupt moderation
library in the bnxt_en driver and the last patch creates an entry in the
MAINTAINERS file for this library.

The main purpose of this code is to allow an administrator to make sure
that default coalesce settings are optimized for low latency, but
quickly adapt to handle high throughput/bulk traffic by altering how
much time passes before popping an interrupt.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for
their comments, etc on this set.

v4: Fix build breakage for VF representers noticed by kbuild test robot.
Thanks for being so courteous, kbuild test robot!

v3: bnxt_en fix from Michael Chan, comment suggestion from Vasundhara
Volam, and small mlx5e header file fix from Tal Gilboa.

v2: Spelling fixes from Stephen Hemminger, bnxt_en suggestions from
Michael Chan, spelling and formatting fixes from Or Gerlitz, and
spelling and mlx5e changes suggested by Tal Gilboa.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:28:15 -05:00
Andy Gospodarek
f4e5f0ea7c MAINTAINERS: add entry for Dynamic Interrupt Moderation
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:45 -05:00
Andy Gospodarek
6a8788f256 bnxt_en: add support for software dynamic interrupt moderation
This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:45 -05:00
Andy Gospodarek
8115b750db net/dim: use struct net_dim_sample as arg to net_dim
Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:45 -05:00
Andy Gospodarek
4c4dbb4a73 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
9a31742531 net/mlx5e: Change Mellanox references in DIM code
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
b9c872f231 net/mlx5e: Move generic functions to new file
These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f5e7f67d9b net/mlx5e: Move AM logic enums
More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
138968e997 net/mlx5e: Remove rq references in mlx5e_rx_am
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f58ee099f3 net/mlx5e: Move interrupt moderation forward declarations
Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
98dd1edffc net/mlx5e: Move interrupt moderation structs to new file
Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
David S. Miller
8448f91fcd Merge branch 'ipv6-Add-support-for-non-equal-cost-multipath'
Ido Schimmel says:

====================
ipv6: Add support for non-equal-cost multipath

This set aims to add support for IPv6 non-equal-cost multipath routes.
The first three patches convert multipath selection to use the
hash-threshold method (RFC 2992) instead of modulo-N. The same method is
employed by the IPv4 routing code since commit 0e884c78ee ("ipv4: L3
hash-based multipath").

Unlike modulo-N, with hash-threshold only the flows near the region
boundaries are affected when a nexthop is added or removed. In addition,
it allows us to easily add support for non-equal-cost multipath in the
last patch by sizing the different regions according to the provided
weights.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:14:45 -05:00
Ido Schimmel
398958ae48 ipv6: Add support for non-equal-cost multipath
The use of hash-threshold instead of modulo-N makes it trivial to add
support for non-equal-cost multipath.

Instead of dividing the multipath hash function's output space equally
between the nexthops, each nexthop is assigned a region size which is
proportional to its weight.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:14:44 -05:00
Ido Schimmel
3d709f69a3 ipv6: Use hash-threshold instead of modulo-N
Now that each nexthop stores its region boundary in the multipath hash
function's output space, we can use hash-threshold instead of modulo-N
in multipath selection.

This reduces the number of checks we need to perform during lookup, as
dead and linkdown nexthops are assigned a negative region boundary. In
addition, in contrast to modulo-N, only flows near region boundaries are
affected when a nexthop is added or removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:14:44 -05:00
Ido Schimmel
7696c06a18 ipv6: Use a 31-bit multipath hash
The hash thresholds assigned to IPv6 nexthops are in the range of
[-1, 2^31 - 1], where a negative value is assigned to nexthops that
should not be considered during multipath selection.

Therefore, in a similar fashion to IPv4, we need to use the upper
31-bits of the multipath hash for multipath selection.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:14:44 -05:00
Ido Schimmel
d7dedee184 ipv6: Calculate hash thresholds for IPv6 nexthops
Before we convert IPv6 to use hash-threshold instead of modulo-N, we
first need each nexthop to store its region boundary in the hash
function's output space.

The boundary is calculated by dividing the output space equally between
the different active nexthops. That is, nexthops that are not dead or
linkdown.

The boundaries are rebalanced whenever a nexthop is added or removed to
a multipath route and whenever a nexthop becomes active or inactive.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:14:44 -05:00
Jason Wang
e2b3b35eb9 vhost_net: batch used ring update in rx
This patch tries to batched used ring update during RX. This is pretty
fit for the case when guest is much faster (e.g dpdk based
backend). In this case, used ring is almost empty:

- we may get serious cache line misses/contending on both used ring
  and used idx.
- at most 1 packet could be dequeued at one time, batching in guest
  does not make much effect.

Update used ring in a batch can help since guest won't access the used
ring until used idx was advanced for several descriptors and since we
advance used ring for every N packets, guest will only need to access
used idx for every N packet since it can cache the used idx. To have a
better interaction for both batch dequeuing and dpdk batching,
VHOST_RX_BATCH was used as the maximum number of descriptors that
could be batched.

Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
E5-2630 connected back to back through ixgbe. Traffic were generated
on one remote ixgbe through MoonGen and measure the RX pps through
testpmd in guest when do xdp_redirect_map from local ixgbe to
tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
improvement).

One possible concern for this is the implications for TCP (especially
latency sensitive workload). Result[1] does not show obvious changes
for most of the netperf test (RR, TX, and RX). And we do get some
improvements for RX on some specific size.

Guest RX:

size/sessions/+thu%/+normalize%
   64/     1/   +2%/   +2%
   64/     2/   +2%/   -1%
   64/     4/   +1%/   +1%
   64/     8/    0%/    0%
  256/     1/   +6%/   -3%
  256/     2/   -3%/   +2%
  256/     4/  +11%/  +11%
  256/     8/    0%/    0%
  512/     1/   +4%/    0%
  512/     2/   +2%/   +2%
  512/     4/    0%/   -1%
  512/     8/   -8%/   -8%
 1024/     1/   -7%/  -17%
 1024/     2/   -8%/   -7%
 1024/     4/   +1%/    0%
 1024/     8/    0%/    0%
 2048/     1/  +30%/  +14%
 2048/     2/  +46%/  +40%
 2048/     4/    0%/    0%
 2048/     8/    0%/    0%
 4096/     1/  +23%/  +22%
 4096/     2/  +26%/  +23%
 4096/     4/    0%/   +1%
 4096/     8/    0%/    0%
16384/     1/   -2%/   -3%
16384/     2/   +1%/   -4%
16384/     4/   -1%/   -3%
16384/     8/    0%/   -1%
65535/     1/  +15%/   +7%
65535/     2/   +4%/   +7%
65535/     4/    0%/   +1%
65535/     8/    0%/    0%

TCP_RR:

size/sessions/+thu%/+normalize%
    1/     1/    0%/   +1%
    1/    25/   +2%/   +1%
    1/    50/   +4%/   +1%
   64/     1/    0%/   -4%
   64/    25/   +2%/   +1%
   64/    50/    0%/   -1%
  256/     1/    0%/    0%
  256/    25/    0%/    0%
  256/    50/   +4%/   +2%

Guest TX:

size/sessions/+thu%/+normalize%
   64/     1/   +4%/   -2%
   64/     2/   -6%/   -5%
   64/     4/   +3%/   +6%
   64/     8/    0%/   +3%
  256/     1/  +15%/  +16%
  256/     2/  +11%/  +12%
  256/     4/   +1%/    0%
  256/     8/   +5%/   +5%
  512/     1/   -1%/   -6%
  512/     2/    0%/   -8%
  512/     4/   -2%/   +4%
  512/     8/   +6%/   +9%
 1024/     1/   +3%/   +1%
 1024/     2/   +3%/   +9%
 1024/     4/    0%/   +7%
 1024/     8/    0%/   +7%
 2048/     1/   +8%/   +2%
 2048/     2/   +3%/   -1%
 2048/     4/   -1%/  +11%
 2048/     8/   +3%/   +9%
 4096/     1/   +8%/   +8%
 4096/     2/    0%/   -7%
 4096/     4/   +4%/   +4%
 4096/     8/   +2%/   +5%
16384/     1/   -3%/   +1%
16384/     2/   -1%/  -12%
16384/     4/   -1%/   +5%
16384/     8/    0%/   +1%
65535/     1/    0%/   -3%
65535/     2/   +5%/  +16%
65535/     4/   +1%/   +2%
65535/     8/   +1%/   -1%

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:04:30 -05:00
David S. Miller
65d51f2682 mlx5-updates-2018-01-08
Four patches from Or that add Hairpin support to mlx5:
 ===========================================================
 From:  Or Gerlitz <ogerlitz@mellanox.com>
 
 We refer the ability of NIC HW to fwd packet received on one port to
 the other port (also from a port to itself) as hairpin. The application API
 is based
 on ingress tc/flower rules set on the NIC with the mirred redirect
 action. Other actions can apply to packets during the redirect.
 
 Hairpin allows to offload the data-path of various SW DDoS gateways,
 load-balancers, etc to HW. Packets go through all the required
 processing in HW (header re-write, encap/decap, push/pop vlan) and
 then forwarded, CPU stays at practically zero usage. HW Flow counters
 are used by the control plane for monitoring and accounting.
 
 Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
 All the flows that share <recv NIC, mirred NIC> are redirected through
 the same hairpin pair. Currently, only header-rewrite is supported as a
 packet modification action.
 
 I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
 functionality
 on HW simulator, before it was avail in the FW so the driver code could be
 tested early.
 ===========================================================
 
 From Feras three patches that provide very small changes that allow IPoIB
 to support RX timestamping for child interfaces, simply by hooking the mlx5e
 timestamping PTP ioctl to IPoIB child interface netdev profile.
 
 One patch from Gal to fix a spilling mistake.
 
 Two patches from Eugenia adds drop counters to VF statistics
 to be reported as part of VF statistics in netlink (iproute2) and
 implemented them in mlx5 eswitch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaVF5WAAoJEEg/ir3gV/o+fRkH/0PxjwJRA3REqhi/H8HOdH9f
 cBLrOzFdqTCYQWQFCLFbMQ/Zgoel3KglpJ0iQMjuVFfjMbybVXOe8FAEVdbWHnfL
 C+2HRMe8dplKrsq5UkxJhbyKhFKhl2XeMFYWonw9dSM7Nz5DyowQ1y1r5SgMlMAv
 t3mYAIa4kZHK18BjDoIsCoAXXwsHiztR2irMp5+DwataTGP7vC7AsrucDxLA/qFf
 I3E15DZk9s1f53PUuY7CYnUnJfMMP3VJdxpyx4k6xt9J2IMuilF4YyD6wpAKsVQU
 /LzRkWI9x/6QindffqlrACeeidimOeY4pC4txIhS5uXgFXulugDHq1/Ih1sgZS8=
 =g5vr
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2018-01-08

Four patches from Or that add Hairpin support to mlx5:
===========================================================
From:  Or Gerlitz <ogerlitz@mellanox.com>

We refer the ability of NIC HW to fwd packet received on one port to
the other port (also from a port to itself) as hairpin. The application API
is based
on ingress tc/flower rules set on the NIC with the mirred redirect
action. Other actions can apply to packets during the redirect.

Hairpin allows to offload the data-path of various SW DDoS gateways,
load-balancers, etc to HW. Packets go through all the required
processing in HW (header re-write, encap/decap, push/pop vlan) and
then forwarded, CPU stays at practically zero usage. HW Flow counters
are used by the control plane for monitoring and accounting.

Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
All the flows that share <recv NIC, mirred NIC> are redirected through
the same hairpin pair. Currently, only header-rewrite is supported as a
packet modification action.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
functionality
on HW simulator, before it was avail in the FW so the driver code could be
tested early.
===========================================================

From Feras three patches that provide very small changes that allow IPoIB
to support RX timestamping for child interfaces, simply by hooking the mlx5e
timestamping PTP ioctl to IPoIB child interface netdev profile.

One patch from Gal to fix a spilling mistake.

Two patches from Eugenia adds drop counters to VF statistics
to be reported as part of VF statistics in netlink (iproute2) and
implemented them in mlx5 eswitch.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:57:19 -05:00
David S. Miller
45f8982253 Merge branch 'hns3-next'
Peng Li says:

====================
code improvements in HNS3 driver

This patchset fixes 2 comments for community review.
[patch 1/2] reverts "net: hns3: Add packet statistics of netdev"
reported by Jakub Kicinski and David Miller.
[patch 2/2] reports the function type the same line with
hns3_nic_get_stats64, reported by Andrew Lunn.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:55:52 -05:00
Peng Li
6c88d9d7ee net: hns3: report the function type the same line with hns3_nic_get_stats64
The function type should be on the same line with the function
name, or it may cause display error if a patch edit the
function. There is am example following:
https://www.spinics.net/lists/netdev/msg476141.html

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:55:51 -05:00
Peng Li
bf909456f6 Revert "net: hns3: Add packet statistics of netdev"
This reverts commit 8491000754.

It is duplicate to add statistics of netdev for ethtool -S.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:55:51 -05:00
David S. Miller
68d5c2653b Merge branch 'Socionext-Synquacer-NETSEC-driver'
Jassi Brar says:

====================
Socionext Synquacer NETSEC driver

Changes since v5
	# Removed helper macros
	# Removed 'inline' qualifier
	# Changed multiline empty comment to single line
	# Added 'clock-names' property in DT binding example
	# Ignore 'clock-names' property in driver until f/ws in the wild are
	  upgraded or we support instance that take in more than one clock.
	# Rebased the patchset onto net-next

Changes since v4
        # Fixed ucode indexing as a word, instead of byte
        # Removed redundant clocks, keep only phy rate reference clock
          and expect it to be 'phy_ref_clk'

Changes since v3
        # Discard 'socionext,snq-mdio', and simply use 'mdio' subnode.
        # Use ioremap on ucode region as well, instead of memremap.

Changes since v2
        # Use 'mdio' subnode in DT bindings.
        # Use phy_interface_mode_is_rgmii(), instead of open coding the check.
        # Use readl/b with eeprom_base pointer.
        # Unregister mdio bus upon failure in probe.

Changes since v1
        # Switched from using memremap to ioremap
        # Implemented ndo_do_ioctl callback
        # Defined optional 'dma-coherent' DT property
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:50:30 -05:00