This operation is required for handling ioctl commands like SIOCGMIIREG,
when debugging MDIO registers from userspace.
This commit adds support for this operation.
Signed-off-by: Romain Perier <romain.perier@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta says:
====================
Hisilicon Network Subsystem 3 Ethernet Driver
This patch-set contains the support of the HNS3 (Hisilicon Network Subsystem 3)
Ethernet driver for hip08 family of SoCs and future upcoming SoCs.
Hisilicon's new hip08 SoCs have integrated ethernet based on PCI Express and
hence there was a need of new driver over the previous HNS driver which is
already part of the Linux mainline. This new driver is NOT backward
compatible with HNS.
This current driver is meant to control the Physical Function and there would
soon be a support of a separate driver for Virtual Function once this base PF
driver has been accepted. Also, this driver is the ongoing development work and
HNS3 Ethernet driver would be incrementally enhanced with more new features.
High Level Architecture:
[ Ethtool ]
^ |
| |
[Ethernet Client] [ODP/UIO Client] . . . [ RoCE Client ]
| |
[ HNAE Device ] |
| |
--------------------------------------------- |
| |
[ HNAE3 Framework (Register/unregister) ] |
| |
--------------------------------------------- |
| |
[ HCLGE Layer] |
________________|_________________ |
| | | |
[ MDIO ] [ Scheduler/Shaper ] [ Debugfs* ] |
| | | |
|________________|_________________| |
| |
[ IMP command Interface ] |
--------------------------------------------- |
HIP08 H A R D W A R E *
Current patch-set broadly adds the support of the following PF functionality:
1. Basic Rx and Tx functionality
2. TSO support
3. Ethtool support
4. * Debugfs support -> this patch for now has been taken off.
5. HNAE framework and hardware compatability layer
6. Scheduler and Shaper support in transmit function
7. MDIO support
Change Log:
V5->V6: Addressed below comments:
* Andrew Lunn: Comments on MDIO and ethtool link mode
* Leon Romanvosky: Some comments on HNAE layer tidy-up
* Internal comments on redundant code removal, fixing error types etc.
V4->V5: Addressed below concerns:
* Florian Fanelli: Miscellaneous comments on ethtool & enet layer
* Stephen Hemminger: comment of Netdev stats in ethool layer
* Leon Romanvosky: Comments on Driver Version String, naming & Kconfig
* Rochard Cochran: Redundant function prototype
V3->V4: Addressed below comments:
* Andrew Lunn: Various comments on MDIO, ethtool, ENET driver etc,
* Stephen Hemminger: change access and updation to 64 but statistics
* Bo You: some spelling mistakes and checkpatch.pl errors.
V2->V3: Addressed comments
* Yuval Mintz: Removal of redundant userprio-to-tc code
* Stephen Hemminger: Ethtool & interuupt enable
* Andrew Lunn: On C45/C22 PHy support, HNAE, ethtool
* Florian Fainelli: C45/C22 and phy_connect/attach
* Intel kbuild errors
V1->V2: Addressed some comments by kbuild, Yuval MIntz, Andrew Lunn &
Florian Fainelli in the following patches:
* Add support of HNS3 Ethernet Driver for hip08 SoC
* Add MDIO support to HNS3 Ethernet driver for hip08 SoC
* Add support of debugfs interface to HNS3 driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the MAINTAINERS file with HNS3 Ethernet driver
maintainers names and other details. This also introduces the new
Makefiles required to build the HNS3 Ethernet driver and updates
the existing Kconfig file in the hisilicon folder.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of the Ethtool interface to
the HNS3 Ethernet driver. Various commands to read the
statistics, configure the offloading, loopback selftest etc.
are supported.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of MDIO bus interface for HNS3 driver.
Code provides various interfaces to start and stop the PHY layer
and to read and write the MDIO bus or PHY.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
THis patch adds the support of the Scheduling and Shaping
functionalities during the transmit leg. This also adds the
support of Pause at MAC level. (Pause at per-priority level
shall be added later along with the DCB feature).
Hardware as such consists of two types of cofiguration of 6 level
schedulers. Algorithms varies according to the level and type
of scheduler being used. Current patch is used to initialize
the mapping, algorithms(like SP, DWRR etc) and shaper(CIR, PIR etc)
being used.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of Hisilicon Network Subsystem Accceleration
Engine and common operations to access it. This layer provides access to the
hardware configuration, hardware statistics. This layer is also
responsible for triggering the initialization of the PHY layer through
the below MDIO layer.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of IMP (Integrated Management Processor)
command interface to the HNS3 driver.
Each PF/VF has support of CQP(Command Queue Pair) ring interface.
Each CQP consis of send queue CSQ and receive queue CRQ.
There are various commands a PF/VF may support, like for Flow Table
manipulation, Device management, Packet buffer allocation, Forwarding,
VLANs config, Tunneling/Overlays etc.
This patch contains code to initialize the command queue, manage the
command queue descriptors and Rx/Tx protocol with the command processor
in the form of various commands/results and acknowledgements.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of the HNAE3 (Hisilicon Network
Acceleration Engine 3) framework support to the HNS3 driver.
Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
and user-space Ethernet drivers (like ODP etc.) to register with HNAE3
devices and their associated operations.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of Hisilicon Network Subsystem 3
Ethernet driver to hip08 family of SoCs.
This driver includes basic Rx/Tx functionality. It also includes
the client registration code with the HNAE3(Hisilicon Network
Acceleration Engine 3) framework.
This work provides the initial support to the hip08 SoC and
would incrementally add features or enhancements.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: remove typedefs from structures part 4
As we know, typedef is suggested not to use in kernel, even checkpatch.pl
also gives warnings about it. Now sctp is using it for many structures.
All this kind of typedef's using should be removed. This patchset is the
part 4 to remove it for another 14 basic structures from linux/sctp.h.
After this patchset, all typedefs are cleaned in linux/sctp.h.
Just as the part 1-3, No any code's logic would be changed in these patches,
only cleaning up.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_auth_chunk_t, and
replace with struct sctp_auth_chunk in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_authhdr_t, and
replace with struct sctp_authhdr in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_addip_chunk_t, and
replace with struct sctp_addip_chunk in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_addiphdr_t, and
replace with struct sctp_addiphdr in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_addip_param_t, and
replace with struct sctp_addip_param in the places where it's
using this typedef.
It is to use sizeof(variable) instead of sizeof(type), and
also fix some indent problems.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove this typedef including the struct, there is even no places
using it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_cwrhdr_t, and
replace with struct sctp_cwrhdr in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_ecne_chunk_t, and
replace with struct sctp_ecne_chunk in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_ecnehdr_t, and
replace with struct sctp_ecnehdr in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_error_t, and replace
with enum sctp_error in the places where it's using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_operr_chunk_t, and
replace with struct sctp_operr_chunk in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_errhdr_t, and replace
with struct sctp_errhdr in the places where it's using this
typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to fix the name of struct sctp_shutdown_chunk_t
, replace with struct sctp_initack_chunk in the places where
it's using it.
It is also to fix some indent problem.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to remove the typedef sctp_shutdownhdr_t, and
replace with struct sctp_shutdownhdr in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen says:
====================
ibmvnic: Improve ethtool functionality
This patch series improves ibmvnic ethtool functionality by adding support
for ethtool -l and -g options, correcting existing statistics reporting,
and augmenting the existing statistics with counters for each tx and rx
queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement .get_channels (ethtool -l) functionality
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement .get_ringparam (ethtool -g) functionality
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vnic server reports the statistics buffer in big endian format and must
be converted to cpu endian in order to be displayed correctly on little
endian lpars.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add counters to report number of packets, bytes, and dropped packets for
each transmit queue and number of packets, bytes, and interrupts for each
receive queue. Modify ethtool callbacks to report the new statistics.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 45f119bf93 ("tcp: remove header prediction") introduced a
minor bug: the sk_state_change() and sk_wake_async() notifications for
a completed active connection happen twice: once in this new spot
inside tcp_finish_connect() and once in the existing code in
tcp_rcv_synsent_state_process() immediately after it calls
tcp_finish_connect(). This commit remoes the duplicate POLL_OUT
notifications.
Fixes: 45f119bf93 ("tcp: remove header prediction")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's been a while now since ds->dst is not an array anymore, but a
simple pointer to a dsa_switch_tree.
Fortunately, SF2 does not support multi-chip and thus ds->index is
always 0.
This patch substitutes 'ds->dst[ds->index].' with 'ds->dst->'.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add const to bin_attribute structure as it is only passed to the
functions sysfs_{remove/create}_bin_file. The corresponding
arguments are of type const, so declare the structure to be const.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS over IB does not use multipath RDS, so the array
of additional rds_conn_path structures is always superfluous
in this case. Reduce the memory footprint of the rds module
by making this a dynamic allocation predicated on whether
the transport is mp_capable.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: Efrain Galaviz <efrain.galaviz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's convenient to init ipip offload. We will check
the return value, and print KERN_CRIT info on failure.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Save the ifindex before it gets zeroed so the invalid
ifindex can be printed out.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- Remove unnecessary length qualifier, by Joe Perches
- Remove too short %pM field width, by Sven Eckelmann
- Remove return value handling from skb_put_data, by Sven Eckelmann
- Spelling fixes, by Colin Ian King
- Convert batman-adv.txt to reStructuredText, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlmB364WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoUL9EADc1EFFdHxbJdLbVL/FjkDkrwpL
60exCnDNLZ06g2+XhViA/Lo5VMPjJTYvNVciG4RbphtZPGwpnXLO1mM+1EntzHLl
Bkvd8pMS8SbhUIlJ4Ua8ret3GoIf2FLz3tEOr/LGO/aJ5iEOS1N9msI1nYK12E6V
4pgLJiHztoLkoEnsCaB30iF/jxlCXKE+AG2LjD65zUuG95DPR0XGGkgdP0qomlvh
kaSG7G4kVCQEsS2W+6TLqC+CKoO3uGGCF1wc4KDD3wgbUU0YCmqhzehrtstCku66
ksNavOy0e0iA3bURo6md/aWWUyOV6wK2uV6QE5ef1gStgXQKYwXa6MzaHVRu8XYZ
SrzfwKQRFDZXzouHTNJCNSeNhrPGKXPs6JSjeQDR1hzwZ0e+5xOct/mgp2VgHhqP
v4xs0ZFcjOWPZ52Yy0kY0r/f4AKTwS20DJLSQaKEST1E0m4rlzpNUxi+/4T1wUSD
LFtXjf9IonS1Weo6Ro5v6x2db4tXuX6pwmlCpfYcAdAK1FFyKIG7HHWw4UV7s85P
5nNbZP/v6K8DsGVVD3I/HEIeoZyi2DnPzYgFeV8pMJy6gYggnu/axdgmd7mgDL3J
aCEaL3rvSbkmnmq6QG/pC0VwXmTR9j945uEBjGICmPq1nzV2rvUt7KvEv1MWBH28
Qv5VAUe8XsIiwKMgFA==
=3oIU
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20170802' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Remove unnecessary length qualifier, by Joe Perches
- Remove too short %pM field width, by Sven Eckelmann
- Remove return value handling from skb_put_data, by Sven Eckelmann
- Spelling fixes, by Colin Ian King
- Convert batman-adv.txt to reStructuredText, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When deal with low and high throughput, it is hard to achiece both
high performance and low latency. In order to achiece that, this patch
calculates the rx rate, and adjust the interrupt coalesce parameter
accordingly.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Tested-by: Weiwei Deng <dengweiwei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
null_x25_address is only used to access the string it contains, so it can
be const.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
ipv4: fib: Provide per-nexthop offload indication
Ido says:
Offload indication for IPv4 routes is currently set in the FIB info's
flags. When multipath routes are employed, this can lead to a route being
marked as offloaded although only one of its nexthops is actually
offloaded.
Instead, this patchset aims to proivde a higher resolution for the offload
indication and report it on a per-nexthop basis.
Example output from patched iproute:
$ ip route show 192.168.200.0/24
192.168.200.0/24
nexthop via 192.168.100.2 dev enp3s0np7 weight 1 offload
nexthop via 192.168.101.3 dev enp3s0np8 weight 1
And once the second gateway is resolved:
$ ip route show 192.168.200.0/24
192.168.200.0/24
nexthop via 192.168.100.2 dev enp3s0np7 weight 1 offload
nexthop via 192.168.101.3 dev enp3s0np8 weight 1 offload
First patch teaches the kernel to look for the offload indication in the
nexthop flags. Patches 2-5 adjust current capable drivers to provide
offload indication on a per-nexthop basis. Last patch removes no longer
used functions to set offload indication in the FIB info's flags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patches converted users of these functions to provide offload
indication using the nexthop's flags instead of the FIB info's.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we provide offload indication using the nexthop's flags we must
refresh the offload indication whenever the offload state within the
group changes.
This didn't matter until now, as offload indication was provided using
the FIB info flags and multipath routes were marked as offloaded as long
as one of the nexthops was offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch removed the reliance on the counter in the FIB info to
set the offload indication, so we no longer need to keep an offload
state on each FIB entry and can just set or unset the RTNH_F_OFFLOAD
flag in each nexthop.
This is also necessary because we're going to need to refresh the
offload indication whenever the nexthop group associated with the FIB
entry is refreshed. Current check would prevent us from marking a newly
resolved nexthop as offloaded if the FIB entry is already marked as
offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to previous patch, use the nexthop flags to provide
offload indication instead of the FIB info's flags.
In case a nexthop in a multipath route can't be offloaded (gateway's MAC
can't be resolved, for example), then its offload flag isn't set.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to stop using the FIB info's flags to provide the offlaod
indication and instead do that on a per-nexthop basis.
Convert rocker to do just that. It only supports one nexthop per-route,
so conversion is simple.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're going to have capable drivers indicate route offload using the
nexthop flags, but for non-multipath routes these flags aren't dumped to
user space.
Instead, set the offload indication in the route message flags.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'trans->tid' is only assigned later in the function, resulting in a zero
transaction ID. Use 'tid' instead.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
netvsc: transparent VF support
This patch set changes how SR-IOV Virtual Function devices are managed
in the Hyper-V network driver. This version is rebased onto current net-next.
Background
In Hyper-V SR-IOV can be enabled (and disabled) by changing guest settings
on host. When SR-IOV is enabled a matching PCI device is hot plugged and
visible on guest. The VF device is an add-on to an existing netvsc
device, and has the same MAC address.
How is this different?
The original support of VF relied on using bonding driver in active
standby mode to handle the VF device.
With the new netvsc VF logic, the Linux hyper-V network
virtual driver will directly manage the link to SR-IOV VF device.
When VF device is detected (hot plug) it is automatically made a
slave device of the netvsc device. The VF device state reflects
the state of the netvsc device; i.e. if netvsc is set down, then
VF is set down. If netvsc is set up, then VF is brought up.
Packet flow is independent of VF status; all packets are sent and
received as if they were associated with the netvsc device. If VF is
removed or link is down then the synthetic VMBUS path is used.
What was wrong with using bonding script?
A lot of work went into getting the bonding script to work on all
distributions, but it was a major struggle. Linux network devices
can be configured many, many ways and there is no one solution from
userspace to make it all work. What is really hard is when
configuration is attached to synthetic device during boot (eth0) and
then the same addresses and firewall rules needs to also work later if
doing bonding. The new code gets around all of this.
How does VF work during initialization?
Since all packets are sent and received through the logical netvsc
device, initialization is much easier. Just configure the regular
netvsc Ethernet device; when/if SR-IOV is enabled it just
works. Provisioning and cloud init only need to worry about setting up
netvsc device (eth0). If SR-IOV is enabled (even as a later step), the
address and rules stay the same.
What devices show up?
Both netvsc and PCI devices are visible in the system. The netvsc
device is active and named in usual manner (eth0). The PCI device is
visible to Linux and gets renamed by udev to a persistent name
(enP2p3s0). The PCI device name is now irrelevant now.
The logic also sets the PCI VF device SLAVE flag on the network
device so network tools can see the relationship if they are smart
enough to understand how layered devices work.
This is a lot like how I see Windows working.
The VF device is visible in Device Manager, but is not configured.
Is there any performance impact?
There is no visible change in performance. The bonding
and netvsc driver both have equivalent steps.
Is it compatible with old bonding script?
It turns out that if you use the old bonding script, then everything
still works but in a sub-optimum manner. What happens is that bonding
is unable to steal the VF from the netvsc device so it creates a one
legged bond. Packet flow then is:
bond0 <--> eth0 <- -> VF (enP2p3s0).
In other words, if you get it wrong it still works, just
awkward and slower.
What if I add address or firewall rule onto the VF?
Same problems occur with now as already occur with bonding, bridging,
teaming on Linux if user incorrectly does configuration onto
an underlying slave device. It will sort of work, packets will come in
and out but the Linux kernel gets confused and things like ARP don’t
work right. There is no way to block manipulation of the slave
device, and I am sure someone will find some special use case where
they want it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer needed, now all managed by transparent VF logic.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some background documentation on netvsc device options
and limitations.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>