The current algorithm for deciding on the number of VRSS channels is
not optimal since we open up the min of number of CPUs online and the
number of VRSS channels the host is offering. So on a 32 VCPU guest
we could potentially open 32 VRSS subchannels. Experimentation has
shown that it is best to limit the number of VRSS channels to the number
of CPUs within a NUMA node.
Here is the new algorithm for deciding on the number of sub-channels we
would open up:
1) Pick the minimum of what the host is offering and what the driver
in the guest is specifying as the default value.
2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
node the primary channel is bound to.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the struct netvsc_stats has a member s_sync
of type u64_stats_sync.
This definition will break kernel build as the macro
netdev_alloc_pcpu_stats requires this member name to be syncp.
(see netdev_alloc_pcpu_stats definition in ./include/linux/netdevice.h)
This patch changes netvsc_stats's member name from s_sync to syncp to fix
the build break.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code does not lock anything when calculating the TX and RX stats.
As a result, the RX and TX data reported by ifconfig are not accuracy in a
system with high network throughput and multiple CPUs (in my test,
RX/TX = 83% between 2 HyperV VM nodes which have 8 vCPUs and 40G Ethernet).
This patch fixed the above issue by using per_cpu stats.
netvsc_get_stats64() summarizes TX and RX data by iterating over all CPUs
to get their respective stats.
This v2 patch addressed David's comments on the cleanup path when
netdev_alloc_pcpu_stats() failed.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Four minor merge conflicts:
1) qca_spi.c renamed the local variable used for the SPI device
from spi_device to spi, meanwhile the spi_set_drvdata() call
got moved further up in the probe function.
2) Two changes were both adding new members to codel params
structure, and thus we had overlapping changes to the
initializer function.
3) 'net' was making a fix to sk_release_kernel() which is
completely removed in 'net-next'.
4) In net_namespace.c, the rtnl_net_fill() call for GET operations
had the command value fixed, meanwhile 'net-next' adjusted the
argument signature a bit.
This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on the information given to this driver (via the xmit_more skb flag),
we can defer signaling the host if more packets are on the way. This will help
make the host more efficient since it can potentially process a larger batch of
packets. Implement this optimization.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit b56fc3c536 ("hv_netvsc: Fix a bug in netvsc_start_xmit()"),
skb variable is no longer used in netvsc_send. Remove variable and dead
code that depended on it.
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b08cc79155 eliminated memory
allocation in the packet send path:
"hv_netvsc: Eliminate memory allocation in the packet send path
The network protocol used to communicate with the host is the remote ndis (rndis)
protocol. We need to decorate each outgoing packet with a rndis header and
additional rndis state (rndis per-packet state). To manage this state, we
currently allocate memory in the transmit path. Eliminate this allocation by
requesting additional head room in the skb."
This commit introduced a bug since it did not account for the case if the skb
was cloned. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Introduce netif-msg to netvsc to control debug logging output
and keep msg_enable in netvsc_device_context so that it is
kept persistently.
2. Only call dump_rndis_message() when NETIF_MSG_RX_ERR or above
is specified in netvsc module debug param.
In non-debug mode, in current code, dump_rndis_message() will not
dump anything but it still initialize some local variables and
process the switch logic which is unnecessary, especially in
high network throughput situation.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If remaining space in a send buffer slot is too small for the whole message,
we only copy the RNDIS header and PPI data into send buffer, so we can batch
one more packet each time. It reduces the vmbus per-message overhead.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netvsc_start_xmit() we can handle packets which are scattered around not
more than MAX_PAGE_BUFFER_COUNT-2 pages. It is, however, easy to create a
packet which is not big in size but occupies more pages (e.g. if it uses frags
on compound pages boundaries). When we drop such packet it cases sender to try
resending it but in most cases it will try resending the same packet which will
also get dropped, this will cause the particular connection to stick. To solve
the issue we can try linearizing skb.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... which validly uses dev_kfree_skb_any() instead of dev_kfree_skb().
Setting ret to -EFAULT and -ENOMEM have no real meaning here (we need to set
it to anything but -EAGAIN) as we drop the packet and return NETDEV_TX_OK
anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the two places changed, we now use netvsc_xmit_completion() which properly
frees hv_netvsc_packet in or not in skb headroom.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sum of RNDIS msg and PPI struct sizes is used in multiple places, so we define
a macro for them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network protocol used to communicate with the host is the remote ndis (rndis)
protocol. We need to decorate each outgoing packet with a rndis header and
additional rndis state (rndis per-packet state). To manage this state, we
currently allocate memory in the transmit path. Eliminate this allocation by
requesting additional head room in the skb.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for embedding the rndis state and other packet state into
the skb, cleanup the test for freeing the skb.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vmbus_are_subchannels_present() also involves opening the channels, which
may be too early at this point. Checking for subchannels is not necessary here.
So this patch removes it. Subchannels will be opened when offer messages arrive.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, we can send out multiple RNDIS data packets in one send buffer
slot and one VMBus message. It reduces the overhead associated with VMBus messages.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for reporting the actual and maximum combined channels
count of the hv_netvsc driver via 'ethtool --show-channels'.
This required adding 'max_chn' to 'struct netvsc_device', and assigning
it 'rsscap.num_recv_que' in 'rndis_filter_device_add'. Now we can access
the combined maximum channel count via 'struct netvsc_device' in the
ethtool callback.
Signed-off-by: Andrew Schwartzmeyer <andrew@schwartzmeyer.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch fixes following sparse warnings:
netvsc.c:688:5: warning: symbol 'netvsc_copy_to_send_buf' was not declared. Should it be static?
rndis_filter.c:627:5: warning: symbol 'rndis_filter_set_offload_params' was not declared. Should it be static?
rndis_filter.c:702:5: warning: symbol 'rndis_filter_set_rss_param' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code frees the skb in EAGAIN case, in which the skb will be
retried from upper layer and used again.
Also, the existing code doesn't free send buffer slot in error case, because
there is no completion message for unsent packets.
This patch fixes these problems.
(Please also include this patch for stable trees. Thanks!)
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
return type of wait_for_completion_timeout is unsigned long not int, this
patch just fixes up the declarations.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
The changed names are union fields with the same size, so the existing code
still works. But, we now update these variables to the correct names.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds proper handling of the vNIC hot removal event, which includes
a rescind-channel-offer message from the host side that triggers vNIC close and
removal. In this case, the notices to the host during close and removal is not
necessary because the channel is rescinded. This patch blocks these unnecessary
messages, and lets vNIC removal process complete normally.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vfree() function performs also input parameter validation.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the host uses packet encapsulation feature, the MTU may be reduced by the
host due to headroom reservation for encapsulation. This patch handles this
new MTU value.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow the workload spreading via vRSS for IPv6.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.
[Request to include this patch to the Stable branches]
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case that the IP header has optional field at the end, this patch will
get the port numbers after that field, and compute the hash. The general
parser skb_flow_dissect() is used here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the packet is successfully sent, we should not touch the packet
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.c
Both r8152 and nfnetlink conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
After the packet is successfully sent, we should not touch the skb
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
In this version of the patch I have fixed issues pointed out by David.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Long Li <longli@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We try to call free_netvsc_device(net_device) when "net_device" is NULL.
It leads to an Oops.
Fixes: f90251c8a6 ('hyperv: Increase the buffer length for netvsc_channel_cb()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the buffer is too small for a packet from VMBus, a bigger buffer will be
allocated in netvsc_channel_cb() and retry reading the packet from VMBus.
Increasing this buffer size will reduce the retry overhead.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WS2008R2 is a supported platform and it turns out that the maximum sendbuf
size that ws2008R2 can support is only 15MB. Make the necessary
adjustment.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intel did some benchmarking on our network throughput when Linux on Hyper-V
is as used as a gateway. This fix gave us almost a 1 Gbps additional throughput
on about 5Gbps base throughput we hadi, prior to increasing the sendbuf size.
The sendbuf mechanism is a copy based transport that we have which is clearly
more optimal than the copy-free page flipping mechanism (for small packets).
In the forwarding scenario, we deal only with MTU sized packets,
and increasing the size of the senbuf area gave us the additional performance.
For what it is worth, Windows guests on Hyper-V, I am told use similar sendbuf
size as well.
The exact value of sendbuf I think is less important than the fact that it needs
to be larger than what Linux can allocate as physically contiguous memory.
Thus the change over to allocating via vmalloc().
We currently allocate 16MB receive buffer and we use vmalloc there for allocation.
Also the low level channel code has already been modified to deal with physically
dis-contiguous memory in the ringbuffer setup.
Based on experimentation Intel did, they say there was some improvement in throughput
as the sendbuf size was increased up to 16MB and there was no effect on throughput
beyond 16MB. Thus I have chosen 16MB here.
Increasing the sendbuf value makes a material difference in small packet handling
In this version of the patch, based on David's feedback, I have added
additional details in the commit log.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return -ENOMEM from the kalloc error handling
case instead of 0.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have at least a netconsole to debug kernel issues on
Windows Azure this patch implements netpoll support.
Sending packets is easy, netvsc_start_xmit() does already everything
needed.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RNDIS_STATUS_NETWORK_CHANGE event is received after the Hyper-V host
sleep or hibernation. We refresh network at this time.
MS-TFS: 135162
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c25aaf814a: "hyperv: Enable sendbuf mechanism on the send path" added
some teardown code that looks like it was copied from the recieve path
above, but missed a variable name replacement.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It passes the hash value as the RNDIS Per-packet info to the Hyper-V host,
so that the send completion notices can be spread across multiple channels.
MS-TFS: 140273
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: get rid of SET_ETHTOOL_OPS
Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.
Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
- SET_ETHTOOL_OPS(dev, ops);
+ dev->ethtool_ops = ops;
Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>