Commit Graph

279 Commits

Author SHA1 Message Date
Michael S. Tsirkin
60302ff631 virtio: document queue state logic
commit d631b94e7a
    virtio: change comment in transmit

started clarifying the logic behind queue state management,
but introduced an inaccuracy: TX_BUSY does not cause
a BUG message.

Clean this up some more, explaining the tradeoffs in detail.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 16:44:24 -04:00
Li RongQing
faadb05f4b virtio: simplify the using of received in virtnet_poll
received is 0, no need to minus it and use "+=" to reassign it

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 12:37:17 -07:00
stephen hemminger
d631b94e7a virtio: change comment in transmit
The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:22:50 -04:00
Jason Wang
ab3971b1e7 virtio-net: correctly delete napi hash
We don't delete napi from hash list during module exit. This will
cause the following panic when doing module load and unload:

BUG: unable to handle kernel paging request at 0000004e00000075
IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0
PGD 3c5d5067 PUD 0
Oops: 0000 [#1] SMP
...
Call Trace:
[<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net]
[<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net]
[<ffffffff8139e667>] virtio_dev_probe+0x137/0x200
[<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250
[<ffffffff814c81d3>] __driver_attach+0x93/0xa0
[<ffffffff814c8140>] ? __device_attach+0x40/0x40
[<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0
[<ffffffff814c7a79>] driver_attach+0x19/0x20
[<ffffffff814c76f0>] bus_add_driver+0x170/0x220
[<ffffffffa0a60000>] ? 0xffffffffa0a60000
[<ffffffff814c894f>] driver_register+0x5f/0xf0
[<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30
[<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net]

This patch fixes this by doing this in virtnet_free_queues(). And also
don't delete napi in virtnet_freeze() since it will call
virtnet_free_queues() which has already did this.

Fixes 91815639d8 ("virtio-net: rx busy polling support")
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 14:37:17 -04:00
Linus Torvalds
53861af9a1 OK, this has the big virtio 1.0 implementation, as specified by OASIS.
On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to
 double-check the implementation.
 
 Then comes the inevitable fixes and cleanups from that work.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5
 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn
 YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH
 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq
 vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt
 X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7
 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I
 z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n
 JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx
 mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY
 Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78
 eI3FDUWsE36NqE5ECWmz
 =ivCe
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

  On top of tht is the major rework of lguest, to use PCI and virtio
  1.0, to double-check the implementation.

  Then comes the inevitable fixes and cleanups from that work"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
  virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
  virtio_net: unconditionally define struct virtio_net_hdr_v1.
  tools/lguest: don't use legacy definitions for net device in example launcher.
  virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
  tools/lguest: use common error macros in the example launcher.
  tools/lguest: give virtqueues names for better error messages
  tools/lguest: more documentation and checking of virtio 1.0 compliance.
  lguest: don't look in console features to find emerg_wr.
  tools/lguest: don't start devices until DRIVER_OK status set.
  tools/lguest: handle indirect partway through chain.
  tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: rename virtio_pci_cfg_cap field to match spec.
  tools/lguest: fix features_accepted logic in example launcher.
  tools/lguest: handle device reset correctly in example launcher.
  virtual: Documentation: simplify and generalize paravirt_ops.txt
  lguest: remove NOTIFY call and eventfd facility.
  lguest: remove NOTIFY facility from demonstration launcher.
  lguest: use the PCI console device's emerg_wr for early boot messages.
  lguest: always put console in PCI slot #1.
  ...
2015-02-18 09:24:01 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Vlad Yasevich
e3e3c423f8 Revert "drivers/net: Disable UFO through virtio"
This reverts commit 3d0ad09412.

Now that GSO functionality can correctly track if the fragment
id has been selected and select a fragment id if necessary,
we can re-enable UFO on tap/macvap and virtio devices.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 23:06:43 -08:00
Jacob Keller
074c358219 virtio_net: add software timestamp support
This patch enables the use of software timestamping via the virtio_net
driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-01-22 18:10:16 -08:00
Michael S. Tsirkin
6ba422489b virtio/net: verify device has config space
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/net needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:47 +10:30
Jason Wang
41f2f1273c virtio-net: don't do header check for dodgy gso packets
There's no need to do header check for virtio-net since:

- Host sets dodgy for all gso packets from guest and check the header.
- Host should be prepared for all kinds of evil packets from guest, since
  malicious guest can send any kinds of packet.

So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:53:20 -05:00
Herbert Xu
8acdf999ac virtio_net: Fix napi poll list corruption
The commit d75b1ade56 (net: less
interrupt masking in NAPI) breaks virtio_net in an insidious way.

It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty.  However, like some
other drivers virtio_net tries to do a last-ditch check and if
there is more work it will call napi_schedule and then immediately
process some of this new work.  Should the entire budget be consumed
while processing such new work then we will violate the new caller
contract.

This patch fixes this by not touching any work when we reschedule
in virtio_net.

The worst part of this bug is that the list corruption causes other
napi users to be moved off-list.  In my case I was chasing a stall
in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
was virtio_net which caused the stall even though the virtio_net
poll was still functioning perfectly after IPsec stalled.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-22 16:10:12 -05:00
Michael S. Tsirkin
51cdc3815f virtio: drop VIRTIO_F_VERSION_1 from drivers
Core activates this bit automatically now,
drop it from drivers that set it explicitly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:06:32 +02:00
Michael S. Tsirkin
9465a7a6f1 virtio_net: enable v1.0 support
Now that we have completed 1.0 support, enable it in our driver.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:28 +02:00
Michael S. Tsirkin
7e93a02fec virtio_net: disable mac write for virtio 1.0
The spec states that mac in config space is only driver-writable in the
legacy case.  Fence writing it in virtnet_set_mac_address() in the
virtio 1.0 case.

Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:28 +02:00
Michael S. Tsirkin
d04302b334 virtio_net: bigger header when VERSION_1 is set
With VERSION_1 virtio_net uses same header size
whether mergeable buffers are enabled or not.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09 12:05:28 +02:00
Michael S. Tsirkin
bcff3162f3 virtio_net: stricter short buffer length checks
Our buffer length check is not strict enough for mergeable
buffers: buffer can still be shorter that header + address
by 2 bytes.

Fix that up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09 12:05:28 +02:00
Michael S. Tsirkin
012873d057 virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr
virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not
really useful since virtio_net_hdr_mrg_rxbuf includes that as the first
field anyway.

Let's drop it, precalculate header len and store within vi instead.

This way we can also remove struct skb_vnet_hdr.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2014-12-09 12:05:28 +02:00
Michael S. Tsirkin
946fa5647b virtio_net: pass vi around
Too many places poke at [rs]q->vq->vdev->priv just to get
the vi structure.  Let's just pass the pointer around: seems
cleaner, and might even be faster.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09 12:05:27 +02:00
Michael S. Tsirkin
fdd819b215 virtio_net: v1.0 endianness
Based on patches by Rusty Russell, Cornelia Huck.
Note: more code changes are needed for 1.0 support
(due to different header size).
So we don't advertize support for 1.0 yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09 12:05:26 +02:00
Jason Wang
892d6eb124 virtio-net: validate features during probe
We currently trigger BUG when VIRTIO_NET_F_CTRL_VQ
is not set but one of features depending on it is.
That's not a friendly way to report errors to
hypervisors.
Let's check, and fail probe instead.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 00:26:43 -05:00
Ben Hutchings
3d0ad09412 drivers/net: Disable UFO through virtio
IPv6 does not allow fragmentation by routers, so there is no
fragmentation ID in the fixed header.  UFO for IPv6 requires the ID to
be passed separately, but there is no provision for this in the virtio
net protocol.

Until recently our software implementation of UFO/IPv6 generated a new
ID, but this was a bug.  Now we will use ID=0 for any UFO/IPv6 packet
passed through a tap, which is even worse.

Unfortunately there is no distinction between UFO/IPv4 and v6
features, so disable UFO on taps and virtio_net completely until we
have a proper solution.

We cannot depend on VM managers respecting the tap feature flags, so
keep accepting UFO packets but log a warning the first time we do
this.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 20:01:18 -04:00
Linus Torvalds
0e6e58f941 One cc: stable commit, the rest are a series of minor cleanups which have
been sitting in MST's tree during my vacation.  I changed a function name
 and made one trivial change, then they spent two days in linux-next.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
 0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
 KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
 YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
 c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
 GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
 eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
 f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
 MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
 kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
 AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
 vtx5nXiJa8YYdxI2TJCN
 =JK6x
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "One cc: stable commit, the rest are a series of minor cleanups which
  have been sitting in MST's tree during my vacation.  I changed a
  function name and made one trivial change, then they spent two days in
  linux-next"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
  virtio-rng: refactor probe error handling
  virtio_scsi: drop scan callback
  virtio_balloon: enable VQs early on restore
  virtio_scsi: fix race on device removal
  virito_scsi: use freezable WQ for events
  virtio_net: enable VQs early on restore
  virtio_console: enable VQs early on restore
  virtio_scsi: enable VQs early on restore
  virtio_blk: enable VQs early on restore
  virtio_scsi: move kick event out from virtscsi_init
  virtio_net: fix use after free on allocation failure
  9p/trans_virtio: enable VQs early
  virtio_console: enable VQs early
  virtio_blk: enable VQs early
  virtio_net: enable VQs early
  virtio: add API to enable VQs early
  virtio_net: minor cleanup
  virtio-net: drop config_mutex
  virtio_net: drop config_enable
  virtio-blk: drop config_mutex
  ...
2014-10-18 10:25:09 -07:00
Michael S. Tsirkin
4b7fd2e688 virtio_net: fix use after free
commit 0b725a2ca6
    net: Remove ndo_xmit_flush netdev operation, use signalling instead.

added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.

Fix by storing xmit_more in a local variable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-15 16:47:45 -04:00
Michael S. Tsirkin
e53fbd11e9 virtio_net: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio net violated this
rule by using receive VQs within restore.

To fix, call virtio_device_ready before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:10 +10:30
Michael S. Tsirkin
0246555550 virtio_net: fix use after free on allocation failure
In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.

To fix, reset device first - same as we do on device removal.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:05 +10:30
Michael S. Tsirkin
4baf1e33d0 virtio_net: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio net violated this
rule by using receive VQs within probe.

To fix, call virtio_device_ready before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:02 +10:30
Michael S. Tsirkin
507613bf31 virtio_net: minor cleanup
goto done;
done:
	return;
is ugly, it was put there to make diff review easier.
replace by open-coded return.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:25:00 +10:30
Michael S. Tsirkin
080c637373 virtio-net: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37
    workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:59 +10:30
Michael S. Tsirkin
102a2786c9 virtio_net: drop config_enable
Now that virtio core ensures config changes don't arrive during probing,
drop config_enable flag in virtio net.
On removal, flush is now sufficient to guarantee that no change work is
queued.

This help simplify the driver, and will allow setting DRIVER_OK earlier
without losing config change notifications.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15 10:24:58 +10:30
Rusty Russell
a58354409a virtio_net: pass well-formed sgs to virtqueue_add_*()
This is the only driver which doesn't hand virtqueue_add_inbuf and
virtqueue_add_outbuf a well-formed, well-terminated sg.  Fix it,
so we can make virtio_add_* simpler.

pktgen results:
	modprobe pktgen
	echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0
	echo nowait 1 > /proc/net/pktgen/eth0
	echo count 1000000 > /proc/net/pktgen/eth0
	echo clone_skb 100000 > /proc/net/pktgen/eth0
	echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0
	echo dst 192.168.1.2 > /proc/net/pktgen/eth0
	for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 /proc/net/pktgen/eth0; done

Before:
  746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec (346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0

After:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:50:46 -04:00
David S. Miller
c89fcfd42c virtio_net: flush when in xmit_more mode and under descriptor pressure
Mirror the changes made to ixgbe in commit 2367a17390
("ixgbe: flush when in xmit_more mode and under descriptor pressure")

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-28 01:39:49 -07:00
David S. Miller
0b725a2ca6 net: Remove ndo_xmit_flush netdev operation, use signalling instead.
As reported by Jesper Dangaard Brouer, for high packet rates the
overhead of having another indirect call in the TX path is
non-trivial.

There is the indirect call itself, and then there is all of the
reloading of the state to refetch the tail pointer value and
then write the device register.

Move to a more passive scheme, which requires very light modifications
to the device drivers.

The signal is a new skb->xmit_more value, if it is non-zero it means
that more SKBs are pending to be transmitted on the same queue as the
current SKB.  And therefore, the driver may elide the tail pointer
update.

Right now skb->xmit_more is always zero.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25 16:29:42 -07:00
David S. Miller
c223a078cb virtio_net: Support netdev_ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 23:02:45 -07:00
Jason Wang
91815639d8 virtio-net: rx busy polling support
Add basic support for rx busy polling. Instead of introducing new
states and spinlock to synchronize between NAPI and polling method,
this patch just reuse NAPI state to avoid extra overhead for fast path
and simplified the codes.

Test was done between a kvm guest and an external host. Two hosts were
connected through 40gb mlx4 cards. With both busy_poll and busy_read
are set to 50 in guest, 1 byte netperf tcp_rr shows 127% improvement:
transaction rate was increased from 8353.33 to 18966.87.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 15:12:02 -07:00
Jason Wang
2ffa75988f virtio-net: introduce virtnet_receive()
Move common receive logic to a new helper virtnet_receive(). It will
also be used by rx busy polling method.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23 15:12:01 -07:00
Wilfried Klaebe
7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Zhangjie \(HZ\)
6ebbc1a638 virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true
This is a small supplement for commit e7428e95a0
("virtio-net: put virtio-net header inline with data"). TCP packages have
enough room to put virtio-net header in, but UDP packages do not. By
setting dev->needed_headroom for virtio-net device, UDP packages could have
enough room.

For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is
"alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);".
The Macro is defined as follows:
#define LL_RESERVED_SPACE(dev) \
     ((((dev)->hard_header_len+(dev)->needed_headroom)\
     &~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
By default, for UDP packages, after skb is allocated, only 16 bytes
reserved. And 2 bytes remained after mac header is set. That is not enough
to put virtio-net header in. If we set dev->needed_headroom to 12 or 10
(according to mergeable_rx_bufs is on or off ), more room can be reserved.
Then there is enough room for UDP packages to put the header in.

test result list as below:
guest and host: suse11sp3, netperf, intel 2.4GHz
+-------+---------+---------+---------+---------+
|       |   old             |   new             |
+-------+---------+---------+---------+---------+
| UDP   |  Gbit/s | pps     |  Gbit/s | pps     |
| 64    |  0.57   | 692232  |  0.61   | 742420  |
| 256   |  1.60   | 686860  |  1.71   | 733331  |
| 512   |  2.92   | 674576  |  3.07   | 710446  |
| 1024  |  4.99   | 598977  |  5.17   | 620821  |
| 1460  |  5.68   | 483757  |  7.16   | 610519  |
| 4096  |  6.98   | 637468  |  7.21   | 658471  |
+-------+---------+---------+---------+---------+

Signed-off-by: Zhang Jie <zhangjie14@huawei.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:26 -04:00
Amos Kong
c18e9cd623 virtio_net: zero is an invald queue_pairs number
Execute "ethtool -L eth0 combined 0" in guest, if multiqueue
is enabled, virtnet_send_command() will return -EINVAL error,
there is a validation in QEMU.

But if multiqueue is disabled, virtnet_set_queues() will just
return zero (success). We should return error for this situation.

Signed-off-by: Amos Kong <akong@redhat.com>
Acked-by:  Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22 16:01:35 -04:00
Linus Torvalds
cd6362befe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here is my initial pull request for the networking subsystem during
  this merge window:

   1) Support for ESN in AH (RFC 4302) from Fan Du.

   2) Add full kernel doc for ethtool command structures, from Ben
      Hutchings.

   3) Add BCM7xxx PHY driver, from Florian Fainelli.

   4) Export computed TCP rate information in netlink socket dumps, from
      Eric Dumazet.

   5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
      Dichtel.

   6) Convert many drivers to pci_enable_msix_range(), from Alexander
      Gordeev.

   7) Record SKB timestamps more efficiently, from Eric Dumazet.

   8) Switch to microsecond resolution for TCP round trip times, also
      from Eric Dumazet.

   9) Clean up and fix 6lowpan fragmentation handling by making use of
      the existing inet_frag api for it's implementation.

  10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

  11) Auto size SKB lengths when composing netlink messages based upon
      past message sizes used, from Eric Dumazet.

  12) qdisc dumps can take a long time, add a cond_resched(), From Eric
      Dumazet.

  13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
      Get rid of never-used-in-tree netpoll RX handling.  From Eric W
      Biederman.

  14) Support inter-address-family and namespace changing in VTI tunnel
      driver(s).  From Steffen Klassert.

  15) Add Altera TSE driver, from Vince Bridgers.

  16) Optimizing csum_replace2() so that it doesn't adjust the checksum
      by checksumming the entire header, from Eric Dumazet.

  17) Expand BPF internal implementation for faster interpreting, more
      direct translations into JIT'd code, and much cleaner uses of BPF
      filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
      Starovoitov"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
  netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
  net: Add a test to see if a skb is freeable in irq context
  qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
  net: ptp: move PTP classifier in its own file
  net: sxgbe: make "core_ops" static
  net: sxgbe: fix logical vs bitwise operation
  net: sxgbe: sxgbe_mdio_register() frees the bus
  Call efx_set_channels() before efx->type->dimension_resources()
  xen-netback: disable rogue vif in kthread context
  net/mlx4: Set proper build dependancy with vxlan
  be2net: fix build dependency on VxLAN
  mac802154: make csma/cca parameters per-wpan
  mac802154: allow only one WPAN to be up at any given time
  net: filter: minor: fix kdoc in __sk_run_filter
  netlink: don't compare the nul-termination in nla_strcmp
  can: c_can: Avoid led toggling for every packet.
  can: c_can: Simplify TX interrupt cleanup
  can: c_can: Store dlc private
  can: c_can: Reduce register access
  can: c_can: Make the code readable
  ...
2014-04-02 20:53:45 -07:00
Linus Torvalds
64056a9425 Nothing exciting: virtio-blk users might see a bit of a boost from the
doubling of the default queue length though.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOirvAAoJENkgDmzRrbjxpqAP/3114WOubf4MNoi24eXNkU2x
 ZC++orq408A72g8dB7A0XAhatKc8Ay0bDP5hOZNAgTHm2hjYxmhpC9UlKEzzElpW
 yjwR0wYBXA0RoJuZqCp9MNgOtkU54QoQZ0c4EZblagslUZBmKQPUDE7XdgkaqO6o
 A3azfCjAFDu523Azep9Npj1sk+H+VH3OIXyMGY+uyHBw1a2rnmIhn4lQCQ+pX+YO
 3wpqxEragpjBAizs1CAAB9wWm2O8zVACxkoUXVYI8Tu60n99lwr9Abxlc0oHSIig
 E7kBnyzQVHfkDrPXR3EdwTi3Hwd/BaOiW4dPvQ3IJKvPOzoiS4H3IpJCnCV5PfRb
 VHl3q//SzPQ+GXH7WH2Fhb9JxoLxBRzFcy3kdIR1wBHYahAOiQjcLgapOO5mVq3X
 PJy9CDs2L9rjbtxvQWtnl62V3JFGw+ZdhhG/BjeC5Who/aSh/mDoss7/qdfrxKJx
 z5IWYSlJw7ighOuF0dPdCKAX9WiWqENvga31Q2svrH4Hxlx6eGumEmX+YQw0iRAf
 ddMYA+1qLT4myPTN0nORFM+T/mkZHNkNMCr0qylRFH0j6hSiDxwWqQG0eXA661By
 W6nIkW++sj2Vkk4knMGCXSyMmy9Nrv+1R+8unQJCXixYevotP5JEY0DoCQwlGuuq
 xa0UR+2q9Htnbytu8S0K
 =AoMS
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "Nothing exciting: virtio-blk users might see a bit of a boost from the
  doubling of the default queue length though"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio-blk: base queue-depth on virtqueue ringsize or module param
  Revert a02bbb1ccf: MAINTAINERS: add virtio-dev ML for virtio
  virtio: fail adding buffer on broken queues.
  virtio-rng: don't crash if virtqueue is broken.
  virtio_balloon: don't crash if virtqueue is broken.
  virtio_blk: don't crash, report error if virtqueue is broken.
  virtio_net: don't crash if virtqueue is broken.
  virtio_balloon: don't softlockup on huge balloon changes.
  virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()
  MAINTAINERS: virtio-dev is subscribers only
  tools/virtio: add a missing )
  tools/virtio: fix missing kmemleak_ignore symbol
  tools/virtio: update internal copies of headers
2014-04-02 14:43:17 -07:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Jason Wang
681daee244 virtio-net: correct error handling of virtqueue_kick()
Current error handling of virtqueue_kick() was wrong in two places:
- The skb were freed immediately when virtqueue_kick() fail during
  xmit. This may lead double free since the skb was not detached from
  the virtqueue.
- try_fill_recv() returns false when virtqueue_kick() fail. This will
  lead unnecessary rescheduling of refill work.

Actually, it's safe to just ignore the kick failure in those two
places. So this patch fixes this by partially revert commit
6797590118.

Fixes 6797590118
(virtio_net: verify if virtqueue_kick() succeeded).

Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:13:21 -04:00
Eric W. Biederman
85e9452539 virtio_net: Call dev_kfree_skb_any instead of dev_kfree_skb.
Replace dev_kfree_skb with dev_kfree_skb_any in start_xmit which can
be called in hard irq and other contexts.

start_xmit only frees skbs that it is dropping.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-03-24 21:19:25 -07:00
Eric W. Biederman
57a7744e09 net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:41:36 -04:00
Rusty Russell
a7c58146cf virtio_net: don't crash if virtqueue is broken.
A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG_ON().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 11:27:55 +10:30
Jason Wang
0e7ede80d9 virtio-net: alloc big buffers also when guest can receive UFO
We should alloc big buffers also when guest can receive UFO
packets to let the big packets fit into guest rx buffer.

Fixes 5c5167515d
(virtio-net: Allow UFO feature to be set and advertised.)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-24 18:19:52 -05:00
Michael Dalton
fbf28d78f5 virtio-net: initial rx sysfs support, export mergeable rx buffer size
Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:07 -08:00
Michael Dalton
ab7db91705 virtio-net: auto-tune mergeable rx buffer size for improved performance
Commit 2613af0ed1 ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads.  For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.

This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.

These optimizations interact positively with recent commit
ba27524103 ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.

Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.

net-next w/ virtio_net before 2613af0ed1 (PAGE_SIZE bufs): 14642.85Gb/s
net-next (MTU-size bufs):  13170.01Gb/s
net-next + auto-tune: 14555.94Gb/s

Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
using MTU-sized buffers to about 26Gb/s using auto-tuning.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Michael Dalton
fb51879dbc virtio-net: use per-receive queue page frag alloc for mergeable bufs
The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.

To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Jason Wang
be121f46af virtio-net: drop rq->max and rq->num
It looks like there's no need for those two fields:

- Unless there's a failure for the first refill try, rq->max should be always
  equal to the vring size.
- rq->num is only used to determine the condition that we need to do the refill,
  we could check vq->num_free instead.
- rq->num was required to be increased or decreased explicitly after each
  get/put which results a bad API.

So this patch removes them both to make the code simpler.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:30:42 -08:00