If a frontend not receiving packets it is useful to detect this and
turn off the carrier so packets are dropped early instead of being
queued and drained when they expire.
A to-guest queue is stalled if it doesn't have enough free slots for a
an extended period of time (default 60 s).
If at least one queue is stalled, the carrier is turned off (in the
expectation that the other queues will soon stall as well). The
carrier is only turned on once all queues are ready.
When the frontend connects, all the queues start in the stalled state
and only become ready once the frontend queues enough Rx requests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netback needs to discard old to-guest skb's (guest Rx queue drain) and
it needs detect guest Rx stalls (to disable the carrier so packets are
discarded earlier), but the current implementation is very broken.
1. The check in hard_start_xmit of the slot availability did not
consider the number of packets that were already in the guest Rx
queue. This could allow the queue to grow without bound.
The guest stops consuming packets and the ring was allowed to fill
leaving S slot free. Netback queues a packet requiring more than S
slots (ensuring that the ring stays with S slots free). Netback
queue indefinately packets provided that then require S or fewer
slots.
2. The Rx stall detection is not triggered in this case since the
(host) Tx queue is not stopped.
3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
will consider this an Rx purge event which may result in it taking
the carrier down unnecessarily. It also considers a queue with
only 1 slot free as unstalled (even though the next packet might
not fit in this).
The internal guest Rx queue is limited by a byte length (to 512 Kib,
enough for half the ring). The (host) Tx queue is stopped and started
based on this limit. This sets an upper bound on the amount of memory
used by packets on the internal queue.
This allows the estimatation of the number of slots for an skb to be
removed (it wasn't a very good estimate anyway). Instead, the guest
Rx thread just waits for enough free slots for a maximum sized packet.
skbs queued on the internal queue have an 'expires' time (set to the
current time plus the drain timeout). The guest Rx thread will detect
when the skb at the head of the queue has expired and discard expired
skbs. This sets a clear upper bound on the length of time an skb can
be queued for. For a guest being destroyed the maximum time needed to
wait for all the packets it sent to be dropped is still the drain
timeout (10 s) since it will not be sending new packets.
Rx stall detection is reintroduced in a later commit.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).
This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse
errors.
Replace the uses with standard structure definitions instead. This is
similar to pci and usb device registration.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The original code is bogus. The function gets called in a loop which
leaks entries created in previous rounds.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enlarge buffer size and check input length properly, so that we don't
misuse -ENOSPC.
Note that command like "kickXXXX" is still allowed, that's one patch for
another day if we really want to be very strict on this.
Reported-by: SeeChen Ng <seechen81@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds debugfs capabilities to netback. There used to be a similar
patch floating around for classic kernel, but it used procfs. It is based on a
very similar blkback patch.
It creates xen-netback/[vifname]/io_ring_q[queueno] files, reading them output
various ring variables etc. Writing "kick" into it imitates an interrupt
happened, it can be useful to check whether the ring is just stalled due to a
missed interrupt.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
The original code uses netdev->real_num_tx_queues to bookkeep number of
queues and invokes netif_set_real_num_tx_queues to set the number of
queues. However, netif_set_real_num_tx_queues doesn't allow
real_num_tx_queues to be smaller than 1, which means setting the number
to 0 will not work and real_num_tx_queues is untouched.
This is bogus when xenvif_free is invoked before any number of queues is
allocated. That function needs to iterate through all queues to free
resources. Using the wrong number of queues results in NULL pointer
dereference.
So we bookkeep the number of queues in xen-netback to solve this
problem. This fixes a regression introduced by multiqueue patchset in
3.16-rc1.
There's another bug in original code that the real number of RX queues
is never set. In current Xen multiqueue design, the number of TX queues
and RX queues are in fact the same. We need to set the numbers of TX and
RX queues to the same value.
Also remove xenvif_select_queue and leave queue selection to core
driver, as suggested by David Miller.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Builds on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.
Writes the maximum supported number of queues into XenStore, and reads
the values written by the frontend to determine how many queues to use.
Ring references and event channels are read from XenStore on a per-queue
basis and rings are connected accordingly.
Also adds code to handle the cleanup of any already initialised queues
if the initialisation of a subsequent queue fails.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for multi-queue support in xen-netback, move the
queue-specific data from struct xenvif into struct xenvif_queue, and
update the rest of the code to use this.
Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_netdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.
Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0 for a single queue and uses
skb_get_hash() to compute the queue index otherwise.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several files refer to an old address for the Free Software Foundation
in the file header comment. Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Paul Mackerras <paulus@samba.org>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a guest is destroyed without transitioning its frontend to CLOSED,
the domain becomes a zombie as netback was not grant unmapping the
shared rings.
When removing a VIF, transition the backend to CLOSED so the VIF is
disconnected if necessary (which will unmap the shared rings etc).
This fixes a regression introduced by
279f438e36 (xen-netback: Don't destroy
the netdev until the vif is shut down).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the frontend state changes netback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this patch, if a frontend cycles through states Closing
and Closed (which Windows frontends need to do) then the netdev
will be destroyed and requires re-invocation of hotplug scripts
to restore state before the frontend can move to Connected. Thus
when udev is not in use the backend gets stuck in InitWait.
With this patch, the netdev is left alone whilst the backend is
still online and is only de-registered and freed just prior to
destroying the vif (which is also nicely symmetrical with the
netdev allocation and registration being done during probe) so
no re-invocation of hotplug scripts is required.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert one printk to pr_<level>.
Add a missing newline in several places to avoid message interleaving,
coalesce formats, reflow modified lines to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netback and netfront only use one event channel to do TX / RX notification,
which may cause unnecessary wake-up of processing routines. This patch adds a
new feature called feature-split-event-channels to netback, enabling it to
handle TX and RX events separately.
Netback will use tx_irq to notify guest for TX completion, rx_irq for RX
notification.
If frontend doesn't support this feature, tx_irq equals to rx_irq.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables user to unload netback module, which is useful when user
wants to upgrade to a newer netback module without rebooting the host.
Netfront cannot handle netback removal event. As we cannot fix all possible
frontends we add module get / put along with vif get / put to avoid
mis-unloading of netback. To unload netback module, user needs to shutdown all
VMs or migrate them to another host or unplug all vifs before hand.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>¬
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'name', 'owner', and 'mod_name' members are redundant with the
identically named fields in the 'driver' sub-structure. Rather than
switching each instance to specify these fields explicitly, introduce
a macro to simplify this.
Eliminate further redundancy by allowing the drvname argument to
DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
the ID table will be used for .driver.name).
Also eliminate the questionable xenbus_register_{back,front}end()
wrappers - their sole remaining purpose was the checking of the
'owner' field, proper setting of which shouldn't be an issue anymore
when the macro gets used.
v2: Restore DRV_NAME for the driver name in xen-pciback.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fixes error from sparse:
CHECK drivers/net/xen-netback/xenbus.c
drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield
int have_hotplug_status_watch:1;
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netback is the host side counterpart to the frontend driver in
drivers/net/xen-netfront.c. The PV protocol is also implemented by
frontend drivers in other OSes too, such as the BSDs and even Windows.
The patch is based on the driver from the xen.git pvops kernel tree but
has been put through the checkpatch.pl wringer plus several manual
cleanup passes and review iterations. The driver has been moved from
drivers/xen/netback to drivers/net/xen-netback.
One major change from xen.git is that the guest transmit path (i.e. what
looks like receive to netback) has been significantly reworked to remove
the dependency on the out of tree PageForeign page flag (a core kernel
patch which enables a per page destructor callback on the final
put_page). This page flag was used in order to implement a grant map
based transmit path (where guest pages are mapped directly into SKB
frags). Instead this version of netback uses grant copy operations into
regular memory belonging to the backend domain. Reinstating the grant
map functionality is something which I would like to revisit in the
future.
Note that this driver depends on 2e820f58f7 "xen/irq: implement
bind_interdomain_evtchn_to_irqhandler for backend drivers" which is in
linux next via the "xen-two" tree and is intended for the 2.6.39 merge
window:
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/backends
this branch has only that single commit since 2.6.38-rc2 and is safe for
cross merging into the net branch.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>