Netback and netfront only use one event channel to do TX / RX notification,
which may cause unnecessary wake-up of processing routines. This patch adds a
new feature called feature-split-event-channels to netback, enabling it to
handle TX and RX events separately.
Netback will use tx_irq to notify guest for TX completion, rx_irq for RX
notification.
If frontend doesn't support this feature, tx_irq equals to rx_irq.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables user to unload netback module, which is useful when user
wants to upgrade to a newer netback module without rebooting the host.
Netfront cannot handle netback removal event. As we cannot fix all possible
frontends we add module get / put along with vif get / put to avoid
mis-unloading of netback. To unload netback module, user needs to shutdown all
VMs or migrate them to another host or unplug all vifs before hand.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>¬
Signed-off-by: David S. Miller <davem@davemloft.net>
The array mmap_pages is never touched in the initialization function. This is
remnant of mapping mechanism, which does not exist upstream. In current
upstream code this array only tracks usage of pages inside netback. Those
pages are allocated when contructing a SKB and passed directly to network
subsystem.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch only changes some names to avoid confusion.
In this patch we have:
MAX_SKB_SLOTS_DEFAULT -> FATAL_SKB_SLOTS_DEFAULT
max_skb_slots -> fatal_skb_slots
#define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN
The fatal_skb_slots is the threshold to determine whether a packet is
malicious.
XEN_NETBK_LEGACY_SLOTS_MAX is the maximum slots a valid packet can have at
this point. It is defined to be XEN_NETIF_NR_SLOTS_MIN because that's
guaranteed to be supported by all backends.
Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tune xen_netbk_count_requests to not touch working array beyond limit, so that
we can make working array size constant.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tracking down from the caller, first_idx is always equal to vif->tx.req_cons.
Remove it to avoid confusion.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some frontend drivers are sending packets > 64 KiB in length. This length
overflows the length field in the first slot making the following slots have
an invalid length.
Turn this error back into a non-fatal error by dropping the packet. To avoid
having the following slots having fatal errors, consume all slots in the
packet.
This does not reopen the security hole in XSA-39 as if the packet as an
invalid number of slots it will still hit fatal error case.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
With the help of coalescing, this patch tries to address two regressions
avoid reopening the security hole in XSA-39.
Regression 1. The reduction of the number of supported ring entries (slots)
per packet (from 18 to 17). This regression has been around for some time but
remains unnoticed until XSA-39 security fix. This is fixed by coalescing
slots.
Regression 2. The XSA-39 security fix turning "too many frags" errors from
just dropping the packet to a fatal error and disabling the VIF. This is fixed
by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17)
which rules out false positive (using 18 slots is legit) and dropping packets
using 19 to `max_skb_slots` slots.
To avoid reopening security hole in XSA-39, frontend sending packet using more
than max_skb_slots is considered malicious.
The behavior of netback for packet is thus:
1-18 slots: valid
19-max_skb_slots slots: drop and respond with an error
max_skb_slots+ slots: fatal error
max_skb_slots is configurable by admin, default value is 20.
Also change variable name from "frags" to "slots" in netbk_count_requests.
Please note that RX path still has dependency on MAX_SKB_FRAGS. This will be
fixed with separate patch.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to use skb_partial_csum_set() to simplify the codes.
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to use the new help skb_probe_transport_header() to do the l4 header
probing for untrusted sources. For packets with partial csum, the header should
already been set by skb_partial_csum_set().
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, for the packets receives from netback, before doing header check,
kernel just reset the transport header in netif_receive_skb() which pretends non
l4 header. This is suboptimal for precise packet length estimation (introduced
in 1def9238: net_sched: more precise pkt_len computation) which needs correct l4
header for gso packets.
The patch just reuse the header probed by netback for partial checksum packets
and tries to use skb_flow_dissect() for other cases, if both fail, just pretend
no l4 header.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This variable is never used.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d37204566a.
This change is incorrect, as per Jan Beulich:
====================
But this is wrong from all we can tell, we discussed this before
(Wei pointed to the discussion in an earlier reply). The core of
it is that the put here parallels the one in netbk_tx_err(), and
the one in xenvif_carrier_off() matches the get from
xenvif_connect() (which normally would be done on the path
coming through xenvif_disconnect()).
====================
And a previous discussion of this issue is at:
http://marc.info/?l=xen-devel&m=136084174026977&w=2
Signed-off-by: David S. Miller <davem@davemloft.net>
netbk_fatal_tx_err() calls xenvif_carrier_off(), which does
a xenvif_put(). As callers of netbk_fatal_tx_err should only
have one reference to the vif at this time, then the xenvif_put
in netbk_fatal_tx_err is one too many.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull in 'net' to take in the bug fixes that didn't make it into
3.8-final.
Also, deal with the semantic conflict of the change made to
net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release
was added to 'net', but in 'net-next' we no longer cache the
neighbour entries in the ipv6 routes so that change is not
appropriate there.
Signed-off-by: David S. Miller <davem@davemloft.net>
If the credit timer is left armed after calling
xen_netbk_remove_xenvif(), then it may fire and attempt to schedule
the vif which will then oops as vif->netbk == NULL.
This may happen both in the fatal error path and during normal
disconnection from the front end.
The sequencing during shutdown is critical to ensure that: a)
vif->netbk doesn't become unexpectedly NULL; and b) the net device/vif
is not freed.
1. Mark as unschedulable (netif_carrier_off()).
2. Synchronously cancel the timer.
3. Remove the vif from the schedule list.
4. Remove it from it netback thread group.
5. Wait for vif->refcnt to become 0.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
netbk_count_requests() could detect an error, call
netbk_fatal_tx_error() but return 0. The vif may then be used
afterwards (e.g., in a call to netbk_tx_error().
Since netbk_fatal_tx_error() could set vif->refcnt to 1, the vif may
be freed immediately after the call to netbk_fatal_tx_error() (e.g.,
if the vif is also removed).
Netback thread Xenwatch thread
-------------------------------------------
netbk_fatal_tx_err() netback_remove()
xenvif_disconnect()
...
free_netdev()
netbk_tx_err() Oops!
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A buggy or malicious frontend should not be able to confuse netback.
If we spot anything which is not as it should be then shutdown the
device and don't try to continue with the ring in a potentially
hostile state. Well behaved and non-hostile frontends will not be
penalised.
As well as making the existing checks for such errors fatal also add a
new check that ensures that there isn't an insane number of requests
on the ring (i.e. more than would fit in the ring). If the ring
contains garbage then previously is was possible to loop over this
insane number, getting an error each time and therefore not generating
any more pending requests and therefore not exiting the loop in
xen_netbk_tx_build_gops for an externded period.
Also turn various netdev_dbg calls which no precipitate a fatal error
into netdev_err, they are rate limited because the device is shutdown
afterwards.
This fixes at least one known DoS/softlockup of the backend domain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes it is useful to be able to change the MAC address of the
interface for netback devices. For example, when using ebtables it may
be useful to be able to distinguish traffic from different interfaces
without depending on the interface name.
Reported-by: Nikita Borzykh <sample.n@gmail.com>
Reported-by: Paul Harvey <stockingpaul@hotmail.com>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xen.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Matt Wilson <msw@amazon.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.
Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'xenarm-for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
arm: introduce a DTS for Xen unprivileged virtual machines
MAINTAINERS: add myself as Xen ARM maintainer
xen/arm: compile netback
xen/arm: compile blkfront and blkback
xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
xen/arm: receive Xen events on ARM
xen/arm: initialize grant_table on ARM
xen/arm: get privilege status
xen/arm: introduce CONFIG_XEN on ARM
xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
xen/arm: Introduce xen_ulong_t for unsigned long
xen/arm: Xen detection and shared_info page mapping
docs: Xen ARM DT bindings
xen/arm: empty implementation of grant_table arch specific functions
xen/arm: sync_bitops
xen/arm: page.h definitions
xen/arm: hypercalls
arm: initial Xen support
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since Xen-4.2, hvm domains may have portions of their memory paged out. When a
foreign domain (such as dom0) attempts to map these frames, the map will
initially fail. The hypervisor returns a suitable errno, and kicks an
asynchronous page-in operation carried out by a helper. The foreign domain is
expected to retry the mapping operation until it eventually succeeds. The
foreign domain is not put to sleep because itself could be the one running the
pager assist (typical scenario for dom0).
This patch adds support for this mechanism for backend drivers using grant
mapping and copying operations. Specifically, this covers the blkback and
gntdev drivers (which map foreign grants), and the netback driver (which copies
foreign grants).
* Add a retry method for grants that fail with GNTST_eagain (i.e. because the
target foreign frame is paged out).
* Insert hooks with appropriate wrappers in the aforementioned drivers.
The retry loop is only invoked if the grant operation status is GNTST_eagain.
It guarantees to leave a new status code different from GNTST_eagain. Any other
status code results in identical code execution as before.
The retry loop performs 256 attempts with increasing time intervals through a
32 second period. It uses msleep to yield while waiting for the next retry.
V2 after feedback from David Vrabel:
* Explicit MAX_DELAY instead of wrap-around delay into zero
* Abstract GNTST_eagain check into core grant table code for netback module.
V3 after feedback from Ian Campbell:
* Add placeholder in array of grant table error descriptions for unrelated
error code we jump over.
* Eliminate single map and retry macro in favor of a generic batch flavor.
* Some renaming.
* Bury most implementation in grant_table.c, cleaner interface.
V4 rebased on top of sync of Xen grant table interface headers.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[v5: Fixed whitespace issues]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
After SKB is queued into tx_queue, it will be freed if request_gop is NULL.
However, no dequeue action is called in this situation, it is likely that
tx_queue constains freed SKB. This patch should fix this issue, and it is
based on 3.5.0-rc4+.
This issue is found through code inspection, no bug is seen with it currently.
I run netperf test for several hours, and no network regression was found.
Signed-off-by: Annie Li <annie.li@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calculating the number of slots required for a packet header, the code
was reserving too many slots if the header crossed a page boundary. Since
netbk_gop_skb copies the header to the start of the page, the count of
slots required for the header should be based solely on the header size.
This problem is easy to reproduce if a VIF is bridged to a USB 3G modem
device as the skb->data value always starts near the end of the first page.
Signed-off-by: Simon Graham <simon.graham@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc failures use dump_stack so emitting an additional
out-of-memory message is an unnecessary duplication.
Remove the allocation failure messages.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (37 commits)
xen/pciback: Expand the warning message to include domain id.
xen/pciback: Fix "device has been assigned to X domain!" warning
xen/pciback: Move the PCI_DEV_FLAGS_ASSIGNED ops to the "[un|]bind"
xen/xenbus: don't reimplement kvasprintf via a fixed size buffer
xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX
xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX.
Xen: consolidate and simplify struct xenbus_driver instantiation
xen-gntalloc: introduce missing kfree
xen/xenbus: Fix compile error - missing header for xen_initial_domain()
xen/netback: Enable netback on HVM guests
xen/grant-table: Support mappings required by blkback
xenbus: Use grant-table wrapper functions
xenbus: Support HVM backends
xen/xenbus-frontend: Fix compile error with randconfig
xen/xenbus-frontend: Make error message more clear
xen/privcmd: Remove unused support for arch specific privcmp mmap
xen: Add xenbus_backend device
xen: Add xenbus device driver
xen: Add privcmd device driver
xen/gntalloc: fix reference counts on multi-page mappings
...
All tables of function pointers should be const to make hacks
more difficult. Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'name', 'owner', and 'mod_name' members are redundant with the
identically named fields in the 'driver' sub-structure. Rather than
switching each instance to specify these fields explicitly, introduce
a macro to simplify this.
Eliminate further redundancy by allowing the drvname argument to
DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
the ID table will be used for .driver.name).
Also eliminate the questionable xenbus_register_{back,front}end()
wrappers - their sole remaining purpose was the checking of the
'owner' field, proper setting of which shouldn't be an issue anymore
when the macro gets used.
v2: Restore DRV_NAME for the driver name in xen-pciback.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
"variables a used" should be "variables are used".
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New value for netbk->mmap_pages[pending_idx] is assigned in
xen_netbk_alloc_page(), no need for a second assignment which
exposes internal to the outside world.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original message in netback_init was 'kthread_run() fails', which should be
'kthread_create() fails'.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: add couple missing conversions in drivers
split unexporting netdev_fix_features()
implemented %pNF
convert sock::sk_route_(no?)caps
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
net: xen-netback: use API provided by xenbus module to map rings
block: xen-blkback: use API provided by xenbus module to map rings
xen: use generic functions instead of xen_{alloc, free}_vm_area()
The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree(). Use these to map the Tx and Rx ring pages
granted by the frontend.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netback currently uses frag->page to store a temporary index reference while
processing incoming requests. Since frag->page is to become opaque switch
instead to using page_offset. Add a wrapper to tidy this up and propagate the
fact that the indexes are only u16 through the code (this was already true in
practice but unsigned long and in were inconsistently used as variable and
parameter types)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
If a VM is saved and restored (or migrated) the netback driver will no
longer process any Tx packets from the frontend. xenvif_up() does not
schedule the processing of any pending Tx requests from the front end
because the carrier is off. Without this initial kick the frontend
just adds Tx requests to the ring without raising an event (until the
ring is full).
This was caused by 47103041e9 (net:
xen-netback: convert to hw_features) which reordered the calls to
xenvif_up() and netif_carrier_on() in xenvif_connect().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add xen-backend:vif module alias to the xen-netback module. This allows
automatic loading of the module.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The later causes warnings with gcc 4.5+. __CONST_RING_SIZE was introduced in
667c78afae to fix this but as netback wasn't upstream at the time it did not
benefit, hence:
CC drivers/net/xen-netback/netback.o
drivers/net/xen-netback/netback.c:110:37: warning: variably modified 'grant_copy_op' at file scope [enabled by default]
drivers/net/xen-netback/netback.c:111:30: warning: variably modified 'meta' at file scope [enabled by default]
drivers/net/xen-netback/netback.c: In function 'xen_netbk_rx_action':
drivers/net/xen-netback/netback.c:584:6: warning: variable 'irq' set but not used [-Wunused-but-set-variable]
Thanks to Witold Baryluk for pointing this out.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>