What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.
There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)
common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear. I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.
What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.
How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device. Backend is also configured by userspace, including vlan/mac
etc.
Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.
Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use
Note on RCU usage (this is also documented in vhost.h, near
private_pointer which is the value protected by this variant of RCU):
what is happening is that the rcu_dereference() is being used in a
workqueue item. The role of rcu_read_lock() is taken on by the start of
execution of the workqueue item, of rcu_read_unlock() by the end of
execution of the workqueue item, and of synchronize_rcu() by
flush_workqueue()/flush_work(). In the future we might need to apply
some gcc attribute or sparse annotation to the function passed to
INIT_WORK(). Paul's ack below is for this RCU usage.
(Includes fixes by Alan Cox <alan@linux.intel.com>,
David L Stevens <dlstevens@us.ibm.com>,
Chris Wright <chrisw@redhat.com>)
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vhost net module wants to do copy to/from user from a kernel thread,
which needs use_mm. Export it to modules.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tun device looks similar to a packet socket
in that both pass complete frames from/to userspace.
This patch fills in enough fields in the socket underlying tun driver
to support sendmsg/recvmsg operations, and message flags
MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
to modules. Regular read/write behaviour is unchanged.
This way, code using raw sockets to inject packets
into a physical device, can support injecting
packets into host network stack almost without modification.
First user of this interface will be vhost virtualization
accelerator.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds error checking of ctrlmode values for CAN devices. As
an example all availabe bits are implemented in the mcp251x driver.
Signed-off-by: Christian Pellegrin <chripell@fsfe.org>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GNU General Public License is in file "COPYING".
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before sending Interrupt coalesce parameters to device,
convert them in little endian.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netxen_read_mac_addr, mac_addr should be declared
u64 instead of __le64, used by host only.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by H Hartley Sweeten, since change_nexthops() uses 'nh'
as it's iterator variable, it can conflict with other existing
local vars.
Use "nexthop_nh" to avoid the conflict and make it easier to figure
out where this magic variable comes from.
Signed-off-by: David S. Miller <davem@davemloft.net>
In sock_getsockopt the symbol 'lv' is declared as an
unsigned int type, probably due to sizeof returning a
size_t which is really an unsigned int.
This produces a sparse warning for SO_PEERNAME due to
the sock->ops->getname() call:
warning: incorrect type in argument 3 (different signedness)
expected int *sockaddr_len
got unsigned int *<noident>
Quiet the warning by changing the type of 'lv' to an int.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert code away from ->read_proc/->write_proc interfaces. Switch to
proc_create()/proc_create_data() which make addition of proc entries
reliable wrt NULL ->proc_fops, NULL ->data and so on.
Problem with ->read_proc et al is described here commit
786d7e1612 "Fix rmmod/read/write races in
/proc entries"
[akpm@linux-foundation.org: CONFIG_PROC_FS=n build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
there is a unnecessary test which can be replaced by a good initialization in
the 'for' statement
Noticed by Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Samir Bellabes <sam@synack.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set default BLKT values for new OSA/3 hardware.
Signed-off-by: Einar Lueck <elelueck@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a qeth device is set online, several initialisation steps are
performed. If a failure in one of these steps occurs, the qeth
device is reset into DOWN state. If due to the failure a qeth recovery
is scheduled and started in another thread, this might cause all kinds
of conflicts, even a kernel panic. The patch forbids scheduling of a
qeth recovery while online processing is performed till the card is in
state SOFTSETUP.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New feature to trace HiperSockets network traffic for debugging
purposes.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make updating the multicast address list generic for all families and
enforce the requirement to update the entire multicast table array all at
once instead of piecemeal which causes problems on some parts.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide MAC-specific function pointer to determine the LAN ID (PCI func).
The LAN ID is used internally by the driver to determine which h/w lock
to use to protect accessing the PHY on ESB2 as well as help to determine
the alternate MAC address on some parts.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to 82571/2/3 parts that already do this, if ESB2/80003es2lan parts
have an alternate MAC address provided in the EEPROM use it instead of the
default MAC address. This patch makes the the actual code that does this
generic so that it can be better used by both MAC families.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should allow the removal of the #defines and uses
of NIPQUAD and NIPQUAD_FMT
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prevent the CAN drivers to operate on invalid socketbuffers the skbs are
now checked and silently dropped at the xmit-function consistently.
Also the netdev stats are consistently using the CAN data length code (dlc)
for [rx|tx]_bytes now.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP).
This is already implemented the same way in BSD kernels.
For the necessary Quagga patch
http://www.gossamer-threads.com/lists/quagga/dev/17389
Description from Cisco
http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html
It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit
This should be documented on ip(4) man page and the Glibc in.h
file also needs update. IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).
Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_dma_mapping_error should be used to test return value of
pci_map_single or pci_map_page.
Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip MAC loopback test when the adapter is set to a VT mode such as SR-IOV
or VMDq
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds SR-IOV features supported by the 82599 controller to the main driver
module. If the CONFIG_PCI_IOV kernel option is selected then the SR-IOV
features are enabled. Use the max_vfs module option to allocate up to 63
virtual functions per physical port.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds code to the core 82599 module to support SR-IOV features of the 82599
network controller
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the mailbox and SR-IOV feature modules to the ixgbe driver Makefile.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module and header file add functions to support SR-IOV features of the
82599 10Gbe controller.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds register definitions, bit definitions and structures used by
the driver to support SR-IOV features of the 82599 controller.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82599 virtual function device and the master 82599 physical function
device implement a mailbox utility for communication between the devices
using some SRAM scratch memory and a doorbell/answering mechanism enabled
via interrupt and/or polling. This C module and accompanying header
file implement the base functions for use of this feature.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modifications for the Kconfig and network device Makefile to add the ixgbevf
driver module to the kernel plus basic driver documentation.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
82599 Virtual Function Device Driver Makefile
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These modules and header contain the Linux OS network interface code and core
interrupt and network send/receive handlers.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82599 virtual function device and the master 82599 physical function
device implement a mailbox utility for communication between the devices
using some SRAM scratch memory and a doorbell/answering mechanism enabled
via interrupt and/or polling. This C module and accompanying header
file implement the base functions for use of this feature.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module and header file contain the core functions for the 82599
virtual function device.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two headers define the commonly used macros, data structures,
register bits and register offsets used by the ixgbevf driver on the
82599 virtual function device
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found a potential null dereference in ircomm_tty_close
and ircomm_tty_hangup. There is a check for tty being NULL,
but it is dereferenced earlier. But it is bogus, the tty cannot
be NULL, so remove the !tty checks.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found a potential null dereference in snmp6_unregister_dev.
There is a check for idev being NULL, but it is dereferenced
earlier. But idev cannot be NULL when passed to
snmp6_unregister_dev, so remove the test.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 18c0019102 ("drivers/net/mac8390.c:
Convert printk(KERN_<level> to pr_<level>(") broke the build:
| drivers/net/mac8390.c:306: error: expected ')' before 'version'
as "version" is not a string literal, but a variable.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
normal users are currently allowed to set/modify ebtables rules.
Restrict it to processes with CAP_NET_ADMIN.
Note that this cannot be reproduced with unmodified ebtables binary
because it uses SOCK_RAW.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds support for the global device reset interrupt. Without
this change the drivers will report tx hangs on all ports when a global
device reset occurs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds documentation for the MSCAN OF device bindings for
the MPC512x and moves the one for the MPC5200 to the new common file
"Documentation/powerpc/dts-bindings/fsl/can.txt".
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main differences compared to the MSCAN on the MPC5200 are:
- More flexibility in choosing the CAN source clock and frequency:
Three different clock sources can be selected: "ip", "ref" or "sys".
For the latter two, a clock divider can be defined as well. If the
clock source is not specified by the device tree, we first try to
find an optimal CAN source clock based on the system clock. If that
is not possible, the reference clock will be used.
- The behavior of bus-off recovery is configurable:
To comply with the usual handling of Socket-CAN bus-off recovery,
"recovery on request" is selected (instead of automatic recovery).
Note that only MPC5121 Rev. 2 and later is supported.
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The start_xmit function of the MSCAN Driver did return improperly if
the CAN dlc check failed (skb not freed and invalid return code). This
patch adds a proper check of the frame lenght and data size and returns
now correctly. The invalid skb packets are dropped silently as suggested
by David Miller in the thread "[RFC] ndo_validate_skb: Let the netdev
check a valid skb content" on the netdev mailing list.
Furthermore, a typo has been fixed.
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
General cleanup of the ep93xx_eth driver.
1) Use pr_fmt() to prefix the module name and __func__ to the error
messages.
2) <linux/io.h> instead of <asm/io.h>
3) <mach/hardware.h> instead of <mach/ep93xx-regs.h> and <mach/platform.h>
4) Move the ep93xx_mdio_read (and ep93xx_mdio_write) function to eliminate
the function prototype.
5) Change all the printk(<level> messages to pr_<level> and remove the
__func__ argument.
6) Use platform_get_{resource/irq} to get the platform resources and add
an error check.
7) Use resource_size() for request_mem_region() and ioremap().
8) Use %pM to print the MAC address at the end of the probe.
9) Use dev->dev_addr not data->dev_addr for the MAC argument because a
random address could be used if the platform does not supply one.
The message at the end of the probe is left as a printk since it displays
cleaner without the function name that would be displayed with pr_info().
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Lennert Buytenhek <kernel@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>