The lock_kernel/unlock_kernel() in cycle_kernel_lock() which is called
in tun_chr_open() is not serializing against anything and safe to
remove.
tun_chr_fasync() is serialized by get/put_tun() and fasync_helper()
has no dependency on BKL. The modification of tun->flags is racy with
and without the BKL so removing it does not make it worse.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 2b980dbd77
("lsm: Add hooks to the TUN driver") tun_set_iff doesn't
return -EINVAL though neither IFF_TUN nor IFF_TAP is set.
Signed-off-by: Kusanagi Kouichi <slash@ma.neweb.ne.jp>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.
This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
As tun always has an embeedded struct sock,
use sk and sk_receive_queue fields instead of
duplicating them in tun_struct.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are all drivers that don't touch real hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TUN driver lacks any LSM hooks which makes it difficult for LSM modules,
such as SELinux, to enforce access controls on network traffic generated by
TUN users; this is particularly problematic for virtualization apps such as
QEMU and KVM. This patch adds three new LSM hooks designed to control the
creation and attachment of TUN devices, the hooks are:
* security_tun_dev_create()
Provides access control for the creation of new TUN devices
* security_tun_dev_post_create()
Provides the ability to create the necessary socket LSM state for newly
created TUN devices
* security_tun_dev_attach()
Provides access control for attaching to existing, persistent TUN devices
and the ability to update the TUN device's socket LSM state as necessary
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Eric Paris <eparis@parisplace.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Morris <jmorris@namei.org>
As it is, parts of the ioctl runs under the RTNL and parts of
it do not. The unlocked section is still protected by the BKL,
but there can be subtle races. For example, Eric Biederman and
Paul Moore observed that if two threads tried to create two tun
devices on the same file descriptor, then unexpected results
may occur.
As there isn't anything in the ioctl that is expected to sleep
indefinitely, we can prevent this from occurring by extending
the RTNL lock coverage.
This also allows to get rid of the BKL.
Finally, I changed tun_get_iff to take a tun device in order to
avoid calling tun_put which would dead-lock as it also tries to
take the RTNL lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Allow setting UFO on tap device and handle UFO packets.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
---------------------------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.
Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is currently possible for an asynchronous device unregister
to cause the same tun device to be unregistered twice. This
is because the unregister in tun_chr_close only checks whether
__tun_get(tfile) != NULL. This however has nothing to do with
whether the device has already been unregistered. All it tells
you is whether __tun_detach has been called.
This patch fixes this by using the most obvious thing to test
whether the device has been unregistered.
It also moves __tun_detach outside of rtnl_unlock since nothing
that it does requires that lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for misc devices to report their requested nodename to
userspace. It also updates a number of misc drivers to provide the
needed subdirectory and device name to be used for them.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It is possible for tun_chr_close to race with dellink on the
a tun device. In which case if __tun_get runs before dellink
but dellink runs before tun_chr_close calls unregister_netdevice
we will attempt to unregister the netdevice after it is already
gone.
The two cases are already serialized on the rtnl_lock, so I have
gone for the cheap simple fix of moving rtnl_lock to cover __tun_get
in tun_chr_close. Eliminating the possibility of the tun device
being unregistered between __tun_get and unregister_netdevice in
tun_chr_close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Tested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the right structure while incrementing the offset in tun_get_user.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As all current versions of virtio_net generate a value for the
header length that's too small, we should optimise this so that
we don't copy it twice. This can be done by ensuring that it is
at least as large as the place where we'll write the checksum.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I added socket accounting to tun I inadvertently introduced
spurious wake-up events that kills qemu performance. The problem
occurs when qemu polls on the tun fd for read, and then transmits
packets. For each packet transmitted, we will wake up qemu even
if it only cares about read events.
Now this affects all sockets, but it is only a new problem for
tun. So this patch tries to fix it for tun first and we can then
look at the problem in general.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds three attribute files in /sys/class/net/$dev/ for tun
devices; allowing userspace to obtain the information which TUNGETIFF
offers, and more, but without having to attach to the device in question
(which may not be possible if it's in use).
It also fixes a bug which has been present in the TUNGETIFF ioctl since
its inception, where it would never set IFF_TUN or IFF_TAP according to
the device type. (Look carefully at the code which I remove from
tun_get_iff() and how the new tun_flags() helper is subtly different).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a certain types of VPN, NetworkManager will first attempt
to find an available tun device by iterating through 'vpn%d' until it
finds one that isn't already busy. Then it'll set that to be persistent
and owned by the otherwise unprivileged user that the VPN dæmon itself
runs as.
There's a race condition here -- during the period where the vpn%d
device is created and we're waiting for the VPN dæmon to actually
connect and use it, if we try to create _another_ device we could end up
re-using the same one -- because trying to open it again doesn't get
-EBUSY as it would while it's _actually_ busy.
So solve this, we add an IFF_TUN_EXCL flag which causes tun_set_iff() to
fail if it would be opening an existing persistent tundevice -- so that
we can make sure we're getting an entirely _new_ device.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to send packets to a tun device fail with weird errors such as EINVAL.
Since tun is the only user of skb_copy_datagram_from_iovec, we can
fix this simply by changing the later so that it does not
touch the iovec passed to it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aio_read gets const struct iovec * but tun_chr_aio_read casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to get packets from a tun device fail with weird errors such as EINVAL.
Fix by using the new skb_copy_datagram_const_iovec.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the sk_sleep wait queue actually lives in tfile, which may be
detached from the tun device, bad things will happen when we use
sk_sleep after detaching.
Since the tun device is the persistent data structure here (when
requested by the user), it makes much more sense to have the wait
queue live there. There is no reason to have it in tfile at all
since the only time we can wait is if we have a tun attached.
In fact we already have a wait queue in tun_struct, so we might
as well use it.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit c70f182940 ("tun: Fix
races between tun_net_close and free_netdev") fixed a race where
an asynchronous deletion of a tun device can hose a poll(2) on
a tun fd attached to that device.
However, this came at the cost of moving the tun wait queue into
the tun file data structure. The problem with this is that it
imposes restrictions on when and where the tun device can access
the wait queue since the tun file may change at any time due to
detaching and reattaching.
In particular, now that we need to use the wait queue on the
receive path it becomes difficult to properly synchronise this
with the detachment of the tun device.
This patch solves the original race in a different way. Since
the race is only because the underlying memory gets freed, we
can prevent it simply by ensuring that we don't do that until
all tun descriptors ever attached to the device (even if they
have since be detached because they may still be sitting in poll)
have been closed.
This is done by using reference counting the attached tun file
descriptors. The refcount in tun->sk has been reappropriated
for this purpose since it was already being used for that, albeit
from the opposite angle.
Note that we no longer zero tfile->tun since tun_get will return
NULL anyway after the refcount on tfile hits zero. Instead it
represents whether this device has ever been attached to a device.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I made the tun driver use non-linear packets as the preferred
option, it broke non-GSO users because they would end up allocating
a completely non-linear packet, which triggers a crash when we call
eth_type_trans.
This patch reverts non-GSO users to using linear packets and adds
a check to ensure that GSO users can't cause crashes in the same
way.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When forward-porting the tun accounting patch I managed to break
the send path compltely by dropping the tun_get call.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tap devices can make use of a small MAC filter set via the
TUNSETTXFILTER ioctl. The filter has a set of exact matches
plus a hash for imperfect filtering of additional multicast
addresses. The current code is unbalanced, adding unicast
addresses to the multicast hash, but only checking the hash
against multicast addresses. This results in the filter
dropping unicast addresses that overflow the exact filter.
The fix is simply to disable the filter by leaving count set
to zero if we find non-multicast addresses after the exact
match table is filled.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike a normal socket path, the tuntap device send path does
not have any accounting. This means that the user-space sender
may be able to pin down arbitrary amounts of kernel memory by
continuing to send data to an end-point that is congested.
Even when this isn't an issue because of limited queueing at
most end points, this can also be a problem because its only
response to congestion is packet loss. That is, when those
local queues at the end-point fills up, the tuntap device will
start wasting system time because it will continue to send
data there which simply gets dropped straight away.
Of course one could argue that everybody should do congestion
control end-to-end, unfortunately there are people in this world
still hooked on UDP, and they don't appear to be going away
anywhere fast. In fact, we've always helped them by performing
accounting in our UDP code, the sole purpose of which is to
provide congestion feedback other than through packet loss.
This patch attempts to apply the same bandaid to the tuntap device.
It creates a pseudo-socket object which is used to account our
packets just as a normal socket does for UDP. Of course things
are a little complex because we're actually reinjecting traffic
back into the stack rather than out of the stack.
The stack complexities however should have been resolved by preceding
patches. So this one can simply start using skb_set_owner_w.
For now the accounting is essentially disabled by default for
backwards compatibility. In particular, we set the cap to INT_MAX.
This is so that existing applications don't get confused by the
sudden arrival EAGAIN errors.
In future we may wish (or be forced to) do this by default.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Tokarev wrote:
[]
> 2, and this is the main one: How about supplementary groups?
>
> Here I have a valid usage case: a group of testers running various
> versions of windows using KVM (kernel virtual machine), 1 at a time,
> to test some software. kvm is set up to use bridge with a tap device
> (there should be a way to connect to the machine). Anyone on that group
> has to be able to start/stop the virtual machines.
>
> My first attempt - pretty obvious when I saw -g option of tunctl - is
> to add group ownership for the tun device and add a supplementary group
> to each user (their primary group should be different). But that fails,
> since kernel only checks for egid, not any other group ids.
>
> What's the reasoning to not allow supplementary groups and to only check
> for egid?
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This greatly simplifies testing to verify I have fixed the problems
with a tun device disappearing when the tun file descriptor is still
held open.
Further it allows removal network namespace operations for the tun
driver. Reducing the network namespace handling in the driver to the
minimum. i.e. When we are creating a tun device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the awkward case between free_netdev and dev_chr_close fixed
there is no longer any need to limit tun and tap devices to the
network namespace they were created in. So remove the
NETIF_F_NETNS_LOCAL flag on the network device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tun code does not cope gracefully if the network device goes away before
the tun file descriptor is closed. It looks like we can trigger this with
rmmod, and moving tun devices between network namespaces will allow this
to be triggered when network namespaces exit.
To fix this I introduce an intermediate data structure tun_file which
holds a count of users and a pointer to the struct tun_struct. tun_get
increments that reference count if it is greater than 0. tun_put decrements
that reference count and detaches from the network device if the count is 0.
While we have a file attached to the network device I hold a reference
to the network device keeping it from going away completely.
When a network device is unregistered I decrement the count of the
attached tun_file and if that was the last user I detach the tun_file,
and all processes on read_wait are woken up to ensure they do not
sleep indefinitely. As some of those sleeps happen with the count on
the tun device elevated waking up the read waiters ensures that
tun_file will be detached in a timely manner.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The poll interface requires that the waitqueue exist while the struct
file is open. In the rare case when a tun device disappears before
the tun file closes we fail to provide this property, so move
read_wait.
This is safe now that tun_net_xmit is atomic with tun_detach.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this small race allows for a packet to be received when we
detach from an tun device and still be enqueued. Not especially
important but not what the code is trying to do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grabbing namespaces in open, and putting them in close always seems to
be the cleanest approach with the fewest surprises.
So now that we have tun_file so we have somepleace to put the network
namespace, let's grab the network namespace on file open and put on
file close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun code suffers from only having a single word of
data that exists for the entire life of the tun file descriptor.
This results in peculiar holding of references to the network namespace
as well as races between free_netdevice and tun_chr_close.
Fix this by introducing tun_file which will hold the per file state.
For the moment it still holds just a single word so the differences
are all logic changes with no changes in semantics.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EBADF is meaningless in the context of a poll mask so use POLLERR
instead.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for two different tasks with access to the same file
descriptor to call tun_set_iff on it at the same time and race to
attach to a tap device. Prevent this by placing all of the logic to
attach to a file descriptor in one function and testing the file
descriptor to be certain it is not already attached to another tun
device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun driver keeps a private list of tun devices for what
appears to be a small gain in performance when reconnecting a file
descriptor to an existing tun or tap device. So simplify the code by
removing it.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_pernet_gen_device() expects 'int*', found via sparse.
CHECK drivers/net/tun.c
drivers/net/tun.c:1245:36: warning: incorrect type in argument 1 (different signedness)
drivers/net/tun.c:1245:36: expected int *id
drivers/net/tun.c:1245:36: got unsigned int static [toplevel] *<noident>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the TUN/TAP tunnel driver to net_device_ops.
Split the ops in two, and retain compatability.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap current->cred and a few other accessors to hide their actual
implementation.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>