Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().
__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).
Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.
All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
handles unconnected UDP datagram sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP transmit path has been running under the socket lock
for a long time because of the corking feature. This means that
transmitting to the same socket in multiple threads does not
scale at all.
However, as most users don't actually use corking, the locking
can be removed in the common case.
This patch creates a lockless fast path where corking is not used.
Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits. In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.
As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts UDP to use the new ip_finish_skb API. This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the helper ip_make_skb which is like ip_append_data
and ip_push_pending_frames all rolled into one, except that it does
not send the skb produced. The sending part is carried out by
ip_send_skb, which the transport protocol can call after it has
tweaked the skb.
It is meant to be called in cases where corking is not used should
have a one-to-one correspondence to sendmsg.
This patch also adds the helper ip_finish_skb which is meant to
be replace ip_push_pending_frames when corking is required.
Previously the protocol stack would peek at the socket write
queue and add its header to the first packet. With ip_finish_skb,
the protocol stack can directly operate on the final skb instead,
just like the non-corking case with ip_make_skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow simultaneous calls to ip_append_data on the same
socket, it must not modify any shared state in sk or inet (other
than those that are designed to allow that such as atomic counters).
This patch abstracts out write references to sk and inet_sk in
ip_append_data and its friends so that we may use the underlying
code in parallel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless. It can't use them anyway since the whole
point of UFO is to use the original pages without copying.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling TX timestamps (SO_TIMESTAMPING) for IPv6 UDP packets, in
the same fashion as for IPv4. Necessary in order for NICs such as
Intel 82580 to timestamp IPv6 packets.
Signed-off-by: Anders Berggren <anders@halon.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil pointed out that we can't send ARP reply on behalf of slaves,
we need to move the arp queue to their bond device.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V4: rebase to net-next-2.6
This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This actually pointed out a (seemingly known) bug where we mangle the
pfkey header in a potentially shared SKB, which is fixed here.
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_unicast() return value is not of interest, we can silently ignore
it, save some instructions and four byte on the stack.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
hash is declared and assigned but not used anymore. ipv6_addr_hash()
exhibit no side-effects.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declaration and assignment of newdp is removed. Usage of dccp_sk()
exhibit no side effects.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Do not fail if the peer supports more or less than 3 algorithms.
* Ignore unknown congestion control algorithms instead of failing.
* Simplify congestion algorithm negotiation (largest is best).
* Do not use a static buffer.
* Fix off-by-two read overflow.
* Avoid extra memory copy (in addition to skb_copy_bits()).
The previous code really made no sense.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_state already contains the pipe state.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Better document choke skb->cb[] use, like we did in netem and sfb
This adds a compile time check to make sure we dont exhaust skb->cb[]
space.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of debug message that are not useful, and enable
the log messages in case of error.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a patch originated with Stefano Salsano and Fabio Ludovici.
It provides several alternative loss models for use with netem.
This patch adds two state machine based loss models.
See: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many users have wanted the old functionality that was lost
to be able to use pfifo as inner qdisc for netem. The reason that
netem could not be classful with the older API was because of the
limitations of the old dequeue/requeue interface; now that qdisc API has
a peek function, there is no longer a problem with using any
inner qdisc's.
This reverts commit 0220146411.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than magic constant in code, expose the maximum size of
packet distribution table in API. In iproute2, q_netem defines
MAX_DIST as 16K already.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netem probability table can be large (up to 64K bytes)
which may be too large to allocate in one contiguous chunk.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nla_put_nested to update netlink attribute value.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.
Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.
Rewrite ip_route_newports() such that:
1) The caller passes in the original port values, so we don't need
to use the rth->fl.fl_ip_{s,d}port values to remember them.
2) The lookup flow is constructed by hand instead of being copied
from the routing cache entry's flow.
Signed-off-by: David S. Miller <davem@davemloft.net>