This change moves some code handling data sequence numbers into a
separate function to avoid too much indentation. This is to prepare
for some changes to data sequence number handling in subsequent
patches.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macro get_unused_fd() is used to allocate a file descriptor with
default flags. Those default flags (0) can be "unsafe":
O_CLOEXEC must be used by default to not leak file descriptor
across exec().
Instead of macro get_unused_fd(), functions anon_inode_getfd()
or get_unused_fd_flags() should be used with flags given by userspace.
If not possible, flags should be set to O_CLOEXEC to provide userspace
with a default safe behavor.
In a further patch, get_unused_fd() will be removed so that
new code start using anon_inode_getfd() or get_unused_fd_flags()
with correct flags.
This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.
The hard coded flag value (0) should be reviewed on a per-subsystem basis,
and, if possible, set to O_CLOEXEC.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_forward_skb() assignment of pkt_type should be done
after the call to eth_type_trans().
ip-encapsulated packets can be handled by localhost. But skb->pkt_type
can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
In that case, the packet is dropped by ip_rcv().
Although this example uses gretap. l2tp-eth also has same issue.
For l2tp-eth case, add dummy device for ip address and ip l2tp command.
netns A | root netns | netns B
veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth
arp packet ->
pkt_type
BROADCAST------------>ip_rcv()------------------------>
<- arp reply
pkt_type
ip_rcv()<-----------------OTHERHOST
drop
sample operations
ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
ip link set tapa up
ip link set tapb up
ip address add 172.17.107.3 dev tapa
ip address add 172.17.107.4 dev tapb
ip route get 172.17.107.3
> local 172.17.107.3 dev lo src 172.17.107.3
> cache <local>
ip route get 172.17.107.4
> local 172.17.107.4 dev lo src 172.17.107.4
> cache <local>
ip link add vetha type veth peer name vetha-peer
ip link add vethb type veth peer name vethb-peer
brctl addbr bra
brctl addbr brb
brctl addif bra tapa
brctl addif bra vetha-peer
brctl addif brb tapb
brctl addif brb vethb-peer
brctl show
> bridge name bridge id STP enabled interfaces
> bra 8000.6ea21e758ff1 no tapa
> vetha-peer
> brb 8000.420020eb92d5 no tapb
> vethb-peer
ip link set vetha-peer up
ip link set vethb-peer up
ip link set bra up
ip link set brb up
ip netns add a
ip netns add b
ip link set vetha netns a
ip link set vethb netns b
ip netns exec a ip address add 10.0.0.3/24 dev vetha
ip netns exec b ip address add 10.0.0.4/24 dev vethb
ip netns exec a ip link set vetha up
ip netns exec b ip link set vethb up
ip netns exec a arping -I vetha 10.0.0.4
ARPING 10.0.0.4 from 10.0.0.3 vetha
^CSent 2 probes (2 broadcast(s))
Received 0 response(s)
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Hong Zhiguo <honkiko@gmail.com>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Jesse Gross <jesse@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a regression introduced by
commit fd58156e45 (IPIP: Use ip-tunneling code.)
Similar to GRE tunnel, previously we only check the parameters
for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
check is moved for all commands.
So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
Also, the check for i_key, o_key etc. is suspicious too,
which did not exist before, reset them before passing
to ip_tunnel_ioctl().
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing .owner of struct pppox_proto. This prevents the
module from being removed from underneath its users.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the patch "bnx2x: remove zeroing of dump data buffer" showed,
it is too easy implement .get_dump_data incorrectly in a driver.
Let's make sure drivers cannot get confused by userspace requesting
a too big dump.
Also WARN if the driver sets dump->len to something weird and make
sure the length reported to userspace is the actual length of data
copied to userspace.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
After having reworked the debugging framework, Neil and Vlad agreed to
get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:
We can use systemtap scripts to investigate these things, we now have
pr_debug() helpers that make life easier, and if we really need anything
else besides those tools, we will be forced to come up with something
better than we have there. Therefore, get rid of this ifdef debugging
code entirely for now.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
and in vti_tunnel_init(), this lead to a memory leak of
dev->tstats.
Just remove the duplicated operations in vti_fb_tunnel_init().
(candidate for -stable)
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing GRE tunnel, I got:
# ip tunnel show
get tunnel gre0 failed: Invalid argument
get tunnel gre1 failed: Invalid argument
This is a regression introduced by commit c544193214
("GRE: Refactor GRE tunneling code.") because previously we
only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
after that commit, the check is moved for all commands.
So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
After this patch I got:
# ip tunnel show
gre0: gre/ip remote any local any ttl inherit nopmtudisc
gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inherit
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should get rid of all own SCTP debug printk macros and use the ones
that the kernel offers anyway instead. This makes the code more readable
and conform to the kernel code, and offers all the features of dynamic
debbuging that pr_debug() et al has, such as only turning on/off portions
of debug messages at runtime through debugfs. The runtime cost of having
CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
is negligible [1]. If kernel debugging is completly turned off, then these
statements will also compile into "empty" functions.
While we're at it, we also need to change the Kconfig option as it /now/
only refers to the ifdef'ed code portions in outqueue.c that enable further
debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
was enabled with this Kconfig option and has now been removed, we
transform those code parts into WARNs resp. where appropriate BUG_ONs so
that those bugs can be more easily detected as probably not many people
have SCTP debugging permanently turned on.
To turn on all SCTP debugging, the following steps are needed:
# mount -t debugfs none /sys/kernel/debug
# echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
This can be done more fine-grained on a per file, per line basis and others
as described in [2].
[1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
[2] Documentation/dynamic-debug-howto.txt
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
Here is current updates for vxlan in net-next. It includes Mike's changes
to handle multiple destinations and lots of little cosmetic stuff.
This is a fresh vxlan-next repository which was forked from net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Two of the x25 ioctl cases have error paths that break out of the function without
unlocking the socket, leading to this warning:
================================================
[ BUG: lock held when returning to user space! ]
3.10.0-rc7+ #36 Not tainted
------------------------------------------------
trinity-child2/31407 is leaving the kernel with locks still held!
1 lock held by trinity-child2/31407:
#0: (sk_lock-AF_X25){+.+.+.}, at: [<ffffffffa024b6da>] x25_ioctl+0x8a/0x740 [x25]
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following typical setup to implement a ~100 ms RTT and big
amount of reorders has very poor performance because netem
implements the time queue using a linked list.
-----------------------------------------------------------
ETH=eth0
IFB=ifb0
modprobe ifb
ip link set dev $IFB up
tc qdisc add dev $ETH ingress 2>/dev/null
tc filter add dev $ETH parent ffff: \
protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
redirect dev $IFB
ethtool -K $ETH gro off tso off gso off
tc qdisc add dev $IFB root netem delay 50ms 10ms limit 100000
tc qd add dev $ETH root netem delay 50ms limit 100000
---------------------------------------------------------
Switch netem time queue to a rb tree, so this kind of setup can work at
high speed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race in neighbour code, because neigh_destroy() uses
skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
while other parts of the code assume neighbour rwlock is what
protects arp_queue
Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
Use __skb_queue_head_init() instead of skb_queue_head_init()
to make clear we do not use arp_queue.lock
And hold neigh->lock in neigh_destroy() to close the race.
Reported-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to skip ECMP lookup when oif is specified, but this implies
to check oif given by user when selecting another route.
When the new route does not match oif requirement, we simply keep the initial
one.
Spotted-by: dingzhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of commit 218774dc34 ("ipv6: add
anti-spoofing checks for 6to4 and 6rd") the sit driver dropped packets
for 2002::/16 destinations and sources even when configured to work as a
tunnel with fixed endpoint. We may only apply the 6rd/6to4 anti-spoofing
checks if the device is not in pointopoint mode.
This was an oversight from me in the above commit, sorry. Thanks to
Roman Mamedov for reporting this!
Reported-by: Roman Mamedov <rm@romanrm.ru>
Cc: David Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Yet one more pull request for wireless updates intended for 3.11...
For the mac80211 bits, Johannes says:
"Here we have a few memory leak fixes related to BSS struct handling
mostly from Ben, including a fix for a more theoretical problem
(associating while a BSS struct times out) from myself, a compilation
warning fix from Arend, mesh fixes from Thomas, tracking the beacon
bitrate (Alex), a bandwidth change event fix (Ilan) and some initial
work for 5/10 MHz channels from Simon."
Regarding the iwlwifi bits, Johannes says:
"Emmanuel removed some unneeded/unsupported module parameters and adds a
Bluetooth 1x1 lookup-table for some upcoming products. From Alex I have
an older patch to add low-power receive support, this depended on a
mac80211 commit that only just came in with the merge from wireless-next
I did. Ilan made beacon timings better, and Eytan added some debug
statements for thermal throttling. I have a few cleanups, a fix for a
long-standing but rare warning, and, arguably the most important patch
here, the firmware API version bump for the 7260/3160 devices."
Also included is a Bluetooth pull -- Gustavo says:
"Here goes a set of patches to 3.11. The biggest work here is from Andre Guedes
on the move of the Discovery to use the new request framework. Other than that
Johan provided a bunch of fixes to the L2CAP code. The rest are just small
fixes and clean ups."
On top of all that, there are a variety of updates and fixes to
brcmfmac, rt2x00, wil6210, ath9k, ath10k, and a few others here and
there. This also includes a pull of the wireless tree, in order to
prevent some merge conflicts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Openvswitch uses function from NET_IPGRE_DEMUX module.
Add Kconfig dependency to fix following compilation errors:
http://marc.info/?l=linux-netdev&m=137244035226634
CC: Jesse Gross <jesse@nicira.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pravin Shelar <pshelar@nicira.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following batch contains Netfilter/IPVS updates for net-next,
they are:
* Enforce policy to several nfnetlink subsystem, from Daniel
Borkmann.
* Use xt_socket to match the third packet (to perform simplistic
socket-based stateful filtering), from Eric Dumazet.
* Avoid large timeout for picked up from the middle TCP flows,
from Florian Westphal.
* Exclude IPVS from struct net if IPVS is disabled and removal
of unnecessary included header file, from JunweiZhang.
* Release SCTP connection immediately under load, to mimic current
TCP behaviour, from Julian Anastasov.
* Replace and enhance SCTP state machine, from Julian Anastasov.
* Add tweak to reduce sync traffic in the presence of persistence,
also from Julian Anastasov.
* Add tweak for the IPVS SH scheduler not to reject connections
directed to a server, choose a new one instead, from Alexander
Frolkin.
* Add support for sloppy TCP and SCTP modes, that creates state
information on any packet, not only initial handshake packets,
from Alexander Frolkin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The common case is that TCP/IP checksums have already been
verified, e.g. by hardware (rx checksum offload), or conntrack.
Userspace can use this flag to determine when the checksum
has not been validated yet.
If the flag is set, this doesn't necessarily mean that the packet has
an invalid checksum, e.g. if NIC doesn't support rx checksum.
Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
infer that IP/TCP checksum has already been validated if either the
SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
flag is unset.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".
However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.
It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.
So for the time being, cache separate copies of input routes for
each next hop exception.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC3590/RFC3810 specifies we should resend MLD reports as soon as a
valid link-local address is available.
We now use the valid_ll_addr_cnt to check if it is necessary to resend
a new report.
Changes since Flavio Leitner's version:
a) adapt for valid_ll_addr_cnt
b) resend first reports directly in the path and just arm the timer for
mc_qrv-1 resends.
Reported-by: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reduce the number of unnecessary router solicitations, MLDv2 and IGMPv3
messages we need to track the number of valid (as in non-optimistic,
no-dad-failed and non-tentative) link-local addresses. Therefore, this
patch implements a valid_ll_addr_cnt in struct inet6_dev.
We now only emit router solicitations if the first link-local address
finishes duplicate address detection.
The changes for MLDv2 and IGMPv3 are in a follow-up patch.
While there, also simplify one if statement(one minor nit I made in one
of my previous patches):
if (!...)
do();
else
return;
<<into>>
if (...)
return;
do();
Cc: Flavio Leitner <fbl@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Suggested-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since (c05cdb1 netlink: allow large data transfers from user-space),
netlink splats if it invokes skb_clone on large netlink skbs since:
* skb_shared_info was not correctly initialized.
* skb->destructor is not set in the cloned skb.
This was spotted by trinity:
[ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
[ 894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
[...]
[ 894.991034] Call Trace:
[ 894.991034] [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
[ 894.991034] [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
[ 894.991034] [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
[ 894.991034] [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0
Fix it by:
1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
that sets our special skb->destructor in the cloned skb. Moreover, handle
the release of the large cloned skb head area in the destructor path.
2) not allowing large skbuffs in the netlink broadcast path. I cannot find
any reasonable use of the large data transfer using netlink in that path,
moreover this helps to skip extra skb_clone handling.
I found two more netlink clients that are cloning the skbs, but they are
not in the sendmsg path. Therefore, the sole client cloning that I found
seems to be the fib frontend.
Thanks to Eric Dumazet for helping to address this issue.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this new function is to perform all needed cleanup before sending
an skb into another netns.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new tokenized address gets installed we send out just one
router solicition. We should send out `rtr_solicits' in case one router
advertisment got lost.
So, rearm the timer as we do in addrconf_dad_complete.
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 32b8a8e59c "sit: add IPv4 over IPv4 support",
tunnel->parms.iph.protocol is 0 when both 4in4 and 6in4 are setup, but
xfrm_lookup() is called only when proto is != 0, thus we need to pass the real
value.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
Just one patch this time.
1) Drop packets when the matching SA is in larval state and add a
statistic counter for that. From Fan Du.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
By default the SH scheduler rejects connections that are hashed onto a
realserver of weight 0. This patch adds a flag to make SH choose a
different realserver in this case, instead of rejecting the connection.
The patch also adds a flag to make SH include the source port (TCP, UDP,
SCTP) in the hash as well as the source address. This basically allows
for deterministic round-robin load balancing (i.e., where any director
in a cluster of directors with identical config will send the same
packet the same way).
The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
can be set per service. They are set using a new option to ipvsadm.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Drop SCTP connections under load (dropentry context) depending
on the protocol state, just like for TCP: INIT conns are
dropped immediately, established are dropped randomly while
connections in progress or shutdown are skipped.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.
With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.
The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
This adds support for sloppy TCP and SCTP modes to IPVS.
When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).
This allows connections to fail over from one IPVS director to another
mid-flight.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Before now the schedulers needed access only to IP
addresses and it was easy to get them from skb by
using ip_vs_fill_iph_addr_only.
New changes for the SH scheduler will need the protocol
and ports which is difficult to get from skb for the
IPv6 case. As we have all the data in the iph structure,
to avoid the same slow lookups provide the iph to schedulers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
The check for all-zero ether address was removed from rtnetlink core,
since Vxlan uses all-zero ether address to signify default address.
Need to add check back in for bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
select/poll busy-poll support.
Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt
Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.
When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.
Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.
Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A trailing newline has been forgotten to add into the WARN().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.
We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.
So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.
I don't see any reason why we should shutdown ipv6 on that interface in
such cases.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.
The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.
If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.
Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>