sit.c:118: ERROR: "foo * bar" should be "foo *bar"
sit.c:694: ERROR: "(foo*)" should be "(foo *)"
sit.c:724: ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
In IPv4, if an RTA_IIF attribute is specified within an RTM_GETROUTE
message, then a route is searched as if a packet was received on the
specified 'iif' interface.
However in IPv6, RTA_IIF is not interpreted in the same way:
'inet6_rtm_getroute()' always calls 'ip6_route_output()', regardless the
RTA_IIF attribute.
As a result, in IPv6 there's no way to use RTM_GETROUTE in order to look
for a route as if a packet was received on a specific interface.
Fix 'inet6_rtm_getroute()' so that RTA_IIF is interpreted as "lookup a
route as if a packet was received on the specified interface", similar
to IPv4's 'inet_rtm_getroute()' interpretation.
Reported-by: Ami Koren <amikoren@yahoo.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
/5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
NypQthI85pc=
=G9mT
-----END PGP SIGNATURE-----
Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system
Pull "Disintegrate and delete asm/system.h" from David Howells:
"Here are a bunch of patches to disintegrate asm/system.h into a set of
separate bits to relieve the problem of circular inclusion
dependencies.
I've built all the working defconfigs from all the arches that I can
and made sure that they don't break.
The reason for these patches is that I recently encountered a circular
dependency problem that came about when I produced some patches to
optimise get_order() by rewriting it to use ilog2().
This uses bitops - and on the SH arch asm/bitops.h drags in
asm-generic/get_order.h by a circuituous route involving asm/system.h.
The main difficulty seems to be asm/system.h. It holds a number of
low level bits with no/few dependencies that are commonly used (eg.
memory barriers) and a number of bits with more dependencies that
aren't used in many places (eg. switch_to()).
These patches break asm/system.h up into the following core pieces:
(1) asm/barrier.h
Move memory barriers here. This already done for MIPS and Alpha.
(2) asm/switch_to.h
Move switch_to() and related stuff here.
(3) asm/exec.h
Move arch_align_stack() here. Other process execution related bits
could perhaps go here from asm/processor.h.
(4) asm/cmpxchg.h
Move xchg() and cmpxchg() here as they're full word atomic ops and
frequently used by atomic_xchg() and atomic_cmpxchg().
(5) asm/bug.h
Move die() and related bits.
(6) asm/auxvec.h
Move AT_VECTOR_SIZE_ARCH here.
Other arch headers are created as needed on a per-arch basis."
Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that. We'll find out anything that got broken and fix it..
* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
Delete all instances of asm/system.h
Remove all #inclusions of asm/system.h
Add #includes needed to permit the removal of asm/system.h
Move all declarations of free_initmem() to linux/mm.h
Disintegrate asm/system.h for OpenRISC
Split arch_align_stack() out from asm-generic/system.h
Split the switch_to() wrapper out of asm-generic/system.h
Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
Create asm-generic/barrier.h
Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
Disintegrate asm/system.h for Xtensa
Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
Disintegrate asm/system.h for Tile
Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for SH
Disintegrate asm/system.h for Score
Disintegrate asm/system.h for S390
Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PA-RISC
Disintegrate asm/system.h for MN10300
...
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
Commit f2c31e32b3 (net: fix NULL dereferences in check_peer_redir() )
added a regression in rt6_fill_node(), leading to rcu_read_lock()
imbalance.
Thats because NLA_PUT() can make a jump to nla_put_failure label.
Fix this by using nla_put()
Many thanks to Ben Greear for his help
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It used to be an int, and it got changed to a bool parameter at least
7 years ago. It happens that NF_ACCEPT and NF_DROP are 0 and 1, so
this works, but it's unclear, and the check that it's in range is not
required.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 299b0767(ipv6: Fix IPsec slowpath fragmentation problem)
In func ip6_append_data,after call skb_put(skb, fraglen + dst_exthdrlen)
the skb->len contains dst_exthdrlen,and we don't reduce dst_exthdrlen at last
This will make fraggap>0 in next "while cycle",and cause the size of skb incorrent
Fix this by reserve headroom for dst_exthdrlen.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in 96b52e61be (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()
Many thanks to Dave Jones for providing a very complete bug report.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit d6ddef9e641d(IPv6: Fix not join all-router mcast group
when forwarding set.) I check 'dev' after it's dereference that
leads to a Smatch complaint:
net/ipv6/addrconf.c:438 ipv6_add_dev()
warn: variable dereferenced before check 'dev' (see line 432)
net/ipv6/addrconf.c
431 /* protected by rtnl_lock */
432 rcu_assign_pointer(dev->ip6_ptr, ndev);
^^^^^^^^^^^^
Old dereference.
433
434 /* Join all-node multicast group */
435 ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
436
437 /* Join all-router multicast group if forwarding is set
*/
438 if (ndev->cnf.forwarding && dev && (dev->flags &
IFF_MULTICAST))
^^^
Remove the check to avoid the complaint as 'dev' can't be NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the infrastructure to add fine timeout tuning
over nfnetlink. Now you can use the NFNL_SUBSYS_CTNETLINK_TIMEOUT
subsystem to create/delete/dump timeout objects that contain some
specific timeout policy for one flow.
The follow up patches will allow you attach timeout policy object
to conntrack via the CT target and the conntrack extension
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch defines a new interface for l4 protocol trackers:
unsigned int *(*get_timeouts)(struct net *net);
that is used to return the array of unsigned int that contains
the timeouts that will be applied for this flow. This is passed
to the l4proto->new(...) and l4proto->packet(...) functions to
specify the timeout policy.
This interface allows per-net global timeout configuration
(although only DCCP supports this by now) and it will allow
custom custom timeout configuration by means of follow-up
patches.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ipt_LOG and ip6_LOG have a lot of common code, merge them
to reduce duplicate code.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/sfc/rx.c
Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)
Before copying mac header, better make sure it is present.
Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809
Reported-by: Niccolò Belli <darkbasic@linuxsystems.it>
Tested-by: Niccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_route_output() never returns NULL, so it is wrong to
check if the return value is NULL.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is only considered for MSG_PEEK flag and the value pointed by
it specifies where to start peeking bytes from. If the offset happens to
point into the middle of the returned skb, the offset within this skb is
put back to this very argument.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is not easily possible to get TOS/DSCP value of packets from
an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
not actually implemented for TOS, though.
This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
(IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
the TOS field is stored from the other party's ACK.
This is needed for proxies which require DSCP transparency. One such example
is at http://zph.bratcheda.org/.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement helper inline function to get traffic class from IPv6 header.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPV6_UNICAST_IF feature is the IPv6 compliment to IP_UNICAST_IF.
Signed-off-by: Erich E. Hoover <ehoover@mines.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It went from unused, to commented out, and never changing after
that.
Just get rid of it, if someone wants it they can unearth it from
the history.
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP RST mechanism is broken in TCP md5(RFC2385). When
connection is gone, md5 key is lost, sending RST
without md5 hash is deem to ignored by peer. This can
be a problem since RST help protocal like bgp to fast
recove from peer crash.
In most case, users of tcp md5, such as bgp and ldp,
have listener on both sides to accept connection from peer.
md5 keys for peers are saved in listening socket.
There are two cases in finding md5 key when connection is
lost:
1.Passive receive RST: The message is send to well known port,
tcp will associate it with listner. md5 key is gotten from
listener.
2.Active receive RST (no sock): The message is send to ative
side, there is no socket associated with the message. In this
case, finding listener from source port, then find md5 key from
listener.
we are not loosing sercuriy here:
packet is checked with md5 hash. No RST is generated
if md5 hash doesn't match or no md5 key can be found.
Signed-off-by: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't check for NULL consistently in __xfrm6_output(). If "x" were
NULL here it would lead to an OOPs later. I asked Steffen Klassert
about this and he suggested that we remove the NULL check.
On 10/29/11, Steffen Klassert <steffen.klassert@secunet.com> wrote:
>> net/ipv6/xfrm6_output.c
>> 148
>> 149 if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
>> ^
>
> x can't be null here. It would be a bug if __xfrm6_output() is called
> without a xfrm_state attached to the skb. I think we can just remove
> this null check.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes sure we use appropriate memory barriers before
publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from
tcp_v4_send_reset() without holding socket lock (upcoming patch from
Shawn Lu)
Note we also need to respect rcu grace period before its freeing, since
we can free socket without this grace period thanks to
SLAB_DESTROY_BY_RCU
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to be able to support proper RST messages for TCP MD5 flows, we
need to allow access to MD5 keys without locking listener socket.
This conversion is a nice cleanup, and shrinks size of timewait sockets
by 80 bytes.
IPv6 code reuses generic code found in IPv4 instead of duplicating it.
Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer use md5_add() method from struct tcp_sock_af_ops
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC5722 Section 4 was amended by Errata 3089
Our implementation did the right thing anyway...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's only used to get at neigh->primary_key, which in this context is
always going to be the same as rt->rt6i_gateway.
Signed-off-by: David S. Miller <davem@davemloft.net>
In this specific situation we know we are dealing with a gatewayed route
and therefore rt6i_gateway is not going to be in6addr_any even in future
interpretations.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all code paths grab a local reference to the neigh, so if neigh
is not NULL we unconditionally release it at the end. The old logic
would only release if we didn't have a non-NULL 'rt'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only semantic difference is that we now hold a reference to the
neighbour and thus have to release it.
Signed-off-by: David S. Miller <davem@davemloft.net>
In the future the ipv4/ipv6 route gateway will take on two types
of values:
1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case
the neighbour must be obtained using the destination address in
ipv4/ipv6 header as the lookup key.
2) Everything else, the actual nexthop route address.
So if the gateway is not inaddr-any we use it, otherwise we must use
the packet's destination address.
Signed-off-by: David S. Miller <davem@davemloft.net>
md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.
Signed-off-by: shawnlu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race condition in addrconf_sysctl_forward() and
addrconf_sysctl_disable().
These functions change idev->cnf.forwarding (resp. idev->cnf.disable_ipv6)
and then try to grab the rtnl lock before performing any actions.
If that fails they restore the original value and restart the syscall.
This creates race conditions if ipv6 code tries to access
these parameters, or if multiple instances try to do the same operation.
As an example of the former, if __ipv6_ifa_notify() finds a 0 in
idev->cnf.forwarding when invoked by addrconf_ifdown() it may not free
anycast addresses, ultimately resulting in the net_device not being freed.
This patch reads the user parameters into a temporary location and only
writes the actual parameters when the rtnl lock is acquired.
Tested in 2.6.38.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
tg3: Fix single-vector MSI-X code
openvswitch: Fix multipart datapath dumps.
ipv6: fix per device IP snmp counters
inetpeer: initialize ->redirect_genid in inet_getpeer()
net: fix NULL-deref in WARN() in skb_gso_segment()
net: WARN if skb_checksum_help() is called on skb requiring segmentation
caif: Remove bad WARN_ON in caif_dev
caif: Fix typo in Vendor/Product-ID for CAIF modems
bnx2x: Disable AN KR work-around for BCM57810
bnx2x: Remove AutoGrEEEn for BCM84833
bnx2x: Remove 100Mb force speed for BCM84833
bnx2x: Fix PFC setting on BCM57840
bnx2x: Fix Super-Isolate mode for BCM84833
net: fix some sparse errors
net: kill duplicate included header
net: sh-eth: Fix build error by the value which is not defined
net: Use device model to get driver name in skb_gso_segment()
bridge: BH already disabled in br_fdb_cleanup()
net: move sock_update_memcg outside of CONFIG_INET
mwl8k: Fixing Sparse ENDIAN CHECK warning
...
In commit 4ce3c183fc (snmp: 64bit ipstats_mib for all arches), I forgot
to change the /proc/net/dev_snmp6/xxx output for IP counters.
percpu array is 64bit per counter but the folding still used the 'long'
variant, and output garbage on 32bit arches.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
make C=2 CF="-D__CHECK_ENDIAN__" M=net
And fix flowi4_init_output() prototype for sport
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
capabilities: remove __cap_full_set definition
security: remove the security_netlink_recv hook as it is equivalent to capable()
ptrace: do not audit capability check when outputing /proc/pid/stat
capabilities: remove task_ns_* functions
capabitlies: ns_capable can use the cap helpers rather than lsm call
capabilities: style only - move capable below ns_capable
capabilites: introduce new has_ns_capabilities_noaudit
capabilities: call has_ns_capability from has_capability
capabilities: remove all _real_ interfaces
capabilities: introduce security_capable_noaudit
capabilities: reverse arguments to security_capable
capabilities: remove the task from capable LSM hook entirely
selinux: sparse fix: fix several warnings in the security server cod
selinux: sparse fix: fix warnings in netlink code
selinux: sparse fix: eliminate warnings for selinuxfs
selinux: sparse fix: declare selinux_disable() in security.h
selinux: sparse fix: move selinux_complete_init
selinux: sparse fix: make selinux_secmark_refcount static
SELinux: Fix RCU deref check warning in sel_netport_insert()
Manually fix up a semantic mis-merge wrt security_netlink_recv():
- the interface was removed in commit fd77846152 ("security: remove
the security_netlink_recv hook as it is equivalent to capable()")
- a new user of it appeared in commit a38f7907b9 ("crypto: Add
userspace configuration API")
causing no automatic merge conflict, but Eric Paris pointed out the
issue.
release idev when ip6_neigh_lookup failed in icmp6_dst_alloc
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).
We miss needed barriers, even on x86, when y is not NULL.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received. Today we instead get
the capabilities from the current task. This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.
Signed-off-by: Eric Paris <eparis@redhat.com>
This ensures a linear behaviour when filling /proc/net/if_inet6 thus making
ifconfig run really fast on IPv6 only addresses. In fact, with this patch and
the IPv4 one sent a while ago, ifconfig will run in linear time regardless of
address type.
IPv4 related patch: f04565ddf5
dev: use name hash for dev_seq_ops
...
Some statistics (running ifconfig > /dev/null on a different setup):
iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time
----------------------------------------------------------------
6250 | 0.23 s | 0.13 s | 0.11 s
12500 | 0.62 s | 0.28 s | 0.22 s
25000 | 2.91 s | 0.57 s | 0.46 s
50000 | 11.37 s | 1.21 s | 0.94 s
128000 | 86.78 s | 3.05 s | 2.54 s
Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
route for the interface we're adding an addres to was wrong (see commit
7ffbcecbee). for one, it never triggers, and two,
it was completely wrong to begin with. This test was meant to cover this
section of RFC 4429:
3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration
* (modifies section 5.5) A host MAY choose to configure a new address
as an Optimistic Address. A host that does not know the SLLAO
of its router SHOULD NOT configure a new address as Optimistic.
A router SHOULD NOT configure an Optimistic Address.
This patch should bring us into proper compliance with the above clause. Since
we only add a SLAAC address after we've received a RA which may or may not
contain a source link layer address option, we can pass a pointer to that option
to addrconf_prefix_rcv (which may be null if the option is not present), and
only set the optimistic flag if the option was found in the RA.
Change notes:
(v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
a pointer to make its use more clear as per request from davem.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.
To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
* doing nothing
* reading ipv6 route table via proc
(while :; do cat /proc/net/ipv6_route > /dev/null; done)
* reading ipv6 route table via rtnetlink
(while :; do ip -6 route show table all > /dev/null; done)
* Load the ipv6 route table up with:
* for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
* iperf commands:
* client: iperf -i 1 -V -c <ipv6 addr>
* server: iperf -V -s
* iperf results - 3 runs each (in Mbits/sec)
* nothing: client: 927,927,927 server: 927,927,927
* proc: client: 179,97,96,113 server: 142,112,133
* iproute: client: 928,927,928 server: 927,927,927
lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.
Signed-off-by: Josh Hunt <joshhunt00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some of the rt6_bind_neighbour() call sites, it hasn't hooked
up the rt->dst.dev pointer yet, so we'd deref a NULL pointer when
obtaining dev->ifindex for the neighbour hash function computation.
Just pass the netdevice explicitly in to fix this problem.
Reported-by: Bjarke Istrup Pedersen <gurligebis@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.
And it makes grepping for dst_entry member uses much harder too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, create and use an rt6_bind_neighbour() in net/ipv6/route.c to
consolidate some common logic.
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.
Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller <davem@davemloft.net>
The route we have here is for the address being added to the interface,
ie. for input packet processing.
Therefore using that route to determine whether an output nexthop gateway
is known and resolved doesn't make any sense.
So, simply remove this test, it never triggered anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
RDBG() wasn't even used, and the messages printed by RT6_DEBUG() were
far from useful. Just get rid of all this stuff, we can replace it
with something more suitable if we want.
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/bluetooth/l2cap_core.c
Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.
Signed-off-by: David S. Miller <davem@davemloft.net>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 8e2ec63917 ("ipv6: don't
use inetpeer to store metrics for routes.") the test in rt6_alloc_cow()
for setting the ANYCAST flag is now wrong.
'rt' will always now have a plen of 128, because it is set explicitly
to 128 by ip6_rt_copy.
So to restore the semantics of the test, check the destination prefix
length of 'ort'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't just succeed with a route that has a NULL neighbour attached.
This follows the behavior of addrconf_dst_alloc().
Allowing this kind of route to end up with a NULL neigh attached will
result in packet drops on output until the route is somehow
invalidated, since nothing will meanwhile try to lookup the neigh
again.
A statistic is bumped for the case where we see a neigh-less route on
output, but the resulting packet drop is otherwise silent in nature,
and frankly it's a hard error for this to happen and ipv6 should do
what ipv4 does which is say something in the kernel logs.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not merged with the ipv4 match into xt_rpfilter.c
to avoid ipv6 module dependency issues.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.
Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same fix as 731abb9cb2 for ipip and sit tunnel.
Commit 1c5cae815d removed an explicit call to dev_alloc_name in
ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
will now create a valid name, however the tunnel keeps a copy of the
name in the private parms structure. Fix this by copying the name back
after register_netdevice has successfully returned.
This shows up if you do a simple tunnel add, followed by a tunnel show:
$ sudo ip tunnel add mode ipip remote 10.2.20.211
$ ip tunnel
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit
$ sudo ip tunnel add mode sit remote 10.2.20.212
$ ip tunnel
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16
sit%d: ioctl 89f8 failed: No such device
sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit
Cc: stable@vger.kernel.org
Signed-off-by: Ted Feng <artisdom@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no obvious reason to add a default multicast route for loopback
devices, otherwise there would be a route entry whose dst.error set to
-ENETUNREACH that would blocking all multicast packets.
====================
[ more detailed explanation ]
The problem is that the resulting routing table depends on the sequence
of interface's initialization and in some situation, that would block all
muticast packets. Suppose there are two interfaces on my computer
(lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
table(for multicast) would be
# ip -6 route show | grep ff00::
unreachable ff00::/8 dev lo metric 256 error -101
ff00::/8 dev eth0 metric 256
When sending multicasting packets, routing subsystem will return the first
route entry which with a error set to -101(ENETUNREACH).
I know the kernel will set the default ipv6 address for 'lo' when it is up
and won't set the default multicast route for it, but there is no reason to
stop 'init' program from setting address for 'lo', and that is exactly what
systemd did.
I am sure there is something wrong with kernel or systemd, currently I preferred
kernel caused this problem.
====================
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP diag get_exact handler will require them to find a
socket by provided net, [sd]addr-s, [sd]ports and device.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
like rt6_lookup, but allows caller to pass in flowi6 structure.
Will be used by the upcoming ipv6 netfilter reverse path filter
match.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It's only used in net/ipv6/route.c and the NULL device check is
superfluous for all of the existing call sites.
Just expand the __ndisc_lookup_errno() call at each location.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) x == NULL --> !x
2) x != NULL --> x
3) (x&BIT) --> (x & BIT)
4) (BIT1|BIT2) --> (BIT1 | BIT2)
5) proper argument and struct member alignment
Signed-off-by: David S. Miller <davem@davemloft.net>
While parsing through IPv6 extension headers, fragment headers are
skipped making them invisible to the caller. This reports the
fragment offset of the last header in order to make it possible to
determine whether the packet is fragmented and, if so whether it is
a first or last fragment.
Signed-off-by: Jesse Gross <jesse@nicira.com>
This reverts commit 81d54ec847.
If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated. So we won't compute
the same values as the original code did.
Reported-by: paul bilke <fsmail@conspiracy.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need not to used 'delta' flag when add single-source to interface
filter source list.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
Igor Maravic reported an error caused by jump_label_dec() being called
from IRQ context :
BUG: sleeping function called from invalid context at kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
1 lock held by swapper/0:
#0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340
Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1
Call Trace:
<IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0
[<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370
[<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109a37f>] ? local_clock+0x6f/0x80
[<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0
[<ffffffff81557929>] ? sock_def_write_space+0x59/0x160
[<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
[<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80
[<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50
[<ffffffff81566525>] net_disable_timestamp+0x15/0x20
[<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50
[<ffffffff81557b00>] __sk_free+0x80/0x200
[<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70
[<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
[<ffffffff81557cba>] sock_wfree+0x3a/0x70
[<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120
[<ffffffff8155c0b6>] __kfree_skb+0x16/0x30
[<ffffffff8155c119>] kfree_skb+0x49/0x170
[<ffffffff815e936e>] arp_error_report+0x3e/0x90
[<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0
[<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0
[<ffffffff81578d20>] ? neigh_update+0x640/0x640
[<ffffffff81073558>] __do_softirq+0xc8/0x3a0
Since jump_label_{inc|dec} must be called from process context only,
we must defer jump_label_dec() if net_disable_timestamp() is called
from interrupt context.
Reported-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set np->mcast_hops to it's default value at this moment
otherwise when we use it and found it's value is -1, the logic to
get default hop limit doesn't take multicast into account and will
return wrong hop limit(IPV6_DEFAULT_HOPLIMIT) which is for unicast.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is, we return null as the default mtu of blackhole routes.
This may lead to a propagation of a bogus pmtu if the default_mtu
method of a blackhole route is invoked. So return dst->dev->mtu
as the default mtu instead.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since linux 2.6.26 (commit c6aefafb7e : Add IPv6 support to TCP SYN
cookies), we can drop a SYN packet reusing a TIME_WAIT socket.
(As a matter of fact we fail to send the SYNACK answer)
As the client resends its SYN packet after a one second timeout, we
accept it, because first packet removed the TIME_WAIT socket before
being dropped.
This probably explains why nobody ever noticed or complained.
Reported-by: Jesse Young <jlyo@jlyo.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Distributions are using this in their default scripts, so don't hide
them behind the advanced setting.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 72a3effaf6 ([NET]: Size listen hash tables using backlog
hint) added a bug allowing inet6_synq_hash() to return an out of bound
array index, because of u16 overflow.
Bug can happen if system admins set net.core.somaxconn &
net.ipv4.tcp_max_syn_backlog sysctls to values greater than 65536
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The forcedeth changes had a conflict with the conversion over
to atomic u64 statistics in net-next.
The libertas cfg.c code had a conflict with the bss reference
counting fix by John Linville in net-next.
Conflicts:
drivers/net/ethernet/nvidia/forcedeth.c
drivers/net/wireless/libertas/cfg.c
ipv6: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a crash when non existing IPv6 route is tried to be changed.
When new destination node was inserted in middle of FIB6 tree, no relevant
sanity checks were performed. Later route insertion might have been prevented
due to invalid request, causing node with no rt info being left in tree.
When this node was accessed, a crash occurred.
Patch adds missing checks in fib6_add_1()
Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>