The LOG targets print the entire MAC header as one long string, which is not
readable very well:
IN=eth0 OUT= MAC=00:15:f2:24:91:f8:00:1b:24:dc:61:e6:08:00 ...
Add an option to decode known header formats (currently just ARPHRD_ETHER devices)
in their individual fields:
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=0800 ...
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=86dd ...
The option needs to be explicitly enabled by userspace to avoid breaking
existing parsers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the comparison within the loop to print the macheader by prepending
the colon to all but the first printk.
Based on suggestion by Jan Engelhardt <jengelh@medozas.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (52 commits)
phylib: Add autoload support for the LXT973 phy.
ISDN: hysdn, fix potential NULL dereference
vxge: fix memory leak in vxge_alloc_msix() error path
isdn/gigaset: correct CAPI connection state storage
isdn/gigaset: encode HLC and BC together
isdn/gigaset: correct CAPI DATA_B3 Delivery Confirmation
isdn/gigaset: correct CAPI voice connection encoding
isdn/gigaset: honor CAPI application's buffer size request
cpmac: do not leak struct net_device on phy_connect errors
smc91c92_cs: fix the problem that lan & modem does not work simultaneously
ipv6: fix NULL reference in proxy neighbor discovery
Bluetooth: Bring back var 'i' increment
xfrm: check bundle policy existance before dereferencing it
sky2: enable rx/tx in sky2_phy_reinit()
cnic: Disable statistics initialization for eth clients that do not support statistics
net: add dependency on fw class module to qlcnic and netxen_nic
snmp: fix SNMP_ADD_STATS()
hso: remove setting of low_latency flag
udp: Fix bogus UFO packet generation
lasi82596: fix netdev_mc_count conversion
...
Allows use of ECN when syncookies are in effect by encoding ecn_ok
into the syn-ack tcp timestamp.
While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES.
With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc
removes the "if (0)".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed out by Fernando Gont there is no need to encode rcv_wscale
into the cookie.
We did not use the restored rcv_wscale anyway; it is recomputed
via tcp_select_initial_window().
Thus we can save 4 bits in the ts option space by removing rcv_wscale.
In case window scaling was not supported, we set the (invalid) wscale
value 0xf.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9261e53701 (ipv6: making ip and icmp statistics per/namespace)
forgot to remove ipv6_statistics variable.
commit bc417d99bf (ipv6: remove stale MIB definitions) took care of
icmpv6_statistics & icmpv6msg_statistics
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.
Callers can use __alignof__(type) to provide correct
alignment.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is thanks to Andre Noll who reported the issue and helped testing.
The Syn-RTT sampled during the initial handshake currently only works for
the client sending the DCCP-Request. TFRC penalizes the absence of an RTT
sample with a very slow initial speed (1 packet per second), which delays
slow-start significantly, resulting in sluggish performance.
This patch mirrors the "Syn RTT" principle by adding a timestamp also onto
the DCCP-Response, producing an RTT sample when the (Data)Ack completing
the handshake arrives.
Also changed the documentation to 'TFRC' since Syn RTTs are also used by CCID-4.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes an unused 'sk' argument from several option-inserting functions.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Remove "pktgen: " from formats
Convert printks to pr_<level>
Added func_enter() for debugging
Moved version to end of string at module_init
Coalesced long formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).
Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):
7ff: 8b 06 mov (%rsi),%eax
801: 66 83 f8 35 cmp $0x35,%ax
805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d>
80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a>
811: 66 83 f8 15 cmp $0x15,%ax
815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322>
81b: 77 73 ja 890 <sk_run_filter+0xd2>
81d: 66 83 f8 04 cmp $0x4,%ax
821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280>
827: 77 29 ja 852 <sk_run_filter+0x94>
829: 66 83 f8 01 cmp $0x1,%ax
[...]
With the modification the compiler translate the switch statement into
the following jump table fragment:
7ff: 66 83 3e 2c cmpw $0x2c,(%rsi)
803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a>
809: 0f b7 06 movzwl (%rsi),%eax
80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
813: 44 89 e3 mov %r12d,%ebx
816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
81b: 41 89 dc mov %ebx,%r12d
81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0>
Furthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of TLLAO option created a kernel OOPS regression
for the case where neighbor advertisement is being sent via
proxy path. When using proxy, ipv6_get_ifaddr() returns NULL
causing the NULL dereference.
Change causing the bug was:
commit f7734fdf61
Author: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri Oct 2 11:39:15 2009 +0000
make TLLAO option for NA packets configurable
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NF_CT_ACCT has been deprecated for awhile and
was originally scheduled for removal by 2.6.29.
Removing support for this config option also stops
this deprecation warning message in the kernel log.
[ 61.669627] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 61.669850] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[ 61.669852] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[ 61.669853] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Patrick: changed default value to 0]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Check at rule install time that CT accounting is enabled. Force it
to be enabled if not while also emitting a warning since this is not
the default state.
This is in preparation for deprecating CONFIG_NF_CT_ACCT upon which
CONFIG_NETFILTER_XT_MATCH_CONNBYTES depended being set.
Added 2 CT accounting support functions:
nf_ct_acct_enabled() - Get CT accounting state.
nf_ct_set_acct() - Enable/disable CT accountuing.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit ff6e2163f2 accidentally added a
regression on the bnep code. Fixing it.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: David S. Miller <davem@davemloft.net>
i've found that tcp_close() can be called for an already closed
socket, but still sends reset in this case (tcp_send_active_reset())
which seems to be incorrect. Moreover, a packet with reset is sent
with different source port as original port number has been already
cleared on socket. Besides that incrementing stat counter for
LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.
Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
seems to be true for the current mainstream kernel (checked on
2.6.35-rc3). Please, correct me if i missed something.
How that happens:
1) the server receives a packet for socket in TCP_CLOSE_WAIT state
that triggers a tcp_reset():
Call Trace:
<IRQ> [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
[<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
[<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
[<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
[<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
[<ffffffff80037131>] ip_local_deliver+0x200/0x2f4
[<ffffffff8003843c>] ip_rcv+0x64c/0x69f
[<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
[<ffffffff80032eca>] process_backlog+0x90/0xec
[<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
[<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
[<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
[<ffffffff8006334c>] call_softirq+0x1c/0x28
[<ffffffff80070476>] do_softirq+0x2c/0x85
[<ffffffff80070441>] do_IRQ+0x149/0x152
[<ffffffff80062665>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
[<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
[<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
[<ffffffff8006a263>] do_page_fault+0x49a/0x7dc
[<ffffffff80066487>] thread_return+0x89/0x174
[<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
[<ffffffff80062e39>] error_exit+0x0/0x84
tcp_rcv_state_process()
... // (sk_state == TCP_CLOSE_WAIT here)
...
/* step 2: check RST bit */
if(th->rst) {
tcp_reset(sk);
goto discard;
}
...
---------------------------------
tcp_rcv_state_process
tcp_reset
tcp_done
tcp_set_state(sk, TCP_CLOSE);
inet_put_port
__inet_put_port
inet_sk(sk)->num = 0;
sk->sk_shutdown = SHUTDOWN_MASK;
2) After that the process (socket owner) tries to write something to
that socket and "inet_autobind" sets a _new_ (which differs from
the original!) port number for the socket:
Call Trace:
[<ffffffff80255a12>] inet_bind_hash+0x33/0x5f
[<ffffffff80257180>] inet_csk_get_port+0x216/0x268
[<ffffffff8026bcc9>] inet_autobind+0x22/0x8f
[<ffffffff80049140>] inet_sendmsg+0x27/0x57
[<ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<ffffffff8001fb49>] __pollwait+0x0/0xdd
[<ffffffff8008d533>] default_wake_function+0x0/0xe
[<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<ffffffff80066538>] thread_return+0x13a/0x174
[<ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<ffffffff800622dd>] tracesys+0xd5/0xe0
3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):
F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3
Call Trace:
[<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
[<ffffffff80033300>] release_sock+0x10/0xae
[<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
[<ffffffff8026bd32>] inet_autobind+0x8b/0x8f
[<ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<ffffffff8001fb49>] __pollwait+0x0/0xdd
[<ffffffff8008d533>] default_wake_function+0x0/0xe
[<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<ffffffff80066538>] thread_return+0x13a/0x174
[<ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<ffffffff800622dd>] tracesys+0xd5/0xe0
tcp_sendmsg()
...
/* Wait for a connection to finish. */
if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
int old_state = sk->sk_state;
if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
if (f_d && (err == -EPIPE)) {
printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
"sk_err=%d, sk_shutdown=%d\n",
sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
sk->sk_err, sk->sk_shutdown);
dump_stack();
}
goto out_err;
}
}
...
4) Then the process (socket owner) understands that it's time to close
that socket and does that (and thus triggers sending reset packet):
Call Trace:
...
[<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
[<ffffffff80034698>] ip_output+0x351/0x384
[<ffffffff80251ae9>] dst_output+0x0/0xe
[<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
[<ffffffff80095700>] vprintk+0x21/0x33
[<ffffffff800070f0>] check_poison_obj+0x2e/0x206
[<ffffffff80013587>] poison_obj+0x36/0x45
[<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<ffffffff80023481>] dbg_redzone1+0x1c/0x25
[<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
[<ffffffff80023405>] tcp_transmit_skb+0x764/0x786
[<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
[<ffffffff80258ff1>] tcp_close+0x39a/0x960
[<ffffffff8026be12>] inet_release+0x69/0x80
[<ffffffff80059b31>] sock_release+0x4f/0xcf
[<ffffffff80059d4c>] sock_close+0x2c/0x30
[<ffffffff800133c9>] __fput+0xac/0x197
[<ffffffff800252bc>] filp_close+0x59/0x61
[<ffffffff8001eff6>] sys_close+0x85/0xc7
[<ffffffff800622dd>] tracesys+0xd5/0xe0
So, in brief:
* a received packet for socket in TCP_CLOSE_WAIT state triggers
tcp_reset() which clears inet_sk(sk)->num and put socket into
TCP_CLOSE state
* an attempt to write to that socket forces inet_autobind() to get a
new port (but the write itself fails with -EPIPE)
* tcp_close() called for socket in TCP_CLOSE state sends an active
reset via socket with newly allocated port
This adds an additional check in tcp_close() for already closed
sockets. We do not want to send anything to closed sockets.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove rtnl_unlock() which had no corresponding rtnl_lock().
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the bundle validation code to not assume having a valid policy.
When we have multiple transformations for a xfrm policy, the bundle
instance will be a chain of bundles with only the first one having
the policy reference. When policy_genid is bumped it will expire the
first bundle in the chain which is equivalent of expiring the whole
chain.
Reported-bisected-and-tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds transmit power setting type and transmit power level attributes
to NL80211_CMD_SET_WIPHY in order to facilitate adjusting of the transmit power
level of the device.
The added attributes allow selection of automatic, limited or fixed transmit
power level, with the level definable in signed mBm format.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In preparation for a TX power setting interface in the nl80211, change the
.set_tx_power function to use mBm units instead of dBm for greater accuracy and
smaller power levels.
Also, already in advance move the tx_power_setting enumeration to nl80211.
This change affects the .tx_set_power function prototype. As a result, the
corresponding changes are needed to modules using it. These are mac80211,
iwmc3200wifi and rndis_wlan.
Cc: Samuel Ortiz <samuel.ortiz@intel.com>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Samuel Ortiz <samuel.ortiz@intel.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While mesh_rx_plink_frame holds sta->lock...
mesh_rx_plink_frame ->
mesh_plink_inc_estab_count ->
ieee80211_bss_info_change_notify
...but ieee80211_bss_info_change_notify is allowed to sleep. A driver
taking advantage of that allowance can cause a scheduling while
atomic bug. Similar paths exist for mesh_plink_dec_estab_count,
so work around those as well.
http://bugzilla.kernel.org/show_bug.cgi?id=16099
Also, correct a minor kerneldoc comment error (mismatched function names).
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: stable@kernel.org
net/mac80211/rc80211_minstrel_ht.c:440:46: warning: incorrect type in argument 2 (different signedness)
net/mac80211/rc80211_minstrel_ht.c:440:46: expected int *idx
net/mac80211/rc80211_minstrel_ht.c:440:46: got unsigned int *<noident>
net/mac80211/rc80211_minstrel_ht.c:446:46: warning: incorrect type in argument 2 (different signedness)
net/mac80211/rc80211_minstrel_ht.c:446:46: expected int *idx
net/mac80211/rc80211_minstrel_ht.c:446:46: got unsigned int *<noident>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
net/mac80211/rx.c:2059:39: warning: symbol 'mgmt' shadows an earlier one
net/mac80211/rx.c:1916:31: originally declared here
Signed-off-by: John W. Linville <linville@tuxdriver.com>
this patch is implementing IP_NODEFRAG option for IPv4 socket.
The reason is, there's no other way to send out the packet with user
customized header of the reassembly part.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use u64_stats_sync infrastructure to provide 64bit rx/tx
counters even on 32bit hosts.
It is safe to use a single u64_stats_sync for rx and tx,
because BH is disabled on both, and we use per_cpu data.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_needs_gso() is checked twice in the TX path once,
before submitting the skb to the qdisc and once after
it is dequeued from the qdisc just before calling
ndo_hard_start(). This opens a window for a user to
change the gso/tso or tx checksum settings that can
cause netif_needs_gso to be true in one check and false
in the other.
Specifically, changing TX checksum setting may cause
the warning in skb_gso_segment() to be triggered if
the checksum is calculated earlier.
This consolidates the netif_needs_gso() calls so that
the stack only checks if gso is needed in
dev_hard_start_xmit().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the attempt to read the calldir fails, then instead of storing the read
bytes, we currently discard them. This leads to a garbage final result when
upon re-entry to the same routine, we read the remaining bytes.
Fixes the regression in bugzilla number 16213. Please see
https://bugzilla.kernel.org/show_bug.cgi?id=16213
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Destination was spelled wrong in KConfig.
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add header file to fix build error:
net/netfilter/xt_IDLETIMER.c:276: error: implicit declaration of function 'MKDEV'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allow one-packet scheduling for UDP connections. When the fwmark-based or
normal virtual service is marked with '-o' or '--ops' options all
connections are created only to schedule one packet. Useful to schedule UDP
packets from same client port to different real servers. Recommended with
RR or WRR schedulers (the connections are not visible with ipvsadm -L).
Signed-off-by: Nick Chalk <nick@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It has been reported that the new UFO software fallback path
fails under certain conditions with NFS. I tracked the problem
down to the generation of UFO packets that are smaller than the
MTU. The software fallback path simply discards these packets.
This patch fixes the problem by not generating such packets on
the UFO path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mechanism introduced in this patch applies (at least) for hardware
designs using a single shared antenna for both WLAN and BT. In these designs,
the antenna must be toggled between WLAN and BT.
In those hardware, managing WLAN co-existence with Bluetooth requires WLAN
full power save whenever there is Bluetooth activity in order for WLAN to be
able to periodically relinquish the antenna to be used for BT. This is because
BT can only access the shared antenna when WLAN is idle or asleep.
Some hardware, for instance the wl1271, are able to indicate to the host
whenever there is BT traffic. In essence, the hardware will send an indication
to the host whenever there is, for example, SCO traffic or A2DP traffic, and
will send another indication when the traffic is over.
The hardware gets information of Bluetooth traffic via hardware co-existence
control lines - these lines are used to negotiate the shared antenna
ownership. The hardware will give the antenna to BT whenever WLAN is sleeping.
This patch adds the interface to mac80211 to facilitate temporarily disabling
of dynamic power save as per request of the WLAN driver. This interface will
immediately force WLAN to full powersave, hence allowing BT coexistence as
described above.
In these kind of shared antenna desings, when WLAN powersave is fully disabled,
Bluetooth will not work simultaneously with WLAN at all. This patch does not
address that problem. This interface will not change PSM state, so if PSM is
disabled it will remain so. Solving this problem requires knowledge about BT
state, and is best done in user-space.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix the following compile warning:
CC [M] net/mac80211/scan.o
net/mac80211/scan.c: In function 'ieee80211_request_internal_scan':
net/mac80211/scan.c:749:23: warning: comparison between 'enum nl80211_band' and 'enum ieee80211_band'
caused by the local variable band not being of the proper 'ieee80211_band' type.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added new CAIF protocol type CAIFPROTO_DEBUG for accessing
CAIF debug on the ST Ericsson modems.
There are two debug servers on the modem, one for radio related
debug (CAIF_RADIO_DEBUG_SERVICE) and the other for
communication/application related debug (CAIF_COM_DEBUG_SERVICE).
The debug connection can contain trace debug printouts or
interactive debug used for debugging and test.
Debug connections can be of type STREAM or SEQPACKET.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously CAIF supported maximum transfer size of ~4050.
The transfer size is now calculated dynamically based on the
link layers mtu size.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF Remote File Manager may send or receive more than 4050 bytes.
Due to this The CAIF RFM service have to support segmentation.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow control is not used by all CAIF services.
The usage of flow control is now part of the gerneal
initialization function for CAIF Services.
Signed-off-by: Sjur Braendeland@stericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, detection in hwsim and ath9k can
detect that two sw scans are in flight at the
same time, which isn't really true. It is
caused by a race condition, because the scan
complete callback is called too late, after
the lock has been dropped, so that a new scan
can be started before it is called.
It is also called too early semantically, as
it is currently called _after_ the return to
the operating channel -- it should be before
so that drivers know this is the operating
channel again.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
regulatory_init is only called by cfg80211_init which is in .init.text,
too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211_exit is only used as module_exit function, so it can go to
.exit.text saving a few bytes when CONFIG_CFG80211=y.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is common in end-node, non STP bridges to set forwarding
delay to zero; which causes the forwarding database cleanup
to run every clock tick. Change to run only as soon as needed
or at next ageing timer interval which ever is sooner.
Use round_jiffies_up macro rather than attempting round up
by changing value.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2.6.34 introduced 'conntrack zones' to deal with cases where packets
from multiple identical networks are handled by conntrack/NAT. Packets
are looped through veth devices, during which they are NATed to private
addresses, after which they can continue normally through the stack
and possibly have NAT rules applied a second time.
This works well, but is needlessly complicated for cases where only
a single SNAT/DNAT mapping needs to be applied to these packets. In that
case, all that needs to be done is to assign each network to a seperate
zone and perform NAT as usual. However this doesn't work for packets
destined for the machine performing NAT itself since its corrently not
possible to configure SNAT mappings for the LOCAL_IN chain.
This patch adds a new INPUT chain to the NAT table and changes the
targets performing SNAT to be usable in that chain.
Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:
iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2
iptables -t nat -A INPUT -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A INPUT -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A OUTPUT -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
iptables -t raw -A OUTPUT -d 10.0.1.0/24 -j CT --zone 2
iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the restriction that only allows connecting to a unix domain
socket identified by unix path that is in the same network namespace.
Crossing network namespaces is always tricky and we did not support
this at first, because of a strict policy of don't mix the namespaces.
Later after Pavel proposed this we did not support this because no one
had performed the audit to make certain using unix domain sockets
across namespaces is safe.
What fundamentally makes connecting to af_unix sockets in other
namespaces is safe is that you have to have the proper permissions on
the unix domain socket inode that lives in the filesystem. If you
want strict isolation you just don't create inodes where unfriendlys
can get at them, or with permissions that allow unfriendlys to open
them. All nicely handled for us by the mount namespace and other
standard file system facilities.
I looked through unix domain sockets and they are a very controlled
environment so none of the work that goes on in dev_forward_skb to
make crossing namespaces safe appears needed, we are not loosing
controll of the skb and so do not need to set up the skb to look like
it is comming in fresh from the outside world. Further the fields in
struct unix_skb_parms should not have any problems crossing network
namespaces.
Now that we handle SCM_CREDENTIALS in a way that gives useable values
across namespaces. There does not appear to be any operational
problems with encouraging the use of unix domain sockets across
containers either.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start capturing not only the userspace pid, uid and gid values of the
sending process but also the struct pid and struct cred of the sending
process as well.
This is in preparation for properly supporting SCM_CREDENTIALS for
sockets that have different uid and/or pid namespaces at the different
ends.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
scm_send occasionally allocates state in the scm_cookie, so I have
modified netlink_sendmsg to guarantee that when scm_send succeeds
scm_destory will be called to free that state.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.
This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep the coming code clear and to allow both the sock
code and the scm code to share the logic introduce a
fuction to translate from struct cred to struct ucred.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
https://bugzilla.kernel.org/show_bug.cgi?id=16183
The sch_teql module, which can be used to load balance over a set of
underlying interfaces, stopped working after 2.6.30 and has been
broken in all kernels since then for any underlying interface which
requires the addition of link level headers.
The problem is that the transmit routine relies on being able to
access the destination address in the skb in order to do address
resolution once it has decided which underlying interface it is going
to transmit through.
In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
default for all interfaces, which causes the destination address to be
released before the transmit routine for the interface is called.
The solution is to clear that flag for teql interfaces.
Signed-off-by: Tom Hughes <tom@compton.nu>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discard the ACK if we find options that do not match current sysctl
settings.
Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.
Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().
Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
regression introduced by b8d92c9c14
In function ‘ieee80211_work_rx_queued_mgmt’:
warning: ‘rma’ may be used uninitialized in this function
this re-adds default value WORK_ACT_NONE back to rma
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.
Thats a bit unfortunate, since old size was exactly 64 bytes.
This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.
Add a inet_peer_refcheck() function to check this assertion for a while.
We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commit aa1039e73c (inetpeer: RCU conversion)
Unused inet_peer entries have a null refcnt.
Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.
Fix this using -1 marker instead of 0 for deleted entries.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge multicast patches introduced an OOM crash in the forward
path, when deliver_clone fails to clone the skb.
Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Third param (work) is unused, remove it.
Remove __inline__ and inline qualifiers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of doing one atomic operation per frag, we can factorize them.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.
Rearrange things so ->iif is also initialized in the syncookie case.
want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.
Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
inetpeer currently uses an AVL tree protected by an rwlock.
It's possible to make most lookups use RCU
1) Add a struct rcu_head to struct inet_peer
2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().
3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.
4) add an smp_wmb() in link_to_pool() right before node insert.
5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.
6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.
7) inet_getpeer() first attempts lockless lookup.
Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
If this attemps fails, lock is taken a regular lookup is performed
again.
8) convert peers.lock from rwlock to a spinlock
9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When management frame protection (IEEE 802.11w) is used, we must use a
separate counter for tracking received CCMP packet number for the
management frames. The previously used NUM_RX_DATA_QUEUESth queue was
shared with data frames when QoS was not used and that can cause
problems in detecting replays incorrectly for robust management frames.
Add a new counter just for robust management frames to avoid this issue.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When management frame protection (IEEE 802.11w) is used,
Deauthentication frame needs to be protected when the pairwise key is
configured. mac80211 was removing the station entry (and its keys)
before actually sending out the Deauthentication frame. Fix this by
reordering the code to send the frame before the station entry gets
removed. This matches an earlier change that handled the Disassociation
frame processing, but missed Deauthentication frames.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ps-qos latency handling is broken. It uses predetermined latency values
to select specific dynamic PS timeouts. With common AP configurations, these
values overlap with beacon interval and are therefore essentially useless
(for network latencies less than the beacon interval, PSM is disabled.)
This patch remedies the problem by replacing the predetermined network latency
values with one high value (1900ms) which is used to go trigger full psm. For
backwards compatibility, the value 2000ms is still mapped to a dynamic ps
timeout of 100ms.
Currently also the mac80211 internal value for storing user space configured
dynamic PSM values is incorrectly in the driver visible ieee80211_conf struct.
Move it to the ieee80211_local struct.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add possibility to register rx_handler data pointer along with a rx_handler.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are multiple problems with the newly added netpoll support:
1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.
This patch fixes all of these problems. In particular, we now
allocate proper netpoll structures for each underlying device.
We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll. Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).
This allows us to do away with the npinfo juggling that caused
problem number 1.
Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.
Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.
They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever. So chaos ensures if two entities try to
perform them on the same device.
This patch adds RTNL to the equation. The code has been rearranged so
that bits that do not need RTNL protection are now moved to the top of
netpoll_setup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of RCU in netpoll is incorrect in a number of places:
1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.
This patch fixes those issues. As most users are in BH context,
this also converts the RCU usage to the BH variant.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that netpoll always zaps npinfo we no longer need to do it
in bridge.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have to NULL npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.
Timers are identified by labels and are created when a rule is set with a new
label. The rules also take a timeout value (in seconds) as an option. If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.
One entry for each timer is created in sysfs. This attribute contains the
timer remaining for the timer to expire. The attributes are located under
the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).
Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- clusterip_lock becomes a spinlock
- lockless lookups
- kfree() deferred after RCU grace period
- rcu_barrier_bh() inserted in clusterip_tg_exit()
v2)
- As Patrick pointed out, we use atomic_inc_not_zero() in
clusterip_config_find_get().
- list_add_rcu() and list_del_rcu() variants are used.
- atomic_dec_and_lock() used in clusterip_config_entry_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Try to reduce cache line contentions in peer management, to reduce IP
defragmentation overhead.
- peer_fake_node is marked 'const' to make sure its not modified.
(tested with CONFIG_DEBUG_RODATA=y)
- Group variables in two structures to reduce number of dirtied cache
lines. One named "peers" for avl tree root, its number of entries, and
associated lock. (candidate for RCU conversion)
- A second one named "unused_peers" for unused list and its lock
- Add a !list_empty() test in unlink_from_unused() to avoid taking lock
when entry is not unused.
- Use atomic_dec_and_lock() in inet_putpeer() to avoid taking lock in
some cases.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use __in6_dev_get() instead of in6_dev_get()/in6_dev_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a circular locking dependency when configuring the
hardware ARP filters on association, occurring when flushing the mac80211
workqueue. This is what happens:
[ 92.026800] =======================================================
[ 92.030507] [ INFO: possible circular locking dependency detected ]
[ 92.030507] 2.6.34-04781-g2b2c009 #85
[ 92.030507] -------------------------------------------------------
[ 92.030507] modprobe/5225 is trying to acquire lock:
[ 92.030507] ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<ffffffff8105b5c0>] flush_workq
ueue+0x0/0xb0
[ 92.030507]
[ 92.030507] but task is already holding lock:
[ 92.030507] (rtnl_mutex){+.+.+.}, at: [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507]
[ 92.030507] which lock already depends on the new lock.
[ 92.030507]
[ 92.030507]
[ 92.030507] the existing dependency chain (in reverse order) is:
[ 92.030507]
[ 92.030507] -> #2 (rtnl_mutex){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff81341754>] mutex_lock_nested+0x44/0x300
[ 92.030507] [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20
[ 92.030507] [<ffffffffa022d47c>] ieee80211_assoc_done+0x6c/0xe0 [mac80211]
[ 92.030507] [<ffffffffa022f2ad>] ieee80211_work_work+0x31d/0x1280 [mac80211]
[ 92.030507] -> #1 ((&local->work_work)){+.+.+.}:
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105a51a>] worker_thread+0x22a/0x370
[ 92.030507] [<ffffffff8105ecc6>] kthread+0x96/0xb0
[ 92.030507] [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10
[ 92.030507]
[ 92.030507] -> #0 ((wiphy_name(local->hw.wiphy))){+.+.+.}:
[ 92.030507] [<ffffffff81075fdc>] __lock_acquire+0x1c0c/0x1d50
[ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110
[ 92.030507] [<ffffffff8105b60e>] flush_workqueue+0x4e/0xb0
[ 92.030507] [<ffffffffa023ff7b>] ieee80211_stop_device+0x2b/0xb0 [mac80211]
[ 92.030507] [<ffffffffa0231635>] ieee80211_stop+0x3e5/0x680 [mac80211]
The locking in this case is quite complex. Fix the problem by rewriting the
way the hardware ARP filter list is handled - i.e. make a copy of the address
list to the bss_conf struct, and provide that list to the hardware driver
when needed.
The current patch will enable filtering also in promiscuous mode. This may need
to be changed in the future.
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove BSS from cfg80211 BSS list if we are only member in IBSS when
leaving it.
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add changed basic rates flag to bss_changed while joinig ibss network.
This patch is split from the patch containing support for setting basic
rates when creating ibss network. Original patch was posted by Johannes
Berg on the linux-wireless posting list.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support to nl80211 and mac80211 to set basic rates when
joining/creating ibss network.
Original patch was posted by Johannes Berg on the linux-wireless posting list.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently, driver tracing is sometimes invoked
after and sometimes before the actual driver
callback. This is fine as long as the driver
has no tracing itself, but as soon as it does
it gets confusing.
To make traces containing such information
easier to read, introduce a return tracer in
mac80211 that essentially brackets any driver
tracing, and invoke the real trace before the
driver's callback, only showing the return
value, if any, afterwards.
Since tracing records the process, there's no
problem with overlapping calls if that should
happen.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The recent change to processing action frames from
the management frame queue had already broken action
frame accounting, and my rework didn't help either.
So add back accounting and simplify the code with a
label rather than duplicating it, and also add
accounting for management frames.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even before the recent changes, the documentation
for TX aggregation was somewhat out of date. Update
it and also add documentation for the RX side.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow drivers to sleep, and indicate this in
the documentation. ath9k has some locking I
don't understand, so keep it safe and disable
BHs in it, all other drivers look fine with
the context change.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To prepare for allowing drivers to sleep in
ampdu_action, change the locking in the TX
aggregation code to use the mutex the RX part
already uses. The spinlock is still necessary
around some code to avoid races with TX, but
now we can also synchronize_net() to avoid
getting an inconsistent sequence number.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we want the code to be able to sleep
in the future, it must not be called from
the timer directly. To achieve that, simply
call the function drivers would call, and
also use RCU in the timer to get the struct
so we don't need to rely on the spinlock in
the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To prepare for allowing drivers to sleep in
ampdu_action, change the locking in the RX
aggregation code to use a mutex, so that it
would already allow drivers to sleep. But
explicitly disable BHs around the callback
for now since the TX part cannot yet sleep,
and drivers' locking might require it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I noticed that when there was _no_ traffic at
all on a given aggregation session, it would
never time out. This won't happen unless you
forced creating a session, but fix it anyway.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we want the code to be able to sleep
in the future, it must not be called from
the timer directly. To prepare, move it out
into the aggregation work.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move the block-ack session works into common
code, since it will be needed for RX agg too
in the next patches.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the driver or rate control requests starting
or stopping an aggregation session, that currently
causes a direct callback into the driver, which
could potentially cause locking problems. Also,
the functions need to be callable from contexts
that cannot sleep, and thus will interfere with
making the ampdu_action callback sleeping.
To address these issues, add a new work item for
each station that will process any start or stop
requests out of line.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 currently maintains the ampdu_lock to
avoid starting a queue due to one aggregation
session while another aggregation session needs
the queue stopped.
We can do better, however, and instead refcount
the queue stops for this particular purpose,
thus removing the need for the lock. This will
help making ampdu_action able to sleep.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The non-irqsafe aggregation start/stop done
callbacks are currently only used by ath9k_htc,
and can cause callbacks into the driver again.
This might lead to locking issues, which will
only get worse as we modify locking. To avoid
trouble, remove the non-irqsafe versions and
change ath9k_htc to use those instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently we allocate some memory for each TX
aggregation session and additionally keep a
state bitmap indicating the state it is in.
By using RCU to protect the pointer, moving
the state into the structure and some locking
trickery we can avoid locking when the TX agg
session is fully operational.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently we allocate some memory for each RX
aggregation session and additionally keep a
flag indicating whether or not it is valid.
By using RCU to protect the pointer and making
sure that the memory is fully set up before it
becomes visible to the RX path, we can remove
the need for the bool that indicates validity,
as well as for locking on the RX path since it
is always synchronised against itself, and we
can guarantee that all other modifications are
done when the structure is not visible to the
RX path.
The net result is that since we remove locking
requirements from the RX path, we can in the
future use any kind of lock for the setup and
teardown code paths.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This moves the aggregation callback processing
to the per-sdata skb queue and a work function
rather than the tasklet.
Unfortunately, this means that it extends the
pkt_type hack to that skb queue. However, it
will enable making ampdu_action API changes
gradually, my current plan is to get rid of
this again by forcing drivers to only return
from ampdu_action() when everything is done,
thus removing the callbacks completely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's a corner case where we receive a fragmented
frame during a blockack session, in which case we
will terminate that session. To simplify future work
in this area that will culminate in allowing the
driver callbacks for aggregation to sleep, move the
processing of this case out of the RX path into the
interface work.
This will simplify future work because the new place
for this code doesn't require that the function will
always be atomic, which the RX path needs.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To prepare for making the ampdu_action callback
sleep, make mac80211 always process blockack
action frames from the skb queue. This gets rid
of the current special case for managed mode
interfaces as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some code is duplicated between ibss, mesh and
managed mode regarding the queueing of management
frames. Since all modes now use a common skb
queue and a common work function, we can pull
the queueing code into the rx handler directly
and remove the duplicated length checks etc.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All the management processing functions free the
skb after they are done, so this can be done in
the new common code instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even with the previous patch, IBSS, managed
and mesh modes all attach their own work
function to the shared work struct, which
means some duplicated code. Change that to
only have a frame processing function and a
further work function for each of them and
share some common code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IBSS, managed and mesh modes all have their
own work struct, and in the future we want
to also use it in other modes to process
frames from the now common skb queue.
This also makes the skb queue and work safe
to use from other interface types.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IBSS, managed and mesh modes all have an
skb queue, and in the future we want to
also use it in other modes, so make them
all use a common skb queue already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A number of places use RCU locking for accessing
the station list, even though they do not need
to. Use mutex locking instead to prepare for the
locking changes I want to make. The mlme code is
also using a WLAN_STA_DISASSOC flag that has the
same meaning as WLAN_STA_BLOCK_BA, so use that.
While doing so, combine places where we loop
over stations twice, and optimise away some of
the loops by checking if the hardware supports
aggregation at all first.
Also fix a more theoretical race condition: right
now we could resume, set up an aggregation session,
and right after tear it down again due to the code
that is needed for hardware reconfiguration here.
Also mark add a comment to that code marking it as
a workaround.
Finally, remove a pointless aggregation disabling
loop when an interface is stopped, directly after
that we remove all stations from it which will also
disable all aggregation sessions that may still be
active, and does so in a race-free way unlike the
current loop that doesn't block new sessions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When in IBSS mode, currently action frame TX and RX
cannot be used. Allow using it to talk to any peer,
or for public action frames. Also, while at it,
restructure the code in mac80211 to make it easier
to add this for other interface types in the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The parameter (work) is unused, remove it.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If the returned csum value is 0, We has set ip_summed with
CHECKSUM_UNNECESSARY flag in __skb_checksum_complete_head().
So this patch kills the check and changes to return to upper
caller directly.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- must use atomic_inc_not_zero() in instance_lookup_get()
- must use hlist_add_head_rcu() instead of hlist_add_head()
- must use hlist_del_rcu() instead of hlist_del()
- Introduce NFULNL_COPY_DISABLED to stop lockless reader from using an
instance, before we do final instance_put() on it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
No need to copy rxhash again in __skb_clone()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
deliver_no_wcard is not being set in skb_copy_header.
In the skb_cloned case it is not being cleared and
may cause the skb to be dropped when the loopback device
pushes it back up the stack.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device statistics have type unsigned long and several of the
device-specific parameters printed here have type __u32.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct rtnl_link_stats64 as the statistics structure.
On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.
Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.
Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch increases the granularity of the rate generated by pktgen.
The previous version of pktgen uses micro seconds (udelay) resolution when it
was delayed causing gaps in the rates. It is changed to nanosecond (ndelay).
Now any rate is possible.
Also it allows to set, the desired rate in Mb/s or packets per second.
The documentation has been updated.
Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
econet lacks proper locking. It holds econet_lock only when inserting or
deleting an entry in econet_sklist, not during lookups.
- convert econet_lock from rwlock to spinlock
- use econet_lock in ec_listening_socket() lookup
- use appropriate sock_hold() / sock_put() to avoid corruptions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.
This was partially addressed in commit 5d944c640b
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.
A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch correct a bug in the delay of pktgen.
It makes sure the inter-packet interval is accurate.
Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.
net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".
Add a new fine grained lock instead of trying to use RTNL in netfilter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 66018506e1 (ip: Router Alert RCU conversion) introduced RCU
lookups to ip_call_ra_chain(). It missed proper deinit phase :
When ip_ra_control() deletes an ip_ra_chain, it should make sure
ip_call_ra_chain() users can not start to use socket during the rcu
grace period. It should also delay the sock_put() after the grace
period, or we risk a premature socket freeing and corruptions, as
raw sockets are not rcu protected yet.
This delay avoids using expensive atomic_inc_not_zero() in
ip_call_ra_chain().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop(). This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.
For example,
vlanx -> bond0 -> ethx
will be dropped in the vlan path and not delivered to any
packet handlers at all. However,
bond0 -> vlanx -> ethx
and
bond0 -> ethx
will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.
This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv(). Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly. This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.
I have tested the following 4 configurations in failover modes
and load balancing modes.
# bond0 -> ethx
# vlanx -> bond0 -> ethx
# bond0 -> vlanx -> ethx
# bond0 -> ethx
|
vlanx -> --
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 1f8438a853 (icmp: Account for ICMP out errors), I did a typo
on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- rcu_read_lock() already held by caller
- use __in_dev_get_rcu() instead of in_dev_get() / in_dev_put()
- remove goto out;
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The extra ! character means that these conditions are always false.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use call_rcu rather than synchronize_rcu.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BugLink: http://bugs.launchpad.net/bugs/591416
There are a number of network drivers (bridge, bonding, etc) that are not yet
receive multi-queue enabled and use alloc_netdev(), so don't print a
num_rx_queues imbalance warning in that case.
Also, only print the warning once for those drivers that _are_ multi-queue
enabled.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
- instances_lock becomes a spinlock
- lockless lookups
While nfnetlink_log probably not performance critical, using less
rwlocks in our code is always welcomed...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- Use an atomic_t for id_sequence to avoid a spin_lock/spin_unlock pair
- Group highly modified struct nfqnl_instance fields together
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Converts queue_lock rwlock to a spinlock.
(readlocked part can be changed by reads of integer values)
One atomic operation instead of four per ipq_enqueue_packet() call.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.
Here is what happens in pretty verbose form:
CPU 0 CPU 1
------------ ------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.
IRQ assignment is changed to this CPU,
but the expire timer stays scheduled on
the other CPU.
New connection from same ip:port comes
in right before the timer expires, we
find the inactive connection in our
connection table and get a reference to
it. We proper lock the connection in
tcp_state_transition() and read the
connection flags in set_tcp_state().
ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!
While still holding proper locks we
write the connection flags in
set_tcp_state() and this sets the hashed
flag again.
ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.
We drop the reference we held on the
connection and schedule the expire timer
for timeouting the connection on this
CPU. Further packets won't be able to
find this connection in our connection
table.
ip_vs_conn_expire() gets called again,
we think it's already hashed, but the
list_head is dangling and while removing
the connection from our connection table
we write to the memory location where
this list_head points to.
The result will probably be a kernel oops at some other point in time.
This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Converts queue_lock rwlock to a spinlock.
(readlocked part can be changed by reads of integer values)
One atomic operation instead of four per ipq_enqueue_packet() call.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use WORD_ROUND to round an int up to the next multiple of 4.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet, slowing down performance.
This patch converts it to a per_cpu variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When we receive a deauthentication frame before
having successfully associated, we neither print
a message nor abort assocation. The former makes
it hard to debug, while the latter later causes
a warning in cfg80211 when, as will typically be
the case, association timed out.
This warning was reported by many, e.g. in
https://bugzilla.kernel.org/show_bug.cgi?id=15981,
but I couldn't initially pinpoint it. I verified
the fix by hacking hostapd to send a deauth frame
instead of an association response.
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet. This is bad for performance.
__read_mostly annotation is also a bad choice.
This patch introduces IPS_UNTRACKED bit so that we can use later a
per_cpu untrack structure more easily.
A new helper, nf_ct_untracked_get() returns a pointer to
nf_conntrack_untracked.
Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
IPS_NAT_DONE_MASK bits to untracked status.
nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
There's a window for ieee80211_ifa_changed() to get called whilst the
managed mode mutex has not been initialized when opening and stopping the
interface. Currently this causes a kernel BUG like the following:
[ 132.460013] kernel BUG at /home/wifi/iwlwifi-2.6/net/mac80211/main.c:380!
[ 132.460013] invalid opcode: 0000 [#1] SMP
The mutex is initialized during open(), hence once netif_running() is true,
the mutex should be valid. Fix by adding a netif_running() check to the
function.
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- dev_get_by_flags() changed to dev_get_by_flags_rcu()
- ipv6_sock_ac_join() dont touch dev & idev refcounts
- ipv6_sock_ac_drop() dont touch dev & idev refcounts
- ipv6_sock_ac_close() dont touch dev & idev refcounts
- ipv6_dev_ac_dec() dount touch idev refcount
- ipv6_chk_acast_addr() dont touch idev refcount
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.
[ Fix build with CONFIG_IP_ROUTE_VERBOSE disabled. -DaveM ]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Straightforward conversion to RCU.
One rwlock becomes a spinlock, and is static.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"mac80211: make ARP filtering depend on CONFIG_INET" introduced this
potential locking leak.
Reported-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes "iw wlan0 dump survey" work again with
mac80211-based drivers that support it, e.g. ath5k.
Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ipmr_rules_exit() and ip6mr_rules_exit() free a list of items, but
forget to properly remove these items from list. List head is not
changed and still points to freed memory.
This can trigger a fault later when icmpv6_sk_exit() is called.
Fix is to either reinit list, or use list_del() to properly remove items
from list before freeing them.
bugzilla report : https://bugzilla.kernel.org/show_bug.cgi?id=16120
Introduced by commit d1db275dd3 (ipv6: ip6mr: support multiple
tables) and commit f0ad0860d0 (ipv4: ipmr: support multiple tables)
Reported-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The extra assertion to allow packet munging only when there are
no other ptypes listening which may have worked around an old bug
is unnecessary. It is sufficient to check if the skb is cloned before
trampling on it. Thanks to Herbert Xu for being persistent and patient
in getting this across.
[Note that cloning checks and assertions are the general rule used
by tc actions (documentation/networking/tc-actions-env-rules.txt)].
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the core path doesnt set OK to munge we detect
writable skbs by looking to see if they are cloned.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove two unnecessary assignments
we don't need to assign NULL when initialize structure objects.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/sched/sch_htb.c | 2 --
1 file changed, 2 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid two atomic ops per raw_send_hdrinc() call
Avoid two atomic ops per raw6_send_hdrinc() call
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF is using "xxx-AF_MAX" strings for the lock validator. It should use
its own strings.
Signed-off-by: Alex Lorca <alex.lorca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch address a serious performance issue in reading the
TCP sockets table (/proc/net/tcp).
Reading the full table is done by a number of sequential read
operations. At each read operation, a seek is done to find the
last socket that was previously read. This seek operation requires
that the sockets in the table need to be counted up to the current
file position, and to count each of these requires taking a lock for
each non-empty bucket. The whole algorithm is O(n^2).
The fix is to cache the last bucket value, offset within the bucket,
and the file position returned by the last read operation. On the
next sequential read, the bucket and offset are used to find the
last read socket immediately without needing ot scan the previous
buckets the table. This algorithm t read the whole table is O(n).
The improvement offered by this patch is easily show by performing
cat'ing /proc/net/tcp on a machine with a lot of connections. With
about 182K connections in the table, I see the following:
- Without patch
time cat /proc/net/tcp > /dev/null
real 1m56.729s
user 0m0.214s
sys 1m56.344s
- With patch
time cat /proc/net/tcp > /dev/null
real 0m0.894s
user 0m0.290s
sys 0m0.594s
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that
are very unreliable, on machines where PAGE_SIZE=4K
Limit allocated skbs to be at most one page. (order-0 allocations)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can avoid an unecessary cache miss by checking if the skb is non-linear
before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be
done to avoid a cache miss on nr_frags if data_len is 0.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- ipv6 msstab: account for ipv6 header size
- ipv4 msstab: add mss for Jumbograms.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
caller: if (!th->rst && !th->syn && th->ack)
callee: if (!th->ack)
make the caller only check for !syn (common for 3whs), and move
the !rst / ack test to the callee.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
both syn_flood_warning functions print a message, but
ipv4 version only prints a warning if CONFIG_SYN_COOKIES=y.
Make the v4 one behave like the v6 one.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its better to make a route lookup in appropriate namespace.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I believe a moderate SYN flood attack can corrupt RFS flow table
(rps_sock_flow_table), making RPS/RFS much less effective.
Even in a normal situation, server handling short lived sessions suffer
from bad steering for the first data packet of a session, if another SYN
packet is received for another session.
We do following action in tcp_v4_rcv() :
sock_rps_save_rxhash(sk, skb->rxhash);
We should _not_ do this if sk is a LISTEN socket, as about each
packet received on a LISTEN socket has a different rxhash than
previous one.
-> RPS_NO_CPU markers are spread all over rps_sock_flow_table.
Also, it makes sense to protect sk->rxhash field changes with socket
lock (We currently can change it even if user thread owns the lock
and might use rxhash)
This patch moves sock_rps_save_rxhash() to a sock locked section,
and only for non LISTEN sockets.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syncookies default to on since
e994b7c901
(tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm triggers a warning if dst_pop() drops a refcount
on a noref dst. This patch changes dst_pop() to
skb_dst_pop(). skb_dst_pop() drops the refcnt only
on a refcounted dst. Also we don't clone the child
dst_entry, so it is not refcounted and we can use
skb_dst_set_noref() in xfrm_output_one().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Processing an association response could take a bit
of time while we set up the hardware etc. During that
time, the AP might already send a blockack request.
If this happens very quickly on a fairly slow machine,
we can end up processing the blockack request before
the association processing has finished. Since the
blockack processing cannot sleep right now, we also
cannot make it wait in the driver.
As a result, sometimes on slow machines the iwlagn
driver gets totally confused, and no traffic can pass
when the aggregation setup was done before the assoc
setup completed.
I'm working on a proper fix for this, which involves
queuing all blockack category action frames from a
work struct, and also allowing the ampdu_action driver
callback to sleep, which will generally clean up the
code and make things easier.
However, this is a very involved and complex change.
To fix the problem at hand in a way that can also be
backported to stable, I've come up with this patch.
Here, I simply process all aggregation action frames
from the managed interface skb queue, which means
their processing will be serialized with processing
the association response, thereby fixing the problem.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
STA_NOTIFY_ADD and STA_NOTIFY_REMOVE have no users anymore,
and station addition/removal are indicated to drivers
using sta_add() and sta_remove(), which can sleep.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using vmalloc_node(size, numa_node_id()) for temporary storage is not
needed. vmalloc(size) is more respectful of user NUMA policy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Avoid two atomic ops in arp_fwd_proxy()
Avoid two atomic ops in arp_process()
Valid optims since arp_rcv() is run under rcu_read_lock()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid two atomic ops on output device refcount
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reduces the binary size by around 25k (measured on MIPS,
with CONFIG_MAC80211_DEBUG_COUNTERS enabled).
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since I recently made station management able
to sleep, I can now rework key management as
well; since it will no longer need a spinlock
and can also use a mutex instead, a bunch of
code to allow drivers' set_key to sleep while
key management is protected by a spinlock can
now be removed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For some odd reason, the plink_state enum is
declared in the middle between aggregation
related structures. Move it down to make the
file easier to read.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After ieee80211_rx_h_ctrl() processing we only
want to process management (including action)
frames, so there's no point in letting control
frames continue.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ever since we use only cfg80211 for configuration,
there is no configuration that could be pending at
this point, cfg80211 will have the configuration
that is pending and apply it afterwards.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I suspect the compiler will do this optimisation
anyway, but it seems cleaner to move this into
the WEP switch case.
Also make rx_h_decrypt use a local variable for
the frame_control so that we don't need to reload
the hdr variable for this after linearizing.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's no sense in letting anything but internal
mac80211 functions set the initiator to anything
but WLAN_BACK_INITIATOR, since WLAN_BACK_RECIPIENT
is only valid when we have received a frame from
the peer, which we react to directly in mac80211.
The debugfs code I recently added got this wrong
as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some hardware allow extended filtering of ARP frames not intended for
the host. To perform such filtering, the hardware needs to know the current
IP address(es) of the host, bound to its interface.
Add support for ARP filtering to mac80211 by adding a new op to the driver
interface, allowing to configure the current IP addresses. This op is called
upon association with the currently configured address(es), and when
associated whenever the IP address(es) change.
This patch adds configuration of IPv4 addresses only, as IPv6 addresses don't
need ARP filtering.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The IBSS code has a bogus mod_timer(..., 0) call,
we shouldn't ever pass a constant value to the
function since any constant value could be in the
future or the past.
However, invoking the timer here is not necessary
at all, since we just finished scanning and just
need to have the IBSS code run again from the
workqueue later, so factor out the work starting
and use that instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
access skb->data safely
we should use skb_header_pointer() and skb_store_bits() to access skb->data to
handle small or non-linear skbs.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/sched/act_pedit.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
cleanup patch.
Use new __packed annotation in net/ and include/
(except netfilter)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid two atomic ops on struct in_device refcount per incoming packet,
if slow path taken, (or route cache disabled)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Lameter mentioned that packets could be dropped in input path
because of rp_filter settings, without any SNMP counter being
incremented. System administrator can have a hard time to track the
problem.
This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
each time we drop a packet because Reverse Path Filter triggers.
(We receive an IPv4 datagram on a given interface, and find the route to
send an answer would use another interface)
netstat -s | grep IPReversePathFilter
IPReversePathFilter: 21714
Reported-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending action frames, we want to verify
that we do that on the correct channel. However,
checking the channel type in addition can get in
the way, since the channel type could change on
the fly during an association, and it's not
useful to have the channel type anyway since it
has no effect on the transmission. Therefore,
make it optional to specify so that if wanted,
it can still be checked, but is not required.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don't refuse HT20 channels on devices that don't support HT40.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
use skb_header_pointer() to dereference data safely
the original skb->data dereference isn't safe, as there isn't any skb->len or
skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch.
And when the skb isn't long enough, we terminate the function u32_classify()
immediately with -1.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For large values of rtt, 2^rho operation may overflow u32. Clamp down the increment to 2^16.
Signed-off-by: Daniele Lacamera <root@danielinux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are more than a dozen occurrences of following code in the
IPv6 stack:
if (opt && opt->srcrt) {
struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
ipv6_addr_copy(&final, &fl.fl6_dst);
ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
final_p = &final;
}
Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally dhclient can be configured to send the "host-name" option
in DHCP requests to update the client's DNS record. However for an
NFSROOT system, dhclient shall never be called (which may change the
IP addr and therefore lose your root NFS mount connection).
So enable updating the DNS record with kernel parameter
ip=::::$HOST_NAME::dhcp
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix the wrong checksum when addr isn't in old_addr/mask
For TCP and UDP packets, when addr isn't in old_addr/mask we don't do SNAT or
DNAT, and we should not update layer 4 checksum.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/sched/act_nat.c | 4 ++++
1 file changed, 4 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
timestamp source that is exported to capture utilities like tcpdump by
packet_mmap.
PACKET_TIMESTAMP accepts the same integer bit field as
SO_TIMESTAMPING. However, only the SOF_TIMESTAMPING_SYS_HARDWARE and
SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
PACKET_TIMESTAMP. SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
If PACKET_TIMESTAMP is not set, a software timestamp generated inside
the networking stack is used (the behavior before this setting was
added).
Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When many cpus compete for sending frames on a given qdisc, the qdisc
spinlock suffers from very high contention.
The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
the lock, and cannot dequeue packets fast enough, since it must wait for
this lock for each dequeued packet.
One solution to this problem is to force all cpus spinning on a second
lock before trying to get the main lock, when/if they see
__QDISC_STATE_RUNNING already set.
The owning cpu then compete with at most one other cpu for the main
lock, allowing for higher dequeueing rate.
Based on a previous patch from Alexander Duyck. I added the heuristic to
avoid the atomic in fast path, and put the new lock far away from the
cache line used by the dequeue worker. Also try to release the busylock
lock as late as possible.
Tests with following script gave a boost from ~50.000 pps to ~600.000
pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
(A single netperf flow can reach ~800.000 pps on this platform)
for j in `seq 0 3`; do
for i in `seq 0 7`; do
netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
done
done
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a skb is received on an inactive bond that does not meet
the special cases checked for by skb_bond_should_drop it should
only be delivered to exact matches as the comment in
netif_receive_skb() says.
However because null_or_bond could also be null this is not
always true. This patch renames null_or_bond to orig_or_bond
and initializes it to orig_dev. This keeps the intent of
null_or_bond to pass frames received on VLAN interfaces stacked
on bonding interfaces without invalidating the statement for
null_or_orig.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlan device should not copy the slave or master flags from
the real device. It is not in the bond until added nor is it
a master.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
second patch will move on another location.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets going through __xfrm_route_forward() have a not refcounted dst
entry, since we enabled a noref forwarding path.
xfrm_lookup() might incorrectly release this dst entry.
It's a bit late to make invasive changes in xfrm_lookup(), so lets force
a refcount in this path.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dialog token allocator has apparently been broken
since b83f4e15 ("mac80211: fix deadlock in sta->lock")
because it got moved out under the spinlock. Fix it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Daniel reported that the paged RX changes had
broken blockack request frame processing due
to using data that wasn't really part of the
skb data.
Fix this using skb_copy_bits() for the needed
data. As a side effect, this adds a check on
processing too short frames, which previously
this code could do.
Reported-by: Daniel Halperin <dhalperi@cs.washington.edu>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Daniel Halperin <dhalperi@cs.washington.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use atomic_cmpxchg() to avoid dirtying a shared location.
xt_statistic_priv smp aligned to avoid sharing same cache line with
other stuff.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Avoid dirtying bridge_parent_rtable refcount, using new dst noref
infrastructure.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
As noticed by Julia Lawall, ipip6_tunnel_add_prl() incorrectly calls
kzallloc(..., GFP_KERNEL) while a spinlock is held. She provided
a patch to use GFP_ATOMIC instead.
One possibility would be to convert this spinlock to a mutex, or
preallocate the thing before taking the lock.
After RCU conversion, it appears we dont need this lock, since
caller already holds RTNL
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit: c720c7e838 missed these.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct sk_forward_alloc handling for error_queue would need to use a
backlog of frames that softirq handler could not deliver because socket
is owned by user thread. Or extend backlog processing to be able to
process normal and error packets.
Another possibility is to not use mem charge for error queue, this is
what I implemented in this patch.
Note: this reverts commit 29030374
(net: fix sk_forward_alloc corruptions), since we dont need to lock
socket anymore.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)
Fix this using alloc_percpu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In xt_register_table, xt_jumpstack_alloc is called first, later
xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
will be used again. Then the memory allocated by previous xt_jumpstack_alloc
will be leaked. We can simply remove the previous xt_jumpstack_alloc because
there aren't any users of newinfo between xt_jumpstack_alloc and
xt_replace_table.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Currently such notifications are only generated when the device comes up or the
address changes. However one use case for these notifications is to enable
faster network recovery after a virtual machine migration (by causing switches
to relearn their MAC tables). A migration appears to the network stack as a
temporary loss of carrier and therefore does not trigger either of the current
conditions. Rather than adding carrier up as a trigger (which can cause issues
when interfaces a flapping) simply add an interface which the driver can use
to explicitly trigger the notification.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
"phyinfo" can never be null here because we assigned it an address, so I
removed both the assert and the second check inside the if statement. I
removed the "phyinfo->phy_layer != NULL" check as well because that was
asserted earlier.
Walter Harms suggested I move the "phyinfo->phy_ref_count++;" outside
the if condition for readability, so I have done that.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already dereferenced uaddr towards the start of the function when we
checked that "uaddr->sa_family != AF_CAIF". Both the check here and the
earlier check were added in bece7b2398: "caif: Rewritten socket
implementation". Before that patch, we assumed that we recieved a valid
pointer for uaddr, and based on that, I have removed this check.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user when user data is immediately copied into the
allocated region.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user when user data is immediately copied into the
allocated region.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_md5_hash_skb_data() should handle skb->frag_list, and eventually
recurse.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netpoll does an interesting work in zap_completion_queue(), but this was
before we did skb orphaning before delivering packets to device.
It now makes sense to add a test in dev_kfree_skb_irq() to not queue a
skb if already orphaned, and to remove netpoll zap_completion_queue() as
a bonus.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch saves 224 bytes of text on my machine.
__this_cpu_inc() generates a single instruction, using no scratch
registers :
65 ff 04 25 a8 30 01 00 incl %gs:0x130a8
instead of :
48 c7 c2 80 30 01 00 mov $0x13080,%rdx
65 48 8b 04 25 88 ea 00 00 mov %gs:0xea88,%rax
83 44 10 28 01 addl $0x1,0x28(%rax,%rdx,1)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David found out, sock_queue_err_skb() should be called with socket
lock hold, or we risk sk_forward_alloc corruption, since we use non
atomic operations to update this field.
This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots.
(BH already disabled)
1) skb_tstamp_tx()
2) Before calling ip_icmp_error(), in __udp4_lib_err()
3) Before calling ipv6_icmp_error(), in __udp6_lib_err()
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The accept()'d socket need to be unhashed while the (listen()'ing)
socket lock is held. This fixes a race condition that could lead to an
OOPS.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There was an spin_unlock missing on the error path. The spin_lock was
tucked in with the declarations so it was hard to spot. I added a new
line.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Brændeland <sjurbren@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a mutex_unlock missing on the error path. In each case, whenever the
label out is reached from elsewhere in the function, mutex is not locked.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E1;
@@
* mutex_lock(E1);
<+... when != E1
if (...) {
... when != E1
* return ...;
}
...+>
* mutex_unlock(E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f4f914b5 (net: ipv6 bind to device issue) caused
a regression with Mobile IPv6 when it changed the meaning
of fl->oif to become a strict requirement of the route
lookup. Instead, only force strict mode when
sk->sk_bound_dev_if is set on the calling socket, getting
the intended behavior and fixing the regression.
Tested-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparse correctly complains that
__ieee80211_get_channel_mode is not static.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
netlink: bug fix: wrong size was calculated for vfinfo list blob
netlink: bug fix: don't overrun skbs on vf_port dump
xt_tee: use skb_dst_drop()
netdev/fec: fix ifconfig eth0 down hang issue
cnic: Fix context memory init. on 5709.
drivers/net: Eliminate a NULL pointer dereference
drivers/net/hamradio: Eliminate a NULL pointer dereference
be2net: Patch removes redundant while statement in loop.
ipv6: Add GSO support on forwarding path
net: fix __neigh_event_send()
vhost: fix the memory leak which will happen when memory_access_ok fails
vhost-net: fix to check the return value of copy_to/from_user() correctly
vhost: fix to check the return value of copy_to/from_user() correctly
vhost: Fix host panic if ioctl called with wrong index
net: fix lock_sock_bh/unlock_sock_bh
net/iucv: Add missing spin_unlock
net: ll_temac: fix checksum offload logic
net: ll_temac: fix interrupt bug when interrupt 0 is used
sctp: dubious bitfields in sctp_transport
ipmr: off by one in __ipmr_fill_mroute()
...
The wrong size was being calculated for vfinfo. In one case, it was over-
calculating using nlmsg_total_size on attrs, in another case, it was
under-calculating by assuming ifla_vf_* structs are packed together, but
each struct is it's own attr w/ hdr (and padding).
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer. Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 7fee226a (net: add a noref bit on skb dst), its wrong to
use : dst_release(skb_dst(skb)), since we could decrement a refcount
while skb dst was not refcounted.
We should use skb_dst_drop(skb) instead.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.
Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
when they really should be checking gso_size + header instead.
I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message. Someone
should add that to be 100% correct.
Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 7fee226ad2 (net: add a noref bit on skb dst) missed one spot
where an skb is enqueued, with a possibly not refcounted dst entry.
__neigh_event_send() inserts skb into arp_queue, so we must make sure
dst entry is refcounted, or dst entry can be freed by garbage collector
after caller exits from rcu protected section.
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
tracing: Add __used annotation to event variable
perf, trace: Fix !x86 build bug
perf report: Support multiple events on the TUI
perf annotate: Fix up usage of the build id cache
x86/mmiotrace: Remove redundant instruction prefix checks
perf annotate: Add TUI interface
perf tui: Remove annotate from popup menu after failure
perf report: Don't start the TUI if -D is used
perf: Fix getline undeclared
perf: Optimize perf_tp_event_match()
perf: Remove more code from the fastpath
perf: Optimize the !vmalloc backed buffer
perf: Optimize perf_output_copy()
perf: Fix wakeup storm for RO mmap()s
perf-record: Share per-cpu buffers
perf-record: Remove -M
perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
...
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix another nfs_wb_page() deadlock
NFS: Ensure that we mark the inode as dirty if we exit early from commit
NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker
sunrpc: fix leak on error on socket xprt setup
By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for iucv.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.
After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a spin_unlock missing on the error path. There seems like no reason
why the lock should continue to be held if the kzalloc fail.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E1;
@@
* spin_lock(E1,...);
<+... when != E1
if (...) {
... when != E1
* return ...;
}
...+>
* spin_unlock(E1,...);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also collect exit code together while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This fixes a smatch warning:
net/ipv4/ipmr.c +1917 __ipmr_fill_mroute(12) error: buffer overflow
'(mrt)->vif_table' 32 <= 32
The ipv6 version had the same issue.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits)
drivers/net/usb/asix.c: Fix pointer cast.
be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
proc_dointvec: write a single value
hso: add support for new products
Phonet: fix potential use-after-free in pep_sock_close()
ath9k: remove VEOL support for ad-hoc
ath9k: change beacon allocation to prefer the first beacon slot
sock.h: fix kernel-doc warning
cls_cgroup: Fix build error when built-in
macvlan: do proper cleanup in macvlan_common_newlink() V2
be2net: Bug fix in init code in probe
net/dccp: expansion of error code size
ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
wireless: fix sta_info.h kernel-doc warnings
wireless: fix mac80211.h kernel-doc warnings
iwlwifi: testing the wrong variable in iwl_add_bssid_station()
ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
ath9k_htc: dereferencing before check in hif_usb_tx_cb()
rt2x00: Fix rt2800usb TX descriptor writing.
rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
...
sk_common_release() might destroy our last reference to the socket.
So an extra temporary reference is needed during cleanup.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-4.3.3 produces the warning:
"format not a string literal and no format arguments"
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.
- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because MIPS's EDQUOT value is 1133(0x46d).
It's larger than u8.
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sta_info.h kernel-doc warnings:
Warning(net/mac80211/sta_info.h:164): No description found for parameter 'tid_active_rx[STA_TID_NUM]'
Warning(net/mac80211/sta_info.h:164): Excess struct/union/enum/typedef member 'tid_state_rx' description in 'sta_ampdu_mlme'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 03ceedea97.
This patch was reported to cause a regression in which connectivity is
lost and cannot be reestablished after a suspend/resume cycle.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
uml: Pushdown the bkl from harddog_kern ioctl
sunrpc: Pushdown the bkl from sunrpc cache ioctl
sunrpc: Pushdown the bkl from ioctl
autofs4: Pushdown the bkl from ioctl
uml: Convert to unlocked_ioctls to remove implicit BKL
ncpfs: BKL ioctl pushdown
coda: Clean-up whitespace problems in pioctl.c
coda: BKL ioctl pushdown
drivers: Push down BKL into various drivers
isdn: Push down BKL into ioctl functions
scsi: Push down BKL into ioctl functions
dvb: Push down BKL into ioctl functions
smbfs: Push down BKL into ioctl function
coda/psdev: Remove BKL from ioctl function
um/mmapper: Remove BKL usage
sn_hwperf: Kill BKL usage
hfsplus: Push down BKL into ioctl function
This reverts commit 03ceedea97, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.
In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes
says:
"Indeed, this patch needs to be reverted. That mac80211 change is wrong
and completely unnecessary."
Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Yingqiang Ma <yma.cool@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes tun update its socket classid every time we
inject a packet into the network stack. This is so that any
updates made by the admin to the process writing packets to
tun is effected.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread. This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.
Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.
In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK. This is currently
unclassified.
As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.
This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)
The value is set at socket creation time. So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.
For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.
In order to minimise risks, I have not made this the authoritative
classid. For now it is only used as a backup when we execute
with soft IRQs disabled. Once we're completely happy with its
semantics we can use it as the sole classid.
Footnote: I have rearranged the error path on cls_group module
creation. If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed handling when skb don't fit in user buffer,
instead of returning -EMSGSIZE, the buffer is truncated (just
as unix seqpakcet does).
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Splint found missing spin_unlock.
Corrected this an some other trivial split warnings.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered bug when testing async connect.
While connecting poll should not return POLLHUP,
but POLLOUT when connected.
Also fixed the sysfs flow-control-counters.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered bugs when injecting slab allocation failures.
Add checks on all memory allocation.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered bug when running high number of parallel connect requests.
Replace buggy home brewed list with linux/list.h.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discovered bug when testing on 64bit architecture.
Fixed by using long to store result from wait_event_interruptible_timeout.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the commit:
commit d90310243f
Author: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed Nov 18 02:36:59 2009 +0000
net: device name allocation cleanups
introduced a bug when there is a hash collision making impossible
to rename a device with eth%d. This bug is very hard to reproduce
and appears rarely.
The problem is coming from we don't pass a temporary buffer to
__dev_alloc_name but 'dev->name' which is modified by the function.
A detailed explanation is here:
http://marc.info/?l=linux-netdev&m=127417784011987&w=2
Changelog:
V2 : replaced strings comparison by pointers comparison
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c02db8c629:
Author: Chris Wright <chrisw@sous-sol.org>
Date: Sun May 16 01:05:45 2010 -0700
Subject: rtnetlink: make SR-IOV VF interface symmetric
adds broken error handling to do_setlink() in net/core/rtnetlink.c. The
problem is the following chunk of code:
if (tb[IFLA_VFINFO_LIST]) {
struct nlattr *attr;
int rem;
nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
if (nla_type(attr) != IFLA_VF_INFO)
----> goto errout;
err = do_setvfinfo(dev, attr);
if (err < 0)
goto errout;
modified = 1;
}
}
which can get to errout without setting err, resulting in the following error:
net/core/rtnetlink.c: In function 'do_setlink':
net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function
Change the code to return -EINVAL in this case. Note that this might not be
the appropriate error though.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chris Wright <chrisw@sous-sol.org>
cc: David S. Miller <davem@davemloft.net>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is already a submenu entry that is always displayed, so there is
no need to also show a dedicated CAIF comment.
Drop dead commented code while we're here, and change the submenu text
to better match the style everyone else is using.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Pfaff reported a kernel oops and provided a test program to
reproduce it.
https://kerneltrap.org/mailarchive/linux-netdev/2010/5/21/6277805
tc_fill_qdisc() should not be called for builtin qdisc, or it
dereference a NULL pointer to get device ifindex.
Fix is to always use tc_qdisc_dump_ignore() before calling
tc_fill_qdisc().
Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pushdown the bkl to cache_ioctl_pipefs.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Nfs <linux-nfs@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Pushdown the bkl to rpc_pipe_ioctl.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Nfs <linux-nfs@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
drivers/char: Eliminate use after free
virtio: console: Accept console size along with resize control message
virtio: console: Store each console's size in the console structure
virtio: console: Resize console port 0 on config intr only if multiport is off
virtio: console: Add support for nonblocking write()s
virtio: console: Rename wait_is_over() to will_read_block()
virtio: console: Don't always create a port 0 if using multiport
virtio: console: Use a control message to add ports
virtio: console: Move code around for future patches
virtio: console: Remove config work handler
virtio: console: Don't call hvc_remove() on unplugging console ports
virtio: console: Return -EPIPE to hvc_console if we lost the connection
virtio: console: Let host know of port or device add failures
virtio: console: Add a __send_control_msg() that can send messages without a valid port
virtio: Revert "virtio: disable multiport console support."
virtio: add_buf_gfp
trans_virtio: use virtqueue_xxx wrappers
virtio-rng: use virtqueue_xxx wrappers
virtio_ring: remove a level of indirection
virtio_net: use virtqueue_xxx wrappers
...
Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
I made a V2 of this patch on top of my patches for VFS switches.
All the changes were due to change in some offsets.
rename - change name of file or directory
size[4] Trename tag[2] fid[4] newdirfid[4] name[s]
size[4] Rrename tag[2]
The rename message is used to change the name of a file, possibly moving it
to a new directory. The 9P wstat message can only rename a file within the
same directory.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
I made a V2 of this patch on top of my patches for VFS switches. The
change was adding v9fs_statfs pointer to v9fs_super_ops_dotl
instead of v9fs_super_ops.
statfs - get file system statistics
size[4] Tstatfs tag[2] fid[4]
size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8]
files[8] ffree[8] fsid[8] namelen[4]
The statfs message is used to request file system information returned
by the statfs(2) system call, which is used by df(1) to report file
system and disk space usage.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Specifying a valid channel type will get
goto out rather than continuing, due to
missing braces. This affects both remain
on channel and action frame TX commands.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since wdev can be NULL, check it before dereferencing it
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
signal_type is enum cfg80211_signal_type.
This fixes the gcc warning:
scan.c: In function `cfg80211_inform_bss':
scan.c:518:6: warning: comparison between `enum cfg80211_signal_type' and `enum nl80211_bss'
scan.c: In function `cfg80211_inform_bss_frame':
scan.c:574:6: warning: comparison between `enum cfg80211_signal_type' and `enum nl80211_bss'
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit aaf8cdc34d.
Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.
In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.
Ouch!
Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents. Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.
Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.
So remove the misguided fix for moving devices between namespaces. The
bug in the kobject layer that inspired it has now been recognized and
fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to. For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.
For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I had a couple of stupid bugs in:
netns: Teach network device kobjects which namespace they are in.
- I duplicated the Kconfig for the NET_NS
- The build was broken when sysfs was not compiled in
The sysfs breakage is because after I moved the operations
for the sysfs to the kobject layer, to make things cleaner
I forgot to move the ifdefs. Opps.
I'm not quite certain how I got introduced a second NET_NS Kconfig,
but it was probably a 3 way merge somewhere along the way that
did not notice that the NET_NS Kconfig option had mvoed and thout
that was a bug. It probably slipped in because it used to be the
sysfs patches were the first patches in my network namespace patches.
Some things just don't go like you would expect.
Neither of these bugs actually affect anything in the common case
but they should be fixed.
Thanks to Serge for noticing they were present.
Reported-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: David S. Miller <davem@davemloft.net>
The problem. Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!
To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.
By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix some issues introduced in batch skb dequeuing for input_pkt_queue.
The primary issue it that the queue head must be incremented only
after a packet has been processed, that is only after
__netif_receive_skb has been called. This is needed for the mechanism
to prevent OOO packet in RFS. Also when flushing the input_pkt_queue
and process_queue, the process queue should be done first to prevent
OOO packets.
Because the input_pkt_queue has been effectively split into two queues,
the calculation of the tail ptr is no longer correct. The correct value
would be head+input_pkt_queue->len+process_queue->len. To avoid
this calculation we added an explict input_queue_tail in softnet_data.
The tail value is simply incremented when queuing to input_pkt_queue.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When GRO produces fraglist entries, and the resulting skb hits
an interface that is incapable of TSO but capable of FRAGLIST,
we end up producing a bogus packet with gso_size non-zero.
This was reported in the field with older versions of KVM that
did not set the TSO bits on tuntap.
This patch fixes that.
Reported-by: Igor Zhang <yugzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
qlcnic: adding co maintainer
ixgbe: add support for active DA cables
ixgbe: dcb, do not tag tc_prio_control frames
ixgbe: fix ixgbe_tx_is_paused logic
ixgbe: always enable vlan strip/insert when DCB is enabled
ixgbe: remove some redundant code in setting FCoE FIP filter
ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
ixgbe: fix header len when unsplit packet overflows to data buffer
ipv6: Never schedule DAD timer on dead address
ipv6: Use POSTDAD state
ipv6: Use state_lock to protect ifa state
ipv6: Replace inet6_ifaddr->dead with state
cxgb4: notify upper drivers if the device is already up when they load
cxgb4: keep interrupts available when the ports are brought down
cxgb4: fix initial addition of MAC address
cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
cnic: Convert cnic_local_flags to atomic ops.
can: Fix SJA1000 command register writes on SMP systems
bridge: fix build for CONFIG_SYSFS disabled
ARCNET: Limit com20020 PCI ID matches for SOHARD cards
...
Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).
Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
vlynq: make whole Kconfig-menu dependant on architecture
add descriptive comment for TIF_MEMDIE task flag declaration.
EEPROM: max6875: Header file cleanup
EEPROM: 93cx6: Header file cleanup
EEPROM: Header file cleanup
agp: use NULL instead of 0 when pointer is needed
rtc-v3020: make bitfield unsigned
PCI: make bitfield unsigned
jbd2: use NULL instead of 0 when pointer is needed
cciss: fix shadows sparse warning
doc: inode uses a mutex instead of a semaphore.
uml: i386: Avoid redefinition of NR_syscalls
fix "seperate" typos in comments
cocbalt_lcdfb: correct sections
doc: Change urls for sparse
Powerpc: wii: Fix typo in comment
i2o: cleanup some exit paths
Documentation/: it's -> its where appropriate
UML: Fix compiler warning due to missing task_struct declaration
UML: add kernel.h include to signal.c
...
This race was triggered by a 'conntrack -F' command running in parallel
to the insertion of a hash for a new connection. Losing this race led to
a dead conntrack entry effectively blocking traffic for a particular
connection until timeout or flushing the conntrack hashes again.
Now the check for an already dying connection is done inside the lock.
Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: (45 commits)
Revert "nfsd4: distinguish expired from stale stateids"
nfsd: safer initialization order in find_file()
nfs4: minor callback code simplification, comment
NFSD: don't report compiled-out versions as present
nfsd4: implement reclaim_complete
nfsd4: nfsd4_destroy_session must set callback client under the state lock
nfsd4: keep a reference count on client while in use
nfsd4: mark_client_expired
nfsd4: introduce nfs4_client.cl_refcount
nfsd4: refactor expire_client
nfsd4: extend the client_lock to cover cl_lru
nfsd4: use list_move in move_to_confirmed
nfsd4: fold release_session into expire_client
nfsd4: rename sessionid_lock to client_lock
nfsd4: fix bare destroy_session null dereference
nfsd4: use local variable in nfs4svc_encode_compoundres
nfsd: further comment typos
sunrpc: centralise most calls to svc_xprt_received
nfsd4: fix unlikely race in session replay case
nfsd4: fix filehandle comment
...
* 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (78 commits)
SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired
SUNRPC: Reorder the struct rpc_task fields
SUNRPC: Remove the 'tk_magic' debugging field
SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
NFS: Don't call iput() in nfs_access_cache_shrinker
NFS: Clean up nfs_access_zap_cache()
NFS: Don't run nfs_access_cache_shrinker() when the mask is GFP_NOFS
SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter
SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired()
SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS
NFS: Read requests can use GFP_KERNEL.
NFS: Clean up nfs_create_request()
NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls
NFSv4: Don't use GFP_KERNEL allocations in state recovery
SUNRPC: Fix xs_setup_bc_tcp()
SUNRPC: Replace jiffies-based metrics with ktime-based metrics
ktime: introduce ktime_to_ms()
SUNRPC: RPC metrics and RTT estimator should use same RTT value
NFS: Calldata for nfs4_renew_done()
NFS: Squelch compiler warning in nfs_add_server_stats()
...
* 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
sunrpc: Include missing smp_lock.h
procfs: Kill the bkl in ioctl
procfs: Push down the bkl from ioctl
procfs: Use generic_file_llseek in /proc/vmcore
procfs: Use generic_file_llseek in /proc/kmsg
procfs: Use generic_file_llseek in /proc/kcore
procfs: Kill BKL in llseek on proc base
Switch trans_virtio to new virtqueue_xxx wrappers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch ensures that all places that schedule the DAD timer
look at the address state in a safe manner before scheduling the
timer. This ensures that we don't end up with pending timers
after deleting an address.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes use of the new POSTDAD state. This prevents
a race between DAD completion and failure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes use of the new state_lock to synchronise between
updates to the ifa state. This fixes the issue where a remotely
triggered address deletion (through DAD failure) coincides with a
local administrative address deletion, causing certain actions to
be performed twice incorrectly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum. This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.
This patch only adds the state field and does not change the logic.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'
Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.
It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.
Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the user has a bad classification configuration, and gets a packet
that goes through too many steps. Chances are more packets will arrive,
and the message spew will overrun syslog because it is not rate limited.
And because it is not tagged with appropriate priority it can't not be screened.
Added the qdisc to the message to try and give some more context when
the message does arrive.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put severity level on pfkey printk messages
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serious oh sh*t messages converted to WARN().
Add KERN_NOTICE severity to the unknown policy type messages.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch encourage me to go look at all the messages in
the network scheduler and fix them. Many messages were missing
any severity level. Some serious ones that should never happen
were turned into WARN(), and the random noise messages that were
handled changed to pr_debug().
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kzalloc rather than the combination of kmalloc and memset.
A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,size,flags;
statement S;
@@
-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
if (x == NULL) S
-memset(x, 0, size);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 5fa782c2f5
sctp: Fix skb_over_panic resulting from multiple invalid \
parameter errors (CVE-2010-1173) (v4)
cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list). These are both nested atrtibutes
using this layout:
[IFLA_NUM_VF]
[IFLA_VF_PORTS]
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
...
[IFLA_PORT_SELF]
[IFLA_PORT_*], ...
These attributes are design to be set and get symmetrically. VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device. PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.
A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev. A port-profile is an identifier known to the switch. How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.
There are two types of port-profiles specs in the netlink msg. The first spec
is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details. In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs. For example, both specs
have a notition of associating/deassociating a port-profile. And both specs
require some information from the hypervisor manager, such as client port
instance ID.
The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile. What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.
There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 33ad798c92 (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.
Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 40 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.
We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.
Fix is to really disable timestamps option for the whole session.
Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The duplicate address check code got broken in the conversion
to hlist (2.6.35). The earlier patch did not fix the case where
two addresses match same hash value. Use two exit paths,
rather than depending on state of loop variables (from macro).
Based on earlier fix by Shan Wei.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When attaching filters to a class pointing to a class higher up in the
hierarchy, classification may enter an endless loop. Currently this is
prevented for filters that are already resolved, but not for filters
resolved at runtime.
Only allow filters to point downwards in the hierarchy, similar to what
CBQ does.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several netem users use TBF for rate control. But every time the parameters
of TBF are changed it destroys the child qdisc, requiring reconfigation.
Better to just keep child qdisc and just notify it of changed limit.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also added an explicit break; to avoid
a fallthrough in net/ipv4/tcp_input.c
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes the BKL in x25 setsock and getsockopts.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the x25 accept approve flag from char into bitfield.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the x25 interrupt flag from char into bitfield.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP outgoing packets can avoid two atomic ops, and dirtying
of previously higly contended cache line using new refdst
infrastructure.
Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE
Note 2: UDP packets dsts are built before ip_queue_xmit().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per
incoming packet.
Note: loopback is excluded from this optimization in ip_rcv_finish()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use low order bit of skb->_skb_dst to tell dst is not refcounted.
Change _skb_dst to _skb_refdst to make sure all uses are catched.
skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.
New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)
skb_dst_drop() drops a reference only if skb dst was refcounted.
skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.
Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().
Use skb_dst_force() in dev_requeue_skb().
Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_SMP=y, then we own a queue spinlock, we can avoid the atomic
test_and_set_bit() from napi_schedule_prep().
We now have same number of atomic ops per netif_rx() calls than with
pre-RPS kernel.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
as addrlabels with an interface index are left alone when the
interface gets removed this results in addrlabels that can no
longer be removed.
Restrict validation of index to adding new addrlabels.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that cache_ioctl_procfs() calls the bkl explicitly, we need to
include the relevant header as well.
This fixes the following build error:
net/sunrpc/cache.c: In function 'cache_ioctl_procfs':
net/sunrpc/cache.c:1355: error: implicit declaration of function 'lock_kernel'
net/sunrpc/cache.c:1359: error: implicit declaration of function 'unlock_kernel'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>