linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-25 14:20:52 +07:00

Author	SHA1	Message	Date
Ying Xue	da7c224b1b	net: xfrm: xfrm_policy: silence compiler warning Fix below compiler warning: net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function] Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 22:45:26 -05:00
Jon Paul Maloy	581465fa28	tipc: make link start event synchronous When a link is created we delay the start event by launching it to be executed later in a tasklet. As we hold all the necessary locks at the moment of creation, and there is no risk of deadlock or contention, this delay serves no purpose in the current code. We remove this obsolete indirection step, and the associated function link_start(). At the same time, we rename the function tipc_link_stop() to the more appropriate tipc_link_purge_queues(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:26 -05:00
Ying Xue	f9a2c80b8b	tipc: introduce new spinlock to protect struct link_req Currently, only 'bearer_lock' is used to protect struct link_req in the function disc_timeout(). This is unsafe, since the member fields 'num_nodes' and 'timer_intv' might be accessed by below three different threads simultaneously, none of them grabbing bearer_lock in the critical region: link_activate() tipc_bearer_add_dest() tipc_disc_add_dest() req->num_nodes++; tipc_link_reset() tipc_bearer_remove_dest() tipc_disc_remove_dest() req->num_nodes-- disc_update() read req->num_nodes write req->timer_intv disc_timeout() read req->num_nodes read/write req->timer_intv Without lock protection, the only symptom of a race is that discovery messages occasionally may not be sent out. This is not fatal, since such messages are best-effort anyway. On the other hand, since discovery messages are not time critical, adding a protecting lock brings no serious overhead either. So we add a new, dedicated spinlock in order to guarantee absolute data consistency in link_req objects. This also helps reduce the overall role of the bearer_lock, which we want to remove completely in a later commit series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	b9d4c33935	tipc: remove 'has_redundant_link' flag from STATE link protocol messages The flag 'has_redundant_link' is defined only in RESET and ACTIVATE protocol messages. Due to an ambiguity in the protocol specification it is currently also transferred in STATE messages. Its value is used to initialize a link state variable, 'permit_changeover', which is used to inhibit futile link failover attempts when it is known that the peer node has no working links at the moment, although the local node may still think it has one. The fact that 'has_redundant_link' incorrectly is read from STATE messages has the effect that 'permit_changeover' sometimes gets a wrong value, and permanently blocks any links from being re-established. Such failures can only occur in in dual-link systems, and are extremely rare. This bug seems to have always been present in the code. Furthermore, since commit `b4b5610223` ("tipc: Ensure both nodes recognize loss of contact between them"), the 'permit_changeover' field serves no purpose any more. The task of enforcing 'lost contact' cycles at both peer endpoints is now taken by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN in struct tipc_node to abort unnecessary failover attempts. We therefore remove the 'has_redundant_link' flag from STATE messages, as well as the now redundant 'permit_changeover' variable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	170b3927b4	tipc: rename functions related to link failover and improve comments The functionality related to link addition and failover is unnecessarily hard to understand and maintain. We try to improve this by renaming some of the functions, at the same time adding or improving the explanatory comments around them. Names such as "tipc_rcv()" etc. also align better with what is used in other networking components. The changes in this commit are purely cosmetic, no functional changes are made. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
David S. Miller	a04c0e2c0d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains two patches: * fix the IRC NAT helper which was broken when adding (incomplete) IPv6 support, from Daniel Borkmann. * Refine the previous bugtrap that Jesper added to catch problems for the usage of the sequence adjustment extension in IPVs in Dec 16th, it may spam messages in case of finding a real bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:38:17 -05:00
Daniel Borkmann	be7928d20b	net: xfrm: xfrm_policy: fix inline not at beginning of declaration Fix three warnings related to: net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] Just removing the inline keyword is sufficient as the compiler will decide on its own about inlining or not. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:34:00 -05:00
Patrick McHardy	9638f33ecf	netfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET The ct expression can currently not be used in the inet family since we don't have a conntrack module for NFPROTO_INET, so nf_ct_l3proto_try_module_get() fails. Add some manual handling to load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the ct expression is used in the inet family. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:32 +01:00
Patrick McHardy	4566bf2706	netfilter: nft_meta: add l4proto support For L3-proto independant rules we need to get at the L4 protocol value directly. Add it to the nft_pktinfo struct and use the meta expression to retrieve it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:31 +01:00
Patrick McHardy	124edfa9e0	netfilter: nf_tables: add nfproto support to meta expression Needed by multi-family tables to distinguish IPv4 and IPv6 packets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:30 +01:00
Patrick McHardy	1d49144c0a	netfilter: nf_tables: add "inet" table for IPv4/IPv6 This patch adds a new table family and a new filter chain that you can use to attach IPv4 and IPv6 rules. This should help to simplify rule-set maintainance in dual-stack setups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:57:25 +01:00
Patrick McHardy	115a60b173	netfilter: nf_tables: add support for multi family tables Add support to register chains to multiple hooks for different address families for mixed IPv4/IPv6 tables. Signed-off-by: Patrick McHardy <kaber@trash.net>	2014-01-07 23:55:46 +01:00
Patrick McHardy	c9484874e7	netfilter: nf_tables: add hook ops to struct nft_pktinfo Multi-family tables need the AF from the hook ops. Add a pointer to the hook ops and replace usage of the hooknum member in struct nft_pktinfo. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Patrick McHardy	3b088c4bc0	netfilter: nf_tables: make chain types override the default AF functions Currently the AF-specific hook functions override the chain-type specific hook functions. That doesn't make too much sense since the chain types are a special case of the AF-specific hooks. Make the AF-specific hook functions the default and make the optional chain type hooks override them. As a side effect, the necessary code restructuring reduces the code size, f.i. in case of nf_tables_ipv4.o: nf_tables_ipv4_init_net \| -24 nft_do_chain_ipv4 \| -113 2 functions changed, 137 bytes removed, diff: -137 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Pablo Neira Ayuso	688d18636f	netfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled net/netfilter/nft_reject.c: In function 'nft_reject_eval': net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
Jerry Chu	bf5a755f5e	net-gre-gro: Add GRE support to the GRO stack This patch built on top of Commit `299603e837` ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO stack. It also serves as an example for supporting other encapsulation protocols in the GRO stack in the future. The patch supports version 0 and all the flags (key, csum, seq#) but will flush any pkt with the S (seq#) flag. This is because the S flag is not support by GSO, and a GRO pkt may end up in the forwarding path, thus requiring GSO support to break it up correctly. Currently the "packet_offload" structure only contains L3 (ETH_P_IP/ ETH_P_IPV6) GRO offload support so the encapped pkts are limited to IP pkts (i.e., w/o L2 hdr). But support for other protocol type can be easily added, so is the support for GRE variations like NVGRE. The patch also support csum offload. Specifically if the csum flag is on and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE), the code will take advantage of the csum computed by the h/w when validating the GRE csum. Note that commit `60769a5dcd` "ipv4: gre: add GRO capability" already introduces GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. But GRO is done after GRE hdr has been removed (i.e., decapped). The following patch applies GRO when pkts first come in (before hitting the GRE tunnel code). There is some performance advantage for applying GRO as early as possible. Also this approach is transparent to other subsystem like Open vSwitch where GRE decap is handled outside of the IP stack hence making it harder for the gro_cells stuff to apply. On the other hand, some NICs are still not capable of hashing on the inner hdr of a GRE pkt (RSS). In that case the GRO processing of pkts from the same remote host will all happen on the same CPU and the performance may be suboptimal. I'm including some rough preliminary performance numbers below. Note that the performance will be highly dependent on traffic load, mix as usual. Moreover it also depends on NIC offload features hence the following is by no means a comprehesive study. Local testing and tuning will be needed to decide the best setting. All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs. (super_netperf 50 -H 192.168.1.18 -l 30) An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1 mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123) is configured. The GRO support for pkts AFTER decap are controlled through the device feature of the GRE device (e.g., ethtool -K gre1 gro on/off). 1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off thruput: 9.16Gbps CPU utilization: 19% 1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off thruput: 5.9Gbps CPU utilization: 15% 1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 12-13% 1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 10% The following tests were performed on a different NIC that is capable of csum offload. I.e., the h/w is capable of computing IP payload csum (CHECKSUM_COMPLETE). 2.1 ethtool -K gre1 gro on (hence will use gro_cells) 2.1.1 ethtool -K eth0 gro off; csum offload disabled thruput: 8.53Gbps CPU utilization: 9% 2.1.2 ethtool -K eth0 gro off; csum offload enabled thruput: 8.97Gbps CPU utilization: 7-8% 2.1.3 ethtool -K eth0 gro on; csum offload disabled thruput: 8.83Gbps CPU utilization: 5-6% 2.1.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.98Gbps CPU utilization: 5% 2.2 ethtool -K gre1 gro off 2.2.1 ethtool -K eth0 gro off; csum offload disabled thruput: 5.93Gbps CPU utilization: 9% 2.2.2 ethtool -K eth0 gro off; csum offload enabled thruput: 5.62Gbps CPU utilization: 8% 2.2.3 ethtool -K eth0 gro on; csum offload disabled thruput: 7.69Gbps CPU utilization: 8% 2.2.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.96Gbps CPU utilization: 5-6% Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:21:31 -05:00
Benjamin Poirier	cdb3f4a31b	net: Do not enable tx-nocache-copy by default There are many cases where this feature does not improve performance or even reduces it. For example, here are the results from tests that I've run using 3.12.6 on one Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are from the Xeon, but they're similar on the i7. All numbers report the mean±stddev over 10 runs of 10s. 1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache copy from user on transmit" There is no statistically significant difference between tx-nocache-copy on/off. nic irqs spread out (one queue per cpu) 200x netperf -r 1400,1 tx-nocache-copy off 692000±1000 tps 50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3 tx-nocache-copy on 693000±1000 tps 50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7 200x netperf -r 14000,14000 tx-nocache-copy off 86450±80 tps 50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40 tx-nocache-copy on 86110±60 tps 50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20 2) single stream throughput tests tx-nocache-copy leads to higher service demand throughput cpu0 cpu1 demand (Gb/s) (Gcycle) (Gcycle) (cycle/B) nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send) tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01 tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004 nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send) tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007 tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002 As a second example, here are some results from Eric Dumazet with latest net-next. tx-nocache-copy also leads to higher service demand (cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz) lpq83:~# ./ethtool -K eth0 tx-nocache-copy on lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000 Performance counter stats for './netperf -H lpq84 -c': 4282.648396 task-clock # 0.423 CPUs utilized 9,348 context-switches # 0.002 M/sec 88 CPU-migrations # 0.021 K/sec 355 page-faults # 0.083 K/sec 11,812,797,651 cycles # 2.758 GHz [82.79%] 9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%] 4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%] 6,053,172,792 instructions # 0.51 insns per cycle # 1.49 stalled cycles per insn [83.64%] 597,275,583 branches # 139.464 M/sec [83.70%] 8,960,541 branch-misses # 1.50% of all branches [83.65%] 10.128990264 seconds time elapsed lpq83:~# ./ethtool -K eth0 tx-nocache-copy off lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000 Performance counter stats for './netperf -H lpq84 -c': 2847.375441 task-clock # 0.281 CPUs utilized 11,632 context-switches # 0.004 M/sec 49 CPU-migrations # 0.017 K/sec 354 page-faults # 0.124 K/sec 7,646,889,749 cycles # 2.686 GHz [83.34%] 6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%] 1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%] 2,079,702,453 instructions # 0.27 insns per cycle # 2.94 stalled cycles per insn [83.22%] 363,773,213 branches # 127.757 M/sec [83.29%] 4,242,732 branch-misses # 1.17% of all branches [83.51%] 10.128449949 seconds time elapsed CC: Tom Herbert <therbert@google.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:20:19 -05:00
Erik Hugne	732256b933	tipc: correctly unlink packets from deferred packet queue When we pull a received packet from a link's 'deferred packets' queue for processing, its 'next' pointer is not cleared, and still refers to the next packet in that queue, if any. This is incorrect, but caused no harm before commit `40ba3cdf54` ("tipc: message reassembly using fragment chain") was introduced. After that commit, it may sometimes lead to the following oops: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W 3.13.0-rc2+ #6 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000 RIP: 0010:[<ffffffff81710694>] [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0 RSP: 0018:ffff880016603a78 EFLAGS: 00010212 RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0 RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0 RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00 R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650 FS: 0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0 Stack: ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00 0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650 ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc Call Trace: <IRQ> [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc] [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00 [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00 [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70 [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200 [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130 [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530 [<ffffffff81565986>] e1000_clean+0x266/0x9c0 [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160 [<ffffffff8171f971>] net_rx_action+0x141/0x310 [<ffffffff81051c1b>] __do_softirq+0xeb/0x480 [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100 [<ffffffff81052346>] irq_exit+0x96/0xc0 [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0 [<ffffffff81981def>] common_interrupt+0x6f/0x6f <EOI> This happens when the last fragment of a message has passed through the the receiving link's 'deferred packets' queue, and at least one other packet was added to that queue while it was there. After the fragment chain with the complete message has been successfully delivered to the receiving socket, it is released. Since 'next' pointer of the last fragment in the released chain now is non-NULL, we get the crash shown above. We fix this by clearing the 'next' pointer of all received packets, including those being pulled from the 'deferred' queue, before they undergo any further processing. Fixes: `40ba3cdf54` ("tipc: message reassembly using fragment chain") Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:15:24 -05:00
Jiri Pirko	dfd1582d1e	ipv4: loopback device: ignore value changes after device is upped When lo is brought up, new ifa is created. Then, devconf and neigh values bitfield should be set so later changes of default values would not affect lo values. Note that the same behaviour is in ipv6. Also note that this is likely not an issue in many distros (for example Fedora 19) because userspace sets address to lo manually before bringing it up. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 15:55:17 -05:00
FX Le Bail	509aba3b0d	IPv6: add the option to use anycast addresses as source addresses in echo reply This change allows to follow a recommandation of RFC4942. - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses as source addresses for ICMPv6 echo reply. This sysctl is false by default to preserve existing behavior. - Add inline check ipv6_anycast_destination(). - Use them in icmpv6_echo_reply(). Reference: RFC4942 - IPv6 Transition/Coexistence Security Considerations (http://tools.ietf.org/html/rfc4942#section-2.1.6) 2.1.6. Anycast Traffic Identification and Security [...] To avoid exposing knowledge about the internal structure of the network, it is recommended that anycast servers now take advantage of the ability to return responses with the anycast address as the source address if possible. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 15:51:39 -05:00
Li RongQing	657e5d1965	ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c initialise pcpu_tstats.syncp to kill the calltrace [ 11.973950] Call Trace: [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<819bbaff>] dump_stack+0x48/0x60 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<81079fa7>] lock_acquire+0x77/0xa0 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] ? dev_get_stats+0xcb/0x130 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<811cf8c1>] ? __nla_reserve+0x21/0xd0 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817ca7ab>] dev_get_stats+0xcb/0x130 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<810352e0>] ? kvm_clock_read+0x20/0x30 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<81008e38>] ? sched_clock+0x8/0x10 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<8106ba45>] ? sched_clock_local+0x25/0x170 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<810da6bd>] ? __kmalloc+0x3d/0x90 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<810da81a>] ? slob_alloc_node+0x2a/0x60 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817b919a>] ? __alloc_skb+0x6a/0x2b0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817d8795>] rtmsg_ifinfo+0x65/0xe0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<817cbd31>] register_netdevice+0x531/0x5a0 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<817cbdb6>] register_netdev+0x16/0x30 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f574a6>] vti6_init_net+0x1c4/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<81f573af>] ? vti6_init_net+0xcd/0x1d4 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c16df>] ops_init.constprop.11+0x17f/0x1c0 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c1779>] register_pernet_operations.isra.9+0x59/0x90 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<817c18d1>] register_pernet_device+0x21/0x60 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f574c7>] vti6_tunnel_init+0x11/0x68 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f572a1>] ? mip6_init+0x73/0xb4 [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<81f0cba4>] do_one_initcall+0xbb/0x15b [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<811a00d8>] ? sha_transform+0x528/0x1150 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<81f0c544>] ? repair_env_string+0x12/0x51 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<8105c30d>] ? parse_args+0x2ad/0x440 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0cd27>] kernel_init_freeable+0xe3/0x182 [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<81f0c532>] ? do_early_param+0x7a/0x7a [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819b5b1b>] kernel_init+0xb/0x100 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819cebf7>] ret_from_kernel_thread+0x1b/0x28 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 [ 11.973950] [<819b5b10>] ? rest_init+0xc0/0xc0 Before `469bdcefdc` ("ipv6: fix the use of pcpu_tstats in ip6_vti.c"), the pcpu_tstats.syncp is not used to pretect the 64bit elements of pcpu_tstats, so not appear this calltrace. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 14:12:46 -05:00
Thierry Escande	b711ad524b	NFC: digital: Set rf tech and crc functions when receiving a PSL_REQ This patch sets the correct rf tech value and crc functions in target mode when receiving a PSL_REQ, as done when receiving an ATR_REQ. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 18:48:12 +01:00
Thierry Escande	48e1044515	NFC: digital: Set current target active on activate_target() call The curr_protocol field of nfc_digital_dev structure used to determine if a target is currently active was set too soon, immediately when a target is found. This is not good since there is no other way than deactivate_target() to reset curr_protocol and if activate_target() is not called, the target remains active and it's not possible to put the device in poll mode anymore. With this patch curr_protocol is set when nfc core activates a target, puts a device up, or when an ATR_REQ is received in target mode. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 18:48:12 +01:00
Emmanuel Grumbach	349b196044	mac80211: allow to set smps mode to OFF in AP mode In managed mode, we should not ask for OFF mode because the power settings may still require DYNAMIC. In AP mode, this should be allowed since the default settings is OFF and AUTOMATIC is not allowed. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:25:49 +01:00
Johannes Berg	3c2723f503	mac80211: clean up prepare_for_handlers() return value Using an int with 0/1 is not very common, make the function return a bool instead with the same values (false/true). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:23:24 +01:00
Emmanuel Grumbach	40791942ec	mac80211: simplify code in ieee80211_prepare_and_rx_handle No need to assign the return value of prepare_for_handlers to a variable if the only usage is to test it. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:22:15 +01:00
Emmanuel Grumbach	87ee475ef6	mac80211: clean up garbage in comment Not clear how this landed here. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-07 16:21:56 +01:00
Claudio Takahasi	e825eb1d7e	Bluetooth: Fix 6loWPAN peer lookup This patch fixes peer address lookup for 6loWPAN over Bluetooth Low Energy links. ADDR_LE_DEV_PUBLIC, and ADDR_LE_DEV_RANDOM are the values allowed for "dst_type" field in the hci_conn struct for LE links. Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2014-01-07 11:32:15 -02:00
Claudio Takahasi	b071a62099	Bluetooth: Fix setting Universal/Local bit This patch fixes the Bluetooth Low Energy Address type checking when setting Universal/Local bit for the 6loWPAN network device or for the peer device connection. ADDR_LE_DEV_PUBLIC or ADDR_LE_DEV_RANDOM are the values allowed for "src_type" and "dst_type" in the hci_conn struct. The Bluetooth link type can be obtainned reading the "type" field in the same struct. Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2014-01-07 11:32:11 -02:00
Eric Dumazet	438e38fadc	gre_offload: statically build GRE offloading support GRO/GSO layers can be enabled on a node, even if said node is only forwarding packets. This patch permits GSO (and upcoming GRO) support for GRE encapsulated packets, even if the host has no GRE tunnel setup. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 20:28:34 -05:00
David S. Miller	39b6b2992f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== [GIT net-next] Open vSwitch Open vSwitch changes for net-next/3.14. Highlights are: * Performance improvements in the mechanism to get packets to userspace using memory mapped netlink and skb zero copy where appropriate. * Per-cpu flow stats in situations where flows are likely to be shared across CPUs. Standard flow stats are used in other situations to save memory and allocation time. * A handful of code cleanups and rationalization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 19:48:38 -05:00
Amitkumar Karwar	22c15bf30b	NFC: NCI: Add set_config API This API can be used by drivers to send their custom configuration using SET_CONFIG NCI command to the device. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Amitkumar Karwar	86e8586ed5	NFC: NCI: Add setup handler Some drivers require special configuration while initializing. This patch adds setup handler for this custom configuration. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Amitkumar Karwar	1907299867	NFC: NCI: Don't reverse local general bytes Local general bytes returned by nfc_get_local_general_bytes() are already in correct order. We don't need to reverse them. Remove local_gb[] local array as it's not needed any more. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-07 01:32:40 +01:00
Stephen Hemminger	443cd88c8a	ovs: make functions local Several functions and datastructures could be local Found with 'make namespacecheck' Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:54:39 -08:00
Thomas Graf	09c5e6054e	openvswitch: Compute checksum in skb_gso_segment() if needed The copy & csum optimization is no longer present with zerocopy enabled. Compute the checksum in skb_gso_segment() directly by dropping the HW CSUM capability from the features passed in. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:24 -08:00
Thomas Graf	bda56f143c	openvswitch: Use skb_zerocopy() for upcall Use of skb_zerocopy() can avoid the expensive call to memcpy() when copying the packet data into the Netlink skb. Completes checksum through skb_checksum_help() if not already done in GSO segmentation. Zerocopy is only performed if user space supported unaligned Netlink messages. memory mapped netlink i/o is preferred over zerocopy if it is set up. Cost of upcall is significantly reduced from: + 7.48% vhost-8471 [k] memcpy + 5.57% ovs-vswitchd [k] memcpy + 2.81% vhost-8471 [k] csum_partial_copy_generic to: + 5.72% ovs-vswitchd [k] memcpy + 3.32% vhost-5153 [k] memcpy + 0.68% vhost-5153 [k] skb_zerocopy (megaflows disabled) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:17 -08:00
Thomas Graf	8055a89cfa	openvswitch: Pass datapath into userspace queue functions Allows removing the net and dp_ifindex argument and simplify the code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:07 -08:00
Thomas Graf	44da5ae5fb	openvswitch: Drop user features if old user space attempted to create datapath Drop user features if an outdated user space instance that does not understand the concept of user_features attempted to create a new datapath. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:53:00 -08:00
Thomas Graf	43d4be9cb5	openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:53 -08:00
Thomas Graf	af2806f8f9	net: Export skb_zerocopy() to zerocopy from one skb to another Make the skb zerocopy logic written for nfnetlink queue available for use by other modules. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:42 -08:00
Wei Yongjun	5f03f47c9c	openvswitch: remove duplicated include from flow_table.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:35 -08:00
Daniel Borkmann	11d6c461b3	net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback} As we're only doing a kfree() anyway in the RCU callback, we can simply use kfree_rcu, which does the same job, and remove the function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:30 -08:00
Pravin B Shelar	e298e50570	openvswitch: Per cpu flow stats. With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. This patch uses per-CPU stats in cases where a flow is likely to be shared (if there is a wildcard in the 5-tuple and therefore likely to be spread by RSS). In other situations, it uses the current strategy, saving memory and allocation time. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:24 -08:00
Thomas Graf	795449d8b8	openvswitch: Enable memory mapped Netlink i/o Use memory mapped Netlink i/o for all unicast openvswitch communication if a ring has been set up. Benchmark * pktgen -> ovs internal port * 5M pkts, 5M flows * 4 threads, 8 cores Before: Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags) 74163pps 5339Mb/sec (5339736000bps) errors: 0 + 2.98% ovs-vswitchd [k] copy_user_generic_string + 2.49% ovs-vswitchd [k] memcpy + 1.84% kpktgend_2 [k] memcpy + 1.81% kpktgend_1 [k] memcpy + 1.81% kpktgend_3 [k] memcpy + 1.78% kpktgend_0 [k] memcpy After: Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags) 206358pps 14857Mb/sec (14857776000bps) errors: 0 + 2.80% ovs-vswitchd [k] memcpy + 1.31% kpktgend_2 [k] memcpy + 1.23% kpktgend_0 [k] memcpy + 1.09% kpktgend_1 [k] memcpy + 1.04% kpktgend_3 [k] memcpy + 0.96% ovs-vswitchd [k] copy_user_generic_string Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:12 -08:00
Thomas Graf	aae9f0e22c	netlink: Avoid netlink mmap alloc if msg size exceeds frame size An insufficent ring frame size configuration can lead to an unnecessary skb allocation for every Netlink message. Check frame size before taking the queue lock and allocating the skb and re-check with lock to be safe. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:52:06 -08:00
Thomas Graf	bb9b18fb55	genl: Add genlmsg_new_unicast() for unicast message allocation Allocates a new sk_buff large enough to cover the specified payload plus required Netlink headers. Will check receiving socket for memory mapped i/o capability and use it if enabled. Will fall back to non-mapped skb if message size exceeds the frame size of the ring. Signed-of-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:53 -08:00
Jesse Gross	663efa3696	openvswitch: Silence RCU lockdep checks from flow lookup. Flow lookup can happen either in packet processing context or userspace context but it was annotated as requiring RCU read lock to be held. This also allows OVS mutex to be held without causing warnings. Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Thomas Graf <tgraf@redhat.com>	2014-01-06 15:51:48 -08:00
Andy Zhou	5bb506324d	openvswitch: Change ovs_flow_tbl_lookup_xx() APIs API changes only for code readability. No functional chnages. This patch removes the underscored version. Added a new API ovs_flow_tbl_lookup_stats() that returns the n_mask_hits. Reported by: Ben Pfaff <blp@nicira.com> Reviewed-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:41 -08:00
Ben Pfaff	8f49ce1135	openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit). We won't normally have a ton of flow masks but using a size_t to store values no bigger than sizeof(struct sw_flow_key) seems excessive. This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but sw_flow_mask only by 8 bytes due to padding. Compile tested only. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:27 -08:00
Ben Pfaff	d1211908b9	openvswitch: Correct comment. Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-01-06 15:51:21 -08:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
Gianluca Anzolin	f86772af6a	Bluetooth: Remove rfcomm_carrier_raised() Remove the rfcomm_carrier_raised() definition as that function isn't used anymore. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	4a2fb3ecc7	Bluetooth: Always wait for a connection on RFCOMM open() This patch fixes two regressions introduced with the recent rfcomm tty rework. The current code uses the carrier_raised() method to wait for the bluetooth connection when a process opens the tty. However processes may open the port with the O_NONBLOCK flag or set the CLOCAL termios flag: in these cases the open() syscall returns immediately without waiting for the bluetooth connection to complete. This behaviour confuses userspace which expects an established bluetooth connection. The patch restores the old behaviour by waiting for the connection in rfcomm_dev_activate() and removes carrier_raised() from the tty_port ops. As a side effect the new code also fixes the case in which the rfcomm tty device is created with the flag RFCOMM_REUSE_DLC: the old code didn't call device_move() and ModemManager skipped the detection probe. Now device_move() is always called inside rfcomm_dev_activate(). Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com> Reported-by: Beson Chow <blc+bluez@mail.vanade.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	e228b63390	Bluetooth: Move rfcomm_get_device() before rfcomm_dev_activate() This is a preparatory patch which moves the rfcomm_get_device() definition before rfcomm_dev_activate() where it will be used. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Gianluca Anzolin	5b89924187	Bluetooth: Release RFCOMM port when the last user closes the TTY This patch fixes a userspace regression introduced by the commit `29cd718b`. If the rfcomm device was created with the flag RFCOMM_RELEASE_ONHUP the user space expects that the tty_port is released as soon as the last process closes the tty. The current code attempts to release the port in the function rfcomm_dev_state_change(). However it won't get a reference to the relevant tty to send a HUP: at that point the tty is already destroyed and therefore NULL. This patch fixes the regression by taking over the tty refcount in the tty install method(). This way the tty_port is automatically released as soon as the tty is destroyed. As a consequence the check for RFCOMM_RELEASE_ONHUP flag in the hangup() method is now redundant. Instead we have to be careful with the reference counting in the rfcomm_release_dev() function. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reported-by: Alexander Holler <holler@ahsoftware.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 13:51:45 -08:00
Jamal Hadi Salim	805c1f4aed	net_sched: act: action flushing missaccounting action flushing missaccounting Account only for deleted actions Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:46:56 -05:00
Jamal Hadi Salim	63acd6807c	net_sched: Remove unnecessary checks for act->ops Remove unnecessary checks for act->ops (suggested by Eric Dumazet). Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:46:32 -05:00
sfeldma@cumulusnetworks.com	fbf2671bb8	bridge: use DEVICE_ATTR_xx macros Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:40:46 -05:00
Curt Brune	fe0d692bbc	bridge: use spin_lock_bh() in br_multicast_set_hash_max br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:39:47 -05:00
Eric Dumazet	996b175e39	tcp: out_of_order_queue do not use its lock TCP out_of_order_queue lock is not used, as queue manipulation happens with socket lock held and we therefore use the lockless skb queue routines (as __skb_queue_head()) We can use __skb_queue_head_init() instead of skb_queue_head_init() to make this more consistent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:34:34 -05:00
Hannes Frederic Sowa	88ad31491e	ipv6: don't install anycast address for /128 addresses on routers It does not make sense to create an anycast address for an /128-prefix. Suppress it. As `32019e651c` ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:32:43 -05:00
Vijay Subramanian	d4b36210c2	net: pkt_sched: PIE AQM scheme Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 15:13:01 -05:00
John W. Linville	a50f9d5ee0	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2014-01-06 14:19:18 -05:00
Thomas Pedersen	057d5f4ba1	mac80211: sync dtim_count to TSF On starting a mesh or AP BSS, the interface dtim_count countdown should match that of the driver TSF. Signed-off-by: Thomas Pedersen <twpedersen@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 20:10:47 +01:00
John W. Linville	9d1cd503c7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2014-01-06 14:08:41 -05:00
Ujjal Roy	60a4fe0ae9	cfg80211: fix wext-compat for getting retry value While getting the retry limit, wext-compat returns the value without updating the flag for retry->flags is 0. Also in this case, it updates long retry flag when short and long retry value are unequal. So, iwconfig never showing "Retry short limit" and showing "Retry long limit" when both values are unequal. Updated the flags and corrected the condition properly. Signed-off-by: Ujjal Roy <royujjal@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 20:00:12 +01:00
David S. Miller	83111e7fe8	netfilter: Fix build failure in nfnetlink_queue_core.c. net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid': net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function) net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1 Just a missing include of net/tcp_states.h Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 13:36:06 -05:00
David S. Miller	9aa28f2b71	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: <pablo@netfilter.org> ==================== nftables updates for net-next The following patchset contains nftables updates for your net-next tree, they are: * Add set operation to the meta expression by means of the select_ops() infrastructure, this allows us to set the packet mark among other things. From Arturo Borrero Gonzalez. * Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel Borkmann. * Add new queue expression to nf_tables. These comes with two previous patches to prepare this new feature, one to add mask in nf_tables_core to evaluate the queue verdict appropriately and another to refactor common code with xt_NFQUEUE, from Eric Leblond. * Do not hide nftables from Kconfig if nfnetlink is not enabled, also from Eric Leblond. * Add the reject expression to nf_tables, this adds the missing TCP RST support. It comes with an initial patch to refactor common code with xt_NFQUEUE, again from Eric Leblond. * Remove an unused variable assignment in nf_tables_dump_set(), from Michal Nazarewicz. * Remove the nft_meta_target code, now that Arturo added the set operation to the meta expression, from me. * Add help information for nf_tables to Kconfig, also from me. * Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is available to other nf_tables objects, requested by Arturo, from me. * Expose the table usage counter, so we can know how many chains are using this table without dumping the list of chains, from Tomasz Bursztyka. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 13:29:30 -05:00
Johan Hedberg	cb6ca8e1ed	Bluetooth: Default to no security with L2CAP RAW sockets L2CAP RAW sockets can be used for things which do not involve establishing actual connection oriented L2CAP channels. One example of such usage is the l2ping tool. The default security level for L2CAP sockets is LOW, which implies that for SSP based connection authentication is still requested (although with no MITM requirement), which is not what we want (or need) for things like l2ping. Therefore, default to one lower level, i.e. BT_SECURITY_SDP, for L2CAP RAW sockets in order not to trigger unwanted authentication requests. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 09:26:23 -08:00
Johan Hedberg	8cef8f50d4	Bluetooth: Fix NULL pointer dereference when disconnecting When disconnecting it is possible that the l2cap_conn pointer is already NULL when bt_6lowpan_del_conn() is entered. Looking at l2cap_conn_del also verifies this as there's a NULL check there too. This patch adds the missing NULL check without which the following bug may occur: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c131e9c7>] bt_6lowpan_del_conn+0x19/0x12a *pde = 00000000 Oops: 0000 [#1] SMP CPU: 1 PID: 52 Comm: kworker/u5:1 Not tainted 3.12.0+ #196 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: hci0 hci_rx_work task: f6259b00 ti: f48c0000 task.ti: f48c0000 EIP: 0060:[<c131e9c7>] EFLAGS: 00010282 CPU: 1 EIP is at bt_6lowpan_del_conn+0x19/0x12a EAX: 00000000 EBX: ef094e10 ECX: 00000000 EDX: 00000016 ESI: 00000000 EDI: f48c1e60 EBP: f48c1e50 ESP: f48c1e34 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 30c65000 CR4: 00000690 Stack: f4d38000 00000000 f4d38000 00000002 ef094e10 00000016 f48c1e60 f48c1e70 c1316bed f48c1e84 c1316bed 00000000 00000001 ef094e10 f48c1e84 f48c1ed0 c1303cc6 c1303c7b f31f331a c1303cc6 f6e7d1c0 f3f8ea16 f3f8f380 f4d38008 Call Trace: [<c1316bed>] l2cap_disconn_cfm+0x3f/0x5b [<c1316bed>] ? l2cap_disconn_cfm+0x3f/0x5b [<c1303cc6>] hci_event_packet+0x645/0x2117 [<c1303c7b>] ? hci_event_packet+0x5fa/0x2117 [<c1303cc6>] ? hci_event_packet+0x645/0x2117 [<c12681bd>] ? __kfree_skb+0x65/0x68 [<c12681eb>] ? kfree_skb+0x2b/0x2e [<c130d3fb>] ? hci_send_to_sock+0x18d/0x199 [<c12fa327>] hci_rx_work+0xf9/0x295 [<c12fa327>] ? hci_rx_work+0xf9/0x295 [<c1036d25>] process_one_work+0x128/0x1df [<c1346a39>] ? _raw_spin_unlock_irq+0x8/0x12 [<c1036d25>] ? process_one_work+0x128/0x1df [<c103713a>] worker_thread+0x127/0x1c4 [<c1037013>] ? rescuer_thread+0x216/0x216 [<c103aec6>] kthread+0x88/0x8d [<c1040000>] ? task_rq_lock+0x37/0x6e [<c13474b7>] ret_from_kernel_thread+0x1b/0x28 [<c103ae3e>] ? __kthread_parkme+0x50/0x50 Code: 05 b8 f4 ff ff ff 8d 65 f4 5b 5e 5f 5d 8d 67 f8 5f c3 57 8d 7c 24 08 83 e4 f8 ff 77 fc 55 89 e5 57 56f EIP: [<c131e9c7>] bt_6lowpan_del_conn+0x19/0x12a SS:ESP 0068:f48c1e34 CR2: 0000000000000000 Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2014-01-06 09:26:23 -08:00
Chun-Yeow Yeoh	6b5895d93d	mac80211: enable WME for peer mesh STA Enable the WME for peer mesh STA so that the driver, such as wcn36xx, will pick this up and enabling it in HW. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 17:43:06 +01:00
Daniel Borkmann	b22f5126a2	netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages Some occurences in the netfilter tree use skb_header_pointer() in the following way ... struct dccp_hdr _dh, *dh; ... skb_header_pointer(skb, dataoff, sizeof(_dh), &dh); ... where dh itself is a pointer that is being passed as the copy buffer. Instead, we need to use &_dh as the forth argument so that we're copying the data into an actual buffer that sits on the stack. Currently, we probably could overwrite memory on the stack (e.g. with a possibly mal-formed DCCP packet), but unintentionally, as we only want the buffer to be placed into _dh variable. Fixes: `2bc780499a` ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-06 17:40:02 +01:00
Johannes Berg	5b0ec94f9c	mac80211: fix memory leak in register_hw() error path Move the internal scan request allocation below the last sanity check in ieee80211_register_hw() to avoid leaking memory if the sanity check actually triggers. Reported-by: ZHAO Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 16:02:34 +01:00
Johannes Berg	ef04a29737	mac80211: handle station TX latency allocation errors When the station's TX latency data structures need to be allocated, handle failures properly and also free all the structures if there are any other problems. Move the allocation code up so that allocation failures don't trigger rate control algorithm calls. Reported-by: ZHAO Gang <gamerh2o@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 15:58:10 +01:00
Johannes Berg	4cd3c4ecfc	mac80211: clean up netdev debugfs macros a bit Clean up the file macros a bit and use that to remove the unnecessary format function for the tkip MIC test file that really is write-only. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 15:47:15 +01:00
Jesper Dangaard Brouer	f2661adc0c	netfilter: only warn once on wrong seqadj usage Avoid potentially spamming the kernel log with WARN splash messages when catching wrong usage of seqadj, by simply using WARN_ONCE. This is a followup to commit `db12cf2743` (netfilter: WARN about wrong usage of sequence number adjustments) Suggested-by: Flavio Leitner <fbl@redhat.com> Suggested-by: Daniel Borkmann <dborkman@redhat.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-06 14:23:17 +01:00
Daniel Borkmann	138aef7dca	netfilter: nf_conntrack_dccp: use %s format string for buffer Some invocations of nf_log_packet() use arg buffer directly instead of "%s" format string with follow-up buffer pointer. Currently, these two usages are not really critical, but we should fix this up nevertheless so that we don't run into trouble if that changes one day. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-06 14:19:16 +01:00
Daniel Borkmann	2690d97ade	netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper Commit `5901b6be88` attempted to introduce IPv6 support into IRC NAT helper. By doing so, the following code seemed to be removed by accident: ip = ntohl(exp->master->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip); sprintf(buffer, "%u %u", ip, port); pr_debug("nf_nat_irc: inserting '%s' == %pI4, port %u\n", buffer, &ip, port); This leads to the fact that buffer[] was left uninitialized and contained some stack value. When we call nf_nat_mangle_tcp_packet(), we call strlen(buffer) on excatly this uninitialized buffer. If we are unlucky and the skb has enough tailroom, we overwrite resp. leak contents with values that sit on our stack into the packet and send that out to the receiver. Since the rather informal DCC spec [1] does not seem to specify IPv6 support right now, we log such occurences so that admins can act accordingly, and drop the packet. I've looked into XChat source, and IPv6 is not supported there: addresses are in u32 and print via %u format string. Therefore, restore old behaviour as in IPv4, use snprintf(). The IRC helper does not support IPv6 by now. By this, we can safely use strlen(buffer) in nf_nat_mangle_tcp_packet() and prevent a buffer overflow. Also simplify some code as we now have ct variable anyway. [1] http://www.irchelp.org/irchelp/rfc/ctcpspec.html Fixes: `5901b6be88` ("netfilter: nf_nat: support IPv6 in IRC NAT helper") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Harald Welte <laforge@gnumonks.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-06 14:17:17 +01:00
Pablo Neira Ayuso	2a50d805e5	Revert "netfilter: avoid get_random_bytes calls" This reverts commit `a42b99a6e3`. Hannes Frederic Sowa reported some problems with this patch, more specifically that prandom_u32() may not be ready at boot time, see: http://marc.info/?l=linux-netdev&m=138896532403533&w=2 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-06 14:00:55 +01:00
Johannes Berg	e03ad6eade	nl80211: move vendor/testmode event skb functions out of ifdef The vendor/testmode event skb functions are needed outside the ifdef for vendor-specific events, so move them out. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 12:09:09 +01:00
Johannes Berg	1b000789a4	mac80211: add tracing for ieee80211_sta_set_buffered This is useful for debugging issues with drivers using this function (erroneously), so add tracing for the API call. Change-Id: Ice9d7eabb8fecbac188f0a741920d3488de700ec Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-01-06 12:09:01 +01:00
Fengguang Wu	5537a0557c	pktgen_dst_metrics[] can be static CC: Fan Du <fan.du@windriver.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-06 11:00:07 +01:00
Daniel Borkmann	a48d4bb0b0	net: netdev_kobject_init: annotate with __init netdev_kobject_init() is only being called from __init context, that is, net_dev_init(), so annotate it with __init as well, thus the kernel can take this as a hint that the function is used only during the initialization phase and free up used memory resources after its invocation. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-05 20:27:54 -05:00
Daniel Borkmann	965801e1eb	net: 6lowpan: fix lowpan_header_create non-compression memcpy call In function lowpan_header_create(), we invoke the following code construct: struct ipv6hdr hdr; ... hdr = ipv6_hdr(skb); ... if (...) memcpy(hc06_ptr + 1, &hdr->flow_lbl[1], 2); else memcpy(hc06_ptr, &hdr, 4); Where the else path of the condition, that is, non-compression path, calls memcpy() with a pointer to struct ipv6hdr hdr as source, thus two levels of indirection. This cannot be correct, and likely only one level of pointer was intended as source buffer for memcpy() here. Fixes: `44331fe2aa` ("IEEE802.15.4: 6LoWPAN basic support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-05 20:25:24 -05:00
David S. Miller	855404efae	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: * Add full port randomization support. Some crazy researchers found a way to reconstruct the secure ephemeral ports that are allocated in random mode by sending off-path bursts of UDP packets to overrun the socket buffer of the DNS resolver to trigger retransmissions, then if the timing for the DNS resolution done by a client is larger than usual, then they conclude that the port that received the burst of UDP packets is the one that was opened. It seems a bit aggressive method to me but it seems to work for them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a new NAT mode to fully randomize ports using prandom. * Add a new classifier to x_tables based on the socket net_cls set via cgroups. These includes two patches to prepare the field as requested by Zefan Li. Also from Daniel Borkmann. * Use prandom instead of get_random_bytes in several locations of the netfilter code, from Florian Westphal. * Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack mark, also from Florian Westphal. * Fix compilation warning due to unused variable in IPVS, from Geert Uytterhoeven. * Add support for UID/GID via nfnetlink_queue, from Valentina Giusti. * Add IPComp extension to x_tables, from Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-05 20:18:50 -05:00
Amitkumar Karwar	fa9be5f009	NFC: NCI: Cancel cmd_timer in nci_close_device() nci_close_device() sends nci reset command to the device. If there is no response for this command, nci request timeout occurs first and then cmd timeout happens. Because command timer has started after sending the command. We are immediately flushing command workqueue after nci timeout. Later we will try to schedule cmd_work in command timer which leads to a crash. Cancel cmd_timer before flushing the workqueue to fix the problem. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-05 23:20:15 +01:00
stephen hemminger	eec73f1c96	tipc: remove unused code Remove dead code; tipc_bearer_find_interface tipc_node_redundant_links This may break out of tree version of TIPC if there still is one. But that maybe a good thing :-) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:50 -05:00
stephen hemminger	9805696399	tipc: make local function static Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:50 -05:00
stephen hemminger	6aee49c558	dccp: make local variable static Make DCCP module config variable static, only used in one file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:50 -05:00
stephen hemminger	fd34d627ae	dccp: remove obsolete code This function is defined but not used. Remove it now, can be resurrected if ever needed. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:49 -05:00
Li RongQing	8f84985fec	net: unify the pcpu_tstats and br_cpu_netstats as one They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:10:24 -05:00
Marcel Holtmann	f9f462faa0	Bluetooth: Add quirk for disabling Delete Stored Link Key command Some controller pretend they support the Delete Stored Link Key command, but in reality they really don't support it. < HCI Command: Delete Stored Link Key (0x03\|0x0012) plen 7 bdaddr 00:00:00:00:00:00 all 1 > HCI Event: Command Complete (0x0e) plen 4 Delete Stored Link Key (0x03\|0x0012) ncmd 1 status 0x11 deleted 0 Error: Unsupported Feature or Parameter Value Not correctly supporting this command causes the controller setup to fail and will make a device not work. However sending the command for controller that handle stored link keys is important. This quirk allows a driver to disable the command if it knows that this command handling is broken. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2014-01-04 20:10:40 +02:00
Thierry Escande	4f319e3251	NFC: digital: Use NFC_NFCID3_MAXSIZE from nfc.h This removes the declaration of NFCID3 size in digital_dep.c and now uses the one from nfc.h. This also removes a faulty and unneeded call to max(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:35:34 +01:00
Thierry Escande	67af1d7a0f	NFC: digital: Fix incorrect use of ERR_PTR and PTR_ERR macros It's bad to use these macros when not dealing with error code. this patch changes calls to these macros with correct casts. Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:35:34 +01:00
Samuel Ortiz	a434c24074	NFC: Only warn on SE discovery error SE discovery errors are currently overwriting the dev_up() return error. This is wrong for many reasons: - We don't want to report an error if we actually brought the device up but it failed to discover SEs. By doing so we pretend we don't have an NFC functional device even we do. The only thing we could not do was checking for SEs availability. This is the false negative case. - In some cases the actual device power up failed but the SE discovery succeeded. Userspace then believes the device is up while it's not. This is the false positive case. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:32:27 +01:00
Szymon Janc	11bfb1c4b9	NFC: llcp: Use default MIU if none was specified on connect If MIUX is not present in CONNECT or CC use default MIU value (128) instead of one announced durring link setup. This was affecting Bluetooth handover with Android 4.3+ NCI stack. Signed-off-by: Szymon Janc <szymon.janc@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:32:27 +01:00
Szymon Janc	43d53c29dd	NFC: llcp: Fix possible memory leak while sending I frames If sending was not completed due to low memory condition msg_data was not free before returning from function. Signed-off-by: Szymon Janc <szymon.janc@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:32:27 +01:00
Samuel Ortiz	249eb5bd74	NFC: Return driver failure upon unknown event reception If the device is polling, this will trigger a netlink event to notify userspace about the polling error. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:32:27 +01:00
Arron Wang	d31652a26b	NFC: Fix target mode p2p link establishment With commit `e29a9e2ae1`, we set the active_target pointer from nfc_dep_link_is_up() in order to support the case where the target detection and the DEP link setting are done atomically by the driver. That can only happen in initiator mode, so we need to check for that otherwise we fail to bring a p2p link in target mode. Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-01-04 03:31:32 +01:00
stephen hemminger	5e419e68a6	llc: make lock static The llc_sap_list_lock does not need to be global, only acquired in core. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-03 20:56:48 -05:00
stephen hemminger	8f09898bf0	socket: cleanups Namespace related cleaning * make cred_to_ucred static * remove unused sock_rmalloc function Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-03 20:55:58 -05:00
Tom Herbert	9a4aa9af44	ipv4: Use percpu Cache route in IP tunnels percpu route cache eliminates share of dst refcnt between CPUs. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-03 19:40:57 -05:00
Tom Herbert	7d442fab0a	ipv4: Cache dst in tunnels Avoid doing a route lookup on every packet being tunneled. In ip_tunnel.c cache the route returned from ip_route_output if the tunnel is "connected" so that all the rouitng parameters are taken from tunnel parms for a packet. Specifically, not NBMA tunnel and tos is from tunnel parms (not inner packet). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-03 19:38:45 -05:00
Neil Horman	f916ec9608	sctp: Add process name and pid to deprecation warnings Recently I updated the sctp socket option deprecation warnings to be both a bit more clear and ratelimited to prevent user processes from spamming the log file. Ben Hutchings suggested that I add the process name and pid to these warnings so that users can tell who is responsible for using the deprecated apis. This patch accomplishes that. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-03 19:36:46 -05:00
Pablo Neira Ayuso	c9c8e48597	netfilter: nf_tables: dump sets in all existing families This patch allows you to dump all sets available in all of the registered families. This allows you to use NFPROTO_UNSPEC to dump all existing sets, similarly to other existing table, chain and rule operations. This patch is based on original patch from Arturo Borrero González. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-04 00:23:11 +01:00
Daniel Borkmann	82a37132f3	netfilter: x_tables: lightweight process control group matching It would be useful e.g. in a server or desktop environment to have a facility in the notion of fine-grained "per application" or "per application group" firewall policies. Probably, users in the mobile, embedded area (e.g. Android based) with different security policy requirements for application groups could have great benefit from that as well. For example, with a little bit of configuration effort, an admin could whitelist well-known applications, and thus block otherwise unwanted "hard-to-track" applications like [1] from a user's machine. Blocking is just one example, but it is not limited to that, meaning we can have much different scenarios/policies that netfilter allows us than just blocking, e.g. fine grained settings where applications are allowed to connect/send traffic to, application traffic marking/conntracking, application-specific packet mangling, and so on. Implementation of PID-based matching would not be appropriate as they frequently change, and child tracking would make that even more complex and ugly. Cgroups would be a perfect candidate for accomplishing that as they associate a set of tasks with a set of parameters for one or more subsystems, in our case the netfilter subsystem, which, of course, can be combined with other cgroup subsystems into something more complex if needed. As mentioned, to overcome this constraint, such processes could be placed into one or multiple cgroups where different fine-grained rules can be defined depending on the application scenario, while e.g. everything else that is not part of that could be dropped (or vice versa), thus making life harder for unwanted processes to communicate to the outside world. So, we make use of cgroups here to track jobs and limit their resources in terms of iptables policies; in other words, limiting, tracking, etc what they are allowed to communicate. In our case we're working on outgoing traffic based on which local socket that originated from. Also, one doesn't even need to have an a-prio knowledge of the application internals regarding their particular use of ports or protocols. Matching is extremly lightweight as we just test for the sk_classid marker of sockets, originating from net_cls. net_cls and netfilter do not contradict each other; in fact, each construct can live as standalone or they can be used in combination with each other, which is perfectly fine, plus it serves Tejun's requirement to not introduce a new cgroups subsystem. Through this, we result in a very minimal and efficient module, and don't add anything except netfilter code. One possible, minimal usage example (many other iptables options can be applied obviously): 1) Configuring cgroups if not already done, e.g.: mkdir /sys/fs/cgroup/net_cls mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls mkdir /sys/fs/cgroup/net_cls/0 echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid (resp. a real flow handle id for tc) 2) Configuring netfilter (iptables-nftables), e.g.: iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP 3) Running applications, e.g.: ping 208.67.222.222 <pid:1799> echo 1799 > /sys/fs/cgroup/net_cls/0/tasks 64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms [...] ping 208.67.220.220 <pid:1804> ping: sendmsg: Operation not permitted [...] echo 1804 > /sys/fs/cgroup/net_cls/0/tasks 64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms [...] Of course, real-world deployments would make use of cgroups user space toolsuite, or own custom policy daemons dynamically moving applications from/to various cgroups. [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: cgroups@vger.kernel.org Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:44 +01:00
Daniel Borkmann	86f8515f97	net: netprio: rename config to be more consistent with cgroup configs While we're at it and introduced CGROUP_NET_CLASSID, lets also make NETPRIO_CGROUP more consistent with the rest of cgroups and rename it into CONFIG_CGROUP_NET_PRIO so that for networking, we now have CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG option consistent among networking cgroups, but also among cgroups CONFIG conventions in general as the vast majority has a prefix of CONFIG_CGROUP_<SUBSYS>. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Zefan Li <lizefan@huawei.com> Cc: cgroups@vger.kernel.org Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:42 +01:00
Daniel Borkmann	fe1217c4f3	net: net_cls: move cgroupfs classid handling into core Zefan Li requested [1] to perform the following cleanup/refactoring: - Split cgroupfs classid handling into net core to better express a possible more generic use. - Disable module support for cgroupfs bits as the majority of other cgroupfs subsystems do not have that, and seems to be not wished from cgroup side. Zefan probably might want to follow-up for netprio later on. - By this, code can be further reduced which previously took care of functionality built when compiled as module. cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so that we are consistent with {netclassid,netprio}_cgroup naming that is under net/core/ as suggested by Zefan. No change in functionality, but only code refactoring that is being done here. [1] http://patchwork.ozlabs.org/patch/304825/ Suggested-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Zefan Li <lizefan@huawei.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: cgroups@vger.kernel.org Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:41 +01:00
Eric Leblond	14abfa161d	netfilter: xt_CT: fix error value in xt_ct_tg_check() If setting event mask fails then we were returning 0 for success. This patch updates return code to -EINVAL in case of problem. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:39 +01:00
stephen hemminger	dcd93ed4cd	netfilter: nf_conntrack: remove dead code The following code is not used in current upstream code. Some of this seems to be old hooks, other might be used by some out of tree module (which I don't care about breaking), and the need_ipv4_conntrack was used by old NAT code but no longer called. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:37 +01:00
stephen hemminger	02eca9d2cc	netfilter: ipset: remove unused code Function never used in current upstream code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:35 +01:00
Daniel Borkmann	34ce324019	netfilter: nf_nat: add full port randomization support We currently use prandom_u32() for allocation of ports in tcp bind(0) and udp code. In case of plain SNAT we try to keep the ports as is or increment on collision. SNAT --random mode does use per-destination incrementing port allocation. As a recent paper pointed out in [1] that this mode of port allocation makes it possible to an attacker to find the randomly allocated ports through a timing side-channel in a socket overloading attack conducted through an off-path attacker. So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization in regard to the attack described in this paper. As we need to keep compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port selection algorithm with a simple prandom_u32() in order to mitigate this attack vector. Note that the lfsr113's internal state is periodically reseeded by the kernel through a local secure entropy source. More details can be found in [1], the basic idea is to send bursts of packets to a socket to overflow its receive queue and measure the latency to detect a possible retransmit when the port is found. Because of increasing ports to given destination and port, further allocations can be predicted. This information could then be used by an attacker for e.g. for cache-poisoning, NS pinning, and degradation of service attacks against DNS servers [1]: The best defense against the poisoning attacks is to properly deploy and validate DNSSEC; DNSSEC provides security not only against off-path attacker but even against MitM attacker. We hope that our results will help motivate administrators to adopt DNSSEC. However, full DNSSEC deployment make take significant time, and until that happens, we recommend short-term, non-cryptographic defenses. We recommend to support full port randomisation, according to practices recommended in [2], and to avoid per-destination sequential port allocation, which we show may be vulnerable to derandomisation attacks. Joint work between Hannes Frederic Sowa and Daniel Borkmann. [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf [2] http://arxiv.org/pdf/1205.5190v1.pdf Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 23:41:26 +01:00
Michal Nazarewicz	720e0dfa3a	netfilter: nf_tables: remove unused variable in nf_tables_dump_set() The nfmsg variable is not used (except in sizeof operator which does not care about its value) between the first and second time it is assigned the value. Furthermore, nlmsg_data has no side effects, so the assignment can be safely removed. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 22:51:14 +01:00
Daniel Borkmann	1466291790	netfilter: nf_tables: fix type in parsing in nf_tables_set_alloc_name() In nf_tables_set_alloc_name(), we are trying to find a new, unused name for our new set and interate through the list of present sets. As far as I can see, we're using format string %d to parse already present names in order to mark their presence in a bitmap, so that we can later on find the first 0 in that map to assign the new set name to. We should rather use a temporary variable of type int to store the result of sscanf() to, and for making sanity checks on. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-03 22:50:59 +01:00
Fan Du	8101328b79	{pktgen, xfrm} Show spi value properly when ipsec turned on If user run pktgen plus ipsec by using spi, show spi value properly when cat /proc/net/pktgen/ethX Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:12 +01:00
Fan Du	c454997e68	{pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen Introduce xfrm_state_lookup_byspi to find user specified by custom from "pgset spi xxx". Using this scheme, any flow regardless its saddr/daddr could be transform by SA specified with configurable spi. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:12 +01:00
Fan Du	cf93d47ed4	{pktgen, xfrm} Construct skb dst for tunnel mode transformation IPsec tunnel mode encapuslation needs to set outter ip header with right protocol/ttl/id value with regard to skb->dst->child. Looking up a rt in a standard way is absolutely wrong for every packet transmission. In a simple way, construct a dst by setting neccessary information to make tunnel mode encapuslation working. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:11 +01:00
Fan Du	de4aee7d69	{pktgen, xfrm} Using "pgset spi xxx" to spedifiy SA for a given flow User could set specific SPI value to arm pktgen flow with IPsec transformation, instead of looking up SA by sadr/daddr. The reaseon to do so is because current state lookup scheme is both slow and, most important of all, in fact pktgen doesn't need to match any SA state addresses information, all it needs is the SA transfromation shell to do the encapuslation. And this option also provide user an alternative to using pktgen test existing SA without creating new ones. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:11 +01:00
Fan Du	4ae770bf58	{pktgen, xfrm} Correct xfrm_state_lock usage in xfrm_stateonly_find Acquiring xfrm_state_lock in process context is expected to turn BH off, as this lock is also used in BH context, namely xfrm state timer handler. Otherwise it surprises LOCKDEP with below messages. [ 81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74 [ 81.725194] [ 81.725211] ========================================================= [ 81.725212] [ INFO: possible irq lock inversion dependency detected ] [ 81.725215] 3.13.0-rc2+ #92 Not tainted [ 81.725216] --------------------------------------------------------- [ 81.725218] kpktgend_0/2780 just changed the state of lock: [ 81.725220] (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past: [ 81.725232] (&(&x->lock)->rlock){+.-...} [ 81.725232] [ 81.725232] and interrupts could create inverse lock ordering between them. [ 81.725232] [ 81.725235] [ 81.725235] other info that might help us debug this: [ 81.725237] Possible interrupt unsafe locking scenario: [ 81.725237] [ 81.725238] CPU0 CPU1 [ 81.725240] ---- ---- [ 81.725241] lock(xfrm_state_lock); [ 81.725243] local_irq_disable(); [ 81.725244] lock(&(&x->lock)->rlock); [ 81.725246] lock(xfrm_state_lock); [ 81.725248] <Interrupt> [ 81.725249] lock(&(&x->lock)->rlock); [ 81.725251] [ 81.725251] * DEADLOCK * [ 81.725251] [ 81.725254] no locks held by kpktgend_0/2780. [ 81.725255] [ 81.725255] the shortest dependencies between 2nd lock and 1st lock: [ 81.725269] -> (&(&x->lock)->rlock){+.-...} ops: 8 { [ 81.725274] HARDIRQ-ON-W at: [ 81.725276] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70 [ 81.725282] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725284] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725289] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725292] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725300] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725303] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725305] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725308] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725313] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725316] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725329] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725333] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725338] IN-SOFTIRQ-W at: [ 81.725340] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70 [ 81.725342] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725344] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725347] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725349] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725352] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725355] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725358] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725360] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725363] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725365] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725368] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725370] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725373] INITIAL USE at: [ 81.725375] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70 [ 81.725385] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725388] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725390] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 81.725394] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 81.725398] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 81.725401] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 81.725404] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 81.725407] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 81.725409] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 81.725412] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 81.725415] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 81.725417] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0 [ 81.725420] } [ 81.725421] ... key at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8 [ 81.725445] ... acquired at: [ 81.725446] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725449] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725452] [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140 [ 81.725454] [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50 [ 81.725456] [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0 [ 81.725458] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725465] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725468] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725471] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725476] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725479] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725482] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725484] [ 81.725486] -> (xfrm_state_lock){+.+...} ops: 11 { [ 81.725490] HARDIRQ-ON-W at: [ 81.725493] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70 [ 81.725504] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725507] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70 [ 81.725510] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0 [ 81.725513] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725516] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725519] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725522] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725525] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725527] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725530] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725533] SOFTIRQ-ON-W at: [ 81.725534] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725537] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725539] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725541] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725544] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725547] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725550] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725555] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725565] INITIAL USE at: [ 81.725567] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70 [ 81.725569] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725572] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70 [ 81.725574] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0 [ 81.725576] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key] [ 81.725580] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key] [ 81.725583] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key] [ 81.725586] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0 [ 81.725589] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130 [ 81.725594] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10 [ 81.725597] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b [ 81.725599] } [ 81.725600] ... key at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50 [ 81.725606] ... acquired at: [ 81.725607] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150 [ 81.725609] [<ffffffff81099e96>] mark_lock+0x196/0x2f0 [ 81.725611] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725614] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725616] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725627] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725629] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725632] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725635] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725637] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725640] [ 81.725641] [ 81.725641] stack backtrace: [ 81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92 [ 81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006 [ 81.725649] ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007 [ 81.725652] ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80 [ 81.725655] ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8 [ 81.725659] Call Trace: [ 81.725664] [<ffffffff8176af37>] dump_stack+0x46/0x58 [ 81.725667] [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0 [ 81.725670] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150 [ 81.725672] [<ffffffff81099e96>] mark_lock+0x196/0x2f0 [ 81.725675] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150 [ 81.725685] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 81.725691] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90 [ 81.725694] [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120 [ 81.725697] [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70 [ 81.725699] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725702] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 81.725704] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725707] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90 [ 81.725710] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 81.725712] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0 [ 81.725715] [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0 [ 81.725717] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0 [ 81.725721] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen] [ 81.725724] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen] [ 81.725727] [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen] [ 81.725729] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10 [ 81.725733] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40 [ 81.725745] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0 [ 81.725751] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 81.725753] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 81.725757] [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen] [ 81.725759] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 81.725762] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 [ 81.725765] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 81.725768] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:11 +01:00
Fan Du	6de9ace4ae	{pktgen, xfrm} Add statistics counting when transforming so /proc/net/xfrm_stat could give user clue about what's wrong in this process. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:11 +01:00
Fan Du	0af0a4136b	{pktgen, xfrm} Correct xfrm state lock usage when transforming xfrm_state lock protects its state, i.e., VALID/DEAD and statistics, not the transforming procedure, as both mode/type output functions are reentrant. Another issue is state lock can be used in BH context when state timer alarmed, after transformation in pktgen, update state statistics acquiring state lock should disabled BH context for a moment. Otherwise LOCKDEP critisize this: [ 62.354339] pktgen: Packet Generator for packet performance testing. Version: 2.74 [ 62.655444] [ 62.655448] ================================= [ 62.655451] [ INFO: inconsistent lock state ] [ 62.655455] 3.13.0-rc2+ #70 Not tainted [ 62.655457] --------------------------------- [ 62.655459] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 62.655463] kpktgend_0/2764 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 62.655466] (&(&x->lock)->rlock){+.?...}, at: [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen] [ 62.655479] {IN-SOFTIRQ-W} state was registered at: [ 62.655484] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70 [ 62.655492] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 62.655498] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 62.655505] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290 [ 62.655511] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40 [ 62.655519] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0 [ 62.655523] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0 [ 62.655526] [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 62.655530] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60 [ 62.655537] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80 [ 62.655541] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30 [ 62.655547] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0 [ 62.655552] [<ffffffff81761c3c>] rest_init+0xbc/0xd0 [ 62.655557] [<ffffffff81ea5e5e>] start_kernel+0x3c4/0x3d1 [ 62.655583] [<ffffffff81ea55a8>] x86_64_start_reservations+0x2a/0x2c [ 62.655588] [<ffffffff81ea569f>] x86_64_start_kernel+0xf5/0xfc [ 62.655592] irq event stamp: 77 [ 62.655594] hardirqs last enabled at (77): [<ffffffff810ab7f2>] vprintk_emit+0x1b2/0x520 [ 62.655597] hardirqs last disabled at (76): [<ffffffff810ab684>] vprintk_emit+0x44/0x520 [ 62.655601] softirqs last enabled at (22): [<ffffffff81059b57>] __do_softirq+0x177/0x2d0 [ 62.655605] softirqs last disabled at (15): [<ffffffff8105a026>] irq_exit+0x96/0xc0 [ 62.655609] [ 62.655609] other info that might help us debug this: [ 62.655613] Possible unsafe locking scenario: [ 62.655613] [ 62.655616] CPU0 [ 62.655617] ---- [ 62.655618] lock(&(&x->lock)->rlock); [ 62.655622] <Interrupt> [ 62.655623] lock(&(&x->lock)->rlock); [ 62.655626] [ 62.655626] * DEADLOCK * [ 62.655626] [ 62.655629] no locks held by kpktgend_0/2764. [ 62.655631] [ 62.655631] stack backtrace: [ 62.655636] CPU: 0 PID: 2764 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #70 [ 62.655638] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006 [ 62.655642] ffffffff8216b7b0 ffff88001be43ab8 ffffffff8176af37 0000000000000007 [ 62.655652] ffff88001c8d4fc0 ffff88001be43b18 ffffffff81766d78 0000000000000000 [ 62.655663] ffff880000000001 ffff880000000001 ffffffff8101025f ffff88001be43b18 [ 62.655671] Call Trace: [ 62.655680] [<ffffffff8176af37>] dump_stack+0x46/0x58 [ 62.655685] [<ffffffff81766d78>] print_usage_bug+0x1f1/0x202 [ 62.655691] [<ffffffff8101025f>] ? save_stack_trace+0x2f/0x50 [ 62.655696] [<ffffffff81099f8c>] mark_lock+0x28c/0x2f0 [ 62.655700] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150 [ 62.655704] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70 [ 62.655712] [<ffffffff81115b09>] ? irq_work_queue+0x69/0xb0 [ 62.655717] [<ffffffff810ab7f2>] ? vprintk_emit+0x1b2/0x520 [ 62.655722] [<ffffffff8109cec5>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 62.655730] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen] [ 62.655734] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130 [ 62.655741] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen] [ 62.655745] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70 [ 62.655752] [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen] [ 62.655758] [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen] [ 62.655766] [<ffffffffa0087a79>] ? pktgen_thread_worker+0xb19/0x1860 [pktgen] [ 62.655771] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10 [ 62.655777] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40 [ 62.655785] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0 [ 62.655791] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 62.655795] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60 [ 62.655800] [<ffffffffa0086f60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen] [ 62.655806] [<ffffffff81078f84>] kthread+0xe4/0x100 [ 62.655813] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 [ 62.655819] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0 [ 62.655824] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-03 07:29:10 +01:00
David S. Miller	aca5f58f9b	netpoll: Fix missing TXQ unlock and and OOPS. The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems. 1) It exits without unlocking the TXQ. 2) It then tries to queue a NULL skb to npinfo->txq. Reported-by: Ahmed Tamrawi <atamrawi@iastate.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:50:52 -05:00
Li RongQing	469bdcefdc	ipv6: fix the use of pcpu_tstats in ip6_vti.c when read/write the 64bit data, the correct lock should be hold. and we can use the generic vti6_get_stats to return stats, and not define a new one in ip6_vti.c Fixes: `87b6d218f3` ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:37:21 -05:00
Li RongQing	abb6013cca	ipv6: fix the use of pcpu_tstats in ip6_tunnel when read/write the 64bit data, the correct lock should be hold. Fixes: `87b6d218f3` ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:37:21 -05:00
Yasushi Asano	fad8da3e08	ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity Fixed a problem with setting the lifetime of an IPv6 address. When setting preferred_lft to a value not zero or infinity, while valid_lft is infinity(0xffffffff) preferred lifetime is set to forever and does not update. Therefore preferred lifetime never becomes deprecated. valid lifetime and preferred lifetime should be set independently, even if valid lifetime is infinity, preferred lifetime must expire correctly (meaning it must eventually become deprecated) Signed-off-by: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:34:40 -05:00
Daniel Borkmann	4d231b76ee	net: llc: fix use after free in llc_ui_recvmsg While commit `30a584d944` fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: `30a584d944` ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:31:09 -05:00
Wei-Chun Chao	7a7ffbabf9	ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC VM to VM GSO traffic is broken if it goes through VXLAN or GRE tunnel and the physical NIC on the host supports hardware VXLAN/GRE GSO offload (e.g. bnx2x and next-gen mlx4). Two issues - (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header integrity check fails in udp4_ufo_fragment if inner protocol is TCP. Also gso_segs is calculated incorrectly using skb->len that includes tunnel header. Fix: robust check should only be applied to the inner packet. (VXLAN & GRE) Once GSO header integrity check passes, NULL segs is returned and the original skb is sent to hardware. However the tunnel header is already pulled. Fix: tunnel header needs to be restored so that hardware can perform GSO properly on the original packet. Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:06:47 -05:00
Cong Wang	c1ddf295f5	net: revert "sched classifier: make cgroup table local" This reverts commit `de6fb288b1`. Otherwise we got: net/sched/cls_cgroup.c:106:29: error: static declaration of ‘net_cls_subsys’ follows non-static declaration static struct cgroup_subsys net_cls_subsys = { ^ In file included from include/linux/cgroup.h:654:0, from net/sched/cls_cgroup.c:18: include/linux/cgroup_subsys.h:35:29: note: previous declaration of ‘net_cls_subsys’ was here SUBSYS(net_cls) ^ make[2]: *** [net/sched/cls_cgroup.o] Error 1 Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 19:02:12 -05:00
Vlad Yasevich	619a60ee04	sctp: Remove outqueue empty state The SCTP outqueue structure maintains a data chunks that are pending transmission, the list of chunks that are pending a retransmission and a length of data in flight. It also tries to keep the emtpy state so that it can performe shutdown sequence or notify user. The problem is that the empy state is inconsistently tracked. It is possible to completely drain the queue without sending anything when using PR-SCTP. In this case, the empty state will not be correctly state as report by Jamal Hadi Salim <jhs@mojatatu.com>. This can cause an association to be perminantly stuck in the SHUTDOWN_PENDING state. Additionally, SCTP is incredibly inefficient when setting the empty state. Even though all the data is availaible in the outqueue structure, we ignore it and walk a list of trasnports. In the end, we can completely remove the extra empty state and figure out if the queue is empty by looking at 3 things: length of pending data, length of in-flight data, and exisiting of retransmit data. All of these are already in the strucutre. Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 17:22:48 -05:00
stephen hemminger	de6fb288b1	sched classifier: make cgroup table local Doesn't need to be global. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:36 -05:00
stephen hemminger	9c75f4029c	sched action: make local function static No need to export functions only used in one file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:36 -05:00
Weilong Chen	dd9b45598a	ipv4: switch and case should be at the same indent Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:36 -05:00
Weilong Chen	442c67f844	ipv4: spaces required around that '=' Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:36 -05:00
Li RongQing	0c3584d589	ipv6: remove prune parameter for fib6_clean_all since the prune parameter for fib6_clean_all always is 0, remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:35 -05:00
wangweidong	b055597697	tipc: make the code look more readable In commit `3b8401fe9d` ("tipc: kill unnecessary goto's") didn't make the code look most readable, so fix it. This patch is cosmetic and does not change the operation of TIPC in any way. Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:35 -05:00
Weilong Chen	2f3ea9a95c	xfrm: checkpatch erros with inline keyword position Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:51 +01:00
Weilong Chen	42054569f9	xfrm: fix checkpatch error Fix that "else should follow close brace '}'". Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:50 +01:00
Weilong Chen	02d0892f98	xfrm: checkpatch erros with space prohibited Fix checkpatch error "space prohibited xxx". Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:50 +01:00
Weilong Chen	3e94c2dcfd	xfrm: checkpatch errors with foo * bar This patch clean up some checkpatch errors like this: ERROR: "foo * bar" should be "foo bar" ERROR: "(foo)" should be "(foo *)" Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:49 +01:00
Weilong Chen	9b7a787d0d	xfrm: checkpatch errors with space This patch cleanup some space errors. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-01-02 07:48:48 +01:00
Salam Noureddine	56022a8fdd	ipv4: arp: update neighbour address when a gratuitous arp is received and arp_accept is set Gratuitous arp packets are useful in switchover scenarios to update client arp tables as quickly as possible. Currently, the mac address of a neighbour is only updated after a locktime period has elapsed since the last update. In most use cases such delays are unacceptable for network admins. Moreover, the "updated" field of the neighbour stucture doesn't record the last time the address of a neighbour changed but records any change that happens to the neighbour. This is clearly a bug since locktime uses that field as meaning "addr_updated". With this observation, I was able to perpetuate a stale address by sending a stream of gratuitous arp packets spaced less than locktime apart. With this change the address is updated when a gratuitous arp is received and the arp_accept sysctl is set. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 00:08:38 -05:00
stephen hemminger	e82435341f	ipv6: namespace cleanups Running 'make namespacecheck' shows: net/ipv6/route.o ipv6_route_table_template rt6_bind_peer net/ipv6/icmp.o icmpv6_route_lookup ipv6_icmp_table_template This addresses some of those warnings by: * make icmpv6_route_lookup static * move inline's out of ip6_route.h since only used into route.c * move rt6_bind_peer into route.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:46:09 -05:00
stephen hemminger	1d143d9f0c	net: core functions cleanup The following functions are not used outside of net/core/dev.c and should be declared static. call_netdevice_notifiers_info __dev_remove_offload netdev_has_any_upper_dev __netdev_adjacent_dev_remove __netdev_adjacent_dev_link_lists __netdev_adjacent_dev_unlink_lists __netdev_adjacent_dev_unlink __netdev_adjacent_dev_link_neighbour __netdev_adjacent_dev_unlink_neighbour And the following are never used and should be deleted netdev_lower_dev_get_private_rcu __netdev_find_adj_rcu Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:46:09 -05:00
stephen hemminger	2173f8d953	netlink: cleanup tap related functions Cleanups in netlink_tap code * remove unused function netlink_clear_multicast_users * make local function static Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:43:36 -05:00
stephen hemminger	3678a9d863	netlink: cleanup rntl_af_register The function __rtnl_af_register is never called outside this code, and the return value is always 0. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:42:19 -05:00
Li RongQing	c3ac17cd6a	ipv6: fix the use of pcpu_tstats in sit when read/write the 64bit data, the correct lock should be hold. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 22:48:59 -05:00
John W. Linville	ad86c55bac	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2014-01-01 15:39:56 -05:00
Pablo Neira Ayuso	d497c63527	netfilter: add help information to new nf_tables Kconfig options Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-01 18:37:10 +01:00
Zhi Yong Wu	21eb218989	net, sch: fix the typo in register_qdisc() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 16:44:10 -05:00
Zhi Yong Wu	855abcf066	net, rps: fix the comment of net_rps_action_and_irq_enable() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 16:44:10 -05:00
David S. Miller	2205369a31	vlan: Fix header ops passthru when doing TX VLAN offload. When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 16:23:35 -05:00
wangweidong	0438816efd	sctp: move skb_dst_set() a bit downwards in sctp_packet_transmit() skb_dst_set will use dst, if dst is NULL although is not a problem, then goto the 'no_route' and free nskb, so do the skb_dst_set is pointless. so move the skb_dst_set after dst check. Remove the unnecessary initialization as well. v2: fix the subject line because it would confuse people, as pointed out by Daniel. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 14:31:44 -05:00
Yang Yingliang	6a031f67c8	sch_netem: support of 64bit rates Add a new attribute to support 64bit rates so that tc can use them to break the 32bit limit. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 14:31:44 -05:00
Yang Yingliang	8cfd88d6d7	sch_netem: more precise length of packets With TSO/GSO/GRO packets, skb->len doesn't represent a precise amount of bytes on wire. This patch replace skb->len with qdisc_pkt_len(skb) which is more precise. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 14:31:43 -05:00
Daniel Borkmann	604d13c97f	netlink: specify netlink packet direction for nlmon In order to facilitate development for netlink protocol dissector, fill the unused field skb->pkt_type of the cloned skb with a hint of the address space of the new owner (receiver) socket in the notion of "to kernel" resp. "to user". At the time we invoke __netlink_deliver_tap_skb(), we already have set the new skb owner via netlink_skb_set_owner_r(), so we can use that for netlink_is_kernel() probing. In normal PF_PACKET network traffic, this field denotes if the packet is destined for us (PACKET_HOST), if it's broadcast (PACKET_BROADCAST), etc. As we only have 3 bit reserved, we can use the value (= 6) of PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel and not supported anywhere, and packets of such type were never exposed to user space, so there are no overlapping users of such kind. Thus, as wished, that seems the only way to make both PACKET_* values non-overlapping and therefore device agnostic. By using those two flags for netlink skbs on nlmon devices, they can be made available and picked up via sll_pkttype (previously unused in netlink context) in struct sockaddr_ll. We now have these two directions: - PACKET_USER (= 6) -> to user space - PACKET_KERNEL (= 7) -> to kernel space Partial `ip a` example strace for sa_family=AF_NETLINK with detected nl msg direction: syscall: direction: sendto(3, ...) = 40 /* to kernel / recvmsg(3, ...) = 3404 / to user / recvmsg(3, ...) = 1120 / to user / recvmsg(3, ...) = 20 / to user / sendto(3, ...) = 40 / to kernel / recvmsg(3, ...) = 168 / to user / recvmsg(3, ...) = 144 / to user / recvmsg(3, ...) = 20 / to user */ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 14:31:43 -05:00
Daniel Borkmann	73bfd370c8	netlink: only do not deliver to tap when both sides are kernel sks We should also deliver packets to nlmon devices when we are in netlink_unicast_kernel(), and only one of the {src,dst} sockets is user sk and the other one kernel sk. That's e.g. the case in netlink diag, netlink route, etc. Still, forbid to deliver messages from kernel to kernel sks. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 14:31:43 -05:00
Neil Horman	94f65193ab	SCTP: Reduce log spamming for sctp setsockopt During a recent discussion regarding some sctp socket options, it was noted that we have several points at which we issue log warnings that can be flooded at an unbounded rate by any user. Fix this by converting all the pr_warns in the sctp_setsockopt path to be pr_warn_ratelimited. Note there are several debug level messages as well. I'm leaving those alone, as, if you turn on pr_debug, you likely want lots of verbosity. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:58:40 -05:00
Yang Yingliang	c76f2a2c4c	sch_dsmark: use correct func name in print messages In dsmark_drop(), the function name printed by pr_debug is "dsmark_reset", correct it to "dsmark_drop" by using __func__ . BTW, replace the other function names with __func__ . Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:50:57 -05:00
Yang Yingliang	a071d27241	sch_htb: use /* comments Do not use C99 // comments and correct a spelling typo. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:50:57 -05:00
Yang Yingliang	c17988a90f	net_sched: replace pr_warning with pr_warn Prefer pr_warn(... to pr_warning(... Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:50:56 -05:00
Weilong Chen	d4dd8aeefd	packet: fix "foo * bar" and "(foo*)" problems Cleanup checkpatch errors.Specially,the second changed line is exactly 80 columns long. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:38:41 -05:00
Li RongQing	46306b4934	ipv6: unneccessary to get address prefix in addrconf_get_prefix_route Since addrconf_get_prefix_route inputs the address prefix to fib6_locate, which does not uses the data which is out of the prefix_len length, so do not need to use ipv6_addr_prefix to get address prefix. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-31 13:37:46 -05:00
Johannes Berg	194ff52d42	cfg80211/mac80211: correct qos-map locking Since the RTNL can't always be held, use wdev/sdata locking for the qos-map dereference in mac80211. This requires cfg80211 to consistently lock it, which it was missing in one place. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2013-12-30 23:14:03 +01:00
Eric Leblond	bee11dc78f	netfilter: nft_reject: support for IPv6 and TCP reset This patch moves nft_reject_ipv4 to nft_reject and adds support for IPv6 protocol. This patch uses functions included in nf_reject.h to implement reject by TCP reset. The code has to be build as a module if NF_TABLES_IPV6 is also a module to avoid compilation error due to usage of IPv6 functions. This has been done in Kconfig by using the construct: depends on NF_TABLES_IPV6 \|\| !NF_TABLES_IPV6 This seems a bit weird in terms of syntax but works perfectly. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-30 18:15:38 +01:00
Eric Leblond	cc70d069e2	netfilter: REJECT: separate reusable code This patch prepares the addition of TCP reset support in the nft_reject module by moving reusable code into a header file. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-30 15:04:41 +01:00
Florian Westphal	f81152e350	net: rose: restore old recvmsg behavior recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit `f3d3342602` (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 22:33:17 -05:00
Ying Xue	84602761ca	tipc: fix deadlock during socket release A deadlock might occur if name table is withdrawn in socket release routine, and while packets are still being received from bearer. CPU0 CPU1 T0: recv_msg() release() T1: tipc_recv_msg() tipc_withdraw() T2: [grab node lock] [grab port lock] T3: tipc_link_wakeup_ports() tipc_nametbl_withdraw() T4: [grab port lock]* named_cluster_distribute() T5: wakeupdispatch() tipc_link_send() T6: [grab node lock]* The opposite order of holding port lock and node lock on above two different paths may result in a deadlock. If socket lock instead of port lock is used to protect port instance in tipc_withdraw(), the reverse order of holding port lock and node lock will be eliminated, as a result, the deadlock is killed as well. Reported-by: Lars Everbrand <lars.everbrand@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 22:24:07 -05:00
stephen hemminger	24245a1b05	lro: remove dead code Remove leftover code that is not used anywhere in current tree. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 16:34:25 -05:00
stephen hemminger	f7e56a76ac	tcp: make local functions static The following are only used in one file: tcp_connect_init tcp_set_rto Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 16:34:24 -05:00
Eric Leblond	5f291c2869	netfilter: select NFNETLINK when enabling NF_TABLES In Kconfig, nf_tables depends on NFNETLINK so building nf_tables as a module or inside kernel depends on the state of NFNETLINK inside the kernel config. If someone wants to build nf_tables inside the kernel, it is necessary to also build NFNETLINK inside the kernel. But NFNETLINK can not be set in the menu so it is necessary to toggle other nfnetlink subsystems such as logging and nfacct to see the nf_tables switch. This patch changes the dependency from 'depend' to 'select' inside Kconfig to allow to set the build of nftables as modules or inside kernel independently. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-29 12:08:29 +01:00
David S. Miller	8eb9bff0ed	Included changes: - reset netfilter-bridge state when removing the batman-adv header from an incoming packet. This prevents netfilter bridge from being fooled when the same packet enters a bridge twice (or more): the first time within the batman-adv header and the second time without. - adjust the packet layout to prevent any architecture from adding padding bytes. All the structs sent over the wire now have size multiple of 4bytes (unless pack(2) is used). - fix access to the inner vlan_eth header when reading the VID in the rx path. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJSvvT6AAoJEEKTMo6mOh1Vn4EP+gKxohskmxmnRd/MnOL1FyyR +T1+RtNm3Z2+v0OlR1zcHS5MJwt7KDrakS+w4tJTNiZTaPPjHQZu9wAn4wlFt9ix 9GO9nAtTbOcPj72WZsmq3Wslm2qKnF+kmu7hdOu7Grv3n3EEafQ+P8vzw7FACB+N dMiwkfGdHgSxlM7a1faJM9N2qiSM2vUesElYPyinhjCciHwBuFW3HwK3CxayVwVE Q20AMlgUd18ZGgy5oqrZz/1LCpaOsydtbcNHOXjF8gX0zwCTmog9GTAD6hT0mtTU 8lo2DHLFH9Hxm1bs8jDkiQEBLK2gXPGb6ic07IRE6g24my62RrAizE/yMmd1WrCP 6I8Qhs58tQ0RUVH8m6XG71TBw3nLk69QbKxHbYiDWlLQhrygx/9+ZXfuu3VUPvGS 9EmNuWw3Io7+KqaT0BJC6CJIkZLTL7oJvgyO2uFGEgkSjY5vxSF8GU0yisP9xilW bpfhQO64gsmxjY2tjmIVRckVTUOsMnJKVlquq6aHKfgsOQdnTTNOd+l7bD8izIJ8 GwZKZLuzYJqYop4TxpkbNK2ckCl0kcMCHHLInJ/pO1KU0sAs+V1+HHouX6VGr9HT IsbBvwMslbRPi8uOFAzueHTT/DzBXdP4sMsg5QomBQOqH8LlZuQ4ZBHtUbEQOVrp xRCy3Fj/wjWzZUFdYE3f =RDuB -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - reset netfilter-bridge state when removing the batman-adv header from an incoming packet. This prevents netfilter bridge from being fooled when the same packet enters a bridge twice (or more): the first time within the batman-adv header and the second time without. - adjust the packet layout to prevent any architecture from adding padding bytes. All the structs sent over the wire now have size multiple of 4bytes (unless pack(2) is used). - fix access to the inner vlan_eth header when reading the VID in the rx path. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 00:30:59 -05:00
David S. Miller	a72338a00e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net This patchset contains four nf_tables fixes, one IPVS fix due to missing updates in the interaction with the new sedadj conntrack extension that was added to support the netfilter synproxy code, and a couple of one-liners to fix netnamespace netfilter issues. More specifically, they are: * Fix ipv6_find_hdr() call without offset being explicitly initialized in nft_exthdr, as required by that function, from Daniel Borkmann. * Fix oops in nfnetlink_log when using netns and unloading the kernel module, from Gao feng. * Fix BUG_ON in nf_ct_timestamp extension after netns is destroyed, from Helmut Schaa. * Fix crash in IPVS due to missing sequence adjustment extension being allocated in the conntrack, from Jesper Dangaard Brouer. * Add bugtrap to spot a warning in case you deference sequence adjustment conntrack area when not available, this should help to catch similar invalid dereferences in the Netfilter tree, also from Jesper. * Fix incomplete dumping of sets in nf_tables when retrieving by family, from me. * Fix oops when updating the table state (dormant <-> active) and having user (not base ) chains, from me. * Fix wrong validation in set element data that results in returning -EINVAL when using the nf_tables dictionary feature with mappings, also from me. We don't usually have this amount of fixes by this time (as we're already in -rc5 of the development cycle), although half of them are related to nf_tables which is a relatively new thing, and I also believe that holidays have also delayed the flight of bugfixes to mainstream a bit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 00:24:28 -05:00
Stephen Hemminger	ea074b3495	ipv4: ping make local stuff static Don't export ping_table or ping_v4_sendmsg. Both are only used inside ping code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-28 17:05:45 -05:00
Stephen Hemminger	068a6e1834	ipv4: remove unused function inetpeer_invalidate_family defined but never used Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-28 17:03:20 -05:00
Stephen Hemminger	7195cf7221	arp: make arp_invalidate static Don't export arp_invalidate, only used in arp.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-28 17:02:46 -05:00
Stephen Hemminger	c9cb6b6ec1	ipv4: make fib_detect_death static Make fib_detect_death function static only used in one file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-28 17:01:46 -05:00
Pablo Neira Ayuso	2ee0d3c80f	netfilter: nf_tables: fix wrong datatype in nft_validate_data_load() This patch fixes dictionary mappings, eg. add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 } The kernel was returning -EINVAL in nft_validate_data_load() since the type of the set element data that is passed was the real userspace datatype instead of NFT_DATA_VALUE. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-28 22:32:28 +01:00
Antonio Quartulli	2b1e2cb359	batman-adv: fix vlan header access When batadv_get_vid() is invoked in interface_rx() the batman-adv header has already been removed, therefore the header_len argument has to be 0. Introduced by `c018ad3de6` ("batman-adv: add the VLAN ID attribute to the TT entry") Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-12-28 14:48:40 +01:00
Antonio Quartulli	55883fd104	batman-adv: clean nf state when removing protocol header If an interface enslaved into batman-adv is a bridge (or a virtual interface built on top of a bridge) the nf_bridge member of the skbs reaching the soft-interface is filled with the state about "netfilter bridge" operations. Then, if one of such skbs is locally delivered, the nf_bridge member should be cleaned up to avoid that the old state could mess up with other "netfilter bridge" operations when entering a second bridge. This is needed because batman-adv is an encapsulation protocol. However at the moment skb->nf_bridge is not released at all leading to bogus "netfilter bridge" behaviours. Fix this by cleaning the netfilter state of the skb before it gets delivered to the upper layer in interface_rx(). Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-12-28 14:47:44 +01:00
Pablo Neira Ayuso	994737513e	netfilter: nf_tables: remove nft_meta_target In `e035b77` ("netfilter: nf_tables: nft_meta module get/set ops"), we got the meta target merged into the existing meta expression. So let's get rid of this dead code now that we fully support that feature. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-28 14:06:25 +01:00
Arturo Borrero Gonzalez	e035b77ac7	netfilter: nf_tables: nft_meta module get/set ops This patch adds kernel support for the meta expression in get/set flavour. The set operation indicates that a given packet has to be set with a property, currently one of mark, priority, nftrace. The get op is what was currently working: evaluate the given packet property. In the nftrace case, the value is always 1. Such behaviour is copied from net/netfilter/xt_TRACE.c The NFTA_META_DREG and NFTA_META_SREG attributes are mutually exclusives. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-28 14:02:12 +01:00
Antonio Quartulli	ca66304644	batman-adv: fix alignment for batadv_tvlv_tt_change Make struct batadv_tvlv_tt_change a multiple 4 bytes long to avoid padding on any architecture. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-12-28 12:51:18 +01:00
Simon Wunderlich	2f7a318219	batman-adv: fix size of batadv_bla_claim_dst Since this is a mac address and always 48 bit, and we can assume that it is always aligned to 2-byte boundaries, add a pack(2) pragma. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2013-12-28 12:51:17 +01:00
Antonio Quartulli	27a417e6ba	batman-adv: fix size of batadv_icmp_header struct batadv_icmp_header currently has a size of 17, which will be padded to 20 on some architectures. Fix this by unrolling the header into the parent structures. Moreover keep the ICMP parsing functions as generic as they are now by using a stub icmp_header struct during packet parsing. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-12-28 12:51:16 +01:00
Simon Wunderlich	a40d9b075c	batman-adv: fix header alignment by unrolling batadv_header The size of the batadv_header of 3 is problematic on some architectures which automatically pad all structures to a 32 bit boundary. To not lose performance by packing this struct, better embed it into the various host structures. Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2013-12-28 12:51:16 +01:00
Simon Wunderlich	46b76e0b8b	batman-adv: fix alignment for batadv_coded_packet The compiler may decide to pad the structure, and then it does not have the expected size of 46 byte. Fix this by moving it in the pragma pack(2) part of the code. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2013-12-28 12:51:15 +01:00
Pablo Neira Ayuso	d201297561	netfilter: nf_tables: fix oops when updating table with user chains This patch fixes a crash while trying to deactivate a table that contains user chains. You can reproduce it via: % nft add table table1 % nft add chain table1 chain1 % nft-table-upd ip table1 dormant [ 253.021026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 253.021114] IP: [<ffffffff8134cebd>] nf_register_hook+0x35/0x6f [ 253.021167] PGD 30fa5067 PUD 30fa2067 PMD 0 [ 253.021208] Oops: 0000 [#1] SMP [...] [ 253.023305] Call Trace: [ 253.023331] [<ffffffffa0885020>] nf_tables_newtable+0x11c/0x258 [nf_tables] [ 253.023385] [<ffffffffa0878592>] nfnetlink_rcv_msg+0x1f4/0x226 [nfnetlink] [ 253.023438] [<ffffffffa0878418>] ? nfnetlink_rcv_msg+0x7a/0x226 [nfnetlink] [ 253.023491] [<ffffffffa087839e>] ? nfnetlink_bind+0x45/0x45 [nfnetlink] [ 253.023542] [<ffffffff8134b47e>] netlink_rcv_skb+0x3c/0x88 [ 253.023586] [<ffffffffa0878973>] nfnetlink_rcv+0x3af/0x3e4 [nfnetlink] [ 253.023638] [<ffffffff813fb0d4>] ? _raw_read_unlock+0x22/0x34 [ 253.023683] [<ffffffff8134af17>] netlink_unicast+0xe2/0x161 [ 253.023727] [<ffffffff8134b29a>] netlink_sendmsg+0x304/0x332 [ 253.023773] [<ffffffff8130d250>] __sock_sendmsg_nosec+0x25/0x27 [ 253.023820] [<ffffffff8130fb93>] sock_sendmsg+0x5a/0x7b [ 253.023861] [<ffffffff8130d5d5>] ? copy_from_user+0x2a/0x2c [ 253.023905] [<ffffffff8131066f>] ? move_addr_to_kernel+0x35/0x60 [ 253.023952] [<ffffffff813107b3>] SYSC_sendto+0x119/0x15c [ 253.023995] [<ffffffff81401107>] ? sysret_check+0x1b/0x56 [ 253.024039] [<ffffffff8108dc30>] ? trace_hardirqs_on_caller+0x140/0x1db [ 253.024090] [<ffffffff8120164e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 253.024141] [<ffffffff81310caf>] SyS_sendto+0x9/0xb [ 253.026219] [<ffffffff814010e2>] system_call_fastpath+0x16/0x1b Reported-by: Alex Wei <alex.kern.mentor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-28 12:18:16 +01:00
Pablo Neira Ayuso	e38195bf32	netfilter: nf_tables: fix dumping with large number of sets If not table name is specified, the dumping of the existing sets may be incomplete with a sufficiently large number of sets and tables. This patch fixes missing reset of the cursors after finding the location of the last object that has been included in the previous multi-part message. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-12-28 12:14:42 +01:00
Li RongQing	6a9eadccff	ipv6: release dst properly in ipip6_tunnel_xmit if a dst is not attached to anywhere, it should be released before exit ipip6_tunnel_xmit, otherwise cause dst memory leakage. Fixes: `61c1db7fae` ("ipv6: sit: add GSO/TSO support") Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-27 13:14:40 -05:00
Weilong Chen	3193502e5b	ieee802154: space prohibited before that close parenthesis Fix checkpatch error with space. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-27 13:06:16 -05:00
Weilong Chen	3cdba604d0	llc: "foo* bar" should be "foo *bar" Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-27 13:06:15 -05:00
Jamal Hadi Salim	1a29321ed0	net_sched: act: Dont increment refcnt on replace This is a bug fix. The existing code tries to kill many birds with one stone: Handling binding of actions to filters, new actions and replacing of action attributes. A simple test case to illustrate: XXXX moja@fe1:~$ sudo tc actions add action drop index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 1 bind 0 moja@fe1:~$ sudo tc actions replace action ok index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 2 bind 0 XXXX The above shows the refcounf being wrongly incremented on replace. There are more complex scenarios with binding of actions to filters that i am leaving out that didnt work as well... Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-27 12:50:00 -05:00
Sasha Levin	c2349758ac	rds: prevent dereference of a NULL device Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[<ffffffff84225f52>] [<ffffffff84225f52>] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [<ffffffff8421c9c3>] rds_bind+0x73/0xf0 [ 1317.270230] [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0 [ 1317.270230] [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [<ffffffff83e4cece>] SyS_bind+0xe/0x10 [ 1317.270230] [<ffffffff843a6ad0>] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP <ffff8803cd31bdf8> [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-27 12:33:58 -05:00
Jesper Dangaard Brouer	b25adce160	ipvs: correct usage/allocation of seqadj ext in ipvs The IPVS FTP helper ip_vs_ftp could trigger an OOPS in nf_ct_seqadj_set, after commit `41d73ec053` (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT). This is because, the seqadj ext is now allocated dynamically, and the IPVS code didn't handle this situation. Fix this in the IPVS nfct code by invoking the alloc function nfct_seqadj_ext_add(). Fixes: `41d73ec053` (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT) Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-12-27 12:30:02 +09:00
Jesper Dangaard Brouer	db12cf2743	netfilter: WARN about wrong usage of sequence number adjustments Since commit `41d73ec053` (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT), the sequence number extension is dynamically allocated. Instead of dying, give a WARN splash, in case of wrong usage of the seqadj code, e.g. when forgetting to allocate via nfct_seqadj_ext_add(). Wrong usage have been seen in the IPVS code path. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-12-27 12:29:54 +09:00
Geert Uytterhoeven	9dcbe1b87c	ipvs: Remove unused variable ret from sync_thread_master() net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master': net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable] Commit `35a2af94c7` ("sched/wait: Make the __wait_event*() interface more friendly") changed how the interruption state is returned. However, sync_thread_master() ignores this state, now causing a compile warning. According to Julian Anastasov <ja@ssi.bg>, this behavior is OK: "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS users were confused why IPVS threads increase the load average. So, we switched to _interruptible calls and later the socket polling was added." Document this, as requested by Peter Zijlstra, to avoid precious developers disappearing in this pitfall in the future. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-12-27 12:19:32 +09:00
Yang Yingliang	2e04ad424b	sch_tbf: add TBF_BURST/TBF_PBURST attribute When we set burst to 1514 with low rate in userspace, the kernel get a value of burst that less than 1514, which doesn't work. Because it may make some loss when transform burst to buffer in userspace. This makes burst lose some bytes, when the kernel transform the buffer back to burst. This patch adds two new attributes to support sending burst/mtu to kernel directly to avoid the loss. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-26 13:54:22 -05:00
wangweidong	4e2d52bfb2	sctp: fix checkpatch errors with //commen Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-26 13:47:48 -05:00
wangweidong	8d72651d86	sctp: fix checkpatch errors with open brace '{' and trailing statements fix checkpatch errors below: ERROR: that open brace { should be on the previous line ERROR: open brace '{' following function declarations go on the next line ERROR: trailing statements should be on next line Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-26 13:47:48 -05:00

... 2 3 4 5 6 ...

31190 Commits