linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Steffen Klassert	3e08f4a72f	ip_tunnel: Fix a memory corruption in ip_tunnel_xmit We might extend the used aera of a skb beyond the total headroom when we install the ipip header. Fix this by calling skb_cow_head() unconditionally. Bug was introduced with commit `c544193214` ("GRE: Refactor GRE tunneling code.") Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-01 12:42:16 -04:00
David S. Miller	e024bdc051	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for your net tree, they are: * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from Patrick McHardy. * Fix possible weight overflow in lblc and lblcr schedulers due to 32-bits arithmetics, from Simon Kirby. * Fix possible memory access race in the lblc and lblcr schedulers, introduced when it was converted to use RCU, two patches from Julian Anastasov. * Fix hard dependency on CPU 0 when reading per-cpu stats in the rate estimator, from Julian Anastasov. * Fix race that may lead to object use after release, when invoking ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-01 12:39:35 -04:00
Pablo Neira Ayuso	91cb498e6a	netfilter: cttimeout: allow to set/get default protocol timeouts Default timeouts are currently set via proc/sysctl interface, the typical pattern is a file name like: /proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE This results in one entry per default protocol state timeout. This patch simplifies this by allowing to set default protocol timeouts via cttimeout netlink interface. This should allow us to get rid of the existing proc/sysctl code in the midterm. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 13:17:39 +02:00
holger@eitzenberger.org	180cf72f56	netfilter: nf_ct_sip: consolidate NAT hook functions There are currently seven different NAT hooks used in both nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in nf_conntrack_sip, then set from the nf_nat_sip NAT helper. And because each of them is exported there is quite some overhead introduced due of this. By introducing nf_nat_sip_hooks I am able to reduce both text/data somewhat. For nf_conntrack_sip e. g. I get text data bss dec old 15243 5256 32 20531 new 15010 5192 32 20234 Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:47:09 +02:00
Gao feng	afff14f608	netfilter: nfnetlink_log: use proper net to allocate skb Use proper net struct to allocate skb, otherwise netlink mmap will be of no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:46:56 +02:00
Gao feng	7433268783	netfilter: nfnetlink_queue: use proper net namespace to allocate skb Use proper net struct to allocate skb, otherwise netlink mmap will have no effect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-01 12:20:31 +02:00
Salam Noureddine	9260d3e101	ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 22:28:58 -07:00
Salam Noureddine	e2401654dd	ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 22:28:56 -07:00
Hannes Frederic Sowa	3da812d860	ipv6: gre: correct calculation of max_headroom gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header, so initialize max_headroom to zero. Otherwise the if (encap_limit >= 0) { max_headroom += 8; mtu -= 8; } increments an uninitialized variable before max_headroom was reset. Found with coverity: 728539 Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 22:04:09 -07:00
Eric W. Biederman	0bbf87d852	net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 - Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 21:59:38 -07:00
stephen hemminger	56d7b53f47	ethernet: use likely() for common Ethernet encap Mark code path's likely/unlikely based on most common usage. * Very few devices use dsa tags. * Most traffic is Ethernet (not 802.2) * No sane person uses trailer type or Novell encapsulation Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 21:52:53 -07:00
stephen hemminger	12861b7bc2	ethernet: cleanup eth_type_trans Remove old legacy comment and weird if condition. The comment has outlived it's stay and is throwback to some early net code (before my time). Maybe Dave remembers what it meant. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 21:52:52 -07:00
Eric Dumazet	c9eeec26e3	tcp: TSQ can use a dynamic limit When TCP Small Queues was added, we used a sysctl to limit amount of packets queues on Qdisc/device queues for a given TCP flow. Problem is this limit is either too big for low rates, or too small for high rates. Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO auto sizing, it can better control number of packets in Qdisc/device queues. New limit is two packets or at least 1 to 2 ms worth of packets. Low rates flows benefit from this patch by having even smaller number of packets in queues, allowing for faster recovery, better RTT estimations. High rates flows benefit from this patch by allowing more than 2 packets in flight as we had reports this was a limiting factor to reach line rate. [ In particular if TX completion is delayed because of coalescing parameters ] Example for a single flow on 10Gbp link controlled by FQ/pacing 14 packets in flight instead of 2 $ tc -s -d qd qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0 requeues 6822476) rate 9346Mbit 771713pps backlog `953820b` 14p requeues 6822476 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit Note that sk_pacing_rate is currently set to twice the actual rate, but this might be refined in the future when a flow is in congestion avoidance. Additional change : skb->destructor should be set to tcp_wfree(). A future patch (for linux 3.13+) might remove tcp_limit_output_bytes Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 20:41:57 -07:00
Li RongQing	fbadadd90c	ipv6: Not need to set fl6.flowi6_flags as zero setting fl6.flowi6_flags as zero after memset is redundant, Remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 19:14:11 -04:00
Eric Dumazet	8d34ce10c5	pkt_sched: fq: qdisc dismantle fixes fq_reset() should drops all packets in queue, including throttled flows. This patch moves code from fq_destroy() to fq_reset() to do the cleaning. fq_change() must stop calling fq_dequeue() if all remaining packets are from throttled flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:51:23 -04:00
stephen hemminger	6459082a3c	qdisc: basic classifier - remove unnecessary initialization err is set once, then first code resets it. err = tcf_exts_validate(...) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jamal Hadi Salim <hadi@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:47:43 -04:00
stephen hemminger	0c4e4020f0	qdisc: meta return ENOMEM on alloc failure Rather than returning earlier value (EINVAL), return ENOMEM if kzalloc fails. Found while reviewing to find another EINVAL condition. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:47:43 -04:00
Oliver Smith	7c3ad056ef	netfilter: ipset: Add hash:net,port,net module to kernel. This adds a new set that provides similar functionality to ip,port,net but permits arbitrary size subnets for both the first and last parameter. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:58 +02:00
Vitaly Lavrov	1785e8f473	netfiler: ipset: Add net namespace for ipset This patch adds netns support for ipset. Major changes were made in ip_set_core.c and ip_set.h. Global variables are moved to per net namespace. Added initialization code and the destruction of the network namespace ipset subsystem. In the prototypes of public functions ip_set_* added parameter "struct net". The remaining corrections related to the change prototypes of public functions ip_set_. The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347 Signed-off-by: Vitaly Lavrov <lve@guap.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:52 +02:00
Jozsef Kadlecsik	3fd986b3d9	netfilter: ipset: Use a common function at listing the extensions Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:42:36 +02:00
Jozsef Kadlecsik	8ec81f9a4d	netfilter: ipset: For set:list types, replaced elements must be zeroed out The new extensions require zero initialization for the new element to be added into a slot from where another element was pushed away. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Jozsef Kadlecsik	80571a9ea4	netfilter: ipset: Fix hash resizing with comments The destroy function must take into account that resizing doesn't create new extensions so those cannot be destroyed at resize. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	fda75c6d9e	netfilter: ipset: Support comments in hash-type ipsets. This provides kernel support for creating ipsets with comment support. This does incur a penalty to flushing/destroying an ipset since all entries are walked in order to free the allocated strings, this penalty is of course less expensive than the operation of listing an ipset to userspace, so for general-purpose usage the overall impact is expected to be little to none. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	81b10bb4bd	netfilter: ipset: Support comments in the list-type ipset. This provides kernel support for creating list ipsets with the comment annotation extension. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:29 +02:00
Oliver Smith	b90cb8ba19	netfilter: ipset: Support comments in bitmap-type ipsets. This provides kernel support for creating bitmap ipsets with comment support. As is the case for hashes, this incurs a penalty when flushing or destroying the entire ipset as the entries must first be walked in order to free the comment strings. This penalty is of course far less than the cost of listing an ipset to userspace. Any set created without support for comments will be flushed/destroyed as before. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Oliver Smith	68b63f08d2	netfilter: ipset: Support comments for ipset entries in the core. This adds the core support for having comments on ipset entries. The comments are stored as standard null-terminated strings in dynamically allocated memory after being passed to the kernel. As a result of this, code has been added to the generic destroy function to iterate all extensions and call that extension's destroy task if the set has that extension activated, and if such a task is defined. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Oliver Smith	ea53ac5b63	netfilter: ipset: Add hash:net,net module to kernel. This adds a new set that provides the ability to configure pairs of subnets. A small amount of additional handling code has been added to the generic hash header file - this code is conditionally activated by a preprocessor definition. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik	d9628bbeca	netfilter: ipset: Kconfig: ipset needs NETFILTER_NETLINK Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik	b91b396d5e	netfilter: ipset: list:set: make sure all elements are checked by the gc When an element timed out, the next one was skipped by the garbage collector, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	40cd63bf33	netfilter: ipset: Support extensions which need a per data destroy function Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	03c8b234e6	netfilter: ipset: Generalize extensions support Get rid of the structure based extensions and introduce a blob for the extensions. Thus we can support more extension types easily. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	ca134ce864	netfilter: ipset: Move extension data to set structure Default timeout and extension offsets are moved to struct set, because all set types supports all extensions and it makes possible to generalize extension support. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	f925f70569	netfilter: ipset: Rename extension offset ids to extension ids Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik	a04d8b6bd9	netfilter: ipset: Prepare ipset to support multiple networks for hash types In order to support hash:net,net, hash:net,port,net etc. types, arrays are introduced for the book-keeping of existing cidr sizes and network numbers in a set. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	5e04c0c38c	netfilter: ipset: Introduce new operation to get both setname and family ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating a clear error message to the user, which is not helpful. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	bd3129fc5e	netfilter: ipset: order matches and targets separatedly in xt_set.c Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Anders K. Pedersen	60b0fe3724	netfilter: ipset: Support package fragments for IPv4 protos without ports Enable ipset port set types to match IPv4 package fragments for protocols that doesn't have ports (or the port information isn't supported by ipset). For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched, while subsequent fragments wasn't. This is not possible for IPv6, where the protocol is in the fragmented part of the package unlike IPv4, where the protocol is in the IP header. IPPROTO_ICMPV6 is deliberately not included, because it isn't relevant for IPv4. Signed-off-by: Anders K. Pedersen <akp@surftown.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik	20b2fab483	netfilter: ipset: Fix "may be used uninitialized" warnings Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	35b8dcf8c3	netfilter: ipset: Rename simple macro names to avoid namespace issues. Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	a0f28dc754	netfilter: ipset: Fix sparse warnings due to missing rcu annotations Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	b3aabd149c	netfilter: ipset: Sparse warning about shadowed variable fixed net/netfilter/ipset/ip_set_hash_ipportnet.c:275:20: warning: symbol 'cidr' shadows an earlier one Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Jozsef Kadlecsik	122ebbf24c	netfilter: ipset: Don't call ip_nest_end needlessly in the error path Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2013-09-30 21:33:25 +02:00
Eric Dumazet	b86783587b	net: flow_dissector: fix thoff for IPPROTO_AH In commit `8ed781668d` ("flow_keys: include thoff into flow_keys for later usage"), we missed that existing code was using nhoff as a temporary variable that could not always contain transport header offset. This is not a problem for TCP/UDP because port offset (@poff) is 0 for these protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:32:05 -04:00
David S. Miller	7b77d161ce	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: include/net/xfrm.h Simple conflict between Joe Perches "extern" removal for function declarations in header files and the changes in Steffen's tree. Steffen Klassert says: ==================== Two patches that are left from the last development cycle. Manual merging of include/net/xfrm.h is needed. The conflict can be solved as it is currently done in linux-next. 1) We announce the creation of temporary acquire state via an asyc event, so the deletion should be annunced too. From Nicolas Dichtel. 2) The VTI tunnels do not real tunning, they just provide a routable IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:24:57 -04:00
Nicolas Dichtel	991fb3f74c	dev: always advertise rx_flags changes via netlink When flags IFF_PROMISC and IFF_ALLMULTI are changed, netlink messages are not consistent. For example, if a multicast daemon is running (flag IFF_ALLMULTI set in dev->flags but not dev->gflags, ie not exported to userspace) and then a user sets it via netlink (flag IFF_ALLMULTI set in dev->flags and dev->gflags, ie exported to userspace), no netlink message is sent. Same for IFF_PROMISC and because dev->promiscuity is exported via IFLA_PROMISCUITY, we may send a netlink message after each change of this counter. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:08:13 -04:00
Nicolas Dichtel	a528c219df	dev: update __dev_notify_flags() to send rtnl msg This patch only prepares the next one, there is no functional change. Now, __dev_notify_flags() can also be used to notify flags changes via rtnetlink. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:08:12 -04:00
Paul Marks	c9d55d5bff	ipv6: Fix preferred_lft not updating in some cases Consider the scenario where an IPv6 router is advertising a fixed preferred_lft of 1800 seconds, while the valid_lft begins at 3600 seconds and counts down in realtime. A client should reset its preferred_lft to 1800 every time the RA is received, but a bug is causing Linux to ignore the update. The core problem is here: if (prefered_lft != ifp->prefered_lft) { Note that ifp->prefered_lft is an offset, so it doesn't decrease over time. Thus, the comparison is always (1800 != 1800), which fails to trigger an update. The most direct solution would be to compute a "stored_prefered_lft", and use that value in the comparison. But I think that trying to filter out unnecessary updates here is a premature optimization. In order for the filter to apply, both of these would need to hold: - The advertised valid_lft and preferred_lft are both declining in real time. - No clock skew exists between the router & client. So in this patch, I've set "update_lft = 1" unconditionally, which allows the surrounding code to be greatly simplified. Signed-off-by: Paul Marks <pmarks@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:06:19 -04:00
Pravin B Shelar	d4a71b155c	ip_tunnel: Do not use stale inner_iph pointer. While sending packet skb_cow_head() can change skb header which invalidates inner_iph pointer to skb header. Following patch avoid using it. Found by code inspection. This bug was introduced by commit `0e6fbc5b6c` (ip_tunnels: extend iptunnel_xmit()). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-30 15:05:07 -04:00
Patrick McHardy	f4a87e7bd2	netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packets TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: Martin Topholm <mph@one.com> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-30 12:44:38 +02:00
Eric Dumazet	62748f32d5	net: introduce SO_MAX_PACING_RATE As mentioned in commit `afe4fd0624` ("pkt_sched: fq: Fair Queue packet scheduler"), this patch adds a new socket option. SO_MAX_PACING_RATE offers the application the ability to cap the rate computed by transport layer. Value is in bytes per second. u32 val = 1000000; setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val)); To be effectively paced, a flow must use FQ packet scheduler. Note that a packet scheduler takes into account the headers for its computations. The effective payload rate depends on MSS and retransmits if any. I chose to make this pacing rate a SOL_SOCKET option instead of a TCP one because this can be used by other protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:35:41 -07:00
Francesco Fusco	aa66158145	ipv4: processing ancillary IP_TOS or IP_TTL If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out packets with the specified TTL or TOS overriding the socket values specified with the traditional setsockopt(). The struct inet_cork stores the values of TOS, TTL and priority that are passed through the struct ipcm_cookie. If there are user-specified TOS (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are used to override the per-socket values. In case of TOS also the priority is changed accordingly. Two helper functions get_rttos and get_rtconn_flags are defined to take into account the presence of a user specified TOS value when computing RT_TOS and RT_CONN_FLAGS. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:21:52 -07:00
Francesco Fusco	f02db315b8	ipv4: IP_TOS and IP_TTL can be specified as ancillary data This patch enables the IP_TTL and IP_TOS values passed from userspace to be stored in the ipcm_cookie struct. Three fields are added to the struct: - the TTL, expressed as __u8. The allowed values are in the [1-255]. A value of 0 means that the TTL is not specified. - the TOS, expressed as __s16. The allowed values are in the range [0,255]. A value of -1 means that the TOS is not specified. - the priority, expressed as a char and computed when handling the ancillary data. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:21:51 -07:00
Eric Dumazet	9a3bab6b05	net: net_secret should not depend on TCP A host might need net_secret[] and never open a single socket. Problem added in commit `aebda156a5` ("net: defer net_secret[] initialization") Based on prior patch from Hannes Frederic Sowa. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:19:40 -07:00
Eric W. Biederman	50624c934d	net: Delay default_device_exit_batch until no devices are unregistering v2 There is currently serialization network namespaces exiting and network devices exiting as the final part of netdev_run_todo does not happen under the rtnl_lock. This is compounded by the fact that the only list of devices unregistering in netdev_run_todo is local to the netdev_run_todo. This lack of serialization in extreme cases results in network devices unregistering in netdev_run_todo after the loopback device of their network namespace has been freed (making dst_ifdown unsafe), and after the their network namespace has exited (making the NETDEV_UNREGISTER, and NETDEV_UNREGISTER_FINAL callbacks unsafe). Add the missing serialization by a per network namespace count of how many network devices are unregistering and having a wait queue that is woken up whenever the count is decreased. The count and wait queue allow default_device_exit_batch to wait until all of the unregistration activity for a network namespace has finished before proceeding to unregister the loopback device and then allowing the network namespace to exit. Only a single global wait queue is used because there is a single global lock, and there is a single waiter, per network namespace wait queues would be a waste of resources. The per network namespace count of unregistering devices gives a progress guarantee because the number of network devices unregistering in an exiting network namespace must ultimately drop to zero (assuming network device unregistration completes). The basic logic remains the same as in v1. This patch is now half comment and half rtnl_lock_unregistering an expanded version of wait_event performs no extra work in the common case where no network devices are unregistering when we get to default_device_exit_batch. Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:09:15 -07:00
$Catalin$ux$ M. BOIE$ Catalin$ux$ M. BOIE	7df37ff33d	IPv6 NAT: Do not drop DNATed 6to4/6rd packets When a router is doing DNAT for 6to4/6rd packets the latest anti-spoofing commit `218774dc` ("ipv6: add anti-spoofing checks for 6to4 and 6rd") will drop them because the IPv6 address embedded does not match the IPv4 destination. This patch will allow them to pass by testing if we have an address that matches on 6to4/6rd interface. I have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR. Also, log the dropped packets (with rate limit). Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-28 15:56:15 -04:00
Hannes Frederic Sowa	0a67d3efa4	ipv6: compare sernum when walking fib for /proc/net/ipv6_route as safety net This patch provides an additional safety net against NULL pointer dereferences while walking the fib trie for the new /proc/net/ipv6_route walkers. I never needed it myself and am unsure if it is needed at all, but the same checks where introduced in `2bec5a369e` ("ipv6: fib: fix crash when changing large fib while dumping it") to fix NULL pointer bugs. This patch is separated from the first patch to make it easier to revert if we are sure we can drop this logic. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-27 17:32:17 -04:00
Hannes Frederic Sowa	8d2ca1d7b5	ipv6: avoid high order memory allocations for /proc/net/ipv6_route Dumping routes on a system with lots rt6_infos in the fibs causes up to 11-order allocations in seq_file (which fail). While we could switch there to vmalloc we could just implement the streaming interface for /proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from single_open_net to seq_open_net. loff_t *pos tracks dst entries. Also kill never used struct rt6_proc_arg and now unused function fib6_clean_all_ro. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-27 17:32:16 -04:00
John W. Linville	0a878747e1	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem Also fixed-up a badly indented closing brace... Signed-off-by: John W. Linville <linville@tuxdriver.com>	2013-09-27 13:11:17 -04:00
Gustavo Padovan	1025c04cec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Conflicts: net/bluetooth/hci_core.c	2013-09-27 11:56:14 -03:00
Gao feng	7722e0d1c0	netfilter: xt_TCPMSS: lookup route from proper net namespace Otherwise the pmtu will be incorrect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:18:23 +02:00
Gao feng	de1389b116	netfilter: xt_TCPMSS: Get mtu only if clamp-mss-to-pmtu is specified This patch refactors the code to skip tcpmss_reverse_mtu if no clamp-mss-to-pmtu is specified. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:17:59 +02:00
holger@eitzenberger.org	b21613aeb6	netfilter: nf_ct_sip: extend RCU read lock in set_expected_rtp_rtcp() Currently set_expected_rtp_rtcp() in the SIP helper uses rcu_dereference() two times to access two different NAT hook functions. However, only the first one is protected by the RCU reader lock, but the 2nd isn't. Fix it by extending the RCU protected area. This is more a cosmetic thing since we rely on all netfilter hooks being rcu_read_lock()ed by nf_hook_slow() in many places anyways, as Patrick McHardy clarified. Signed-off-by: Holger Eitzenberger <holger.eitzenberger@sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-09-27 16:17:47 +02:00
Veaceslav Falico	5831d66e80	net: create sysfs symlinks for neighbour devices Also, remove the same functionality from bonding - it will be already done for any device that links to its lower/upper neighbour. The links will be created for dev's kobject, and will look like lower_eth0 for lower device eth0 and upper_bridge0 for upper device bridge0. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:08 -04:00
Veaceslav Falico	842d67a7b3	net: expose the master link to sysfs, and remove it from bond Currently, we can have only one master upper neighbour, so it would be useful to create a symlink to it in the sysfs device directory, the way that bonding now does it, for every device. Lower devices from bridge/team/etc will automagically get it, so we could rely on it. Also, remove the same functionality from bonding. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:08 -04:00
Veaceslav Falico	47701a36a3	vlan: unlink the upper neighbour before unregistering On netdev unregister we're removing also all of its sysfs-associated stuff, including the sysfs symlinks that are controlled by netdev neighbour code. Also, it's a subtle race condition - cause we can still access it after unregistering. Move the unlinking right before the unregistering to fix both. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:07 -04:00
Veaceslav Falico	5df27e6cb2	vlan: link the upper neighbour only after registering Otherwise users might access it without being fully registered, as per sysfs - it only inits in register_netdevice(), so is unusable till it is called. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:07 -04:00
Veaceslav Falico	b6ccba4c68	net: add a possibility to get private from netdev_adjacent->list It will be useful to get first/last element. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:06 -04:00
Veaceslav Falico	31088a113c	net: add for_each iterators through neighbour lower link's private Add a possibility to iterate through netdev_adjacent's private, currently only for lower neighbours. Add both RCU and RTNL/other locking variants of iterators, and make the non-rcu variant to be safe from removal. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:04 -04:00
Veaceslav Falico	402dae9614	net: add netdev_adjacent->private and allow to use it Currently, even though we can access any linked device, we can't attach anything to it, which is vital to properly manage them. To fix this, add a new void *private to netdev_adjacent and functions setting/getting it (per link), so that we can save, per example, bonding's slave structures there, per slave device. netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to upper dev and populates the neighbour link only with private. netdev_lower_dev_get_private{,_rcu}() returns the private, if found. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:04 -04:00
Veaceslav Falico	5249dec738	net: add RCU variant to search for netdev_adjacent link Currently we have only the RTNL flavour, however we can traverse it while holding only RCU, so add the RCU search. Add an RCU variant that uses list_head * as an argument, so that it can be universally used afterwards. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:04 -04:00
Veaceslav Falico	2f268f129c	net: add adj_list to save only neighbours Currently, we distinguish neighbours (first-level linked devices) from non-neighbours by the neighbour bool in the netdev_adjacent. This could be quite time-consuming in case we would like to traverse only through neighbours - cause we'd have to traverse through all devices and check for this flag, and in a (quite common) scenario where we have lots of vlans on top of bridge, which is on top of a bond - the bonding would have to go through all those vlans to get its upper neighbour linked devices. This situation is really unpleasant, cause there are already a lot of cases when a device with slaves needs to go through them in hot path. To fix this, introduce a new upper/lower device lists structure - adj_list, which contains only the neighbours. It works always in pair with the all_adj_list structure (renamed from upper/lower_dev_list), i.e. both of them contain the same links, only that all_adj_list contains also non-neighbour device links. It's really a small change visible, currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't change the main linked logic at all. Also, add some comments a fix a name collision in netdev_for_each_upper_dev_rcu() and rework the naming by the following rules: netdev_(all_)(upper\|lower)_* If "all_" is present, then we work with the whole list of upper/lower devices, otherwise - only with direct neighbours. Uninline functions - to get better stack traces. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:04 -04:00
Veaceslav Falico	7863c054d1	net: use lists as arguments instead of bool upper Currently we make use of bool upper when we want to specify if we want to work with upper/lower list. It's, however, harder to read, debug and occupies a lot more code. Fix this by just passing the correct upper/lower_dev_list list_head pointer instead of bool upper, and work internally with it. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 16:02:03 -04:00
John W. Linville	7c6a4acc64	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2013-09-26 13:47:05 -04:00
Hannes Frederic Sowa	4ed377e36e	net: neighbour: use source address of last enqueued packet for solicitation Currently we always use the first member of the arp_queue to determine the sender ip address of the arp packet (or in case of IPv6 - source address of the ndisc packet). This skb is fixed as long as the queue is not drained by a complete purge because of a timeout or by a successful response. If the first packet enqueued on the arp_queue is from a local application with a manually set source address and the to be discovered system does some kind of uRPF checks on the source address in the arp packet the resolving process hangs until a timeout and restarts. This hurts communication with the participating network node. This could be mitigated a bit if we use the latest enqueued skb's source address for the resolving process, which is not as static as the arp_queue's head. This change of the source address could result in better recovery of a failed solicitation. Cc: "David S. Miller" <davem@davemloft.net> Cc: Julian Anastasov <ja@ssi.bg> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-26 13:46:10 -04:00
Johan Hedberg	4375f1037d	Bluetooth: Add new mgmt_set_advertising command This patch adds a new mgmt command for enabling and disabling LE advertising. The command depends on the LE setting being enabled first and will return a "rejected" response otherwise. The patch also adds safeguards so that there will ever only be one set_le or set_advertising command pending per adapter. The response handling and new_settings event sending is done in an asynchronous request callback, meaning raw HCI access from user space to enable advertising (e.g. hciconfig leadv) will not trigger the new_settings event. This is intentional since trying to support mixed raw HCI and mgmt access would mean adding extra state tracking or new helper functions, essentially negating the benefit of using the asynchronous request framework. The HCI_LE_ENABLED and HCI_LE_PERIPHERAL flags however are updated correctly even with raw HCI access so this will not completely break subsequent access over mgmt. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	eeca6f8913	Bluetooth: Add new mgmt setting for LE advertising This patch adds a new mgmt setting for LE advertising and hooks up the necessary places in the mgmt code to operate on the HCI_LE_PERIPHERAL flag (which corresponds to this setting). This patch does not yet add any new command for enabling the setting - that is left for a subsequent patch. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	416a4ae56b	Bluetooth: Use async request for LE enable/disable This patch updates the code to use an asynchronous request for handling the enabling and disabling of LE support. This refactoring is necessary as a preparation for adding advertising support, since when LE is disabled we should also disable advertising, and the cleanest way to do this is to perform the two respective HCI commands in the same asynchronous request. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:11 -03:00
Johan Hedberg	bd99abdd5b	Bluetooth: Move mgmt response convenience functions to a better location The settings_rsp and cmd_status_rsp functions can be useful for all mgmt command handlers when asynchronous request callbacks are used. They will e.g. be used by subsequent patches to change set_le to use an async request as well as a new set_advertising command. Therefore, move them higher up in the mgmt.c file to avoid unnecessary forward declarations or mixing this trivial change with other patches. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	87b95ba64e	Bluetooth: Fix busy return for mgmt_set_powered in some cases We should return a "busy" error always when there is another mgmt_set_powered operation in progress. Previously when powering on while the auto off timer was still set the code could have let two or more pending power on commands to be queued. This patch fixes the issue by moving the check for duplicate commands to an earlier point in the set_powered handler. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	970871bc9c	Bluetooth: Clean up socket locking in l2cap_sock_recvmsg This patch cleans up the locking login in l2cap_sock_recvmsg by pairing up each lock_sock call with a release_sock call. The function already has a "done" label that handles releasing the socket and returning from the function so the fix is rather simple. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Johan Hedberg	0fba96f97b	Bluetooth: Add clarifying comment to bt_sock_wait_state() The bt_sock_wait_state requires the sk lock to be held (through lock_sock) so document it clearly in the code. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-25 14:30:10 -03:00
Hannes Frederic Sowa	2811ebac25	ipv6: udp packets following an UFO enqueued packet need also be handled by UFO In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 11:43:05 -04:00
Cong Wang	8ce4406103	ipv6: do not allow ipv6 module to be removed There was some bug report on ipv6 module removal path before. Also, as Stephen pointed out, after vxlan module gets ipv6 support, the ipv6 stub it used is not safe against this module removal either. So, let's just remove inet6_exit() so that ipv6 module will not be able to be unloaded. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 11:31:58 -04:00
Eric Dumazet	b0983d3c9b	tcp: fix dynamic right sizing Dynamic Right Sizing (DRS) is supposed to open TCP receive window automatically, but suffers from two bugs, presented by order of importance. 1) tcp_rcv_space_adjust() fix : Using twice the last received amount is very pessimistic, because it doesn't allow fast recovery or proper slow start ramp up, if sender wants to increase cwin by 100% every RTT. copied = bytes received in previous RTT 2copied = bytes we expect to receive in next RTT 4copied = bytes we need to advertise in rwin at end of next RTT DRS is one RTT late, it needs a 4x factor. If sender is not using ABC, and increases cwin by 50% every rtt, then we needed 1.5*1.5 = 2.25 factor. This is probably why this bug was not really noticed. 2) There is no window adjustment after first RTT. DRS triggers only after the second RTT. DRS needs two RTT to initialize, so tcp_fixup_rcvbuf() should setup sk_rcvbuf to allow proper window grow for first two RTT. This patch increases TCP efficiency particularly for large RTT flows when autotuning is used at the receiver, and more particularly in presence of packet losses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 11:07:32 -04:00
Florian Westphal	086293542b	tcp: syncookies: reduce mss table to four values Halve mss table size to make blind cookie guessing more difficult. This is sad since the tables were already small, but there is little alternative except perhaps adding more precise mss information in the tcp timestamp. Timestamps are unfortunately not ubiquitous. Guessing all possible cookie values still has 8-in 2**32 chance. Reported-by: Jakob Lell <jakob@jakoblell.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 10:39:58 -04:00
Florian Westphal	8c27bd75f0	tcp: syncookies: reduce cookie lifetime to 128 seconds We currently accept cookies that were created less than 4 minutes ago (ie, cookies with counter delta 0-3). Combined with the 8 mss table values, this yields 32 possible values (out of 2**32) that will be valid. Reducing the lifetime to < 2 minutes halves the guessing chance while still providing a large enough period. While at it, get rid of jiffies value -- they overflow too quickly on 32 bit platforms. getnstimeofday is used to create a counter that increments every 64s. perf shows getnstimeofday cost is negible compared to sha_transform; normal tcp initial sequence number generation uses getnstimeofday, too. Reported-by: Jakob Lell <jakob@jakoblell.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 10:39:58 -04:00
Duan Jiong	8d65b1190d	net: raw: do not report ICMP redirects to user space Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 10:15:49 -04:00
Duan Jiong	1a462d1892	net: udp: do not report ICMP redirects to user space Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-24 10:15:49 -04:00
Noel Burton-Krahn	9fe34f5d92	mrp: add periodictimer to allow retries when packets get lost MRP doesn't implement the periodictimer in 802.1Q, so it never retries if packets get lost. I ran into this problem when MRP sent a MVRP JoinIn before the interface was fully up. The JoinIn was lost, MRP didn't retry, and MVRP registration failed. Tested against Juniper QFabric switches Signed-off-by: Noel Burton-Krahn <noel@burton-krahn.com> Acked-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-23 16:53:52 -04:00
josselin.costanzi@mobile-devices.fr	a224bd36bf	net/lapb: re-send packets on timeout Actually re-send packets when the T1 timer runs out. This fixes a bug where packets are waiting on the write queue until disconnection when no other traffic is outstanding. Signed-off-by: Josselin Costanzi <josselin.costanzi@mobile-devices.fr> Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-23 16:52:45 -04:00
Peter Senna Tschudin	941247f910	Bluetooth: Fix assignment of 0/1 to bool variables Convert 0 to false and 1 to true when assigning values to bool variables. Inspired by commit `3db1cd5c05`. The simplified semantic patch that find this problem is as follows (http://coccinelle.lip6.fr/): @@ bool b; @@ ( -b = 0 +b = false \| -b = 1 +b = true ) Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-22 16:59:14 -05:00
Linus Torvalds	c43a3855f4	NFS client bugfix for 3.12 - Fix a regression due to incorrect sharing of gss auth caches -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSPev8AAoJEGcL54qWCgDy+2UP/3ZyJbL8qIpbgdOdlCIFYHZ7 +8Z87XH31i+9hzKejSnmTURdRgWl7f0ehEOMG4soPF/4vpc1Ji03Xo0Iunq6AR3R 0JiKSPHJR+j1IiTdQR+HL127+ymUEECsWKubm0ZYOgphxhGy2e95sbU3C7w3wRry kIul7+oVdmp6bXUbKpGqRx3SiT5H0YunGO0dBD7SWHJP4cQIPVNKd1ErRF9EAMqc MhOTdy04hoYbL4AdZH95MGW8/l6t+djO8DRwI89Gfw1g2ybqZjbU72Ur+FotU09H dQmyFuoiyRazO+VEInfvngYdtZ3w3ZfBqxQdq7rEhbrDnH0tLi+e49VjUFeNavr+ c+xeC169hzggplARaeCtMQkvulV5ucI6pQJyVZiOqiIpiXJmwAlhZzkuYmqdfISx uZy43dgD64APoOeDcGmvqCnPfhl2gtGfEO36hGZVru4sZ5YaomgqA9gFUtkNnP3S YUQovg/g8zCZ4F35AHFvYaRmbBUbqjhzEIamX4gAotd71+zFevrX6R00oFWluhJD 8WUdp4GzfSlIE9Ns6Ef9jIEba8M/1l9Vwzaa3MpbdtaDRSBVNhuz7Tm+3K/KeE8e QL8zqyQbgVihPQyBpMl1ukxnTwrTvEkVDPQtsP+5J9ArasYjU5kTgjVAzmeTgOdR 3YxnAh7LMTKAoyItXta9 =44+y -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfix from Trond Myklebust: "Fix a regression due to incorrect sharing of gss auth caches" * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: RPCSEC_GSS: fix crash on destroying gss auth	2013-09-21 15:59:41 -07:00
Gianluca Anzolin	29cd718beb	Bluetooth: don't release the port in rfcomm_dev_state_change() When the dlc is closed, rfcomm_dev_state_change() tries to release the port in the case it cannot get a reference to the tty. However this is racy and not even needed. Infact as Peter Hurley points out: 1. Only consider dlcs that are 'stolen' from a connected socket, ie. reused. Allocated dlcs cannot have been closed prior to port activate and so for these dlcs a tty reference will always be avail in rfcomm_dev_state_change() -- except for the conditions covered by #2b below. 2. If a tty was at some point previously created for this rfcomm, then either (a) the tty reference is still avail, so rfcomm_dev_state_change() will perform a hangup. So nothing to do, or, (b) the tty reference is no longer avail, and the tty_port will be destroyed by the last tty_port_put() in rfcomm_tty_cleanup. Again, no action required. 3. Prior to obtaining the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to do here. 4. After releasing the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a tty reference could not be obtained. Again, the best thing to do here is nothing. Any future attempted open() will block on rfcomm_dev_carrier_raised(). The unconnected device will exist until released by ioctl(RFCOMMRELEASEDEV). The patch removes the aforementioned code and uses the tty_port_tty_hangup() helper to hangup the tty. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2013-09-20 14:17:54 -05:00
Eric Dumazet	df62cdf348	net_sched: htb: support of 64bit rates HTB already can deal with 64bit rates, we only have to add two new attributes so that tc can use them to break the current 32bit ABI barrier. TCA_HTB_RATE64 : class rate (in bytes per second) TCA_HTB_CEIL64 : class ceil (in bytes per second) This allows us to setup HTB on 40Gbps links, as 32bit limit is actually ~34Gbps Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-20 14:41:03 -04:00
Eric Dumazet	3e1e3aae1f	net_sched: add u64 rate to psched_ratecfg_precompute() Add an extra u64 rate parameter to psched_ratecfg_precompute() so that some qdisc can opt-in for 64bit rates in the future, to overcome the ~34 Gbits limit. psched_ratecfg_getrate() reports a legacy structure to tc utility, so if actual rate is above the 32bit rate field, cap it to the 34Gbit limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-20 14:41:02 -04:00
Avinash Kumar	118a7b0ede	net: ethernet: eth.c: removed checkpatch warnings and errors removed these checkpatch.pl warnings: net/ethernet/eth.c:61: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> net/ethernet/eth.c:136: WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... net/ethernet/eth.c:181: ERROR: space prohibited before that close parenthesis ')' Signed-off-by: Avinash Kumar <avi.kp.137@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-20 14:41:02 -04:00
Linus Torvalds	b75ff5e84b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) If the local_df boolean is set on an SKB we have to allocate a unique ID even if IP_DF is set in the ipv4 headers, from Ansis Atteka. 2) Some fixups for the new chipset support that went into the sfc driver, from Ben Hutchings. 3) Because SCTP bypasses a good chunk of, and actually duplicates, the logic of the ipv6 output path, some IPSEC things don't get done properly. Integrate SCTP better into the ipv6 output path so that these problems are fixed and such issues don't get missed in the future either. From Daniel Borkmann. 4) Fix skge regressions added by the DMA mapping error return checking added in v3.10, from Mikulas Patocka. 5) Kill some more IRQF_DISABLED references, from Michael Opdenacker. 6) Fix races and deadlocks in the bridging code, from Hong Zhiguo. 7) Fix error handling in tun_set_iff(), in particular don't leak resources. From Jason Wang. 8) Prevent format-string injection into xen-netback driver, from Kees Cook. 9) Fix regression added to netpoll ARP packet handling, in particular check for the right ETH_P_ARP protocol code. From Sonic Zhang. 10) Try to deal with AMD IOMMU errors when using r8169 chips, from Francois Romieu. 11) Cure freezes due to recent changes in the rt2x00 wireless driver, from Stanislaw Gruszka. 12) Don't do SPI transfers (which can sleep) in interrupt context in cw1200 driver, from Solomon Peachy. 13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719. From Nithin Sujir. 14) Make xen_netbk_count_skb_slots() count the actual number of slots that will be used, taking into consideration packing and other issues that the transmit path will run into. From David Vrabel. 15) Use the correct maximum age when calculating the bridge message_age_timer, from Chris Healy. 16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey Khoroshilov. 17) Netfilter conntrack extensions were converted to RCU but are not always freed properly using kfree_rcu(). Fix from Michal Kubecek. 18) VF reset recovery not being done correctly in qlcnic driver, from Manish Chopra. 19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko. 20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang. 21) Internal switch not initialized properly in bgmac driver, from Rafał Miłecki. 22) Netlink messages report wrong local and remote addresses in IPv6 tunneling, from Ding Zhi. 23) ICMP redirects should not generate socket errors in DCCP and SCTP. We're still working out how this should be handled for RAW and UDP sockets. From Daniel Borkmann and Duan Jiong. 24) We've had several bugs wherein the network namespace's loopback device gets accessed after it is free'd, NULL it out so that we can catch these problems more readily. From Eric W Biederman. 25) Fix regression in TCP RTO calculations, from Neal Cardwell. 26) Fix too early free of xen-netback network device when VIFs still exist. From Paul Durrant. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) netconsole: fix a deadlock with rtnl and netconsole's mutex netpoll: fix NULL pointer dereference in netpoll_cleanup skge: fix broken driver ip: generate unique IP identificator if local fragmentation is allowed ip: use ip_hdr() in __ip_make_skb() to retrieve IP header xen-netback: Don't destroy the netdev until the vif is shut down net:dccp: do not report ICMP redirects to user space cnic: Fix crash in cnic_bnx2x_service_kcq() bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions. vxlan: Avoid creating fdb entry with NULL destination tcp: fix RTO calculated from cached RTT drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h> net loopback: Set loopback_dev to NULL when freed batman-adv: set the TAG flag for the vid passed to BLA netfilter: nfnetlink_queue: use network skb for sequence adjustment net: sctp: rfc4443: do not report ICMP redirects to user space net: usb: cdc_ether: use usb.h macros whenever possible net: usb: cdc_ether: fix checkpatch errors and warnings net: usb: cdc_ether: Use wwan interface for Telit modules ip6_tunnels: raddr and laddr are inverted in nl msg ...	2013-09-19 13:57:28 -05:00
Nikolay Aleksandrov	d0fe8c888b	netpoll: fix NULL pointer dereference in netpoll_cleanup I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:15:53 -04:00
Ansis Atteka	703133de33	ip: generate unique IP identificator if local fragmentation is allowed If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:11:15 -04:00
Ansis Atteka	749154aa56	ip: use ip_hdr() in __ip_make_skb() to retrieve IP header skb->data already points to IP header, but for the sake of consistency we can also use ip_hdr() to retrieve it. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-19 14:11:15 -04:00

1 2 3 4 5 ...

29590 Commits