linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-02 06:36:43 +07:00

Author	SHA1	Message	Date
Johannes Berg	963a1852fb	mac80211: don't validate unchanged AP bandwidth while tracking The MLME code in mac80211 must track whether or not the AP changed bandwidth, but if there's no change while tracking it shouldn't do anything, otherwise regulatory updates can make it impossible to connect to certain APs if the regulatory database doesn't match the information from the AP. See the precise scenario described in the code. This still leaves some possible problems with CSA or if the AP actually changed bandwidth, but those cases are less common and won't completely prevent using it. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=70881 Cc: stable@vger.kernel.org Reported-and-tested-by: Nate Carlson <kernel@natecarlson.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-24 10:16:40 +01:00
Amitkumar Karwar	44a589ca2d	NFC: NCI: Fix NULL pointer dereference The check should be for setup function pointer. This patch fixes NULL pointer dereference issue for NCI based NFC driver which doesn't define setup handler. Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2014-02-23 23:14:45 +01:00
Hannes Frederic Sowa	916e4cf46d	ipv6: reuse ip6_frag_id from ip6_ufo_append_data Currently we generate a new fragmentation id on UFO segmentation. It is pretty hairy to identify the correct net namespace and dst there. Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst available at all. This causes unreliable or very predictable ipv6 fragmentation id generation while segmentation. Luckily we already have pregenerated the ip6_frag_id in ip6_ufo_append_data and can use it here. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:28:21 -05:00
Daniel Borkmann	4c47af4d5e	net: sctp: rework multihoming retransmission path selection to rfc4960 Problem statement: 1) both paths (primary path1 and alternate path2) are up after the association has been established i.e., HB packets are normally exchanged, 2) path2 gets inactive after path_max_retrans * max_rto timed out (i.e. path2 is down completely), 3) now, if a transmission times out on the only surviving/active path1 (any ~1sec network service impact could cause this like a channel bonding failover), then the retransmitted packets are sent over the inactive path2; this happens with partial failover and without it. Besides not being optimal in the above scenario, a small failure or timeout in the only existing path has the potential to cause long delays in the retransmission (depending on RTO_MAX) until the still active path is reselected. Further, when the T3-timeout occurs, we have active_patch == retrans_path, and even though the timeout occurred on the initial transmission of data, not a retransmit, we end up updating retransmit path. RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under 6.4.1. "Failover from an Inactive Destination Address" the following: Some of the transport addresses of a multi-homed SCTP endpoint may become inactive due to either the occurrence of certain error conditions (see Section 8.2) or adjustments from the SCTP user. When there is outbound data to send and the primary path becomes inactive (e.g., due to failures), or where the SCTP user explicitly requests to send data to an inactive destination transport address, before reporting an error to its ULP, the SCTP endpoint should try to send the data to an alternate __active__ destination transport address if one exists. When retransmitting data that timed out, if the endpoint is multihomed, it should consider each source-destination address pair in its retransmission selection policy. When retransmitting timed-out data, the endpoint should attempt to pick the most divergent source-destination pair from the original source-destination pair to which the packet was transmitted. Note: Rules for picking the most divergent source-destination pair are an implementation decision and are not specified within this document. So, we should first reconsider to take the current active retransmission transport if we cannot find an alternative active one. If all of that fails, we can still round robin through unkown, partial failover, and inactive ones in the hope to find something still suitable. Commit `4141ddc02a` ("sctp: retran_path update bug fix") broke that behaviour by selecting the next inactive transport when no other active transport was found besides the current assoc's peer.retran_path. Before commit `4141ddc02a`, we would have traversed through the list until we reach our peer.retran_path again, and in case that is still in state SCTP_ACTIVE, we would take it and return. Only if that is not the case either, we take the next inactive transport. Besides all that, another issue is that transports in state SCTP_UNKNOWN could be preferred over transports in state SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after SCTP_UNKNOWN in the transport list yielding a weaker transport state to be used in retransmission. This patch mostly reverts `4141ddc02a`, but also rewrites this function to introduce more clarity and strictness into the code. A strict priority of transport states is enforced in this patch, hence selection is active > unkown > partial failover > inactive. Fixes: `4141ddc02a` ("sctp: retran_path update bug fix") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vlad Yasevich <yasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:26:05 -05:00
Jiri Pirko	b194c1f1db	neigh: fix setting of default gc_* values This patch fixes bug introduced by: commit `1d4c8c2984` "neigh: restore old behaviour of default parms values" The thing is that in neigh_sysctl_register, extra1 and extra2 which were previously set for NEIGH_VAR_GC_* are overwritten. That leads to nonsense int limits for gc_* variables. So fix this by not touching extra* fields for gc_* variables. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:08:10 -05:00
Eric Dumazet	f5ddcbbb40	net-tcp: fastopen: fix high order allocations This patch fixes two bugs in fastopen : 1) The tcp_sendmsg(..., @size) argument was ignored. Code was relying on user not fooling the kernel with iovec mismatches 2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5 allocations, which are likely to fail when memory gets fragmented. Fixes: `783237e8da` ("net-tcp: Fast Open client - sending SYN-data") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Tested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:05:21 -05:00
Ying Xue	970122fdf4	tipc: make bearer set up in module insertion stage Accidentally a side effect is involved by commit 6e967adf7(tipc: relocate common functions from media to bearer). Now tipc stack handler of receiving packets from netdevices as well as netdevice notification handler are registered when bearer is enabled rather than tipc module initialization stage, but the two handlers are both unregistered in tipc module exit phase. If tipc module is inserted and then immediately removed, the following warning message will appear: "dev_remove_pack: ffffffffa0380940 not found" This is because in module insertion stage tipc stack packet handler is not registered at all, but in module exit phase dev_remove_pack() needs to remove it. Of course, dev_remove_pack() cannot find tipc protocol handler from the kernel protocol handler list so that the warning message is printed out. But if registering the two handlers is adjusted from enabling bearer phase into inserting module stage, the warning message will be eliminated. Due to this change, tipc_core_start_net() and tipc_core_stop_net() can be deleted as well. Reported-by: Wang Weidong <wangweidong1@huawei.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Ying Xue	9fe7ed4749	tipc: remove all enabled flags from all tipc components When tipc module is inserted, many tipc components are initialized one by one. During the initialization period, if one of them is failed, tipc_core_stop() will be called to stop all components whatever corresponding components are created or not. To avoid to release uncreated ones, relevant components have to add necessary enabled flags indicating whether they are created or not. But in the initialization stage, if one component is unsuccessfully created, we will just destroy successfully created components before the failed component instead of all components. All enabled flags defined in components, in turn, become redundant. Additionally it's also unnecessary to identify whether table.types is NULL in tipc_nametbl_stop() because name stable has been definitely created successfully when tipc_nametbl_stop() is called. Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Matija Glavinic Pecotic	7cce3b7568	net: sctp: Potentially-Failed state should not be reached from unconfirmed state In current implementation it is possible to reach PF state from unconfirmed. We can interpret sctp-failover-02 in a way that PF state is meant to be reached only from active state, in the end, this is when entering PF state makes sense. Here are few quotes from sctp-failover-02, but regardless of these, same understanding can be reached from whole section 5: Section 5.1, quickfailover guide: "The PF state is an intermediate state between Active and Failed states." "Each time the T3-rtx timer expires on an active or idle destination, the error counter of that destination address will be incremented. When the value in the error counter exceeds PFMR, the endpoint should mark the destination transport address as PF." There are several concrete reasons for such interpretation. For start, rfc4960 does not take into concern quickfailover algorithm. Therefore, quickfailover must comply to 4960. Point where this compliance can be argued is following behavior: When PF is entered, association overall error counter is incremented for each missed HB. This is contradictory to rfc4960, as address, while in unconfirmed state, is subjected to probing, and while it is probed, it should not increment association overall error counter. This has as a consequence that we might end up in situation in which we drop association due path failure on unconfirmed address, in case we have wrong configuration in a way: Association.Max.Retrans == Path.Max.Retrans. Another reason is that entering PF from unconfirmed will cause a loss of address confirmed event when address is once (if) confirmed. This is fine from failover guide point of view, but it is not consistent with behavior preceding failover implementation and recommendation from 4960: 5.4. Path Verification Whenever a path is confirmed, an indication MAY be given to the upper layer. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-20 13:24:56 -05:00
Nicolas Dichtel	cf71d2bc0b	sit: fix panic with route cache in ip tunnels Bug introduced by commit `7d442fab0a` ("ipv4: Cache dst in tunnels"). Because sit code does not call ip_tunnel_init(), the dst_cache was not initialized. CC: Tom Herbert <therbert@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-20 13:13:50 -05:00
Steffen Klassert	ee5c23176f	xfrm: Clone states properly on migration We loose a lot of information of the original state if we clone it with xfrm_state_clone(). In particular, there is no crypto algorithm attached if the original state uses an aead algorithm. This patch add the missing information to the clone state. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:30:10 +01:00
Steffen Klassert	8c0cba22e1	xfrm: Take xfrm_state_lock in xfrm_migrate_state_find A comment on xfrm_migrate_state_find() says that xfrm_state_lock is held. This is apparently not the case, but we need it to traverse through the state lists. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:30:04 +01:00
Steffen Klassert	35ea790d78	xfrm: Fix NULL pointer dereference on sub policy usage xfrm_state_sort() takes the unsorted states from the src array and stores them into the dst array. We try to get the namespace from the dst array which is empty at this time, so take the namespace from the src array instead. Fixes: `283bc9f35b` ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:29:58 +01:00
Steffen Klassert	876fc03aaa	ip6_vti: Fix build when NET_IP_TUNNEL is not set. Since commit `469bdcefdc` ip6_vti uses ip_tunnel_get_stats64(), so we need to select NET_IP_TUNNEL to have this function available. Fixes: `469bdcefdc` ("ipv6: fix the use of pcpu_tstats in ip6_vti.c") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2014-02-20 14:29:49 +01:00
Johannes Berg	e3685e03b4	mac80211: fix station wakeup powersave race Consider the following (relatively unlikely) scenario: 1) station goes to sleep while frames are buffered in driver 2) driver blocks wakeup (until no more frames are buffered) 3) station wakes up again 4) driver unblocks wakeup In this case, the current mac80211 code will do the following: 1) WLAN_STA_PS_STA set 2) WLAN_STA_PS_DRIVER set 3) - nothing - 4) WLAN_STA_PS_DRIVER cleared As a result, no frames will be delivered to the client, even though it is awake, until it sends another frame to us that triggers ieee80211_sta_ps_deliver_wakeup() in sta_ps_end(). Since we now take the PS spinlock, we can fix this while at the same time removing the complexity with the pending skb queue function. This was broken since my commit `50a9432dae` ("mac80211: fix powersaving clients races") due to removing the clearing of WLAN_STA_PS_STA in the RX path. While at it, fix a cleanup path issue when a station is removed while the driver is still blocking its wakeup. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-20 11:54:09 +01:00
Johannes Berg	5108ca8280	mac80211: insert stations before adding to driver There's a race condition in mac80211 because we add stations to the internal lists after adding them to the driver, which means that (for example) the following can happen: 1. a station connects and is added 2. first, it is added to the driver 3. then, it is added to the mac80211 lists If the station goes to sleep between steps 2 and 3, and the firmware/hardware records it as being asleep, mac80211 will never instruct the driver to wake it up again as it never realized it went to sleep since the RX path discarded the frame as a "spurious class 3 frame", no station entry was present yet. Fix this by adding the station in software first, and only then adding it to the driver. That way, any state that the driver changes will be reflected properly in mac80211's station state. The problematic part is the roll-back if the driver fails to add the station, in that case a bit more is needed. To not make that overly complex prevent starting BA sessions in the meantime. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-20 10:34:33 +01:00
Emmanuel Grumbach	1d147bfa64	mac80211: fix AP powersave TX vs. wakeup race There is a race between the TX path and the STA wakeup: while a station is sleeping, mac80211 buffers frames until it wakes up, then the frames are transmitted. However, the RX and TX path are concurrent, so the packet indicating wakeup can be processed while a packet is being transmitted. This can lead to a situation where the buffered frames list is emptied on the one side, while a frame is being added on the other side, as the station is still seen as sleeping in the TX path. As a result, the newly added frame will not be send anytime soon. It might be sent much later (and out of order) when the station goes to sleep and wakes up the next time. Additionally, it can lead to the crash below. Fix all this by synchronising both paths with a new lock. Both path are not fastpath since they handle PS situations. In a later patch we'll remove the extra skb queue locks to reduce locking overhead. BUG: unable to handle kernel NULL pointer dereference at 000000b0 IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211] *pde = 00000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1 EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211] EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000 ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000) iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9 Stack: e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0 ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210 ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002 Call Trace: [<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211] [<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211] [<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211] [<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211] [<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211] [<c149ef70>] dev_hard_start_xmit+0x450/0x950 [<c14b9aa9>] sch_direct_xmit+0xa9/0x250 [<c14b9c9b>] __qdisc_run+0x4b/0x150 [<c149f732>] dev_queue_xmit+0x2c2/0xca0 Cc: stable@vger.kernel.org Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> [reword commit log, use a separate lock] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-20 10:32:29 +01:00
David S. Miller	ebe44f350e	ip_tunnel: Move ip_tunnel_get_stats64 into ip_tunnel_core.c net/built-in.o:(.rodata+0x1707c): undefined reference to `ip_tunnel_get_stats64' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-20 02:14:23 -05:00
Linus Torvalds	e95003c3f9	NFS client bugfixes for Linux 3.14 Highlights include stable fixes for the following bugs: - General performance regression due to NFS_INO_INVALID_LABEL being set when the server doesn't support labeled NFS - Hang in the RPC code due to a socket out-of-buffer race - Infinite loop when trying to establish the NFSv4 lease - Use-after-free bug in the RPCSEC gss code. - nfs4_select_rw_stateid is returning with a non-zero error value on success Other bug fixes: - Potential memory scribble in the RPC bi-directional RPC code - Pipe version reference leak - Use the correct net namespace in the new NFSv4 migration code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTBMBlAAoJEGcL54qWCgDyOakP+gKDh0VhKw8GziJFfY+6kHHI wej86M/coNnRPUv8n7s3N5TXMoV36qisYpxbIG/WQbOn6MfR3qjto6WoP7+vsrEq iNomtLivgEsWJePydTuIyAR/TK0du/zqP4zoPEDgdLDenucEVvkCzGIkqzg8Mddc duknEhIq918BkXIe3hRBWuxl+pRjwZur+TY0h/OR11oodqTYHxrE37f5PvREWOmB 08hhjOFBFYlyEnCjD3I1SFmcXQxkKzvACavvbhTyF6u/37oL/QC1/DZKL5mSdOJ6 novO8sv9gIpn/RhsEMOdaeYMYM5QTvkYIJQyLpKAYyaLZ42EMbRkczwNE7C0ZWyi F9MizDMNTie+DvsSHZPYwABTDOOQOuWPa9PO3Lyo8UtWxfmTOjYr2Rre1wyG0/+0 ywb3JiKQCtVDPnmSHhqxFfVp9XvS7D2/vz0udgKjrCLKyDC2OMMGLVa/acnrtZmz s94QpPiqhjnRqIuKo251HtbK3AVaLBNQxBCPszieYwPm9aZ04P7mjsGg1WuhhB2v +eSa9UkicGJwKWJWtIBr54qIAOlEXu2bXY+vio5UfbDb+5qfBHe0TmNrz5QNJ53A x0eUBocth9VpW1cv/Rf+o30tJyZGy6Jtv8kn2hfXkJXNL3N1Gn25rD3tpb+5TtdY b7gtHy13/oe8yi0tK91f =tll2 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include stable fixes for the following bugs: - General performance regression due to NFS_INO_INVALID_LABEL being set when the server doesn't support labeled NFS - Hang in the RPC code due to a socket out-of-buffer race - Infinite loop when trying to establish the NFSv4 lease - Use-after-free bug in the RPCSEC gss code. - nfs4_select_rw_stateid is returning with a non-zero error value on success Other bug fixes: - Potential memory scribble in the RPC bi-directional RPC code - Pipe version reference leak - Use the correct net namespace in the new NFSv4 migration code" * tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS fix error return in nfs4_select_rw_stateid NFSv4: Use the correct net namespace in nfs4_update_server SUNRPC: Fix a pipe_version reference leak SUNRPC: Ensure that gss_auth isn't freed before its upcall messages SUNRPC: Fix potential memory scribble in xprt_free_bc_request() SUNRPC: Fix races in xs_nospace() SUNRPC: Don't create a gss auth cache unless rpc.gssd is running NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS	2014-02-19 12:13:02 -08:00
David S. Miller	2e99c07fbe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: * Fix nf_trace in nftables if XT_TRACE=n, from Florian Westphal. * Don't use the fast payload operation in nf_tables if the length is not power of 2 or it is not aligned, from Nikolay Aleksandrov. * Fix missing break statement the inet flavour of nft_reject, which results in evaluating IPv4 packets with the IPv6 evaluation routine, from Patrick McHardy. * Fix wrong kconfig symbol in nft_meta to match the routing realm, from Paul Bolle. * Allocate the NAT null binding when creating new conntracks via ctnetlink to avoid that several packets race at initializing the the conntrack NAT extension, original patch from Florian Westphal, revisited version from me. * Fix DNAT handling in the snmp NAT helper, the same handling was being done for SNAT and DNAT and 2.4 already contains that fix, from Francois-Xavier Le Bail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-19 13:12:53 -05:00
Inbal Hacohen	50c11eb998	cfg80211: bugfix in regulatory user hint process After processing hint_user, we would want to schedule the timeout work only if we are actually waiting to CRDA. This happens when the status is not "IGNORE" nor "ALREADY_SET". Signed-off-by: Inbal Hacohen <Inbal.Hacohen@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-19 11:56:48 +01:00
Linus Torvalds	525b870974	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID update from Jiri Kosina: - fixes for several bugs in incorrect allocations of buffers by David Herrmann and Benjamin Tissoires. - support for a few new device IDs by Archana Patni, Benjamin Tissoires, Huei-Horng Yo, Reyad Attiyat and Yufeng Shen * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hyperv: make sure input buffer is big enough HID: Bluetooth: hidp: make sure input buffers are big enough HID: hid-sensor-hub: quirk for STM Sensor hub HID: apple: add Apple wireless keyboard 2011 JIS model support HID: fix buffer allocations HID: multitouch: add FocalTech FTxxxx support HID: microsoft: Add ID's for Surface Type/Touch Cover 2 HID: usbhid: quirk for CY-TM75 75 inch Touch Overlay	2014-02-18 16:29:46 -08:00
Dan Carpenter	d7cf0c34af	af_packet: remove a stray tab in packet_set_ring() At first glance it looks like there is a missing curly brace but actually the code works the same either way. I have adjusted the indenting but left the code the same. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 18:02:25 -05:00
Daniel Borkmann	ffd5939381	net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of 'struct sctp_getaddrs_old' which includes a struct sockaddr pointer, sizeof(param) check will always fail in kernel as the structure in 64bit kernel space is 4bytes larger than for user binaries compiled in 32bit mode. Thus, applications making use of sctp_connectx() won't be able to run under such circumstances. Introduce a compat interface in the kernel to deal with such situations by using a 'struct compat_sctp_getaddrs_old' structure where user data is copied into it, and then sucessively transformed into a 'struct sctp_getaddrs_old' structure with the help of compat_ptr(). That fixes sctp_connectx() abi without any changes needed in user space, and lets the SCTP test suite pass when compiled in 32bit and run on 64bit kernels. Fixes: `f9c67811eb` ("sctp: Fix regression introduced by new sctp_connectx api") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 16:06:48 -05:00
David S. Miller	7ffb0d317d	Included changes: - fix soft-interface MTU computation - fix bogus pointer mangling when parsing the TT-TVLV container. This bug led to a wrong memory access. - fix memory leak by properly releasing the VLAN object after CRC check - properly check pskb_may_pull() return value - avoid potential race condition while adding new neighbour - fix potential memory leak by removing all the references to the orig_node object in case of initialization failure - fix the TT CRC computation by ensuring that every node uses the same byte order when hosts with different endianess are part of the same network - fix severe memory leak by freeing skb after a successful TVLV parsing - avoid potential double free when orig_node initialization fails - fix potential kernel paging error caused by the usage of the old value of skb->data after skb reallocation -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTAj4LAAoJEEKTMo6mOh1VXoQP/2WVjuIrB7rd4mpq5MSXjkWm qCRRmuU9MVSbwBPvKcNAT4sDb9KliodqMu7jtUNKJ118afTK5VIh1EmbFGIm2vA+ QowpfvSOFaDVrd6pB1bKlPlX5Xi9OF+hj82LalfMRWvsdvQUN00fkCMjyrxPivhR zq7ucyff1YTft/mSmD+X0gqNK1L99om2xNcWzPjl+CZ0LOBFe411/sWf8Ujldgl0 F6jTPXckNBToukmYO8wwmtG8PFrIWNBRUEfpY/P+VNp+Cg7GF9KOts4mdym9PviI //PkonRNylfeTvBlztmCdTQB9vHhlT3e/9KTd/lXBQ669Mz/eQ6H1MascDZ8e0Ib 1IeqL6cyOaEDIOh8Bgr2WcRTH/JCx0F0cy+PISJx0DEVYKLWZedm8ECIU9eXWMr6 hnTcBue51IoVbDE5SJ0apoDmQOZZF2euaYBPXtRrziZBzcHubt69rQKOqQ/A5atR m5kuA7E14NR7F/FOTdKsfLyAVqx9j5mw7NQYAhlbXex0Lp+qQQ9YMtHBv4pgzYA3 UYE9pnuMkr3EXOQ9wAt/ldq+hWBkXDFkg5nd3bzY8aKw5QLBPHZdrTgFtOmVO1RP Fa7fJSwt2ImCa50w59u4f22U870QK7AYK7xvHeLHbvIzthTDgA71OKRePcRu9EU3 yN6J5h/+A4X7fGgz0Z/X =6IO8 -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - fix soft-interface MTU computation - fix bogus pointer mangling when parsing the TT-TVLV container. This bug led to a wrong memory access. - fix memory leak by properly releasing the VLAN object after CRC check - properly check pskb_may_pull() return value - avoid potential race condition while adding new neighbour - fix potential memory leak by removing all the references to the orig_node object in case of initialization failure - fix the TT CRC computation by ensuring that every node uses the same byte order when hosts with different endianess are part of the same network - fix severe memory leak by freeing skb after a successful TVLV parsing - avoid potential double free when orig_node initialization fails - fix potential kernel paging error caused by the usage of the old value of skb->data after skb reallocation Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 15:40:50 -05:00
Pablo Neira Ayuso	0eba801b64	netfilter: ctnetlink: force null nat binding on insert Quoting Andrey Vagin: When a conntrack is created by kernel, it is initialized (sets IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it is added in hashes (__nf_conntrack_hash_insert), so one conntract can't be initialized from a few threads concurrently. ctnetlink can add an uninitialized conntrack (w/o IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up this conntrack and start initialize it concurrently. It's dangerous, because BUG can be triggered from nf_nat_setup_info. Fix this race by always setting up nat, even if no CTA_NAT_ attribute was requested before inserting the ct into the hash table. In absence of CTA_NAT_ attribute, a null binding is created. This alters current behaviour: Before this patch, the first packet matching the newly injected conntrack would be run through the nat table since nf_nat_initialized() returns false. IOW, this forces ctnetlink users to specify the desired nat transformation on ct creation time. Thanks for Florian Westphal, this patch is based on his original patch to address this problem, including this patch description. Reported-By: Andrey Vagin <avagin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2014-02-18 00:13:51 +01:00
Duan Jiong	a6254864c0	ipv4: fix counter in_slow_tot since commit 89aef8921bf("ipv4: Delete routing cache."), the counter in_slow_tot can't work correctly. The counter in_slow_tot increase by one when fib_lookup() return successfully in ip_route_input_slow(), but actually the dst struct maybe not be created and cached, so we can increase in_slow_tot after the dst struct is created. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 16:54:42 -05:00
David Herrmann	a4b1b5877b	HID: Bluetooth: hidp: make sure input buffers are big enough HID core expects the input buffers to be at least of size 4096 (HID_MAX_BUFFER_SIZE). Other sizes will result in buffer-overflows if an input-report is smaller than advertised. We could, like i2c, compute the biggest report-size instead of using HID_MAX_BUFFER_SIZE, but this will blow up if report-descriptors are changed after ->start() has been called. So lets be safe and just use the biggest buffer we have. Note that this adds an additional copy to the HIDP input path. If there is a way to make sure the skb-buf is big enough, we should use that instead. The best way would be to make hid-core honor the @size argument, though, that sounds easier than it is. So lets just fix the buffer-overflows for now and afterwards look for a faster way for all transport drivers. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-02-17 21:17:55 +01:00
Nicolas Dichtel	08b44656c0	gre: add link local route when local addr is any This bug was reported by Steinar H. Gunderson and was introduced by commit `f7cb888633` ("sit/gre6: don't try to add the same route two times"). root@morgental:~# ip tunnel add foo mode gre remote 1.2.3.4 ttl 64 root@morgental:~# ip link set foo up mtu 1468 root@morgental:~# ip -6 route show dev foo fe80::/64 proto kernel metric 256 but after the above commit, no such route shows up. There is no link local route because dev->dev_addr is 0 (because local ipv4 address is 0), hence no link local address is configured. In this scenario, the link local address is added manually: 'ip -6 addr add fe80::1 dev foo' and because prefix is /128, no link local route is added by the kernel. Even if the right things to do is to add the link local address with a /64 prefix, we need to restore the previous behavior to avoid breaking userpace. Reported-by: Steinar H. Gunderson <sesse@samfundet.no> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 14:08:26 -05:00
Antonio Quartulli	70b271a78b	batman-adv: fix potential kernel paging error for unicast transmissions batadv_send_skb_prepare_unicast(_4addr) might reallocate the skb's data. If it does then our ethhdr pointer is not valid anymore in batadv_send_skb_unicast(), resulting in a kernel paging error. Fixing this by refetching the ethhdr pointer after the potential reallocation. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-02-17 17:17:02 +01:00
Antonio Quartulli	a5a5cb8cab	batman-adv: avoid double free when orig_node initialization fails In the failure path of the orig_node initialization routine the orig_node->bat_iv.bcast_own field is free'd twice: first in batadv_iv_ogm_orig_get() and then later in batadv_orig_node_free_rcu(). Fix it by removing the kfree in batadv_iv_ogm_orig_get(). Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:02 +01:00
Antonio Quartulli	05c3c8a636	batman-adv: free skb on TVLV parsing success When the TVLV parsing routine succeed the skb is left untouched thus leading to a memory leak. Fix this by consuming the skb in case of success. Introduced by `ef26157747` ("batman-adv: tvlv - basic infrastructure") Reported-by: Russel Senior <russell@personaltelco.net> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Tested-by: Russell Senior <russell@personaltelco.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:02 +01:00
Antonio Quartulli	a30e22ca84	batman-adv: fix TT CRC computation by ensuring byte order When computing the CRC on a 2byte variable the order of the bytes obviously alters the final result. This means that computing the CRC over the same value on two archs having different endianess leads to different numbers. The global and local translation table CRC computation routine makes this mistake while processing the clients VIDs. The result is a continuous CRC mismatching between nodes having different endianess. Fix this by converting the VID to Network Order before processing it. This guarantees that every node uses the same byte order. Introduced by `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") Reported-by: Russel Senior <russell@personaltelco.net> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Tested-by: Russell Senior <russell@personaltelco.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:02 +01:00
Simon Wunderlich	b2262df7fc	batman-adv: fix potential orig_node reference leak Since batadv_orig_node_new() sets the refcount to two, assuming that the calling function will use a reference for putting the orig_node into a hash or similar, both references must be freed if initialization of the orig_node fails. Otherwise that object may be leaked in that error case. Reported-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2014-02-17 17:17:01 +01:00
Antonio Quartulli	08bf0ed29c	batman-adv: avoid potential race condition when adding a new neighbour When adding a new neighbour it is important to atomically perform the following: - check if the neighbour already exists - append the neighbour to the proper list If the two operations are not performed in an atomic context it is possible that two concurrent insertions add the same neighbour twice. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:01 +01:00
Antonio Quartulli	f1791425cf	batman-adv: properly check pskb_may_pull return value pskb_may_pull() returns 1 on success and 0 in case of failure, therefore checking for the return value being negative does not make sense at all. This way if the function fails we will probably read beyond the current skb data buffer. Fix this by doing the proper check. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:01 +01:00
Antonio Quartulli	91c2b1a9f6	batman-adv: release vlan object after checking the CRC There is a refcounter unbalance in the CRC checking routine invoked on OGM reception. A vlan object is retrieved (thus its refcounter is increased by one) but it is never properly released. This leads to a memleak because the vlan object will never be free'd. Fix this by releasing the vlan object after having read the CRC. Reported-by: Russell Senior <russell@personaltelco.net> Reported-by: Daniel <daniel@makrotopia.org> Reported-by: cmsv <cmsv@wirelesspt.net> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:00 +01:00
Antonio Quartulli	e889241f45	batman-adv: fix TT-TVLV parsing on OGM reception When accessing a TT-TVLV container in the OGM RX path the variable pointing to the list of changes to apply is altered by mistake. This makes the TT component read data at the wrong position in the OGM packet buffer. Fix it by removing the bogus pointer alteration. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:00 +01:00
Antonio Quartulli	930cd6e46e	batman-adv: fix soft-interface MTU computation The current MTU computation always returns a value smaller than 1500bytes even if the real interfaces have an MTU large enough to compensate the batman-adv overhead. Fix the computation by properly returning the highest admitted value. Introduced by `a19d3d85e1` ("batman-adv: limit local translation table max size") Reported-by: Russell Senior <russell@personaltelco.net> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2014-02-17 17:17:00 +01:00
Nikolay Aleksandrov	f627ed91d8	netfilter: nf_tables: check if payload length is a power of 2 Add a check if payload's length is a power of 2 when selecting ops. The fast ops were meant for well aligned loads, also this fixes a small bug when using a length of 3 with some offsets which causes only 1 byte to be loaded because the fast ops are chosen. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-17 11:21:17 +01:00
Florian Westphal	478b360a47	netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get lots of "TRACE: filter:output:policy:1 IN=..." warnings as several places will leave skb->nf_trace uninitialised. Unlike iptables tracing functionality is not conditional in nftables, so always copy/zero nf_trace setting when nftables is enabled. Move this into __nf_copy() helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-17 11:20:12 +01:00
Daniel Borkmann	0fd5d57ba3	packet: check for ndo_select_queue during queue selection Mathias reported that on an AMD Geode LX embedded board (ALiX) with ath9k driver PACKET_QDISC_BYPASS, introduced in commit `d346a3fae3` ("packet: introduce PACKET_QDISC_BYPASS socket option"), triggers a WARN_ON() coming from the driver itself via `066dae93bd` ("ath9k: rework tx queue selection and fix queue stopping/waking"). The reason why this happened is that ndo_select_queue() call is not invoked from direct xmit path i.e. for ieee80211 subsystem that sets queue and TID (similar to 802.1d tag) which is being put into the frame through 802.11e (WMM, QoS). If that is not set, pending frame counter for e.g. ath9k can get messed up. So the WARN_ON() in ath9k is absolutely legitimate. Generally, the hw queue selection in ieee80211 depends on the type of traffic, and priorities are set according to ieee80211_ac_numbers mapping; working in a similar way as DiffServ only on a lower layer, so that the AP can favour frames that have "real-time" requirements like voice or video data frames. Therefore, check for presence of ndo_select_queue() in netdev ops and, if available, invoke it with a fallback handler to __packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe, or mlx4 can still select a hw queue for transmission in relation to the current CPU while e.g. ieee80211 subsystem can make their own choices. Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:36:34 -05:00
Daniel Borkmann	b9507bdaf4	netdevice: move netdev_cap_txqueue for shared usage to header In order to allow users to invoke netdev_cap_txqueue, it needs to be moved into netdevice.h header file. While at it, also add kernel doc header to document the API. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:36:34 -05:00
Daniel Borkmann	99932d4fc0	netdevice: add queue selection fallback handler for ndo_select_queue Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:36:34 -05:00
Matija Glavinic Pecotic	ef2820a735	net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer Implementation of (a)rwnd calculation might lead to severe performance issues and associations completely stalling. These problems are described and solution is proposed which improves lksctp's robustness in congestion state. 1) Sudden drop of a_rwnd and incomplete window recovery afterwards Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data), but size of sk_buff, which is blamed against receiver buffer, is not accounted in rwnd. Theoretically, this should not be the problem as actual size of buffer is double the amount requested on the socket (SO_RECVBUF). Problem here is that this will have bad scaling for data which is less then sizeof sk_buff. E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion of traffic of this size (less then 100B). An example of sudden drop and incomplete window recovery is given below. Node B exhibits problematic behavior. Node A initiates association and B is configured to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp message in 4G (LTE) network). On B data is left in buffer by not reading socket in userspace. Lets examine when we will hit pressure state and declare rwnd to be 0 for scenario with above stated parameters (rwnd == 10000, chunk size == 43, each chunk is sent in separate sctp packet) Logic is implemented in sctp_assoc_rwnd_decrease: socket_buffer (see below) is maximum size which can be held in socket buffer (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count) A simple expression is given for which it will be examined after how many packets for above stated parameters we enter pressure state: We start by condition which has to be met in order to enter pressure state: socket_buffer < currently_alloced; currently_alloced is represented as size of sctp packets received so far and not yet delivered to userspace. x is the number of chunks/packets (since there is no bundling, and each chunk is delivered in separate packet, we can observe each chunk also as sctp packet, and what is important here, having its own sk_buff): socket_buffer < xeach_sctp_packet; each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is twice the amount of initially requested size of socket buffer, which is in case of sctp, twice the a_rwnd requested: 2rwnd < x(payload+sizeof(struc sk_buff)); sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000 and each payload size is 43 20000 < x(43+190); x > 20000/233; x ~> 84; After ~84 messages, pressure state is entered and 0 rwnd is advertised while received 8443B ~= 3612B sctp data. This is why external observer notices sudden drop from 6474 to 0, as it will be now shown in example: IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017] IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839] IP A.34340 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.34340: sctp (1) [COOKIE ACK] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0] <...> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18] --> Sudden drop IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] At this point, rwnd_press stores current rwnd value so it can be later restored in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This condition is not met since rwnd, after it hit 0, must first reach rwnd_press by adding amount which is read from userspace. Let us observe values in above example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the amount of actual sctp data currently waiting to be delivered to userspace is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed only for sctp data, which is ~3500. Condition is never met, and when userspace reads all data, rwnd stays on 3569. IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] --> At this point userspace read everything, rwnd recovered only to 3569 IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] Reproduction is straight forward, it is enough for sender to send packets of size less then sizeof(struct sk_buff) and receiver keeping them in its buffers. 2) Minute size window for associations sharing the same socket buffer In case multiple associations share the same socket, and same socket buffer (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one of the associations can permanently drop rwnd of other association(s). Situation will be typically observed as one association suddenly having rwnd dropped to size of last packet received and never recovering beyond that point. Different scenarios will lead to it, but all have in common that one of the associations (let it be association from 1)) nearly depleted socket buffer, and the other association blames socket buffer just for the amount enough to start the pressure. This association will enter pressure state, set rwnd_press and announce 0 rwnd. When data is read by userspace, similar situation as in 1) will occur, rwnd will increase just for the size read by userspace but rwnd_press will be high enough so that association doesn't have enough credit to reach rwnd_press and restore to previous state. This case is special case of 1), being worse as there is, in the worst case, only one packet in buffer for which size rwnd will be increased. Consequence is association which has very low maximum rwnd ('minute size', in our case down to 43B - size of packet which caused pressure) and as such unusable. Scenario happened in the field and labs frequently after congestion state (link breaks, different probabilities of packet drop, packet reordering) and with scenario 1) preceding. Here is given a deterministic scenario for reproduction: >From node A establish two associations on the same socket, with rcvbuf_policy being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1 repeat scenario from 1), that is, bring it down to 0 and restore up. Observe scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered', bring it down close to 0, as in just one more packet would close it. This has as a consequence that association number 2 is able to receive (at least) one more packet which will bring it in pressure state. E.g. if association 2 had rwnd of 10000, packet received was 43, and we enter at this point into pressure, rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will increase for 43, but conditions to restore rwnd to original state, just as in 1), will never be satisfied. --> Association 1, between A.y and B.12345 IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569] IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613] IP A.55915 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.55915: sctp (1) [COOKIE ACK] --> Association 2, between A.z and B.12346 IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173] IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240] IP A.55915 > B.12346: sctp (1) [COOKIE ECHO] IP B.12346 > A.55915: sctp (1) [COOKIE ACK] --> Deplete socket buffer by sending messages of size 43B over association 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] <...> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0] --> Sudden drop on 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using association 1 so there is place in buffer for only one more packet IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] --> Socket buffer is almost depleted, but there is space for one more packet, send them over association 2, size 43B IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Immediate drop IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Read everything from the socket, both association recover up to maximum rwnd they are capable of reaching, note that association 1 recovered up to 3698, and association 2 recovered only to 43 IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0] IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] A careful reader might wonder why it is necessary to reproduce 1) prior reproduction of 2). It is simply easier to observe when to send packet over association 2 which will push association into the pressure state. Proposed solution: Both problems share the same root cause, and that is improper scaling of socket buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while calculating rwnd is not possible due to fact that there is no linear relationship between amount of data blamed in increase/decrease with IP packet in which payload arrived. Even in case such solution would be followed, complexity of the code would increase. Due to nature of current rwnd handling, slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is entered is rationale, but it gives false representation to the sender of current buffer space. Furthermore, it implements additional congestion control mechanism which is defined on implementation, and not on standard basis. Proposed solution simplifies whole algorithm having on mind definition from rfc: o Receiver Window (rwnd): This gives the sender an indication of the space available in the receiver's inbound buffer. Core of the proposed solution is given with these lines: sctp_assoc_rwnd_update: if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0) asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1; else asoc->rwnd = 0; We advertise to sender (half of) actual space we have. Half is in the braces depending whether you would like to observe size of socket buffer as SO_RECVBUF or twice the amount, i.e. size is the one visible from userspace, that is, from kernelspace. In this way sender is given with good approximation of our buffer space, regardless of the buffer policy - we always advertise what we have. Proposed solution fixes described problems and removes necessity for rwnd restoration algorithm. Finally, as proposed solution is simplification, some lines of code, along with some bytes in struct sctp_association are saved. Version 2 of the patch addressed comments from Vlad. Name of the function is set to be more descriptive, and two parts of code are changed, in one removing the superfluous call to sctp_assoc_rwnd_update since call would not result in update of rwnd, and the other being reordering of the code in a way that call to sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2 in a way that existing function is not reordered/copied in line, but it is correctly called. Thanks Vlad for suggesting. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:16:56 -05:00
Duan Jiong	cd0f0b95fd	ipv4: distinguish EHOSTUNREACH from the ENETUNREACH since commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."), the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of err was always set to ENETUNREACH. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-16 23:45:31 -05:00
Gerrit Renker	09db308053	dccp: re-enable debug macro dccp tfrc: revert This reverts `6aee49c558` ("dccp: make local variable static") since the variable tfrc_debug is referenced by the tfrc_pr_debug(fmt, ...) macro when TFRC debugging is enabled. If it is enabled, use of the macro produces a compilation error. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-16 23:45:00 -05:00
Trond Myklebust	e9776d0f4a	SUNRPC: Fix a pipe_version reference leak In gss_alloc_msg(), if the call to gss_encode_v1_msg() fails, we want to release the reference to the pipe_version that was obtained earlier in the function. Fixes: `9d3a2260f0` (SUNRPC: Fix buffer overflow checking in...) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-02-16 13:28:01 -05:00
Trond Myklebust	9eb2ddb48c	SUNRPC: Ensure that gss_auth isn't freed before its upcall messages Fix a race in which the RPC client is shutting down while the gss daemon is processing a downcall. If the RPC client manages to shut down before the gss daemon is done, then the struct gss_auth used in gss_release_msg() may have already been freed. Link: http://lkml.kernel.org/r/1392494917.71728.YahooMailNeo@web140002.mail.bf1.yahoo.com Reported-by: John <da_audiophile@yahoo.com> Reported-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-02-16 13:06:06 -05:00
Jarno Rajahalme	42ee19e293	openvswitch: Fix race. ovs_vport_cmd_dump() did rcu_read_lock() only after getting the datapath, which could have been deleted in between. Resolved by taking rcu_read_lock() before the get_dp() call. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-02-15 17:42:29 -08:00
Jarno Rajahalme	04382a3303	openvswitch: Read tcp flags only then the tranport header is present. Only the first IP fragment can have a TCP header, check for this. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-15 17:37:45 -08:00
Jiri Pirko	3c7eacfc8a	ovs: fix dp check in ovs_dp_reset_user_features This fixes crash when userspace does "ovs-dpctl add-dp dev" where dev is existing non-dp netdevice. Introduced by: commit `44da5ae5fb` "openvswitch: Drop user features if old user space attempted to create datapath" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-15 17:24:19 -08:00
FX Le Bail	2b7a79bae2	netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches The solution was found by Patrick in 2.4 kernel sources. Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-14 11:37:36 +01:00
Paul Bolle	06efbd6d56	netfilter: nft_meta: fix typo "CONFIG_NET_CLS_ROUTE" There are two checks for CONFIG_NET_CLS_ROUTE, but the corresponding Kconfig symbol was dropped in v2.6.39. Since the code guards access to dst_entry.tclassid it seems CONFIG_IP_ROUTE_CLASSID should be used instead. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-14 11:37:34 +01:00
Patrick McHardy	ce898ecb5a	netfilter: nft_reject_inet: fix unintended fall-through in switch-statatement For IPv4 packets, we call both IPv4 and IPv6 reject. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-14 11:37:33 +01:00
FX Le Bail	357137a422	ipv4: ipconfig.c: add parentheses in an if statement Even if the 'time_before' macro expand with parentheses, the look is bad. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-14 00:14:23 -05:00
Vijay Subramanian	219e288e89	net: sched: Cleanup PIE comments Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add more details. Also add references to algorithm (IETF draft and paper) to top of file. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> CC: Mythili Prabhu <mysuryan@cisco.com> CC: Norbert Kiesel <nkiesel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 18:29:58 -05:00
Florian Westphal	fe6cc55f3a	net: ip, ipv6: handle gso skbs in forwarding path Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:17:02 -05:00
Florian Westphal	d206940319	net: core: introduce netif_skb_dev_features Will be used by upcoming ipv4 forward path change that needs to determine feature mask using skb->dst->dev instead of skb->dev. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:17:02 -05:00
wangweidong	efb842c45e	sctp: optimize the sctp_sysctl_net_register Here, when the net is init_net, we needn't to kmemdup the ctl_table again. So add a check for net. Also we can save some memory. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:08:29 -05:00
wangweidong	22a1f5140e	sctp: fix a missed .data initialization As commit 3c68198e75111a90("sctp: Make hmac algorithm selection for cookie generation dynamic"), we miss the .data initialization. If we don't use the net_namespace, the problem that parts of the sysctl configuration won't be isolation and won't occur. In sctp_sysctl_net_register(), we register the sysctl for each net, in the for(), we use the 'table[i].data' as check condition, so when the 'i' is the index of sctp_hmac_alg, the data is NULL, then break. So add the .data initialization. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:08:29 -05:00
Cong Wang	0e0eee2465	net: correct error path in rtnl_newlink() I saw the following BUG when ->newlink() fails in rtnl_newlink(): [ 40.240058] kernel BUG at net/core/dev.c:6438! this is due to free_netdev() is not supposed to be called before netdev is completely unregistered, therefore it is not correct to call free_netdev() here, at least for ops->newlink!=NULL case, many drivers call it in ->destructor so that rtnl_unlock() will take care of it, we probably don't need to do anything here. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:08:29 -05:00
Erik Hugne	64380a04de	tipc: fix message corruption bug for deferred packets If a packet received on a link is out-of-sequence, it will be placed on a deferred queue and later reinserted in the receive path once the preceding packets have been processed. The problem with this is that it will be subject to the buffer adjustment from link_recv_buf_validate twice. The second adjustment for 20 bytes header space will corrupt the packet. We solve this by tagging the deferred packets and bail out from receive buffer validation for packets that have already been subjected to this. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 16:35:05 -05:00
Felix Fietkau	1bf4bbb402	mac80211: send control port protocol frames to the VO queue Improves reliability of wifi connections with WPA, since authentication frames are prioritized over normal traffic and also typically exempt from aggregation. Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-12 11:26:43 +01:00
Linus Torvalds	16e5a2ed59	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking updates from David Miller: 1) Fix flexcan build on big endian, from Arnd Bergmann 2) Correctly attach cpsw to GPIO bitbang MDIO drive, from Stefan Roese 3) udp_add_offload has to use GFP_ATOMIC since it can be invoked from non-sleepable contexts. From Or Gerlitz 4) vxlan_gro_receive() does not iterate over all possible flows properly, fix also from Or Gerlitz 5) CAN core doesn't use a proper SKB destructor when it hooks up sockets to SKBs. Fix from Oliver Hartkopp 6) ip_tunnel_xmit() can use an uninitialized route pointer, fix from Eric Dumazet 7) Fix address family assignment in IPVS, from Michal Kubecek 8) Fix ath9k build on ARM, from Sujith Manoharan 9) Make sure fail_over_mac only applies for the correct bonding modes, from Ding Tianhong 10) The udp offload code doesn't use RCU correctly, from Shlomo Pongratz 11) Handle gigabit features properly in generic PHY code, from Florian Fainelli 12) Don't blindly invoke link operations in rtnl_link_get_slave_info_data_size, they are optional. Fix from Fernando Luis Vazquez Cao 13) Add USB IDs for Netgear Aircard 340U, from Bjørn Mork 14) Handle netlink packet padding properly in openvswitch, from Thomas Graf 15) Fix oops when deleting chains in nf_tables, from Patrick McHardy 16) Fix RX stalls in xen-netback driver, from Zoltan Kiss 17) Fix deadlock in mac80211 stack, from Emmanuel Grumbach 18) inet_nlmsg_size() forgets to consider ifa_cacheinfo, fix from Geert Uytterhoeven 19) tg3_change_mtu() can deadlock, fix from Nithin Sujir 20) Fix regression in setting SCTP local source addresses on accepted sockets, caused by some generic ipv6 socket changes. Fix from Matija Glavinic Pecotic 21) IPPROTO_* must be pure defines, otherwise module aliases don't get constructed properly. Fix from Jan Moskyto 22) IPV6 netconsole setup doesn't work properly unless an explicit source address is specified, fix from Sabrina Dubroca 23) Use __GFP_NORETRY for high order skb page allocations in sock_alloc_send_pskb and skb_page_frag_refill. From Eric Dumazet 24) Fix a regression added in netconsole over bridging, from Cong Wang 25) TCP uses an artificial offset of 1ms for SRTT, but this doesn't jive well with TCP pacing which needs the SRTT to be accurate. Fix from Eric Dumazet 26) Several cases of missing header file includes from Rashika Kheria 27) Add ZTE MF667 device ID to qmi_wwan driver, from Raymond Wanyoike 28) TCP Small Queues doesn't handle nonagle properly in some corner cases, fix from Eric Dumazet 29) Remove extraneous read_unlock in bond_enslave, whoops. From Ding Tianhong 30) Fix 9p trans_virtio handling of vmalloc buffers, from Richard Yao * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (136 commits) 6lowpan: fix lockdep splats alx: add missing stats_lock spinlock init 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers bonding: remove unwanted bond lock for enslave processing USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support tcp: tsq: fix nonagle handling bridge: Prevent possible race condition in br_fdb_change_mac_address bridge: Properly check if local fdb entry can be deleted when deleting vlan bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address bridge: Fix the way to check if a local fdb entry can be deleted bridge: Change local fdb entries whenever mac address of bridge device changes bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min net: vxge: Remove unused device pointer net: qmi_wwan: add ZTE MF667 3c59x: Remove unused pointer in vortex_eisa_cleanup() net: fix 'ip rule' iif/oif device rename ...	2014-02-11 12:05:55 -08:00
Trond Myklebust	628356791b	SUNRPC: Fix potential memory scribble in xprt_free_bc_request() The call to xprt_free_allocation() will call list_del() on req->rq_bc_pa_list, which is not attached to a list. This patch moves the list_del() out of xprt_free_allocation() and into those callers that need it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-02-11 13:56:54 -05:00
Trond Myklebust	06ea0bfe6e	SUNRPC: Fix races in xs_nospace() When a send failure occurs due to the socket being out of buffer space, we call xs_nospace() in order to have the RPC task wait until the socket has drained enough to make it worth while trying again. The current patch fixes a race in which the socket is drained before we get round to setting up the machinery in xs_nospace(), and which is reported to cause hangs. Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown Fixes: `a9a6b52ee1` (SUNRPC: Don't start the retransmission timer...) Reported-by: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-02-11 09:34:30 -05:00
Eytan Lifshitz	c368ddaa9a	mac80211: fix memory leak In case ieee80211_prep_connection() fails to dereference sdata->vif.chanctx_conf, the function returns and doesn't free new_sta. fixed. Signed-off-by: Eytan Lifshitz <eytan.lifshitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-11 12:59:36 +01:00
Arik Nemtsov	32769814d5	mac80211: fix sched_scan restart on recovery In case we were not suspended, the reconfig function returns without configuring the scheduled scan. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-11 12:59:12 +01:00
Eric Dumazet	20e7c4e80d	6lowpan: fix lockdep splats When a device ndo_start_xmit() calls again dev_queue_xmit(), lockdep can complain because dev_queue_xmit() is re-entered and the spinlocks protecting tx queues share a common lockdep class. Same issue was fixed for bonding/l2tp/ppp in commits `0daa230302` ("[PATCH] bonding: lockdep annotation") `49ee49202b` ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat") `23d3b8bfb8` ("net: qdisc busylock needs lockdep annotations ") `303c07db48` ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ") Reported-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 17:51:29 -08:00
Richard Yao	b6f52ae2f0	9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers The 9p-virtio transport does zero copy on things larger than 1024 bytes in size. It accomplishes this by returning the physical addresses of pages to the virtio-pci device. At present, the translation is usually a bit shift. That approach produces an invalid page address when we read/write to vmalloc buffers, such as those used for Linux kernel modules. Any attempt to load a Linux kernel module from 9p-virtio produces the following stack. [<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510 [<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0 [<ffffffff814839dd>] p9_client_read+0x15d/0x240 [<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0 [<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20 [<ffffffff811c84e7>] v9fs_file_read+0x37/0x70 [<ffffffff8114e3fb>] vfs_read+0x9b/0x160 [<ffffffff81153571>] kernel_read+0x41/0x60 [<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180 Subsequently, QEMU will die printing: qemu-system-x86_64: virtio: trying to map MMIO memory This patch enables 9p-virtio to correctly handle this case. This not only enables us to load Linux kernel modules off virtfs, but also enables ZFS file-based vdevs on virtfs to be used without killing QEMU. Special thanks to both Avi Kivity and Alexander Graf for their interpretation of QEMU backtraces. Without their guidence, tracking down this bug would have taken much longer. Also, special thanks to Linus Torvalds for his insightful explanation of why this should use is_vmalloc_addr() instead of is_vmalloc_or_module_addr(): https://lkml.org/lkml/2014/2/8/272 Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 17:48:54 -08:00
John Ogness	bf06200e73	tcp: tsq: fix nonagle handling Commit `46d3ceabd8` ("tcp: TCP Small Queues") introduced a possible regression for applications using TCP_NODELAY. If TCP session is throttled because of tsq, we should consult tp->nonagle when TX completion is done and allow us to send additional segment, especially if this segment is not a full MSS. Otherwise this segment is sent after an RTO. [edumazet] : Cooked the changelog, added another fix about testing sk_wmem_alloc twice because TX completion can happen right before setting TSQ_THROTTLED bit. This problem is particularly visible with recent auto corking, but might also be triggered with low tcp_limit_output_bytes values or NIC drivers delaying TX completion by hundred of usec, and very low rtt. Thomas Glanzmann for example reported an iscsi regression, caused by tcp auto corking making this bug quite visible. Fixes: `46d3ceabd8` ("tcp: TCP Small Queues") Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 15:23:39 -08:00
Toshiaki Makita	ac4c886883	bridge: Prevent possible race condition in br_fdb_change_mac_address br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without br->hash_lock. These hash list updates are racy with br_fdb_update()/br_fdb_cleanup(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	424bb9c97c	bridge: Properly check if local fdb entry can be deleted when deleting vlan Vlan codes unconditionally delete local fdb entries. We should consider the possibility that other ports have the same address and vlan. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev eth0 vid 10 bridge vlan add dev eth1 vid 10 bridge vlan add dev br0 vid 10 self We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, delete eth0 vlan 10. bridge vlan del dev eth0 vid 10 In this case, we still need the entry for br0, but it will be deleted. Note that br0 needs the entry even though its mac address is not set manually. To delete the entry with proper condition checking, fdb_delete_local() is suitable to use. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	a778e6d1a5	bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port br_fdb_delete_by_port() doesn't care about vlan and mac address of the bridge device. As the check is almost the same as mac address changing, slightly modify fdb_delete_local() and use it. Note that we can always set added_by_user to 0 in fdb_delete_local() because - br_fdb_delete_by_port() calls fdb_delete_local() for local entries regardless of its added_by_user. In this case, we have to check if another port has the same address and vlan, and if found, we have to create the entry (by changing dst). This is kernel-added entry, not user-added. - br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	960b589f86	bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address br_fdb_change_mac_address() doesn't check if the local entry has the same address as any of bridge ports. Although I'm not sure when it is beneficial, current implementation allow the bridge device to receive any mac address of its ports. To preserve this behavior, we have to check if the mac address of the entry being deleted is identical to that of any port. As this check is almost the same as that in br_fdb_changeaddr(), create a common function fdb_delete_local() and call it from br_fdb_changeadddr() and br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2b292fb4a5	bridge: Fix the way to check if a local fdb entry can be deleted We should take into account the followings when deleting a local fdb entry. - nbp_vlan_find() can be used only when vid != 0 to check if an entry is deletable, because a fdb entry with vid 0 can exist at any time while nbp_vlan_find() always return false with vid 0. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address 12:34:56:78:90:ab brctl addif br0 eth0 brctl addif br0 eth1 ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the bridge port eth1 still has that address. - The port to which the bridge device is attached might needs a local entry if its mac address is set manually. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab brctl addif br0 eth0 ip link set br0 address 12:34:56:78:90:ab ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be deleted. We can use br->dev->addr_assign_type to check if the address is manually set or not, but I propose another approach. Since we delete and insert local entries whenever changing mac address of the bridge device, we can change dst of the entry to NULL regardless of addr_assign_type when deleting an entry associated with a certain port, and if it is found to be unnecessary later, then delete it. That is, if changing mac address of a port, the entry might be changed to its dst being NULL first, but is eventually deleted when recalculating and changing bridge id. This approach is especially useful when we want to share the code with deleting vlan in which the bridge device might want such an entry regardless of addr_assign_type, and makes things easy because we don't have to consider if mac address of the bridge device will be changed or not at the time we delete a local entry of a port, which means fdb code will not be bothered even if the bridge id calculating logic is changed in the future. Also, this change reduces inconsistent state, where frames whose dst is the mac address of the bridge, can't reach the bridge because of premature fdb entry deletion. This change reduces the possibility that the bridge device replies unreachable mac address to arp requests, which could occur during the short window between calling del_nbp() and br_stp_recalculate_bridge_id() in br_del_if(). This will effective after br_fdb_delete_by_port() starts to use the same code by following patch. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a4b816d8ba	bridge: Change local fdb entries whenever mac address of bridge device changes Vlan code may need fdb change when changing mac address of bridge device even if it is caused by the mac address changing of a bridge port. Example configuration: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev br0 vid 10 self bridge vlan add dev eth0 vid 10 We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, change the mac address of eth0 to greater value. ip link set eth0 address ee:ff:12:34:56:78 Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff. However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not able to communicate using br0 on vlan 10. Address this issue by deleting and adding local entries whenever changing the mac address of the bridge device. If there already exists an entry that has the same address, for example, in case that br_fdb_changeaddr() has already inserted it, br_fdb_change_mac_address() will simply fail to insert it and no duplicated entry will be made, as it was. This approach also needs br_add_if() to call br_fdb_insert() before br_stp_recalculate_bridge_id() so that we don't create an entry whose dst == NULL in this function to preserve previous behavior. Note that this is a slight change in behavior where the bridge device can receive the traffic to the new address before calling br_stp_recalculate_bridge_id() in br_add_if(). However, it is not a problem because we have already the address on the new port and such a way to insert new one before recalculating bridge id is taken in br_device_event() as well. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a3ebb7efe7	bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address We have been always failed to delete the old entry at br_fdb_change_mac_address() because br_set_mac_address() updates dev->dev_addr before calling br_fdb_change_mac_address() and br_fdb_change_mac_address() uses dev->dev_addr to find the old entry. That update of dev_addr is completely unnecessary because the same work is done in br_stp_change_bridge_id() which is called right away after calling br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2836882fe0	bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr Since commit `bc9a25d21e` ("bridge: Add vlan support for local fdb entries"), br_fdb_changeaddr() has inserted a new local fdb entry only if it can find old one. But if we have two ports where they have the same address or user has deleted a local entry, there will be no entry for one of the ports. Example of problematic case: ip link set eth0 address aa:bb:cc:dd:ee:ff ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # eth1 will not have a local entry due to dup. ip link set eth1 address 12:34:56:78:90:ab Then, the new entry for the address 12:34:56:78:90:ab will not be created, and the bridge device will not be able to communicate. Insert new entries regardless of whether we can find old entries or not. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a5642ab474	bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr br_fdb_changeaddr() assumes that there is at most one local entry per port per vlan. It used to be true, but since commit `36fd2b63e3` ("bridge: allow creating/deleting fdb entries via netlink"), it has not been so. Therefore, the function might fail to search a correct previous address to be deleted and delete an arbitrary local entry if user has added local entries manually. Example of problematic case: ip link set eth0 address ee:ff:12:34:56:78 brctl addif br0 eth0 bridge fdb add 12:34:56:78:90:ab dev eth0 master ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the address 12:34:56:78:90:ab might be deleted instead of ee:ff:12:34:56:78, the original mac address of eth0. Address this issue by introducing a new flag, added_by_user, to struct net_bridge_fdb_entry. Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases like: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master brctl addif br0 eth1 brctl delif br0 eth0 In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff, but it also should have been added by "brctl addif br0 eth1" originally, so we don't delete it and treat it a new kernel-created entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Trond Myklebust	a699d65ec4	SUNRPC: Don't create a gss auth cache unless rpc.gssd is running An infinite loop is caused when nfs4_establish_lease() fails with -EACCES. This causes nfs4_handle_reclaim_lease_error() to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit. This in turn causes nfs4_state_manager() to try and reestablished the lease, again, again, again... The problem is a valid RPCSEC_GSS client is being created when rpc.gssd is not running. Link: http://lkml.kernel.org/r/1392066375-16502-1-git-send-email-steved@redhat.com Fixes: `0ea9de0ea6` (sunrpc: turn warn_gssd() log message into a dprintk()) Reported-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-02-10 16:49:17 -05:00
Jesper Juhl	b10bd54c05	tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min As far as I can tell we have used a default of 60 seconds for FIN_WAIT2 timeout for ages (since 2.x times??). In any case, the timeout these days is 60 seconds, so the 3 min comment is wrong (and cost me a few minutes of my life when I was debugging a FIN_WAIT2 related problem in a userspace application and checked the kernel source for details). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 19:14:23 -08:00
Maciej Żenczykowski	946c032e5a	net: fix 'ip rule' iif/oif device rename ip rules with iif/oif references do not update: (detach/attach) across interface renames. Signed-off-by: Maciej Żenczykowski <maze@google.com> CC: Willem de Bruijn <willemb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: Chris Davis <chrismd@google.com> CC: Carlo Contavalli <ccontavalli@google.com> Google-Bug-Id: 12936021 Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 19:02:52 -08:00
Christian Engelmayer	310f6fd0ca	6lowpan: Remove unused pointer in lowpan_header_create() Commit `8df8c56a` (6lowpan: Moving generic compression code into 6lowpan_iphc.c) left pointer 'hdr' unused - remove it. Detected by Coverity: CID 1164868. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 18:38:10 -08:00
FX Le Bail	d94c1f92bb	ipv6: icmp6_send: fix Oops when pinging a not set up IPv6 peer on a sit tunnel The patch `446fab5933` ("ipv6: enable anycast addresses as source addresses in ICMPv6 error messages") causes an Oops when pinging a not set up IPv6 peer on a sit tunnel. The problem is that ipv6_anycast_destination() uses unconditionally skb_dst(skb), which is NULL in this case. The solution is to use instead the ipv6_chk_acast_addr_src() function. Here are the steps to reproduce it: modprobe sit ip link add sit1 type sit remote 10.16.0.121 local 10.16.0.249 ip l s sit1 up ip -6 a a dev sit1 2001🔢:123 remote 2001🔢:121 ping6 2001🔢:121 Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 18:12:40 -08:00
Rashika Kheria	e1d83ee673	net: Mark functions as static in net/sunrpc/svc_xprt.c Mark functions as static in net/sunrpc/svc_xprt.c because they are not used outside this file. This eliminates the following warning in net/sunrpc/svc_xprt.c: net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Rashika Kheria	bd76ed36ba	net: Include appropriate header file in netfilter/nft_lookup.c Include appropriate header file net/netfilter/nf_tables_core.h in net/netfilter/nft_lookup.c because it has prototype declaration of functions defined in net/netfilter/nft_lookup.c. This eliminates the following warning in net/netfilter/nft_lookup.c: net/netfilter/nft_lookup.c:133:12: warning: no previous prototype for ‘nft_lookup_module_init’ [-Wmissing-prototypes] net/netfilter/nft_lookup.c:138:6: warning: no previous prototype for ‘nft_lookup_module_exit’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Rashika Kheria	535d3ae9c8	net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c Move prototype declaration of function to header file include/net/net_namespace.h from net/ipx/af_ipx.c because they are used by more than one file. This eliminates the following warning in net/ipx/sysctl_net_ipx.c: net/ipx/sysctl_net_ipx.c:33:6: warning: no previous prototype for ‘ipx_register_sysctl’ [-Wmissing-prototypes] net/ipx/sysctl_net_ipx.c:38:6: warning: no previous prototype for ‘ipx_unregister_sysctl’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Rashika Kheria	7780d8ae4a	net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c Move prototype declarations of function to header file include/net/datalink.h from net/ipx/af_ipx.c because they are used by more than one file. This eliminates the following warning in net/ipx/pe2.c: net/ipx/pe2.c:20:24: warning: no previous prototype for ‘make_EII_client’ [-Wmissing-prototypes] net/ipx/pe2.c:32:6: warning: no previous prototype for ‘destroy_EII_client’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Rashika Kheria	578efbc19f	net: Move prototype declaration to header file include/net/ipx.h from net/ipx/af_ipx.c Move prototype declaration of functions to header file include/net/ipx.h from net/ipx/af_ipx.c because they are used by more than one file. This eliminates the following warning in net/ipx/ipx_route.c:33:19: warning: no previous prototype for ‘ipxrtr_lookup’ [-Wmissing-prototypes] net/ipx/ipx_route.c:52:5: warning: no previous prototype for ‘ipxrtr_add_route’ [-Wmissing-prototypes] net/ipx/ipx_route.c:94:6: warning: no previous prototype for ‘ipxrtr_del_routes’ [-Wmissing-prototypes] net/ipx/ipx_route.c:149:5: warning: no previous prototype for ‘ipxrtr_route_skb’ [-Wmissing-prototypes] net/ipx/ipx_route.c:171:5: warning: no previous prototype for ‘ipxrtr_route_packet’ [-Wmissing-prototypes] net/ipx/ipx_route.c:261:5: warning: no previous prototype for ‘ipxrtr_ioctl’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Rashika Kheria	493cc5e5ba	net: Move prototype declaration to include/net/ipx.h from net/ipx/ipx_route.c Move prototype definition of function to header file include/net/ipx.h from net/ipx/ipx_route.c because they are used by more than one file. This eliminates the following warning from net/ipx/af_ipx.c: net/ipx/af_ipx.c:193:23: warning: no previous prototype for ‘ipxitf_find_using_net’ [-Wmissing-prototypes] net/ipx/af_ipx.c:577:5: warning: no previous prototype for ‘ipxitf_send’ [-Wmissing-prototypes] net/ipx/af_ipx.c:1219:8: warning: no previous prototype for ‘ipx_cksum’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	ab3301bd96	net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c Move prototype declaration of functions to header file include/net/dn.h from net/decnet/af_decnet.c because they are used by more than one file. This eliminates the following warning in net/decnet/af_decnet.c: net/decnet/sysctl_net_decnet.c:354:6: warning: no previous prototype for ‘dn_register_sysctl’ [-Wmissing-prototypes] net/decnet/sysctl_net_decnet.c:359:6: warning: no previous prototype for ‘dn_unregister_sysctl’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	f56b8bf6e4	net: Move prototype declaration to appropriate header file from decnet/af_decnet.c Move prototype declaration of functions to header file include/net/dn_route.h from net/decnet/af_decnet.c because it is used by more than one file. This eliminates the following warning in net/decnet/dn_route.c: net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	0a59f3a9fd	net: Mark functions as static in core/dev.c Mark functions as static in core/dev.c because they are not used outside this file. This eliminates the following warning in core/dev.c: net/core/dev.c:2806:5: warning: no previous prototype for ‘__dev_queue_xmit’ [-Wmissing-prototypes] net/core/dev.c:4640:5: warning: no previous prototype for ‘netdev_adjacent_sysfs_add’ [-Wmissing-prototypes] net/core/dev.c:4650:6: warning: no previous prototype for ‘netdev_adjacent_sysfs_del’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	02fe72c9ed	net: Include appropriate header file in caif/cfsrvl.c Include appropriate header file net/caif/caif_dev.h in caif/cfsrvl.c because it has prototype declaration of functions defined in caif/cfsrvl.c. This eliminates the following warning in caif/cfsrvl.c: net/caif/cfsrvl.c:198:6: warning: no previous prototype for ‘caif_free_client’ [-Wmissing-prototypes] net/caif/cfsrvl.c:208:6: warning: no previous prototype for ‘caif_client_register_refcnt’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	8203274e15	net: Include appropriate header file in caif/caif_dev.c Include appropriate header file net/caif/caif_dev.h in caif/caif_dev.c because it has prototype declarations of function defined in caif/caif_dev.c. This eliminates the following file in caif/caif_dev.c: net/caif/caif_dev.c:303:6: warning: no previous prototype for ‘caif_enroll_dev’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
Rashika Kheria	fd944bded5	net: Mark function as static in 9p/client.c Mark function as static in net/9p/client.c because it is not used outside this file. This eliminates the following warning in net/9p/client.c: net/9p/client.c:207:18: warning: no previous prototype for ‘p9_fcall_alloc’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:49 -08:00
David S. Miller	872c7e6fd2	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Please pull this batch of fixes intended for the 3.14 stream! For the mac80211 bits, Johannes says: "This is just a collection of small fixes, the commit logs explain the details. The only thing that isn't strictly a fix is the 5/10 MHz enabling, I had forgotten this and there's little point in waiting longer. The patch simply removes the force-disable code that I put in when there was a problem with the userspace API (that has long been fixed.)" For the iwlwifi bits, Emmanuel says: "I have an important fix that disables A band in case the driver thought it was enabled, and the firmware disagreed. We ended up making the firmware unhappy. I also fix the station table in AP mode and fix the scan while we have BT working. Johannes removes a static variable that could potentially lead to to issues on multi-device setups and disables scheduled scan to avoid issues with old versions of wpa_supplicant. A small fix from David on scan and a few new device IDs for 7265." On top of that... Oleksij Rempel adds a USB ID to the ar5523 driver and changes the default powersave setting for ath9k_htc to "off", due to observed stability issues (based on an equivalent ath9k patch). Stanislaw Gruszka similarly disables powersave for a couple of rt2x00 drivers. He also fixes a couple of scheduling while atomic issues in ath9k_htc. Sujith Manoharan rounds-out the powersave disables with one for ath9k. He also fixes a build prolem with ath9k on ARM and fixes an ath9k Tx power calculation. Finally, Andrea Merello fixes a couple of lingering DMA mapping problems in the rtl8180 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 14:21:25 -08:00
David S. Miller	f41f031960	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/nftables/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes, mostly nftables fixes, most relevantly they are: * Fix a crash in the h323 conntrack NAT helper due to expectation list corruption, from Alexey Dobriyan. * A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and me. * Dump direction attribute in nft_ct only if it is set, from Arturo Borrero. * Fix IPVS bug in its own connection tracking system that may lead to copying only 4 bytes of the IPv6 address when initializing the ip_vs_conn object, from Michal Kubecek. * Fix -EBUSY errors in nftables when deleting the rules, chain and tables in a row due mixture of asynchronous and synchronous object releasing, from me. * Three fixes for the nf_tables set infrastructure when using intervals and mappings, from me. * Four patches to fixing the nf_tables log, reject and ct expressions from the new inet table, from Patrick McHardy. * Fix memory overrun in the map that is used to dynamically allocate names from anonymous sets, also from Patrick. * Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table name, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 14:20:00 -08:00
Ilya Dryomov	0ec1d15ec6	libceph: do not dereference a NULL bio pointer Commit `f38a5181d9` ("ceph: Convert to immutable biovecs") introduced a NULL pointer dereference, which broke rbd in -rc1. Fix it. Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-02-07 11:37:07 -08:00
Ilya Dryomov	ff513ace9b	libceph: take map_sem for read in handle_reply() Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race conditions that arise from releasing request_mutex to be able to acquire map_sem in redirect reply case. (Lock ordering is: map_sem, request_mutex, crush_mutex.) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-02-07 10:45:53 -08:00
Ilya Dryomov	0bbfdfe8d2	libceph: factor out logic from ceph_osdc_start_request() Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-02-07 10:45:42 -08:00
John W. Linville	0f96b860bc	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-02-07 13:44:14 -05:00
Patrick McHardy	6d8c00d58e	netfilter: nf_tables: unininline nft_trace_packet() It makes no sense to inline a rarely used function meant for debugging only that is called a total of five times in the main evaluation loop. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-07 17:50:27 +01:00
Pablo Neira Ayuso	62f9c8b40d	netfilter: nf_tables: fix loop checking with end interval elements Fix access to uninitialized data for end interval elements. The element data part is uninitialized in interval end elements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-07 17:21:45 +01:00
Pablo Neira Ayuso	2fb91ddbf8	netfilter: nft_rbtree: fix data handling of end interval elements This patch fixes several things which related to the handling of end interval elements: * Chain use underflow with intervals and map: If you add a rule using intervals+map that introduces a loop, the error path of the rbtree set decrements the chain refcount for each side of the interval, leading to a chain use counter underflow. * Don't copy the data part of the end interval element since, this area is uninitialized and this confuses the loop detection code. * Don't allocate room for the data part of end interval elements since this is unused. So, after this patch the idea is that end interval elements don't have a data part. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>	2014-02-07 14:22:06 +01:00
Pablo Neira Ayuso	bd7fc645da	netfilter: nf_tables: do not allow NFT_SET_ELEM_INTERVAL_END flag and data This combination is not allowed since end interval elements cannot contain data. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>	2014-02-07 14:21:49 +01:00
Eric Dumazet	4a5ab4e224	tcp: remove 1ms offset in srtt computation TCP pacing depends on an accurate srtt estimation. Current srtt estimation is using jiffie resolution, and has an artificial offset of at least 1 ms, which can produce slowdowns when FQ/pacing is used, especially in DC world, where typical rtt is below 1 ms. We are planning a switch to usec resolution for linux-3.15, but in the meantime, this patch removes the 1 ms offset. All we need is to have tp->srtt minimal value of 1 to differentiate the case of srtt being initialized or not, not 8. The problematic behavior was observed on a 40Gbit testbed, where 32 concurrent netperf were reaching 12Gbps of aggregate speed, instead of line speed. This patch also has the effect of reporting more accurate srtt and send rates to iproute2 ss command as in : $ ss -i dst cca2 Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10.244.129.1:56984 10.244.129.2:12865 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send 463.4Mbps rcv_rtt:1 rcv_space:29200 tcp ESTAB 0 390960 10.244.129.1:60247 10.244.129.2:50204 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51 send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200 Reported-by: Vytautas Valancius <valas@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:28:06 -08:00
Cong Wang	dbe173079a	bridge: fix netconsole setup over bridge Commit `93d8bf9fb8` ("bridge: cleanup netpoll code") introduced a check in br_netpoll_enable(), but this check is incorrect for br_netpoll_setup(). This patch moves the code after the check into __br_netpoll_enable() and calls it in br_netpoll_setup(). For br_add_if(), the check is still needed. Fixes: `93d8bf9fb8` ("bridge: cleanup netpoll code") Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Tested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:28:06 -08:00
Eric Dumazet	ed98df3361	net: use __GFP_NORETRY for high order allocations sock_alloc_send_pskb() & sk_page_frag_refill() have a loop trying high order allocations to prepare skb with low number of fragments as this increases performance. Problem is that under memory pressure/fragmentation, this can trigger OOM while the intent was only to try the high order allocations, then fallback to order-0 allocations. We had various reports from unexpected regressions. According to David, setting __GFP_NORETRY should be fine, as the asynchronous compaction is still enabled, and this will prevent OOM from kicking as in : CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0, oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled CFSClientEventm Call Trace: [<ffffffff8043766c>] dump_header+0xe1/0x23e [<ffffffff80437a02>] oom_kill_process+0x6a/0x323 [<ffffffff80438443>] out_of_memory+0x4b3/0x50d [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7 [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0 [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0 [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160 [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0 [<ffffffff802a5037>] inet_sendmsg+0x67/0x100 [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90 [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0 [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430 [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110 [<ffffffff80711472>] system_call_fastpath+0x16/0x1b Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:28:06 -08:00
Sabrina Dubroca	00fe11b3c6	netpoll: fix netconsole IPv6 setup Currently, to make netconsole start over IPv6, the source address needs to be specified. Without a source address, netpoll_parse_options assumes we're setting up over IPv4 and the destination IPv6 address is rejected. Check if the IP version has been forced by a source address before checking for a version mismatch when parsing the destination address. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:28:06 -08:00
Matija Glavinic Pecotic	661dbf3412	net: sctp: fix initialization of local source address on accepted ipv6 sockets commit `efe4208f47`: 'ipv6: make lookups simpler and faster' broke initialization of local source address on accepted ipv6 sockets. Before the mentioned commit receive address was copied along with the contents of ipv6_pinfo in sctp_v6_create_accept_sk. Now when it is moved, it has to be copied separately. This also fixes lksctp's ipv6 regression in a sense that test_getname_v6, TC5 - 'getsockname on a connected server socket' now passes. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:18:06 -08:00
Geert Uytterhoeven	63b5f152eb	ipv4: Fix runtime WARNING in rtmsg_ifa() On m68k/ARAnyM: WARNING: CPU: 0 PID: 407 at net/ipv4/devinet.c:1599 0x316a99() Modules linked in: CPU: 0 PID: 407 Comm: ifconfig Not tainted 3.13.0-atari-09263-g0c71d68014d1 #1378 Stack from 10c4fdf0: 10c4fdf0 002ffabb 000243e8 00000000 008ced6c 00024416 00316a99 0000063f 00316a99 00000009 00000000 002501b4 00316a99 0000063f c0a86117 00000080 c0a86117 00ad0c90 00250a5a 00000014 00ad0c90 00000000 00000000 00000001 00b02dd0 00356594 00000000 00356594 c0a86117 eff6c9e4 008ced6c 00000002 008ced60 0024f9b4 00250b52 00ad0c90 00000000 00000000 00252390 00ad0c90 eff6c9e4 0000004f 00000000 00000000 eff6c9e4 8000e25c eff6c9e4 80001020 Call Trace: [<000243e8>] warn_slowpath_common+0x52/0x6c [<00024416>] warn_slowpath_null+0x14/0x1a [<002501b4>] rtmsg_ifa+0xdc/0xf0 [<00250a5a>] __inet_insert_ifa+0xd6/0x1c2 [<0024f9b4>] inet_abc_len+0x0/0x42 [<00250b52>] inet_insert_ifa+0xc/0x12 [<00252390>] devinet_ioctl+0x2ae/0x5d6 Adding some debugging code reveals that net_fill_ifaddr() fails in put_cacheinfo(skb, ifa->ifa_cstamp, ifa->ifa_tstamp, preferred, valid)) nla_put complains: lib/nlattr.c:454: skb_tailroom(skb) = 12, nla_total_size(attrlen) = 20 Apparently commit `5c766d642b` ("ipv4: introduce address lifetime") forgot to take into account the addition of struct ifa_cacheinfo in inet_nlmsg_size(). Hence add it, like is already done for ipv6. Suggested-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 20:02:15 -08:00
Pablo Neira Ayuso	0165d9325d	netfilter: nf_tables: fix racy rule deletion We may lost race if we flush the rule-set (which happens asynchronously via call_rcu) and we try to remove the table (that userspace assumes to be empty). Fix this by recovering synchronous rule and chain deletion. This was introduced time ago before we had no batch support, and synchronous rule deletion performance was not good. Now that we have the batch support, we can just postpone the purge of old rule in a second step in the commit phase. All object deletions are synchronous after this patch. As a side effect, we save memory as we don't need rcu_head per rule anymore. Cc: Patrick McHardy <kaber@trash.net> Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 11:46:06 +01:00
Patrick McHardy	b8ecbee67c	netfilter: nf_tables: fix log/queue expressions for NFPROTO_INET The log and queue expressions both store the family during ->init() and use it to deliver packets. This is wrong when used in NFPROTO_INET since they should both deliver to the actual AF of the packet, not the dummy NFPROTO_INET. Use the family from the hook ops to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 11:41:38 +01:00
Johannes Berg	fab57a6cc2	mac80211: fix virtual monitor interface iteration During channel context assignment, the interface should be found by interface iteration, so we need to assign the pointer before the channel context. Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:22 +01:00
Johannes Berg	338f977f4e	mac80211: fix fragmentation code, particularly for encryption The "new" fragmentation code (since my rewrite almost 5 years ago) erroneously sets skb->len rather than using skb_trim() to adjust the length of the first fragment after copying out all the others. This leaves the skb tail pointer pointing to after where the data originally ended, and thus causes the encryption MIC to be written at that point, rather than where it belongs: immediately after the data. The impact of this is that if software encryption is done, then a) encryption doesn't work for the first fragment, the connection becomes unusable as the first fragment will never be properly verified at the receiver, the MIC is practically guaranteed to be wrong b) we leak up to 8 bytes of plaintext (!) of the packet out into the air This is only mitigated by the fact that many devices are capable of doing encryption in hardware, in which case this can't happen as the tail pointer is irrelevant in that case. Additionally, fragmentation is not used very frequently and would normally have to be configured manually. Fix this by using skb_trim() properly. Cc: stable@vger.kernel.org Fixes: `2de8e0d999` ("mac80211: rewrite fragmentation") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:21 +01:00
Sujith Manoharan	d4c80d9df6	mac80211: Fix IBSS disconnect Currently, when a station leaves an IBSS network, the corresponding BSS is not dropped from cfg80211 if there are other active stations in the network. But, the small window that is present when trying to determine a station's status based on IEEE80211_IBSS_MERGE_INTERVAL introduces a race. Instead of trying to keep the BSS, always remove it when leaving an IBSS network. There is not much benefit to retain the BSS entry since it will be added with a subsequent join operation. This fixes an issue where a dangling BSS entry causes ath9k to wait for a beacon indefinitely. Cc: <stable@vger.kernel.org> Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:20 +01:00
Emmanuel Grumbach	0297ea17bf	mac80211: release the channel in error path in start_ap When the driver cannot start the AP or when the assignement of the beacon goes wrong, we need to unassign the vif. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:20 +01:00
Johannes Berg	f9d15d162b	cfg80211: send scan results from work queue Due to the previous commit, when a scan finishes, it is in theory possible to hit the following sequence: 1. interface starts being removed 2. scan is cancelled by driver and cfg80211 is notified 3. scan done work is scheduled 4. interface is removed completely, rdev->scan_req is freed, event sent to userspace but scan done work remains pending 5. new scan is requested on another virtual interface 6. scan done work runs, freeing the still-running scan To fix this situation, hang on to the scan done message and block new scans while that is the case, and only send the message from the work function, regardless of whether the scan_req is already freed from interface removal. This makes step 5 above impossible and changes step 6 to be 5. scan done work runs, sending the scan done message As this can't work for wext, so we send the message immediately, but this shouldn't be an issue since we still return -EBUSY. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:19 +01:00
Johannes Berg	a617302c53	cfg80211: fix scan done race When an interface/wdev is removed, any ongoing scan should be cancelled by the driver. This will make it call cfg80211, which only queues a work struct. If interface/wdev removal is quick enough, this can leave the scan request pending and processed only after the interface is gone, causing a use-after-free. Fix this by making sure the scan request is not pending after the interface is destroyed. We can't flush or cancel the work item due to locking concerns, but when it'll run it shouldn't find anything to do. This leaves a potential issue, if a new scan gets requested before the work runs, it prematurely stops the running scan, potentially causing another crash. I'll fix that in the next patch. This was particularly observed with P2P_DEVICE wdevs, likely because freeing them is quicker than freeing netdevs. Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Fixes: `4a58e7c384` ("cfg80211: don't "leak" uncompleted scans") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:19 +01:00
Emmanuel Grumbach	8ffcc704c9	mac80211: avoid deadlock revealed by lockdep sdata->u.ap.request_smps_work can’t be flushed synchronously under wdev_lock(wdev) since ieee80211_request_smps_ap_work itself locks the same lock. While at it, reset the driver_smps_mode when the ap is stopped to its default: OFF. This solves: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-ipeer+ #2 Tainted: G O ------------------------------------------------------- rmmod/2867 is trying to acquire lock: ((&sdata->u.ap.request_smps_work)){+.+...}, at: [<c105b8d0>] flush_work+0x0/0x90 but task is already holding lock: (&wdev->mtx){+.+.+.}, at: [<f9b32626>] cfg80211_stop_ap+0x26/0x230 [cfg80211] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&wdev->mtx){+.+.+.}: [<c10aefa9>] lock_acquire+0x79/0xe0 [<c1607a1a>] mutex_lock_nested+0x4a/0x360 [<fb06288b>] ieee80211_request_smps_ap_work+0x2b/0x50 [mac80211] [<c105cdd8>] process_one_work+0x198/0x450 [<c105d469>] worker_thread+0xf9/0x320 [<c10669ff>] kthread+0x9f/0xb0 [<c1613397>] ret_from_kernel_thread+0x1b/0x28 -> #0 ((&sdata->u.ap.request_smps_work)){+.+...}: [<c10ae9df>] __lock_acquire+0x183f/0x1910 [<c10aefa9>] lock_acquire+0x79/0xe0 [<c105b917>] flush_work+0x47/0x90 [<c105d867>] __cancel_work_timer+0x67/0xe0 [<c105d90f>] cancel_work_sync+0xf/0x20 [<fb0765cc>] ieee80211_stop_ap+0x8c/0x340 [mac80211] [<f9b3268c>] cfg80211_stop_ap+0x8c/0x230 [cfg80211] [<f9b0d8f9>] cfg80211_leave+0x79/0x100 [cfg80211] [<f9b0da72>] cfg80211_netdev_notifier_call+0xf2/0x4f0 [cfg80211] [<c160f2c9>] notifier_call_chain+0x59/0x130 [<c106c6de>] __raw_notifier_call_chain+0x1e/0x30 [<c106c70f>] raw_notifier_call_chain+0x1f/0x30 [<c14f8213>] call_netdevice_notifiers_info+0x33/0x70 [<c14f8263>] call_netdevice_notifiers+0x13/0x20 [<c14f82a4>] __dev_close_many+0x34/0xb0 [<c14f83fe>] dev_close_many+0x6e/0xc0 [<c14f9c77>] rollback_registered_many+0xa7/0x1f0 [<c14f9dd4>] unregister_netdevice_many+0x14/0x60 [<fb06f4d9>] ieee80211_remove_interfaces+0xe9/0x170 [mac80211] [<fb055116>] ieee80211_unregister_hw+0x56/0x110 [mac80211] [<fa3e9396>] iwl_op_mode_mvm_stop+0x26/0xe0 [iwlmvm] [<f9b9d8ca>] _iwl_op_mode_stop+0x3a/0x70 [iwlwifi] [<f9b9d96f>] iwl_opmode_deregister+0x6f/0x90 [iwlwifi] [<fa405179>] __exit_compat+0xd/0x19 [iwlmvm] [<c10b8bf9>] SyS_delete_module+0x179/0x2b0 [<c1613421>] sysenter_do_call+0x12/0x32 Fixes: `687da13223` ("mac80211: implement SMPS for AP") Cc: <stable@vger.kernel.org> [3.13] Reported-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:18 +01:00
Johannes Berg	5a6aa705ff	cfg80211: re-enable 5/10 MHz support Unfortunately I forgot this during the merge window, but the patch seems small enough to go in as a fix. The userspace API bug that was the reason for disabling it has long been fixed. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:18 +01:00
Pontus Fuchs	f12cb28930	nl80211: Reset split_start when netlink skb is exhausted When the netlink skb is exhausted split_start is left set. In the subsequent retry, with a larger buffer, the dump is continued from the failing point instead of from the beginning. This was causing my rt28xx based USB dongle to now show up when running "iw list" with an old iw version without split dump support. Cc: stable@vger.kernel.org Fixes: `3713b4e364` ("nl80211: allow splitting wiphy information in dumps") Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com> [avoid the entire workaround when state->split is set] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:17 +01:00
Eliad Peller	2f617435c3	mac80211: move roc cookie assignment earlier ieee80211_start_roc_work() might add a new roc to existing roc, and tell cfg80211 it has already started. However, this might happen before the roc cookie was set, resulting in REMAIN_ON_CHANNEL (started) event with null cookie. Consequently, it can make wpa_supplicant go out of sync. Fix it by setting the roc cookie earlier. Cc: stable@vger.kernel.org Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2014-02-06 09:55:17 +01:00
Patrick McHardy	05513e9e33	netfilter: nf_tables: add reject module for NFPROTO_INET Add a reject module for NFPROTO_INET. It does nothing but dispatch to the AF-specific modules based on the hook family. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 09:44:18 +01:00
Patrick McHardy	cc4723ca31	netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts Currently the nft_reject module depends on symbols from ipv6. This is wrong since no generic module should force IPv6 support to be loaded. Split up the module into AF-specific and a generic part. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 09:44:10 +01:00
Patrick McHardy	64d46806b6	netfilter: nf_tables: add AF specific expression support For the reject module, we need to add AF-specific implementations to get rid of incorrect module dependencies. Try to load an AF-specific module first and fall back to generic modules. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:05:36 +01:00
Patrick McHardy	51292c0735	netfilter: nft_ct: fix missing NFT_CT_L3PROTOCOL key in validity checks The key was missing in the list of valid keys, add it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:05:33 +01:00
Patrick McHardy	ec2c993568	netfilter: nf_tables: fix potential oops when dumping sets Commit `c9c8e48597` (netfilter: nf_tables: dump sets in all existing families) changed nft_ctx_init_from_setattr() to only look up the address family if it is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute is given, nftables_afinfo_lookup() will dereference the NULL afi pointer. Fix by checking for non-NULL afi and also move a check added by that commit to the proper position. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:04:15 +01:00
Patrick McHardy	53b70287dd	netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() The map that is used to allocate anonymous sets is indeed BITS_PER_BYTE * PAGE_SIZE long. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 17:46:07 +01:00
Pablo Neira Ayuso	e53376bef2	netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-02-05 17:46:06 +01:00
Alexey Dobriyan	829d9315c4	netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() Similar bug fixed in SIP module in `3f509c6` ("netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation"). BUG: unable to handle kernel paging request at 00100104 IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack] ... Call Trace: [<c0244bd8>] ? del_timer+0x48/0x70 [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack] [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack] [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack] [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c024442d>] call_timer_fn+0x1d/0x80 [<c024461e>] run_timer_softirq+0x18e/0x1a0 [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack] [<c023e6f3>] __do_softirq+0xa3/0x170 [<c023e650>] ? __local_bh_enable+0x70/0x70 <IRQ> [<c023e587>] ? irq_exit+0x67/0xa0 [<c0202af6>] ? do_IRQ+0x46/0xb0 [<c027ad05>] ? clockevents_notify+0x35/0x110 [<c066ac6c>] ? common_interrupt+0x2c/0x40 [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0 [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100 [<c02085f8>] ? arch_cpu_idle+0x8/0x30 [<c027314b>] ? cpu_idle_loop+0x4b/0x140 [<c0273258>] ? cpu_startup_entry+0x18/0x20 [<c066056d>] ? rest_init+0x5d/0x70 [<c0813ac8>] ? start_kernel+0x2ec/0x2f2 [<c081364f>] ? repair_env_string+0x5b/0x5b [<c0813269>] ? i386_start_kernel+0x33/0x35 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 17:46:05 +01:00
Andrey Vagin	c6825c0976	netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Lets look at destroy_conntrack: hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get ____nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(&ct->tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>] [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [<ffffffff814841b9>] nf_iterate+0x69/0xb0 <4>[46267.085991] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [<ffffffff81484374>] nf_hook_slow+0x74/0x110 <4>[46267.086133] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [<ffffffff814b5890>] ? dst_output+0x0/0x20 <4>[46267.086277] [<ffffffff81495204>] ip_output+0xa4/0xc0 <4>[46267.086346] [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [<ffffffff81444d67>] sock_sendmsg+0x117/0x140 <4>[46267.086638] [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [<ffffffff81445599>] sys_sendto+0x139/0x190 <4>[46267.087229] [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:18 +01:00
Patrick McHardy	3dd7279fb6	netfilter: nf_tables: fix oops when deleting a chain with references The following commands trigger an oops: # nft -i nft> add table filter nft> add chain filter input { type filter hook input priority 0; } nft> add chain filter test nft> add rule filter input jump test nft> delete chain filter test We need to check the chain use counter before allowing destruction since we might have references from sets or jump rules. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341 Reported-by: Matthew Ife <deleriux1@gmail.com> Tested-by: Matthew Ife <deleriux1@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:17 +01:00
Arturo Borrero	2a53bfb3e0	netfilter: nft_ct: fix unconditional dump of 'dir' attr We want to make sure that the information that we get from the kernel can be reinjected without troubles. The kernel shouldn't return an attribute that is not required, or even prohibited. Dumping unconditionally NFTA_CT_DIRECTION could lead an application in userspace to interpret that the attribute was originally set, while it was not. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:17 +01:00
Andy Zhou	c14e0953ca	openvswitch: Suppress error messages on megaflow updates With subfacets, we'd expect megaflow updates message to carry the original micro flow. If not, EINVAL is returned and kernel logs an error message. Now that the user space subfacet layer is removed, it is expected that flow updates can arrive with a micro flow other than the original. Change the return code to EEXIST and remove the kernel error log message. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-04 22:32:38 -08:00
Pravin B Shelar	e4c6d75954	openvswitch: Fix ovs_flow_free() ovs-lock assert. ovs_flow_free() is not called under ovs-lock during packet execute path (ovs_packet_cmd_execute()). Since packet execute does not touch flow->mask, there is no need to take that lock either. So move assert in case where flow->mask is checked. Found by code inspection. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-04 22:21:45 -08:00
Daniele Di Proietto	45fb9c35b2	openvswitch: Fix ovs_dp_cmd_msg_size() commit `43d4be9cb5` (openvswitch: Allow user space to announce ability to accept unaligned Netlink messages) introduced OVS_DP_ATTR_USER_FEATURES netlink attribute in datapath responses, but the attribute size was not taken into account in ovs_dp_cmd_msg_size(). Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-04 22:21:23 -08:00
Andy Zhou	e80857cce8	openvswitch: Fix kernel panic on ovs_flow_free Both mega flow mask's reference counter and per flow table mask list should only be accessed when holding ovs_mutex() lock. However this is not true with ovs_flow_table_flush(). The patch fixes this bug. Reported-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-04 22:21:17 -08:00
Thomas Graf	aea0bb4f8e	openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed While the zerocopy method is correctly omitted if user space does not support unaligned Netlink messages. The attribute is still not padded correctly as skb_zerocopy() will not ensure padding and the attribute size is no longer pre calculated though nla_reserve() which ensured padding previously. This patch applies appropriate padding if a linear data copy was performed in skb_zerocopy(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2014-02-04 22:21:11 -08:00
Fernando Luis Vazquez Cao	6049f2530c	rtnetlink: fix oops in rtnl_link_get_slave_info_data_size We should check whether rtnetlink link operations are defined before calling get_slave_size(). Without this, the following oops can occur when adding a tap device to OVS. [ 87.839553] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 87.839595] IP: [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220 [...] [ 87.840651] Call Trace: [ 87.840664] [<ffffffff813d694b>] ? rtmsg_ifinfo+0x2b/0x100 [ 87.840688] [<ffffffff813c8340>] ? __netdev_adjacent_dev_insert+0x150/0x1a0 [ 87.840718] [<ffffffff813d6a50>] ? rtnetlink_event+0x30/0x40 [ 87.840742] [<ffffffff814b4144>] ? notifier_call_chain+0x44/0x70 [ 87.840768] [<ffffffff813c8946>] ? __netdev_upper_dev_link+0x3c6/0x3f0 [ 87.840798] [<ffffffffa0678d6c>] ? netdev_create+0xcc/0x160 [openvswitch] [ 87.840828] [<ffffffffa06781ea>] ? ovs_vport_add+0x4a/0xd0 [openvswitch] [ 87.840857] [<ffffffffa0670139>] ? new_vport+0x9/0x50 [openvswitch] [ 87.840884] [<ffffffffa067279e>] ? ovs_vport_cmd_new+0x11e/0x210 [openvswitch] [ 87.840915] [<ffffffff813f3efa>] ? genl_family_rcv_msg+0x19a/0x360 [ 87.840941] [<ffffffff813f40c0>] ? genl_family_rcv_msg+0x360/0x360 [ 87.840967] [<ffffffff813f4139>] ? genl_rcv_msg+0x79/0xc0 [ 87.840991] [<ffffffff813b6cf9>] ? __kmalloc_reserve.isra.25+0x29/0x80 [ 87.841018] [<ffffffff813f2389>] ? netlink_rcv_skb+0xa9/0xc0 [ 87.841042] [<ffffffff813f27cf>] ? genl_rcv+0x1f/0x30 [ 87.841064] [<ffffffff813f1988>] ? netlink_unicast+0xe8/0x1e0 [ 87.841088] [<ffffffff813f1d9a>] ? netlink_sendmsg+0x31a/0x750 [ 87.841113] [<ffffffff813aee96>] ? sock_sendmsg+0x86/0xc0 [ 87.841136] [<ffffffff813c960d>] ? __netdev_update_features+0x4d/0x200 [ 87.841163] [<ffffffff813ca94e>] ? ethtool_get_value+0x2e/0x50 [ 87.841188] [<ffffffff813af269>] ? ___sys_sendmsg+0x359/0x370 [ 87.841212] [<ffffffff813da686>] ? dev_ioctl+0x1a6/0x5c0 [ 87.841236] [<ffffffff8109c210>] ? autoremove_wake_function+0x30/0x30 [ 87.841264] [<ffffffff813ac59d>] ? sock_do_ioctl+0x3d/0x50 [ 87.841288] [<ffffffff813aca68>] ? sock_ioctl+0x1e8/0x2c0 [ 87.841312] [<ffffffff811934bf>] ? do_vfs_ioctl+0x2cf/0x4b0 [ 87.841335] [<ffffffff813afeb9>] ? __sys_sendmsg+0x39/0x70 [ 87.841362] [<ffffffff814b86f9>] ? system_call_fastpath+0x16/0x1b [ 87.841386] Code: c0 74 10 48 89 ef ff d0 83 c0 07 83 e0 fc 48 98 49 01 c7 48 89 ef e8 d0 d6 fe ff 48 85 c0 0f 84 df 00 00 00 48 8b 90 08 07 00 00 <48> 8b 8a a8 00 00 00 31 d2 48 85 c9 74 0c 48 89 ee 48 89 c7 ff [ 87.841529] RIP [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220 [ 87.841555] RSP <ffff880221aa5950> [ 87.841569] CR2: 00000000000000a8 [ 87.851442] ---[ end trace e42ab217691b4fc2 ]--- Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-04 20:28:51 -08:00
Shlomo Pongratz	a664a4f7aa	net/ipv4: Use proper RCU APIs for writer-side in udp_offload.c RCU writer side should use rcu_dereference_protected() and not rcu_dereference(), fix that. This also removes the "suspicious RCU usage" warning seen when running with CONFIG_PROVE_RCU. Also, don't use rcu_assign_pointer/rcu_dereference for pointers which are invisible beyond the udp offload code. Fixes: `b582ef0` ('net: Add GRO support for UDP encapsulating protocols') Reported-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-04 20:01:55 -08:00
Michal Kubecek	2a971354e7	ipvs: fix AF assignment in ip_vs_conn_new() If a fwmark is passed to ip_vs_conn_new(), it is passed in vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we may copy only first 4 bytes of an IPv6 address into cp->daddr. Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-02-04 21:13:47 +09:00
Eric Dumazet	b045d37bd6	ip_tunnel: fix panic in ip_tunnel_xmit() Setting rt variable to NULL at the beginning of ip_tunnel_xmit() missed possible use of this variable as a scratch value. Also fixes a possible dst leak in tunnel_dst_check() : If we had to call tunnel_dst_reset(), we forgot to release the reference on dst. Merges tunnel_dst_get()/tunnel_dst_check() into a single tunnel_rtable_get() function for clarity. Many thanks to Tommi for his report and tests. Fixes: `7d442fab0a` ("ipv4: Cache dst in tunnels") Reported-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tommi Rantala <tt.rantala@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-03 13:01:48 -08:00
Ilya Dryomov	c172ec5c8d	libceph: fix error handling in ceph_osdc_init() msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-02-03 19:20:59 +02:00
Linus Torvalds	8a1f006ad3	NFS client bugfixes for Linux 3.14 Highlights: - Fix several races in nfs_revalidate_mapping - NFSv4.1 slot leakage in the pNFS files driver - Stable fix for a slot leak in nfs40_sequence_done - Don't reject NFSv4 servers that support ACLs with only ALLOW aces -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJS7Bb+AAoJEGcL54qWCgDyDuQP/17nKR5e6MLhixcAbvlcH+pN 8CGolAM3HmRXDWUW/PkBH3UguG8Tzx1Ex26vIxipPeTSwZabf6194Twj6L97DEGZ 2SouD158BW1TkAbhEN/alKB/4ZCPos05iXjZkrL7MRff+8FD0UvWR2pBT1F2jQdY ZftG76Q72qhZHfH07ZMxM/v4Oy2Ge98RDD35gfuuqMSjHpmN9tiB55PeheW33LVY fu6I/JEwmlJpgy2qUcDv7v0V4mDpjC7XbcjjHpMHL8zp/C5Rx/rdgt9OQPlwmjdV FD8MWNXLc5TWxIouLDFPVUv3WZPjyu449QHS9Wc95fSqsHcdl4j4SwLAoSvUIdHt vDI5PtWhw3WAezbtiuCQnT0xcoIOn5bLjOVP13taDcV9vlZLcFlyOpZ5gVE4/Yju zm4sCW2+imDc74ERGa4fukF6QhzzAVmD8RlCJwuNzwCfXiZ36+xSanMYiPoUiwLL OVNgymrm0fe7GVFQKWN2D+Vr68OQEmARO+KfA3UzP5rQV+9CU8zSHjbcoRWZ59QG VahOS5WDLQSrMp8W37yAHH9IiAWveAAKJJTHlOniRqH90QYPgyW18fTo7YcpW313 AQGFgr/1n4t27MWRLu5rdoN5v8+kwNi0UV6oboNIPoP1v15NkEMvc7HKFj5M883R qEYfe5wqN/eRNj68NT/+ =B7f0 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights: - Fix several races in nfs_revalidate_mapping - NFSv4.1 slot leakage in the pNFS files driver - Stable fix for a slot leak in nfs40_sequence_done - Don't reject NFSv4 servers that support ACLs with only ALLOW aces" * tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: initialize the ACL support bits to zero. NFSv4.1: Cleanup NFSv4.1: Clean up nfs41_sequence_done NFSv4: Fix a slot leak in nfs40_sequence_done NFSv4.1 free slot before resending I/O to MDS nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING NFS: Fix races in nfs_revalidate_mapping sunrpc: turn warn_gssd() log message into a dprintk() NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping nfs: handle servers that support only ALLOW ACE type.	2014-01-31 15:39:07 -08:00
PaX Team	2def2ef2ae	x86, x32: Correct invalid use of user timespec in the kernel The x32 case for the recvmsg() timout handling is broken: asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user mmsg, unsigned int vlen, unsigned int flags, struct compat_timespec __user timeout) { int datagrams; struct timespec ktspec; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (COMPAT_USE_64BIT_TIME) return __sys_recvmmsg(fd, (struct mmsghdr __user )mmsg, vlen, flags \| MSG_CMSG_COMPAT, (struct timespec ) timeout); ... The timeout pointer parameter is provided by userland (hence the __user annotation) but for x32 syscalls it's simply cast to a kernel pointer and is passed to __sys_recvmmsg which will eventually directly dereference it for both reading and writing. Other callers to __sys_recvmmsg properly copy from userland to the kernel first. The bug was introduced by commit `ee4fa23c4b` ("compat: Use COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels since 3.4 (and perhaps vendor kernels if they backported x32 support along with this code). Note that CONFIG_X86_X32_ABI gets enabled at build time and only if CONFIG_X86_X32 is enabled and ld can build x32 executables. Other uses of COMPAT_USE_64BIT_TIME seem fine. This addresses CVE-2014-0038. Signed-off-by: PaX Team <pageexec@freemail.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-30 18:44:13 -08:00
David S. Miller	65b80cae7a	linux-can-fixes-for-3.14-20140129 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEUEABECAAYFAlLpas0ACgkQjTAFq1RaXHN8KwCeM2I+jaknMl9xJIauEG4FHhwj UvoAlRLtN1IlYbnhta5iHmFycdyN158= =M6Dq -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-3.14-20140129' of git://gitorious.org/linux-can/linux-can linux-can-fixes-for-3.14-20140129 Marc Kleine-Budde says: ==================== Arnd Bergmann provides a fix for the flexcan driver, enabling compilation on all combinations of big and little endian on ARM and PowerPc. A patch by Ira W. Snyder fixes uninitialized variable warnings in the janz-ican3 driver. Rostislav Lisovy contributes a patch to propagate the SO_PRIORITY of raw sockets to skbs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-30 16:48:17 -08:00

1 2 3 4 5 ...

31611 Commits