linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
David Ahern	74b20582ac	net: l3mdev: Add hook in ip and ipv6 Currently the VRF driver uses the rx_handler to switch the skb device to the VRF device. Switching the dev prior to the ip / ipv6 layer means the VRF driver has to duplicate IP/IPv6 processing which adds overhead and makes features such as retaining the ingress device index more complicated than necessary. This patch moves the hook to the L3 layer just after the first NF_HOOK for PRE_ROUTING. This location makes exposing the original ingress device trivial (next patch) and allows adding other NF_HOOKs to the VRF driver in the future. dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb with the switched device through the packet taps to maintain current behavior (tcpdump can be used on either the vrf device or the enslaved devices). Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 19:31:40 -04:00
Nicolas Dichtel	ca4aa976f0	ipv6: fix 4in6 tunnel receive path Protocol for 4in6 tunnel is IPPROTO_IPIP. This was wrongly changed by the last cleanup. CC: Tom Herbert <tom@herbertland.com> Fixes: `0d3c703a9d` ("ipv6: Cleanup IPv6 tunnel receive path") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 19:23:55 -04:00
Jiri Benc	e271c7b442	gre: do not keep the GRE header around in collect medata mode For ipgre interface in collect metadata mode, it doesn't make sense for the interface to be of ARPHRD_IPGRE type. The outer header of received packets is not needed, as all the information from it is present in metadata_dst. We already don't set ipgre_header_ops for collect metadata interfaces, which is the only consumer of mac_header pointing to the outer IP header. Just set the interface type to ARPHRD_NONE in collect metadata mode for ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset mac_header. Fixes: `a64b04d86d` ("gre: do not assign header_ops in collect metadata mode") Fixes: `2e15ea390e` ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 15:16:32 -04:00
Joe Stringer	16ec3d4fbb	openvswitch: Fix cached ct with helper. When using conntrack helpers from OVS, a common configuration is to perform a lookup without specifying a helper, then go through a firewalling policy, only to decide to attach a helper afterwards. In this case, the initial lookup will cause a ct entry to be attached to the skb, then the later commit with helper should attach the helper and confirm the connection. However, the helper attachment has been missing. If the user has enabled automatic helper attachment, then this issue will be masked as it will be applied in init_conntrack(). It is also masked if the action is executed from ovs_packet_cmd_execute() as that will construct a fresh skb. This patch fixes the issue by making an explicit call to try to assign the helper if there is a discrepancy between the action's helper and the current skb->nfct. Fixes: `cae3a26275` ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 15:14:56 -04:00
Lawrence Brakmo	756ee1729b	tcp: replace cnt & rtt with struct in pkts_acked() Replace 2 arguments (cnt and rtt) in the congestion control modules' pkts_acked() function with a struct. This will allow adding more information without having to modify existing congestion control modules (tcp_nv in particular needs bytes in flight when packet was sent). As proposed by Neal Cardwell in his comments to the tcp_nv patch. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 14:43:19 -04:00
Jamal Hadi Salim	4e8c861550	net sched: ife action fix late binding The process below was broken and is fixed with this patch. //add an ife action and give it an instance id of 1 sudo tc actions add action ife encode \ type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1 //create a filter which binds to ife action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:11 action ife index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
Jamal Hadi Salim	5e1567aeb7	net sched: skbedit action fix late binding The process below was broken and is fixed with this patch. //add a skbedit action and give it an instance id of 1 sudo tc actions add action skbedit mark 10 index 1 //create a filter which binds to skbedit action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action skbedit index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
Jamal Hadi Salim	0e5538ab2b	net sched: simple action fix late binding The process below was broken and is fixed with this patch. //add a simple action and give it an instance id of 1 sudo tc actions add action simple sdata "foobar" index 1 //create a filter which binds to simple action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action simple index 1 Message before fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
Jamal Hadi Salim	87dfbdc6c7	net sched: mirred action fix late binding The process below was broken and is fixed with this patch. //add an mirred action and give it an instance id of 1 sudo tc actions add action mirred egress mirror dev $MDEV index 1 //create a filter which binds to mirred action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action mirred index 1 Message before bug fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
Jamal Hadi Salim	a57f19d30b	net sched: ipt action fix late binding This was broken and is fixed with this patch. //add an ipt action and give it an instance id of 1 sudo tc actions add action ipt -j mark --set-mark 2 index 1 //create a filter which binds to ipt action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\ match ip dst 17.0.0.1/32 flowid 1:10 action ipt index 1 Message before bug fix was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
Jamal Hadi Salim	5026c9b1ba	net sched: vlan action fix late binding Late vlan action binding was broken and is fixed with this patch. //add a vlan action to pop and give it an instance id of 1 sudo tc actions add action vlan pop index 1 //create filter which binds to vlan action id 1 sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 \ match ip dst 17.0.0.1/32 flowid 1:1 action vlan index 1 current message(before bug fix) was: RTNETLINK answers: Invalid argument We have an error talking to the kernel Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:50:15 -04:00
David S. Miller	cf88585b1d	Included changes: - remove useless skb size check in batadv_interface_rx - basic netns support introduced by Andrew Lunn: - prevent virtual interface from changing netns by setting NETIF_F_NETNS_LOCAL - create virtual interface within the netns of the first hard-interface - introduce detection of complex bridge loops and report event to the user (via udev) when the Bridge Loop Avoidance mechanism can't prevent them - minor reference counting bugfixes for the hard_iface object that couldn't make it via the net tree - use kref_get() instead of kref_get_unless_zero() to make reference counting bug more visible - use batadv_compare_eth() all over the code when possible instead of plain memcmp() - minor code cleanup and style adjustments -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXMjUdAAoJEJ4aZjxxc6bKhDsP/i3rRwoppaDK3gxuDkXqrqOj AYhMJYulcizK2PPCzj24bZSttwcKDNWpjR67rBgXY02NpqSsW+qYFnCm+KokzgC0 3NGV+lC6oCwzowP3FJtW06BCNOybnbR84Fw6JCRL5P4mKoGnhn4jAOIYlYnk4K7z ByhEallLD84h5oo2Td8i06QzxzaO6lkYaLrAVBQnd+Cs27foxjMOO0gasyCvUkl7 Jsm2n0NSEN6SYv0smLhiPHx9oF66MBAxNRVS6dL/6Ko/vSHNy1AGQgx0SztlHzCQ K6JCpOZn5CExvKZUMXQXOrSkpdjIxVh+EGnGBAGyNswqBg49Y3K45FqNz4b+R0Pg ckPzc7UAo8z+ZHzPz4mzx/596sw9xG4jLS6aHkjwet7kLXj/ShP//HqGwfHTeeOY w4a2ers6ByCq2PwKYz34s/sDvC2+p++ZdyJF+qOjJpNjbH1B+mDId8M1f9H/9rvF //EZhCcdlpqLIxvQKDMADQRpcwTbAPglsBuHs/1Sk1zjRK1NDRPw+46pUUQOb8ND ydAs0JVUOgSKHZT+3b5l85poRYpp8gX8nVrvF2DNJBqmBTvSkAbZpNPiSy6fNWQV he3Q9RndqQq2OPffG1ijte7G1m9J8lCh7Nzl6Ee6vGsw4GzzkKIx9qHHCtXhYpOE kdg5ttgUK5sfgjsn3ANp =CB1D -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== Included changes: - remove useless skb size check in batadv_interface_rx - basic netns support introduced by Andrew Lunn: - prevent virtual interface from changing netns by setting NETIF_F_NETNS_LOCAL - create virtual interface within the netns of the first hard-interface - introduce detection of complex bridge loops and report event to the user (via udev) when the Bridge Loop Avoidance mechanism can't prevent them - minor reference counting bugfixes for the hard_iface object that couldn't make it via the net tree - use kref_get() instead of kref_get_unless_zero() to make reference counting bug more visible - use batadv_compare_eth() all over the code when possible instead of plain memcmp() - minor code cleanup and style adjustments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 23:36:14 -04:00
Sowmini Varadhan	953abb3823	skbuff: remove unused variable `doff' There are two instances of an unused variable, `doff' added by commit `6fa01ccd88` ("skbuff: Add pskb_extract() helper function") in pskb_carve_inside_header() and pskb_carve_inside_nonlinear(). Remove these instances, they are not used. Reported by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 16:05:12 -04:00
Tom Herbert	1ddb6b71b9	ila: ipv6/ila: fix nlsize calculation for lwtunnel The handler 'ila_fill_encap_info' adds two attributes: ILA_ATTR_LOCATOR and ILA_ATTR_CSUM_MODE. nla_total_size_64bit() must be use for ILA_ATTR_LOCATOR. Also, do nla_put_u8 instead of nla_put_u64 for ILA_ATTR_CSUM_MODE. Fixes: `f13a82d87b` ("ipv6: use nla_put_u64_64bit()") Fixes: `90bfe662db` ("ila: add checksum neutral ILA translations") Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 16:00:25 -04:00
Eric Dumazet	10a81980fc	tcp: refresh skb timestamp at retransmit time In the very unlikely case __tcp_retransmit_skb() can not use the cloning done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing the copy and transmit, otherwise TCP TS val will be an exact copy of original transmit. Fixes: `7faee5c0d5` ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 15:58:41 -04:00
Antonio Quartulli	676970e55b	batman-adv: use batadv_compare_eth when possible When comparing Ethernet address it is better to use the more generic batadv_compare_eth. The latter is also optimised for architectures having a fast unaligned access. Signed-off-by: Antonio Quartulli <a@unstable.cc> [sven@narfation.org: fix conflicts with current version] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2016-05-10 18:28:54 +08:00
Marek Lindner	9d1601ef43	batman-adv: replace ethertype variable with ETH_P_BATMAN for readability Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:54 +08:00
Sven Eckelmann	4b426b108a	batman-adv: Use bool as return type for boolean functions It is easier to understand that the returned value of a specific function doesn't have to be 0 when the functions was successful when the actual return type is bool. This is especially true when all surrounding functions with return type int use negative values to return the error code. Reported-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:54 +08:00
Sven Eckelmann	f0b94ebccd	batman-adv: Use kref_get for _batadv_update_route _batadv_update_route requires that the caller already has a valid reference for neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	17a8691502	batman-adv: Use kref_get for hard_iface subfunctions The callers of the functions using batadv_hard_iface objects already make sure that they hold a valid reference. The subfunctions don't have to check whether the reference counter is > 0 because this was checked by the callers. The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	c3ba37a778	batman-adv: Use kref_get for batadv_gw_node_add batadv_gw_node_add requires that the caller already has a valid reference for orig_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	a08d497d67	batman-adv: Use kref_get for batadv_gw_select batadv_gw_select requires that the caller already has a valid reference for new_gw_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	0de32ceee1	batman-adv: Use kref_get for batadv_nc_get_nc_node batadv_nc_get_nc_node requires that the caller already has a valid reference for orig_neigh_node. It is therefore not possible that it has an reference counter of 0 and was still given to this function The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	c9dad805e9	batman-adv: Use kref_get for batadv_tvlv_container_get batadv_tvlv_container_get requires that tvlv.container_list_lock is held by the caller. It is therefore not possible that an item in tvlv.container_list has an reference counter of 0 and is still in the list The kref_get function instead WARNs (with debug information) when the reference counter would still be 0. This makes a bug in batman-adv better visible because kref_get_unless_zero would have ignored this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	d7d6de9530	batman-adv: Increase hard_iface refcnt for ptype The hard_iface is referenced in the packet_type for batman-adv. Increase the refcounter of the hard_interface for it to have an explicit reference for it in case this functionality gets refactorted and the currently used implicit reference for it will be removed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	4fe56e60ac	batman-adv: Check hard_iface refcnt when receiving skb The receive function may start processing an incoming packet while the hard_iface is shut down in a different context. All called functions called with the batadv_hard_iface object belonging to the incoming interface would have to check whether the reference counter is still > 0. This is rather error-prone because this check can be forgotten easily. Instead check the reference counter when receiving the object to make sure that all called functions have a valid reference. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Sven Eckelmann	273534468f	batman-adv: Check hard_iface refcnt before calling function The batadv_hardif_list list is checked in many situations and the items in this list are given to specialized functions to modify the routing behavior. At the moment each of these called functions has to check itself whether the received batadv_hard_iface has a refcount > 0 before it can increase the reference counter and use it in other objects. This can easily lead to problems because it is not easily visible where all callers of a function got the batadv_hard_iface object from and whether they already hold a valid reference. Checking the reference counter directly before calling a subfunction with a pointer from the batadv_hardif_list avoids this problem. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:28:29 +08:00
Simon Wunderlich	cd9c7bfbba	batman-adv: add detection for complex bridge loops There are network setups where the current bridge loop avoidance can't detect bridge loops. The minimal setup affected would consist of two LANs and two separate meshes, connected in a ring like that: A...(mesh1)...B \| \| (LAN1) (LAN2) \| \| C...(mesh2)...D Since both the meshes and backbones are separate, the bridge loop avoidance has not enough information to detect and avoid the loop in this case. Even if these scenarios can't be fixed easily, these kind of loops can be detected. This patch implements a periodic check (running every 60 seconds for now) which sends a broadcast frame with a random MAC address on each backbone VLAN. If a broadcast frame with the same MAC address is received shortly after on the mesh, we know that there must be a loop and report that incident as well as throw an uevent to let others handle that problem. Signed-off-by: Simon Wunderlich <simon.wunderlich@open-mesh.com> [sven@narfation.org: fix conflicts with current version] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:26:45 +08:00
Andrew Lunn	2cd45a0671	batman-adv: Create batman soft interfaces within correct netns. When creating a soft interface, create it in the same netns as the hard interface. Replace all references to init_net with the correct name space for the interface being manipulated. Suggested-by: Daniel Ehlers <danielehlers@mindeye.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:26:44 +08:00
Andrew Lunn	0d21cdaa9b	batman-adv: NETIF_F_NETNS_LOCAL feature to prevent netns moves The batX soft interface should not be moved between network name spaces. This is similar to bridges, bonds, tunnels, which are not allowed to move between network namespaces. Suggested-by: Daniel Ehlers <danielehlers@mindeye.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:26:44 +08:00
Sven Eckelmann	7142fc1072	batman-adv: Remove hdr_size skb size check in batadv_interface_rx The callers of batadv_interface_rx have to make sure that enough data can be pulled from the skb when they read the batman-adv header. The only two functions using it are either calling pskb_may_pull with hdr_size directly (batadv_recv_bcast_packet) or indirectly via batadv_check_unicast_packet (batadv_recv_unicast_packet). Reported-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:26:43 +08:00
Sven Eckelmann	6535db56d5	batman-adv: Remove unused parameter recv_if of batadv_interface_rx Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>	2016-05-10 18:26:43 +08:00
Arnd Bergmann	c047c3b1af	netfilter: conntrack: remove uninitialized shadow variable A recent commit introduced an unconditional use of an uninitialized variable, as reported in this gcc warning: net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_confirm': net/netfilter/nf_conntrack_core.c:632:33: error: 'ctinfo' may be used uninitialized in this function [-Werror=maybe-uninitialized] bytes = atomic64_read(&counter[CTINFO2DIR(ctinfo)].bytes); ^ net/netfilter/nf_conntrack_core.c:628:26: note: 'ctinfo' was declared here enum ip_conntrack_info ctinfo; The problem is that a local variable shadows the function parameter. This removes the local variable, which looks like what Pablo originally intended. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `71d8c47fc6` ("netfilter: conntrack: introduce clash resolution on insertion race") Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 01:04:04 -04:00
David S. Miller	adc0a8bfdc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contain Netfilter simple fixes for your net tree, two one-liner and one two-liner: 1) Oneliner to fix missing spinlock definition that triggers 'BUG: spinlock bad magic on CPU#' when spinlock debugging is enabled, from Florian Westphal. 2) Fix missing workqueue cancelation on IDLETIMER removal, from Liping Zhang. 3) Fix insufficient validation of netlink of NFACCT_QUOTA in nfnetlink_acct, from Phil Turnbull. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:50:20 -04:00
Tom Herbert	b45bd1d787	ip6_gre: Use correct flags for reading TUNNEL_SEQ Fix two spots where o_flags in a tunnel are being compared to GRE_SEQ instead of TUNNEL_SEQ. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:39:27 -04:00
Tom Herbert	4b4a0c9143	ip6: Don't set transport header in IPv6 tunneling We only need to reset network header here. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:39:27 -04:00
Tom Herbert	d27bff9ca2	ip6_gre: Set inner protocol correctly in __gre6_xmit Need to use adjusted protocol value for setting inner protocol. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:39:27 -04:00
Tom Herbert	f41fe3c2ac	gre6: Fix flag translations GRE for IPv6 does not properly translate for GRE flags to tunnel flags and vice versa. This patch fixes that. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:39:27 -04:00
Tom Herbert	db2ec95d1b	ip6_gre: Fix MTU setting In ip6gre_tnl_link_config set t->tun_len and t->hlen correctly for the configuration. For hard_header_len and mtu calculation include IPv6 header and encapsulation overhead. In ip6gre_tunnel_init_common set t->tun_len and t->hlen correctly for the configuration. Revert to setting hard_header_len instead of needed_headroom. Tested: ./ip link add name tun8 type ip6gretap remote \ 2401:db00:20:911a:face:0:27:0 local \ 2401:db00:20:911a:face:0:25:0 ttl 225 Gives MTU of 1434. That is equal to 1500 - 40 - 14 - 4 - 8. ./ip link add name tun8 type ip6gretap remote \ 2401:db00:20:911a:face:0:27:0 local \ 2401:db00:20:911a:face:0:25:0 ttl 225 okey 123 Gives MTU of 1430. That is equal to 1500 - 40 - 14 - 4 - 8 - 4. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-10 00:39:27 -04:00
Kangjie Lu	79e4865032	net: fix a kernel infoleak in x25 module Stack object "dte_facilities" is allocated in x25_rx_call_request(), which is supposed to be initialized in x25_negotiate_facilities. However, 5 fields (8 bytes in total) are not initialized. This object is then copied to userland via copy_to_user, thus infoleak occurs. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 22:45:33 -04:00
David Ahern	1ff23beebd	net: l3mdev: Allow send on enslaved interface Allow udp and raw sockets to send by oif that is an enslaved interface versus the l3mdev/VRF device. For example, this allows BFD to use ifindex from IP_PKTINFO on a receive to send a response without the need to convert to the VRF index. It also allows ping and ping6 to work when specifying an enslaved interface (e.g., ping -I swp1 <ip>) which is a natural use case. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 22:33:52 -04:00
David Ahern	4a65896f94	net: l3mdev: Move get_saddr and rt6_dst Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only user and keep the l3mdev_get_rt6_dst name for consistency with other hooks. A follow-on patch adds more code to these functions making them long for inlined functions. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 22:33:52 -04:00
David S. Miller	e800072c18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 15:59:24 -04:00
David S. Miller	e8ed77dfa9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following large patchset contains Netfilter updates for your net-next tree. My initial intention was to send you this in two goes but when I looked back twice I already had this burden on top of me. Several updates for IPVS from Marco Angaroni: 1) Allow SIP connections originating from real-servers to be load balanced by the SIP persistence engine as is already implemented in the other direction. 2) Release connections immediately for One-packet-scheduling (OPS) in IPVS, instead of making it via timer and rcu callback. 3) Skip deleting conntracks for each one packet in OPS, and don't call nf_conntrack_alter_reply() since no reply is expected. 4) Enable drop on exhaustion for OPS + SIP persistence. Miscelaneous conntrack updates from Florian Westphal, including fix for hash resize: 5) Move conntrack generation counter out of conntrack pernet structure since this is only used by the init_ns to allow hash resizing. 6) Use get_random_once() from packet path to collect hash random seed instead of our compound. 7) Don't disable BH from ____nf_conntrack_find() for statistics, use NF_CT_STAT_INC_ATOMIC() instead. 8) Fix lookup race during conntrack hash resizing. 9) Introduce clash resolution on conntrack insertion for connectionless protocol. Then, Florian's netns rework to get rid of per-netns conntrack table, thus we use one single table for them all. There was consensus on this change during the NFWS 2015 and, on top of that, it has recently been pointed as a source of multiple problems from unpriviledged netns: 11) Use a single conntrack hashtable for all namespaces. Include netns in object comparisons and make it part of the hash calculation. Adapt early_drop() to consider netns. 12) Use single expectation and NAT hashtable for all namespaces. 13) Use a single slab cache for all namespaces for conntrack objects. 14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet conntrack counter tells us the table is empty (ie. equals zero). Fixes for nf_tables interval set element handling, support to set conntrack connlabels and allow set names up to 32 bytes. 15) Parse element flags from element deletion path and pass it up to the backend set implementation. 16) Allow adjacent intervals in the rbtree set type for dynamic interval updates. 17) Add support to set connlabel from nf_tables, from Florian Westphal. 18) Allow set names up to 32 bytes in nf_tables. Several x_tables fixes and updates: 19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch from Andrzej Hajda. And finally, miscelaneous netfilter updates such as: 20) Disable automatic helper assignment by default. Note this proc knob was introduced by `a900689264` ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") 4 years ago to start moving towards explicit conntrack helper configuration via iptables CT target. 21) Get rid of obsolete and inconsistent debugging instrumentation in x_tables. 22) Remove unnecessary check for null after ip6_route_output(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 15:02:58 -04:00
Florian Westphal	0c5366b3a8	netfilter: conntrack: use single slab cache An earlier patch changed lookup side to also net_eq() namespaces after obtaining a reference on the conntrack, so a single kmemcache can be used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-09 16:45:50 +02:00
Florian Westphal	a76ae1c855	netfilter: conntrack: use a single nat bysource table for all namespaces We already include netns address in the hash, so we only need to use net_eq in find_appropriate_src and can then put all entries into same table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-09 16:45:49 +02:00
Florian Westphal	464c38556e	netfilter: conntrack: make netns address part of nat bysrc hash Will be needed soon when we place all in the same hash table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-09 16:45:49 +02:00
Eric Dumazet	8a3a4c6e7b	net: make sch_handle_ingress() drop monitor ready TC_ACT_STOLEN is used when ingress traffic is mirred/redirected to say ifb. Packet is not dropped, but consumed. Only TC_ACT_SHOT is a clear indication something went wrong. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-08 23:53:22 -04:00
Eric Dumazet	95b58430ab	fq_codel: add memory limitation per queue On small embedded routers, one wants to control maximal amount of memory used by fq_codel, instead of controlling number of packets or bytes, since GRO/TSO make these not practical. Assuming skb->truesize is accurate, we have to keep track of skb->truesize sum for skbs in queue. This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute. I chose a default value of 32 MBytes, which looks reasonable even for heavy duty usages. (Prior fq_codel users should not be hurt when they upgrade their kernels) Two fields are added to tc_fq_codel_qd_stats to report : - Current memory usage - Number of drops caused by memory limits # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M .. # tc -s -d qd sh dev eth1 qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0 requeues 21705223) rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414 ecn_mark 0 memory_used 4190048 drop_overmemory 4994406 new_flows_len 1 old_flows_len 177 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Dave Täht <dave.taht@gmail.com> Cc: Sebastian Möller <moeller0@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-08 23:49:38 -04:00
Courtney Cavin	bdabad3e36	net: Add Qualcomm IPC router Add an implementation of Qualcomm's IPC router protocol, used to communicate with service providing remote processors. Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> [bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-08 23:46:14 -04:00
Jarno Rajahalme	229740c631	udp_offload: Set encapsulation before inner completes. UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner _complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner _complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 18:25:26 -04:00
Jarno Rajahalme	43b8448cd7	udp_tunnel: Remove redundant udp_tunnel_gro_complete(). The setting of the UDP tunnel GSO type is already performed by udp[46]_gro_complete(). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 18:25:26 -04:00
Eric Dumazet	47dcc20a39	ipv4: tcp: ip_send_unicast_reply() is not BH safe I forgot that ip_send_unicast_reply() is not BH safe (yet). Disabling preemption before calling it was not a good move. Fixes: `c10d9310ed` ("tcp: do not assume TCP code is non preemptible") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andres Lagar-Cavilla <andreslc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 16:02:43 -04:00
Alexei Starovoitov	db58ba4592	bpf: wire in data and data_end for cls_act_bpf allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers. The bpf helpers that change skb->data need to update data_end pointer as well. The verifier checks that programs always reload data, data_end pointers after calls to such bpf helpers. We cannot add 'data_end' pointer to struct qdisc_skb_cb directly, since it's embedded as-is by infiniband ipoib, so wrapper struct is needed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 16:01:54 -04:00
David Ahern	b3b4663c97	net: vrf: Create FIB tables on link create Tables have to exist for VRFs to function. Ensure they exist when VRF device is created. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 15:51:47 -04:00
David Ahern	1d2f7b2d95	net: ipv6: tcp reset, icmp need to consider L3 domain Responses for packets to unused ports are getting lost with L3 domains. IPv4 has ip_send_unicast_reply for sending TCP responses which accounts for L3 domains; update the IPv6 counterpart tcp_v6_send_response. For icmp the L3 master check needs to be moved up in icmp6_send to properly respond to UDP packets to a port with no listener. Fixes: `ca254490c8` ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 15:49:07 -04:00
Linus Lüssing	856ce5d083	bridge: fix igmp / mld query parsing With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMP and MLD query parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header. If there is a querier somewhere else, then this either causes the multicast snooping to stay disabled even though it could be enabled. Or, if we have the querier enabled too, then this can create unnecessary IGMP / MLD query messages on the link. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: `9afd85c9e4` ("net: Export IGMP/MLD message validation code") Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 12:55:13 -04:00
Pablo Neira Ayuso	bbe848ec1a	Merge tag 'ipvs2-for-v4.7' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.7 please consider these enhancements to the IPVS. They allow its DoS mitigation strategy effective in conjunction with the SIP persistence engine. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-06 11:50:52 +02:00
Florian Westphal	0a93aaedc4	netfilter: conntrack: use a single expectation table for all namespaces We already include netns address in the hash and compare the netns pointers during lookup, so even if namespaces have overlapping addresses entries will be spread across the expectation table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-06 11:50:01 +02:00
Florian Westphal	a9a083c387	netfilter: conntrack: make netns address part of expect hash Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-06 11:50:01 +02:00
Florian Westphal	03d7dc5cdf	netfilter: conntrack: check netns when walking expect hash Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-06 11:50:01 +02:00
Marco Angaroni	698e2a8dca	ipvs: make drop_entry protection effective for SIP-pe DoS protection policy that deletes connections to avoid out of memory is currently not effective for SIP-pe plus OPS-mode for two reasons: 1) connection templates (holding SIP call-id) are always skipped in ip_vs_random_dropentry() 2) in_pkts counter (used by drop_entry algorithm) is not incremented for connection templates This patch addresses such problems with the following changes: a) connection templates associated (via their dest) to virtual-services configured in OPS mode are included in ip_vs_random_dropentry() monitoring. This applies to SIP-pe over UDP (which requires OPS mode), but is more general principle: when OPS is controlled by templates memory can be used only by templates themselves, since OPS conns are deleted after packet is forwarded. b) OPS connections, if controlled by a template, cause increment of in_pkts counter of their template. This is already happening but only in case director is in master-slave mode (see ip_vs_sync_conn()). Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-05-06 16:26:23 +09:00
Nikolay Aleksandrov	31ca0458a6	net: bridge: fix old ioctl unlocked net device walk get_bridge_ifindices() is used from the old "deviceless" bridge ioctl calls which aren't called with rtnl held. The comment above says that it is called with rtnl but that is not really the case. Here's a sample output from a test ASSERT_RTNL() which I put in get_bridge_ifindices and executed "brctl show": [ 957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30) [ 957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G W O 4.6.0-rc4+ #157 [ 957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 957.423009] 0000000000000000 ffff880058adfdf0 ffffffff8138dec5 0000000000000400 [ 957.423009] ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32 0000000000000001 [ 957.423009] 00007ffec1a444b0 0000000000000400 ffff880053c19130 0000000000008940 [ 957.423009] Call Trace: [ 957.423009] [<ffffffff8138dec5>] dump_stack+0x85/0xc0 [ 957.423009] [<ffffffffa05ead32>] br_ioctl_deviceless_stub+0x212/0x2e0 [bridge] [ 957.423009] [<ffffffff81515beb>] sock_ioctl+0x22b/0x290 [ 957.423009] [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700 [ 957.423009] [<ffffffff8126c159>] SyS_ioctl+0x79/0x90 [ 957.423009] [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1 Since it only reads bridge ifindices, we can use rcu to safely walk the net device list. Also remove the wrong rtnl comment above. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-05 23:32:30 -04:00
Ian Campbell	dedc58e067	VSOCK: do not disconnect socket when peer has shutdown SEND only The peer may be expecting a reply having sent a request and then done a shutdown(SHUT_WR), so tearing down the whole socket at this point seems wrong and breaks for me with a client which does a SHUT_WR. Looking at other socket family's stream_recvmsg callbacks doing a shutdown here does not seem to be the norm and removing it does not seem to have had any adverse effects that I can see. I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact on the vmci transport. Signed-off-by: Ian Campbell <ian.campbell@docker.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Adit Ranadive <aditr@vmware.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-05 23:31:29 -04:00
Phil Turnbull	eda3fc50da	netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter If a quota bit is set in NFACCT_FLAGS but the NFACCT_QUOTA parameter is missing then a NULL pointer dereference is triggered. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:47:08 +02:00
Pablo Neira Ayuso	cb39ad8b8e	netfilter: nf_tables: allow set names up to 32 bytes Currently, we support set names of up to 16 bytes, get this aligned with the maximum length we can use in ipset to make it easier when considering migration to nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:51 +02:00
Pablo Neira Ayuso	d7cdf81657	netfilter: x_tables: get rid of old and inconsistent debugging The dprintf() and duprintf() functions are enabled at compile time, these days we have better runtime debugging through pr_debug() and static keys. On top of this, this debugging is so old that I don't expect anyone using this anymore, so let's get rid of this. IP_NF_ASSERT() is still left in place, although this needs that NETFILTER_DEBUG is enabled, I think these assertions provide useful context information when reading the code. Note that ARP_NF_ASSERT() has been removed as there is no user of this. Kill also DEBUG_ALLOW_ALL and a couple of pr_error() and pr_debug() spots that are inconsistently placed in the code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:51 +02:00
Pablo Neira Ayuso	3b78155b1b	openvswitch: __nf_ct_l{3,4}proto_find() always return a valid pointer If the protocol is not natively supported, this assigns generic protocol tracker so we can always assume a valid pointer after these calls. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Joe Stringer <joe@ovn.org>	2016-05-05 16:39:50 +02:00
Pablo Neira Ayuso	71d8c47fc6	netfilter: conntrack: introduce clash resolution on insertion race This patch introduces nf_ct_resolve_clash() to resolve race condition on conntrack insertions. This is particularly a problem for connection-less protocols such as UDP, with no initial handshake. Two or more packets may race to insert the entry resulting in packet drops. Another problematic scenario are packets enqueued to userspace via NFQUEUE after the raw table, that make it easier to trigger this race. To resolve this, the idea is to reset the conntrack entry to the one that won race. Packet and bytes counters are also merged. The 'insert_failed' stats still accounts for this situation, after this patch, the drop counter is bumped whenever we drop packets, so we can watch for unresolved clashes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:50 +02:00
Pablo Neira Ayuso	ba76738c03	netfilter: conntrack: introduce nf_ct_acct_update() Introduce a helper function to update conntrack counters. __nf_ct_kill_acct() was unnecessarily subtracting skb_network_offset() that is expected to be zero from the ipv4/ipv6 hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:49 +02:00
Pablo Neira Ayuso	4b4ceb9dbf	netfilter: conntrack: __nf_ct_l4proto_find() always returns valid pointer Remove unnecessary check for non-nul pointer in destroy_conntrack() given that __nf_ct_l4proto_find() returns the generic protocol tracker if the protocol is not supported. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:48 +02:00
Florian Westphal	3e86638e9a	netfilter: conntrack: consider ct netns in early_drop logic When iterating, skip conntrack entries living in a different netns. We could ignore netns and kill some other non-assured one, but it has two problems: - a netns can kill non-assured conntracks in other namespace - we would start to 'over-subscribe' the affected/overlimit netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:48 +02:00
Florian Westphal	56d52d4892	netfilter: conntrack: use a single hashtable for all namespaces We already include netns address in the hash and compare the netns pointers during lookup, so even if namespaces have overlapping addresses entries will be spread across the table. Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a 64bit system. NAT bysrc and expectation hash is still per namespace, those will changed too soon. Future patch will also make conntrack object slab cache global again. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:47 +02:00
Florian Westphal	1b8c8a9f64	netfilter: conntrack: make netns address part of hash Once we place all conntracks into a global hash table we want them to be spread across entire hash table, even if namespaces have overlapping ip addresses. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:46 +02:00
Florian Westphal	e0c7d47221	netfilter: conntrack: check netns when comparing conntrack objects Once we place all conntracks in the same hash table we must also compare the netns pointer to skip conntracks that belong to a different namespace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:46 +02:00
Florian Westphal	245cfdcaba	netfilter: conntrack: small refactoring of conntrack seq_printf The iteration process is lockless, so we test if the conntrack object is eligible for printing (e.g. is AF_INET) after obtaining the reference count. Once we put all conntracks into same hash table we might see more entries that need to be skipped. So add a helper and first perform the test in a lockless fashion for fast skip. Once we obtain the reference count, just repeat the check. Note that this refactoring also includes a missing check for unconfirmed conntrack entries due to slab rcu object re-usage, so they need to be skipped since they are not part of the listing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:45 +02:00
Florian Westphal	868043485e	netfilter: conntrack: use nf_ct_key_equal() in more places This prepares for upcoming change that places all conntracks into a single, global table. For this to work we will need to also compare net pointer during lookup. To avoid open-coding such check use the nf_ct_key_equal helper and then later extend it to also consider net_eq. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:44 +02:00
Florian Westphal	88b68bc523	netfilter: conntrack: don't attempt to iterate over empty table Once we place all conntracks into same table iteration becomes more costly because the table contains conntracks that we are not interested in (belonging to other netns). So don't bother scanning if the current namespace has no entries. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:44 +02:00
Florian Westphal	5e3c61f981	netfilter: conntrack: fix lookup race during hash resize When resizing the conntrack hash table at runtime via echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with the conntrack lookup path -- reads can happen in parallel and nothing prevents readers from observing a the newly allocated hash but the old size (or vice versa). So access to hash[bucket] can trigger OOB read access in case the table got expanded and we saw the new size but the old hash pointer (or it got shrunk and we got new hash ptr but the size of the old and larger table): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107 [..] Call Trace: [<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90 [<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0 [<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760 Use generation counter to obtain the address/length of the same table. Also add a synchronize_net before freeing the old hash. AFAICS, without it we might access ct_hash[bucket] after ct_hash has been freed, provided that lockless reader got delayed by another event: CPU1 CPU2 seq_begin seq_retry <delay> resize occurs free oldhash for_each(oldhash[size]) Note that resize is only supported in init_netns, it took over 2 minutes of constant resizing+flooding to produce the warning, so this isn't a big problem in practice. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:43 +02:00
Florian Westphal	2cf1234807	netfilter: conntrack: keep BH enabled during lookup No need to disable BH here anymore: stats are switched to _ATOMIC variant (== this_cpu_inc()), which nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:43 +02:00
Florian Westphal	1ad8f48df6	netfilter: nftables: add connlabel set support Conntrack labels are currently sized depending on the iptables ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we would allocate enough room to store at least bit 65. However, with nft, the input is just a register with arbitrary runtime content. We therefore ask for the upper ceiling we currently have, which is enough room to store 128 bits. Alternatively, we could alter nf_connlabel_replace to increase net->ct.label_words at run time, but since 128 bits is not that big we'd only save sizeof(long) so it doesn't seem worth it for now. This follows a similar approach that xtables 'connlabel' match uses, so when user inputs ct label set bar then we will set the bit used by the 'bar' label and leave the rest alone. This is done by passing the sreg content to nf_connlabels_replace as both value and mask argument. Labels (bits) already set thus cannot be re-set to zero, but this is not supported by xtables connlabel match either. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:27:59 +02:00
Eric Dumazet	777c6ae57e	tcp: two more missing bh disable percpu_counter only have protection against preemption. TCP stack uses them possibly from BH, so we need BH protection in contexts that could be run in process context Fixes: `c10d9310ed` ("tcp: do not assume TCP code is non preemptible") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 23:47:54 -04:00
Eric Dumazet	614bdd4d6e	tcp: must block bh in __inet_twsk_hashdance() __inet_twsk_hashdance() might be called from process context, better block BH before acquiring bind hash and established locks Fixes: `c10d9310ed` ("tcp: do not assume TCP code is non preemptible") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:55:11 -04:00
Eric Dumazet	46cc6e4976	tcp: fix lockdep splat in tcp_snd_una_update() tcp_snd_una_update() and tcp_rcv_nxt_update() call u64_stats_update_begin() either from process context or BH handler. This triggers a lockdep splat on 32bit & SMP builds. We could add u64_stats_update_begin_bh() variant but this would slow down 32bit builds with useless local_disable_bh() and local_enable_bh() pairs, since we own the socket lock at this point. I add sock_owned_by_me() helper to have proper lockdep support even on 64bit builds, and new u64_stats_update_begin_raw() and u64_stats_update_end_raw methods. Fixes: `c10d9310ed` ("tcp: do not assume TCP code is non preemptible") Reported-by: Fabio Estevam <festevam@gmail.com> Diagnosed-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:55:11 -04:00
David S. Miller	32b583a0cb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2016-05-04 1) The flowcache can hit an OOM condition if too many entries are in the gc_list. Fix this by counting the entries in the gc_list and refuse new allocations if the value is too high. 2) The inner headers are invalid after a xfrm transformation, so reset the skb encapsulation field to ensure nobody tries access the inner headers. Otherwise tunnel devices stacked on top of xfrm may build the outer headers based on wrong informations. 3) Add pmtu handling to vti, we need it to report pmtu informations for local generated packets. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:35:31 -04:00
David S. Miller	5332174a83	In this pull request you have: - two changes to the MAINTAINERS file where one marks our mailing list as moderated and the other adds a missing documentation file - kernel-doc fixes - code refactoring and various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXKRJdAAoJEJ4aZjxxc6bKSVEP/1Ky6O7+oanpsjjwUiZDMj0W KPtoPQ/VsJxu51e0OYi78jHtGned7xV+FLyFx1k8BwLPThYtd8ysDVqMqFAmXsRh JPOT+7Y+lf8/FBUYdKyJcsaoqAeRPXnY+p0vE7woLaxk+GOiWpOIip73nisgu9gy NxfmgJ77WjEV2v6IiD4djfYmZqOOvCF6IGWkubtc0WZdg5ma/2u7vYEDBy3yjN/b og/5joT3GZC8K6X8BabxNSLDER+qs489a6rOUGoRK4NCU3LhELAywuAws30nPrB/ vFJ6BvEEzkaGcXJViSFelb9zsi4ngwvY9OPQnFCmOicDzJN7jqdV6yXcnSLurph1 sDR+1+k1f63czCJpG8Uhj+8SaQO7P8T9A5nL1UKwhdCOENCuj8Vtp5y4S2A3bOSe jEv1dy9FC3yaPvtkyUN+wOuDerPoJr5pFuVRz2RGyeFSMxs+RPBLYf/D0+x1om9A Vz63ecsygk7S7qGNXHUbQvX5Q5Kv5f4y4XjvmrH3rBq+T/WC6V5vbkTo8L4CapX9 KNffNGl1RWqHz/TVLSrQmlHc9zNM/Rg0am2MIxplGfP0rQSUNob/qjD50KPJSLF/ M8tmOBSCNAxzlfAwcn+VJLq+xt6Mr2mkhwZZPYGQPno8JJqCMq52k4w1AQvbv+eI sxgFGvTq1WACUDx03vyx =qoV3 -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== pull request: batman-adv 20160504 In this pull request you have: - two changes to the MAINTAINERS file where one marks our mailing list as moderated and the other adds a missing documentation file - kernel-doc fixes - code refactoring and various cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:21:08 -04:00
Kangjie Lu	5f8e44741f	net: fix infoleak in rtnetlink The stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:19:42 -04:00
Kangjie Lu	b8670c09f3	net: fix infoleak in llc The stack object “info” has a total size of 12 bytes. Its last byte is padding which is not initialized and leaked via “put_cmsg”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 16:18:48 -04:00
Florian Westphal	9b36627ace	net: remove dev->trans_start previous patches removed all direct accesses to dev->trans_start, so change the netif_trans_update helper to update trans_start of netdev queue 0 instead and then remove trans_start from struct net_device. AFAICS a lot of the netif_trans_update() invocations are now useless because they occur in ndo_start_xmit and driver doesn't set LLTX (i.e. stack already took care of the update). As I can't test any of them it seems better to just leave them alone. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:16:50 -04:00
Florian Westphal	860e9538a9	treewide: replace dev->trans_start update with helper Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-xtensa@linux-xtensa.org Cc: linux1394-devel@lists.sourceforge.net Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: MPT-FusionLinux.pdl@broadcom.com Cc: linux-scsi@vger.kernel.org Cc: linux-can@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-hams@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: linux-bluetooth@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:16:49 -04:00
Arnd Bergmann	8bf42e9e51	gre6: add Kconfig dependency for NET_IPGRE_DEMUX The ipv6 gre implementation was cleaned up to share more code with the ipv4 version, but it can be enabled even when NET_IPGRE_DEMUX is disabled, resulting in a link error: net/built-in.o: In function `gre_rcv': :(.text+0x17f5d0): undefined reference to `gre_parse_header' ERROR: "gre_parse_header" [net/ipv6/ip6_gre.ko] undefined! This adds a Kconfig dependency to prevent that now invalid configuration. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `308edfdf15` ("gre6: Cleanup GREv6 receive path, call common GRE functions") Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:12:36 -04:00
Jiri Benc	125372faa4	gre: receive also TEB packets for lwtunnels For ipgre interfaces in collect metadata mode, receive also traffic with encapsulated Ethernet headers. The lwtunnel users are supposed to sort this out correctly. This allows to have mixed Ethernet + L3-only traffic on the same lwtunnel interface. This is the same way as VXLAN-GPE behaves. To keep backwards compatibility and prevent any surprises, gretap interfaces have priority in receiving packets with Ethernet headers. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:11:32 -04:00
Jiri Benc	244a797bdc	gre: move iptunnel_pull_header down to ipgre_rcv This will allow to make the pull dependent on the tunnel type. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:11:31 -04:00
Jiri Benc	00b2034029	gre: remove superfluous pskb_may_pull The call to gre_parse_header is either followed by iptunnel_pull_header, or in the case of ICMP error path, the actual header is not accessed at all. In the first case, iptunnel_pull_header will call pskb_may_pull anyway and it's pointless to do it twice. The only difference is what call will fail with what error code but the net effect is still the same in all call sites. In the second case, pskb_may_pull is pointless, as skb->data is at the outer IP header and not at the GRE header. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:11:31 -04:00
Alexander Duyck	b1dc497b28	net: Fix netdev_fix_features so that TSO_MANGLEID is only available with TSO This change makes it so that we will strip the TSO_MANGLEID bit if TSO is not present. This way we will also handle ECN correctly of TSO is not present. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:27 -04:00
Alexander Duyck	36c983824b	gso: Only allow GSO_PARTIAL if we can checksum the inner protocol This patch addresses a possible issue that can occur if we get into any odd corner cases where we support TSO for a given protocol but not the checksum or scatter-gather offload. There are few drivers floating around that setup their tunnels this way and by enforcing the checksum piece we can avoid mangling any frames. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:27 -04:00
Alexander Duyck	d7fb5a8049	gso: Do not perform partial GSO if number of partial segments is 1 or less In the event that the number of partial segments is equal to 1 we don't really need to perform partial segmentation offload. As such we should skip multiplying the MSS and instead just clear the partial_segs value since it will not provide any gain to advertise the frame as being GSO when it is a single frame. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:26 -04:00
Jiri Benc	f132ae7c46	gre: change gre_parse_header to return the header length It's easier for gre_parse_header to return the header length instead of filing it into a parameter. That way, the callers that don't care about the header length can just check whether the returned value is lower than zero. In gre_err, the tunnel header must not be pulled. See commit `b7f8fe251e` ("gre: do not pull header in ICMP error processing") for details. This patch reduces the conflict between the mentioned commit and commit `95f5c64c3c` ("gre: Move utility functions to common headers"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 12:44:45 -04:00
Eric Dumazet	d4011239f4	tcp: guarantee forward progress in tcp_sendmsg() Under high rx pressure, it is possible tcp_sendmsg() never has a chance to allocate an skb and loop forever as sk_flush_backlog() would always return true. Fix this by calling sk_flush_backlog() only if one skb had been allocated and filled before last backlog check. Fixes: `d41a69f1d3` ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 12:44:36 -04:00
David S. Miller	cba6532100	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 00:52:29 -04:00

1 2 3 4 5 ...

41987 Commits