linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-19 03:08:34 +07:00

Author	SHA1	Message	Date
Eric Dumazet	eb9344781a	tcp: add a force_schedule argument to sk_stream_alloc_skb() In commit `8e4d980ac2` ("tcp: fix behavior for epoll edge trigger") we fixed a possible hang of TCP sockets under memory pressure, by allowing sk_stream_alloc_skb() to use sk_forced_mem_schedule() if no packet is in socket write queue. It turns out there are other cases where we want to force memory schedule : tcp_fragment() & tso_fragment() need to split a big TSO packet into two smaller ones. If we block here because of TCP memory pressure, we can effectively block TCP socket from sending new data. If no further ACK is coming, this hang would be definitive, and socket has no chance to effectively reduce its memory usage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-21 16:56:40 -04:00
Johannes Berg	f8bdbb5847	mac80211: add missing drv_priv description for TXQ struct The kernel-doc description for the drv_priv member of struct ieee80211_txq was missing, leading to errors. Add a suitable description to fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-20 15:04:53 +02:00
Andy Zhou	06b2c61c92	ip: remove unused function prototype ip_do_nat() function was removed prior to kernel 3.4. Remove the unnecessary function prototype as well. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-19 16:54:36 -04:00
Daniel Borkmann	492135557d	tcp: add rfc3168, section 6.1.1.1. fallback This work as a follow-up of commit `f7b3bec6f5` ("net: allow setting ecn via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing ECN connections. In other words, this work adds a retry with a non-ECN setup SYN packet, as suggested from the RFC on the first timeout: [...] A host that receives no reply to an ECN-setup SYN within the normal SYN retransmission timeout interval MAY resend the SYN and any subsequent SYN retransmissions with CWR and ECE cleared. [...] Schematic client-side view when assuming the server is in tcp_ecn=2 mode, that is, Linux default since 2009 via commit `255cac91c3` ("tcp: extend ECN sysctl to allow server-side only ECN"): 1) Normal ECN-capable path: SYN ECE CWR -----> <----- SYN ACK ECE ACK -----> 2) Path with broken middlebox, when client has fallback: SYN ECE CWR ----X crappy middlebox drops packet (timeout, rtx) SYN -----> <----- SYN ACK ACK -----> In case we would not have the fallback implemented, the middlebox drop point would basically end up as: SYN ECE CWR ----X crappy middlebox drops packet (timeout, rtx) SYN ECE CWR ----X crappy middlebox drops packet (timeout, rtx) SYN ECE CWR ----X crappy middlebox drops packet (timeout, rtx) In any case, it's rather a smaller percentage of sites where there would occur such additional setup latency: it was found in end of 2014 that ~56% of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the fallback would mitigate with a slight latency trade-off. Recent related paper on this topic: Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth, Gorry Fairhurst, and Richard Scheffenegger: "Enabling Internet-Wide Deployment of Explicit Congestion Notification." Proc. PAM 2015, New York. http://ecn.ethz.ch/ecn-pam15.pdf Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168, section 6.1.1.1. fallback on timeout. For users explicitly not wanting this which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that allows for disabling the fallback. tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but rather we let tcp_ecn_rcv_synack() take that over on input path in case a SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent ECN being negotiated eventually in that case. Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch> Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch> Cc: Eric Dumazet <edumazet@google.com> Cc: Dave That <dave.taht@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-19 16:53:37 -04:00
Eric Dumazet	b5d721d761	inet: properly align icsk_ca_priv tcp_illinois and upcoming tcp_cdg require 64bit alignment of icsk_ca_priv x86 does not care, but other architectures might. Fixes: `05cbc0db03` ("ipv4: Create probe timer for tcp PMTU as per RFC4821") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Fan Du <fan.du@intel.com> Acked-by: Fan Du <fan.du@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-19 11:08:00 -04:00
Alexander Aring	0e66545701	nl802154: add support for dump phy capabilities This patch add support to nl802154 to dump all phy capabilities which is inside the wpan_phy_supported struct. Also we introduce a new method to dumping supported channels. The new method will offer a easier interface and has lesser netlink traffic. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:44 +02:00
Alexander Aring	65318680c9	ieee802154: add iftypes capability This patch adds capability flags for supported interface types. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	edea8f7c75	cfg802154: introduce wpan phy flags This patch introduce a flag property for the wpan phy structure. The current flag settings in ieee802154_hw are accessable in mac802154 layer only which is okay for flags which indicates MAC handling which are done by phy. For real PHY layer settings like cca mode, transmit power, cca energy detection level. The difference between these flags are that the MAC handling flags are only handled in mac802154/HardMac layer e.g. on an interface up. The phy settings are direct netlink calls from nl802154 into the driver layer and the nl802154 need to have a chance to check if the driver supports this handling before sending to the next layer. We also check now on PHY flags while dumping and setting pib attributes. In comparing with MIB attributes the 802.15.4 gives us an default value which we assume when a transceiver implement less functionality. In case of MIB settings the nl802154 layer doesn't need to check on the ieee802154_hw flags then. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	791021bf13	mac802154: check for really changes This patch adds check if the value is really changed inside pib/mib. If a transceiver do support only one value for e.g. max_be then this will also handle that the driver layer doesn't need to care about handling to set one value only. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	fea3318d20	ieee802154: add several phy supported handling This patch adds support for phy supported handling for all other already existing handling 802.15.4 functionality. We assume now a fully 802.15.4 complaint transceiver at phy allocation. If a transceiver can support 802.15.4 default values only, then the values should be overwirtten by values the transceiver supports. If the transceiver doesn't set the according hardware flags, we assume the 802.15.4 defaults now which cannot be changed. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	72f655e44d	ieee802154: introduce wpan_phy_supported This patch introduce the wpan_phy_supported struct for wpan_phy. There is currently no way to check if a transceiver can handle IEEE 802.15.4 complaint values. With this struct we can check before if the transceiver supports these values before sending to driver layer. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Acked-by: Varka Bhadram <varkabhadram@gmail.com> Cc: Alan Ott <alan@signal11.us> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	32b23550ad	ieee802154: change cca ed level to mbm This patch change the handling of cca energy detection level from dbm to mbm. This prepares to handle floating point cca energy detection levels values. The old netlink 802.15.4 will convert the dbm value to mbm for handling backward compatibility. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:42 +02:00
Alexander Aring	e2eb173aaa	ieee802154: change transmit power to mbm This patch change the handling of transmit power level from dbm to mbm. This prepares to handle floating point transmit power levels values. The old netlink 802.15.4 will convert the dbm value to mbm for handling backward compatibility. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:41 +02:00
Alexander Aring	1a19cb680b	ieee802154: change transmit power to s32 This patch change the transmit power from s8 to s32. This prepares to store a mbm value instead dbm inside the transmit power variable. The old interface keep the a s8 dbm value, which should be backward compatibility when assign s8 to s32. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-05-19 11:44:41 +02:00
Andy Zhou	49d16b23cd	bridge_netfilter: No ICMP packet on IPv4 fragmentation error When bridge netfilter re-fragments an IP packet for output, all packets that can not be re-fragmented to their original input size should be silently discarded. However, current bridge netfilter output path generates an ICMP packet with 'size exceeded MTU' message for such packets, this is a bug. This patch refactors the ip_fragment() API to allow two separate use cases. The bridge netfilter user case will not send ICMP, the routing output will, as before. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-19 00:15:39 -04:00
Andy Zhou	5cf4228082	ipv4: introduce frag_expire_skip_icmp() Improve readability of skip ICMP for de-fragmentation expiration logic. This change will also make the logic easier to maintain when the following patches in this series are applied. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-19 00:15:26 -04:00
WANG Cong	de133464c9	netns: make nsid_lock per net The spinlock is used to protect netns_ids which is per net, so there is no need to use a global spinlock. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 23:41:11 -04:00
Samudrala, Sridhar	45d4122ca7	switchdev: add support for fdb add/del/dump via switchdev_port_obj ops. - introduce port fdb obj and generic switchdev_port_fdb_add/del/dump() - use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops. - add support for fdb obj in switchdev_port_obj_add/del/dump() - switch rocker to implement fdb ops via switchdev_ops v3: updated to sync with named union changes. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 22:49:09 -04:00
Eric Dumazet	b8da51ebb1	tcp: introduce tcp_under_memory_pressure() Introduce an optimized version of sk_under_memory_pressure() for TCP. Our intent is to use it in fast paths. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 22:45:48 -04:00
Eric Dumazet	a6c5ea4ccf	tcp: rename sk_forced_wmem_schedule() to sk_forced_mem_schedule() We plan to use sk_forced_wmem_schedule() in input path as well, so make it non static and rename it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 22:45:48 -04:00
Eric Dumazet	1a24e04e4b	net: fix sk_mem_reclaim_partial() sk_mem_reclaim_partial() goal is to ensure each socket has one SK_MEM_QUANTUM forward allocation. This is needed both for performance and better handling of memory pressure situations in follow up patches. SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes as some arches have 64KB pages. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 22:45:48 -04:00
Eric Dumazet	d53a2aa3a1	net: fix sparse error in csum_replace4() make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/netfilter/nf_nat_l3proto_ipv4.o CHECK net/ipv4/netfilter/nf_nat_l3proto_ipv4.c include/net/checksum.h:125:64: warning: incorrect type in argument 2 (different base types) include/net/checksum.h:125:64: expected restricted __wsum [usertype] addend include/net/checksum.h:125:64: got restricted __be32 [usertype] from include/net/checksum.h:125:71: warning: incorrect type in argument 2 (different base types) include/net/checksum.h:125:71: expected restricted __wsum [usertype] addend include/net/checksum.h:125:71: got restricted __be32 [usertype] to include/net/checksum.h:125:64: warning: incorrect type in argument 2 (different base types) include/net/checksum.h:125:64: expected restricted __wsum [usertype] addend include/net/checksum.h:125:64: got restricted __be32 [usertype] from include/net/checksum.h:125:71: warning: incorrect type in argument 2 (different base types) include/net/checksum.h:125:71: expected restricted __wsum [usertype] addend include/net/checksum.h:125:71: got restricted __be32 [usertype] to Fixes: `4565af0d40` ("net: optimise csum_replace4()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-17 13:08:29 -04:00
Eric Dumazet	264ea103a7	tcp: syncookies: extend validity range Now we allow storing more request socks per listener, we might hit syncookie mode less often and hit following bug in our stack : When we send a burst of syncookies, then exit this mode, tcp_synq_no_recent_overflow() can return false if the ACK packets coming from clients are coming three seconds after the end of syncookie episode. This is a way too strong requirement and conflicts with rest of syncookie code which allows ACK to be aged up to 2 minutes. Perfectly valid ACK packets are dropped just because clients might be in a crowded wifi environment or on another planet. So let's fix this, and also change tcp_synq_overflow() to not dirty a cache line for every syncookie we send, as we are under attack. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-14 22:32:17 -04:00
John W. Linville	35d32e8fe4	geneve: move definition of geneve_hdr() to geneve.h This is a static inline with identical definitions in multiple places... Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:59:13 -04:00
Jiri Pirko	59346afe7a	flow_dissector: change port array into src, dst tuple Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	67a900cc04	flow_dissector: introduce support for Ethernet addresses Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	b924933cbb	flow_dissector: introduce support for ipv6 addressses So far, only hashes made out of ipv6 addresses could be dissected. This patch introduces support for dissection of full ipv6 addresses. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	c3f8eaeb6e	flow_dissector: add missing header includes Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	06635a35d1	flow_dissect: use programable dissector in skb_flow_dissect and friends Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	fbff949e3b	flow_dissector: introduce programable flow_dissector Introduce dissector infrastructure which allows user to specify which parts of skb he wants to dissect. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:47 -04:00
Jiri Pirko	9c684b5083	net: move __skb_get_hash function declaration to flow_dissector.h Since the definition of the function is in flow_dissector.c, it makes sense to have the declaration in flow_dissector.h Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:46 -04:00
Jiri Pirko	10b89ee43e	net: move *skb_get_poff declarations into correct header Since these functions are defined in flow_dissector.c, move header declarations from skbuff.h into flow_dissector.h Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:45 -04:00
Jiri Pirko	b0a31431b4	flow_dissector: remove unused function flow_get_hlen declaration commit `56193d1bce` ("net: Add function for parsing the header length out of linear ethernet frames") added this function declaration but it is defined nowhere. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:45 -04:00
Jiri Pirko	1bd758eb1c	net: change name of flow_dissector header to match the .c file name add couple of empty lines on the way. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:19:45 -04:00
Florian Westphal	e578d9c025	net: sched: use counter to break reclassify loops Seems all we want here is to avoid endless 'goto reclassify' loop. tc_classify_compat even resets this counter when something other than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't break hypothetical loops induced by something other than perpetual TC_ACT_RECLASSIFY return values. skb_act_clone is now identical to skb_clone, so just use that. Tested with following (bogus) filter: tc filter add dev eth0 parent ffff: \ protocol ip u32 match u32 0 0 police rate 10Kbit burst \ 64000 mtu 1500 action reclassify Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 15:08:14 -04:00
David S. Miller	b04096ff33	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 14:31:43 -04:00
Scott Feldman	42275bd8fc	switchdev: don't use anonymous union on switchdev attr/obj structs Older gcc versions (e.g. gcc version 4.4.6) don't like anonymous unions which was causing build issues on the newly added switchdev attr/obj structs. Fix this by using named union on structs. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 14:20:59 -04:00
Scott Feldman	5eb764edee	switchdev: align comment with other comments in block Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-13 12:26:27 -04:00
Ying Xue	9449c3cd90	net: make skb_dst_pop routine static As xfrm_output_one() is the only caller of skb_dst_pop(), we should make skb_dst_pop() localized. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 23:19:49 -04:00
Scott Feldman	58c2cb16b1	switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del The IPv4 FIB ops convert nicely to the switchdev objs and we're left with only four switchdev ops: port get/set and port add/del. Other objs will follow, such as FDB. So go ahead and convert IPv4 FIB over to switchdev obj for consistency, anticipating more objs to come. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:55 -04:00
Scott Feldman	8793d0a664	switchdev: add new switchdev_port_bridge_getlink Like bridge_setlink, add switchdev wrapper to handle bridge_getlink and call into port driver to get port attrs. For now, only BR_LEARNING and BR_LEARNING_SYNC are returned. To add more, we'll probably want to break away from ndo_dflt_bridge_getlink() and build the netlink skb directly in the switchdev code. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:55 -04:00
Scott Feldman	87a5dae59e	switchdev: remove unused switchdev_port_bridge_dellink Now we can remove old wrappers for dellink. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:55 -04:00
Scott Feldman	5c34e02214	switchdev: add new switchdev_port_bridge_dellink Same change as setlink. Provide the wrapper op for SELF ndo_bridge_dellink and call into the switchdev driver to delete afspec VLANs. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:55 -04:00
Scott Feldman	e71f220b34	switchdev: remove old switchdev_port_bridge_setlink New attr-based bridge_setlink can recurse lower devs and recover on err, so remove old wrapper (including ndo_dflt_switchdev_port_bridge_setlink). Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:54 -04:00
Scott Feldman	6004c86718	switchdev: add bridge port flags attr rocker: use switchdev get/set attr for bridge port flags Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:54 -04:00
Scott Feldman	6fc3016da7	switchdev: add port vlan obj VLAN obj has flags (PVID and untagged) as well as start and end vid ranges. The switchdev driver can optimize programing the device using the ranges. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Scott Feldman	491d0f1533	switchdev: introduce switchdev add/del obj ops Like switchdev attr get/set, add new switchdev obj add/del. switchdev objs will be things like VLANs or FIB entries, so add/del fits better for objects than get/set used for attributes. Use same two-phase prepare-commit transaction model as in attr set. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Scott Feldman	3563606258	switchdev: convert STP update to switchdev attr set STP update is just a settable port attribute, so convert switchdev_port_stp_update to an attr set. For DSA, the prepare phase is skipped and STP updates are only done in the commit phase. This is because currently the DSA drivers don't need to allocate any memory for STP updates and the STP update will not fail to HW (unless something horrible goes wrong on the MDIO bus, in which case the prepare phase wouldn't have been able to predict anyway). Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Scott Feldman	f8e20a9f87	switchdev: convert parent_id_get to switchdev attr get Switch ID is just a gettable port attribute. Convert switchdev op switchdev_parent_id_get to a switchdev attr. Note: for sysfs and netlink interfaces, SWITCHDEV_ATTR_PORT_PARENT_ID is called with SWITCHDEV_F_NO_RECUSE to limit switch ID user-visiblity to only port netdevs. So when a port is stacked under bond/bridge, the user can only query switch id via the switch ports, but not via the upper devices Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Scott Feldman	3094333d90	switchdev: introduce get/set attrs ops Add two new swdev ops for get/set switch port attributes. Most swdev interactions on a port are gets or sets on port attributes, so rather than adding ops for each attribute, let's define clean get/set ops for all attributes, and then we can have clear, consistent rules on how attributes propagate on stacked devs. Add the basic algorithms for get/set attr ops. Use the same recusive algo to walk lower devs we've used for STP updates, for example. For get, compare attr value for each lower dev and only return success if attr values match across all lower devs. For sets, set the same attr value for all lower devs. We'll use a two-phase prepare-commit transaction model for sets. In the first phase, the driver(s) are asked if attr set is OK. If all OK, the commit attr set in second phase. A driver would NACK the prepare phase if it can't set the attr due to lack of resources or support, within it's control. RTNL lock must be held across both phases because we'll recurse all lower devs first in prepare phase, and then recurse all lower devs again in commit phase. If any lower dev fails the prepare phase, we need to abort the transaction for all lower devs. If lower dev recusion isn't desired, allow a flag SWITCHDEV_F_NO_RECURSE to indicate get/set only work on port (lowest) device. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Jiri Pirko	9d47c0a2d9	switchdev: s/swdev_/switchdev_/ Turned out that "switchdev" sticks. So just unify all related terms to use this prefix. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:53 -04:00
Jiri Pirko	ebb9a03a59	switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ Turned out that "switchdev" sticks. So just unify all related terms to use this prefix. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 18:43:52 -04:00
Eric Dumazet	b396cca6fa	net: sched: deprecate enqueue_root() Only left enqueue_root() user is netem, and it looks not necessary : qdisc_skb_cb(skb)->pkt_len is preserved after one skb_clone() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 14:17:32 -04:00
Mahesh Bandewar	d22a5fc0c3	bonding: Implement user key part of port_key in an AD system. The port key has three components - user-key, speed-part, and duplex-part. The LSBit is for the duplex-part, next 5 bits are for the speed while the remaining 10 bits are the user defined key bits. Get these 10 bits from the user-space (through the SysFs interface) and use it to form the admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If it is not provided then use zero for the user-key-bits (default). It can set using following example code - # modprobe bonding mode=4 # usr_port_key=$(( RANDOM & 0x3FF )) # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * fixed up context from change in ad_actor_sys_prio patch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:59:32 -04:00
Mahesh Bandewar	7451495755	bonding: Allow userspace to set actors' macaddr in an AD-system. In an AD system, the communication between actor and partner is the business between these two entities. In the current setup anyone on the same L2 can "guess" the LACPDU contents and then possibly send the spoofed LACPDUs and trick the partner causing connectivity issues for the AD system. This patch allows to use a random mac-address obscuring it's identity making it harder for someone in the L2 is do the same thing. This patch allows user-space to choose the mac-address for the AD-system. This mac-address can not be NULL or a Multicast. If the mac-address is set from user-space; kernel will honor it and will not overwrite it. In the absence (value from user space); the logic will default to using the masters' mac as the mac-address for the AD-system. It can be set using example code below - # modprobe bonding mode=4 # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \ $(( (RANDOM & 0xFE) \| 0x02 )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF ))) # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: fixed up style issues reported by checkpatch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:59:32 -04:00
Mahesh Bandewar	6791e4661c	bonding: Allow userspace to set actors' system_priority in AD system This patch allows user to randomize the system-priority in an ad-system. The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user does not specify this value, the system defaults to 0xFFFF, which is what it was before this patch. Following example code could set the value - # modprobe bonding mode=4 # sys_prio=$(( 1 + RANDOM + RANDOM )) # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * changed how the default value is set in bond_check_params(), this makes the default consistent between what gets set for a new bond and what the default is claimed to be in the bonding options.] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:59:31 -04:00
Eric W. Biederman	affb9792f1	net: kill sk_change_net and sk_release_kernel These functions are no longer needed and no longer used kill them. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:18 -04:00
Eric W. Biederman	26abe14379	net: Modify sk_alloc to not reference count the netns of kernel sockets. Now that sk_alloc knows when a kernel socket is being allocated modify it to not reference count the network namespace of kernel sockets. Keep track of if a socket needs reference counting by adding a flag to struct sock called sk_net_refcnt. Update all of the callers of sock_create_kern to stop using sk_change_net and sk_release_kernel as those hacks are no longer needed, to avoid reference counting a kernel socket. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:18 -04:00
Eric W. Biederman	11aa9c28b4	net: Pass kern from net_proto_family.create to sk_alloc In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:17 -04:00
Eric Dumazet	80ba92fa1a	codel: add ce_threshold attribute For DCTCP or similar ECN based deployments on fabrics with shallow buffers, hosts are responsible for a good part of the buffering. This patch adds an optional ce_threshold to codel & fq_codel qdiscs, so that DCTCP can have feedback from queuing in the host. A DCTCP enabled egress port simply have a queue occupancy threshold above which ECT packets get CE mark. In codel language this translates to a sojourn time, so that one doesn't have to worry about bytes or bandwidth but delays. This makes the host an active participant in the health of the whole network. This also helps experimenting DCTCP in a setup without DCTCP compliant fabric. On following example, ce_threshold is set to 1ms, and we can see from 'ldelay xxx us' that TCP is not trying to go around the 5ms codel target. Queue has more capacity to absorb inelastic bursts (say from UDP traffic), as queues are maintained to an optimal level. lpaa23:~# ./tc -s -d qd sh dev eth1 qdisc mq 1: dev eth1 root Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961) backlog 3108242b 364p requeues 42961 qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503) rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503 count 0 lastcount 0 ldelay 1.0ms drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384 qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186) rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186 count 0 lastcount 0 ldelay 694us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873 qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554) rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554 count 0 lastcount 0 ldelay 889us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780 ... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:50:20 -04:00
Nicolas Dichtel	59324cf35a	netlink: allow to listen "all" netns More accurately, listen all netns that have a nsid assigned into the netns where the netlink socket is opened. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. With this patch, a daemon needs only one socket to listen many netns. This is useful when the number of netns is high. Because 0 is a valid value for a nsid, the field nsid_is_set indicates if the field nsid is valid or not. skb->cb is initialized to 0 on skb allocation, thus we are sure that we will never send a nsid 0 by error to the userland. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	7a0877d4b4	netns: rename peernet2id() to peernet2id_alloc() In a following commit, a new function will be introduced to only lookup for a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the existing function is renamed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:30 -04:00
David S. Miller	0e00a0f73f	Lots of updates for net-next for this cycle. As usual, we have a lot of small fixes and cleanups, the bigger items are: * proper mac80211 rate control locking, to fix some random crashes (this required changing other locking as well) * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the amount of code we execute while going from ndo_start_xmit() to the driver * this also clears the way for properly supporting S/G and checksum and segmentation offloads -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVSh65AAoJEDBSmw7B7bqrudwP/0iXyNQhF0mLTENrx+rdsDZS qQhB/8wejJaOJb89Re7M+bhwri7Q6S5BM/G24vhMc01dxmqNMcdKfEV3+nlmc5C+ KeEgTI9aZiCnUt4WAd54Zwbkc9o+1kBtaFuaWDvOdQHUf0WDwEIQxjnV4+SZujV9 xl1TV5yV35hRQgrDE8ZSbtOYRmhSVoi0MEgwqAjzdN2fEPyWVeqwYULDtpOopjL2 UHQgv0E2fYVRWennHyQQ88tWBQg+EsRaG1U1/rYHhNBmAJ+f9AOxKi7ErzxYfkbM 961B+3E++pM+zUeqw6+jaMKqT5jeCCM5ugCNSG4NrIvfxDIDgecAFV9Fs2islnI4 8xd3GqyA5iqaitAWIUsaYaQfaAcwSIlpSinfQW9EUm2wuCkPyZboFP+GRd2K7sQn FnRJSJ9PkGPdWwdDE3gunLHBHtbDS0z+R8VegIeS0qT8LamkqICiNQSyPlsTeluW ig2kwHsDdj3k11wyelhfp/RdtsOch/brKpLSjdzPXC1BzIWhQLwmsPh9qZ83vSB9 qbLsdnM/IPQXocWB6fOhmwaGsLeRalxs2yQFM0zdJCwpaU9dzKsJrxepAXVuq31p r0fygWTp8GVevHXzfS7fRya8xjsTRrSs6n2kH7ErOfiep13HQypAjbyLswNe4kW/ D6x8pVC3AhdGkl/9CW4m =oUlh -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Lots of updates for net-next for this cycle. As usual, we have a lot of small fixes and cleanups, the bigger items are: * proper mac80211 rate control locking, to fix some random crashes (this required changing other locking as well) * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the amount of code we execute while going from ndo_start_xmit() to the driver * this also clears the way for properly supporting S/G and checksum and segmentation offloads ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:27:25 -04:00
Eric Dumazet	e520af48c7	tcp: add TCPWinProbe and TCPKeepAlive SNMP counters Diagnosing problems related to Window Probes has been hard because we lack a counter. TCPWinProbe counts the number of ACK packets a sender has to send at regular intervals to make sure a reverse ACK packet opening back a window had not been lost. TCPKeepAlive counts the number of ACK packets sent to keep TCP flows alive (SO_KEEPALIVE) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 16:42:32 -04:00
Eric Dumazet	21c8fe9915	tcp: adjust window probe timers to safer values With the advent of small rto timers in datacenter TCP, (ip route ... rto_min x), the following can happen : 1) Qdisc is full, transmit fails. TCP sets a timer based on icsk_rto to retry the transmit, without exponential backoff. With low icsk_rto, and lot of sockets, all cpus are servicing timer interrupts like crazy. Intent of the code was to retry with a timer between 200 (TCP_RTO_MIN) and 500ms (TCP_RESOURCE_PROBE_INTERVAL) 2) Receivers can send zero windows if they don't drain their receive queue. TCP sends zero window probes, based on icsk_rto current value, with exponential backoff. With /proc/sys/net/ipv4/tcp_retries2 being 15 (or even smaller in some cases), sender can abort in less than one or two minutes ! If receiver stops the sender, it obviously doesn't care of very tight rto. Probability of dropping the ACK reopening the window is not worth the risk. Lets change the base timer to be at least 200ms (TCP_RTO_MIN) for these events (but not normal RTO based retransmits) A followup patch adds a new SNMP counter, as it would have helped a lot diagnosing this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 16:42:32 -04:00
Arik Nemtsov	06f207fc54	cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA The GO_CONCURRENT regulatory definition can be extended to station interfaces requesting to IR as part of TDLS off-channel operations. Rename the GO_CONCURRENT flag to IR_CONCURRENT and allow the added use-case. Change internal users of GO_CONCURRENT to use the new definition. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-06 15:50:02 +02:00
Johannes Berg	a31cf1c69e	mac80211: extend get_key() to return PN for all ciphers For ciphers not supported by mac80211, the function currently doesn't return any PN data. Fix this by extending the driver's get_key_seq() a little more to allow moving arbitrary PN data. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-06 13:30:00 +02:00
Johannes Berg	9352c19f63	mac80211: extend get_tkip_seq to all keys Extend the function to read the TKIP IV32/IV16 to read the IV/PN for all ciphers in order to allow drivers with full hardware crypto to properly support this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-06 13:29:59 +02:00
Eric Dumazet	cd8ae85299	tcp: provide SYN headers for passive connections This patch allows a server application to get the TCP SYN headers for its passive connections. This is useful if the server is doing fingerprinting of clients based on SYN packet contents. Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN. The first is used on a socket to enable saving the SYN headers for child connections. This can be set before or after the listen() call. The latter is used to retrieve the SYN headers for passive connections, if the parent listener has enabled TCP_SAVE_SYN. TCP_SAVED_SYN is read once, it frees the saved SYN headers. The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP headers. Original patch was written by Tom Herbert, I changed it to not hold a full skb (and associated dst and conntracking reference). We have used such patch for about 3 years at Google. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-05 16:02:34 -04:00
Johannes Berg	f5c4ae0799	mac80211: make LED trigger names const This is just a code cleanup, make the LED trigger names const as they're not expected to be modified by drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-05 14:21:55 +02:00
David S. Miller	b7ba7b469a	We have only a few fixes right now: * a fix for an issue with hash collision handling in the rhashtable conversion * a merge issue - rhashtable removed default shrinking just before mac80211 was converted, so enable it now * remove an invalid WARN that can trigger with legitimate userspace behaviour * add a struct member missing from kernel-doc that caused a lot of warnings -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVR1FIAAoJEDBSmw7B7bqrbkQP/RPleNMLHNOLRoJva6vNVers hiyLwG+lpV9yHCUX1I7TsvyTHddjW3ANNC1LvlKyUiNzBBsr/Yg/Qe1RVGkRTK7E ZPTU7xutk2nSLjvbVLjTQtIAynKZM6W9vM3X16WtsZ9+25z4e8jN8BjhzP99Vs7X IkwL7Qe2q1O/ta+/ByVR4dFWtGyJDxRjXkFs4cQ/VdL/fJalKxIa9YJrmUwIBMOU UIuJ1lh7kkNFGky+4Y3dBA/2L+O/iS64dxe+ygLMJGf9xdGGHwyuM6oe6OKEKPi1 pF0NIWgHRGOUEJNyUKZTOwCyk1PeOr9aFFDHN4GJyE463KzZqVYdA4ajI6udwL2h I/AjsBnM0zQcoBD8HUeJK2ZgndPxOrMZGVjs0/M3K9lFocYBALoLkt5YPD1PXOZs 9EQ8mhej3/xbwyEaaAp+HJPOr/wLZ//juxsPP8HSR1BDm8cR40rd9rETEGWnmrxt 5vpmb2vNpKekw+pKf2vwuEH6BwH02nuz4YPudTswiEzObEQAuGx+rpopQVI9bdgE V/r3WY0cl+bDMrcSpoZ7v6FDoHHA/zPq9aAAV0JsMyYApY0nMdqwShBJq5cTSoK9 rK5I6oIbqChskpoWULUVbbjb1fLsNVI4C89KjxW8xziTEY1d2fEeeJ2tR0lYBX/5 /KOygSze3Qi0MjrdE9D2 =sf/A -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have only a few fixes right now: * a fix for an issue with hash collision handling in the rhashtable conversion * a merge issue - rhashtable removed default shrinking just before mac80211 was converted, so enable it now * remove an invalid WARN that can trigger with legitimate userspace behaviour * add a struct member missing from kernel-doc that caused a lot of warnings ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 16:00:55 -04:00
David S. Miller	73e84313ee	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-05-04 Here's the first bluetooth-next pull request for 4.2: - Various fixes for at86rf230 driver - ieee802154: trace events support for rdev->ops - HCI UART driver refactoring - New Realtek IDs added to btusb driver - Off-by-one fix for rtl8723b in btusb driver - Refactoring of btbcm driver for both UART & USB use Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:36:07 -04:00
Linus Lüssing	9afd85c9e4	net: Export IGMP/MLD message validation code With this patch, the IGMP and MLD message validation functions are moved from the bridge code to IPv4/IPv6 multicast files. Some small refactoring was done to enhance readibility and to iron out some differences in behaviour between the IGMP and MLD parsing code (e.g. the skb-cloning of MLD messages is now only done if necessary, just like the IGMP part always did). Finally, these IGMP and MLD message validation functions are exported so that not only the bridge can use it but batman-adv later, too. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 14:49:23 -04:00
Randy Dunlap	ff419b3f95	mac80211: fix 90 kernel-doc warnings Eliminate 90 of these warnings: Warning(..//include/net/mac80211.h:1682): No description found for parameter 'drv_priv[0]' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-05-04 12:56:13 +02:00
Tom Herbert	2f59e1ebaa	net: Add flow_keys digest Some users of flow keys (well just sch_choke now) need to pass flow_keys in skbuff cb, and use them for exact comparisons of flows so that skb->hash is not sufficient. In order to increase size of the flow_keys structure, we introduce another structure for the purpose of passing flow keys in skbuff cb. We limit this structure to sixteen bytes, and we will technically treat this as a digest of flow_keys struct hence its name flow_keys_digest. In the first incaranation we just copy the flow_keys structure up to 16 bytes-- this is the same information previously passed in the cb. In the future, we'll adapt this for larger flow_keys and could use something like SHA-1 over the whole flow_keys to improve the quality of the digest. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 00:09:09 -04:00
Eric Dumazet	a5d2809040	codel: fix maxpacket/mtu confusion Under presence of TSO/GSO/GRO packets, codel at low rates can be quite useless. In following example, not a single packet was ever dropped, while average delay in codel queue is ~100 ms ! qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0) backlog 13626b 3p requeues 0 count 0 lastcount 0 ldelay 96.9ms drop_next 0us maxpacket 9084 ecn_mark 0 drop_overlimit 0 This comes from a confusion of what should be the minimal backlog. It is pretty clear it is not 64KB or whatever max GSO packet ever reached the qdisc. codel intent was to use MTU of the device. After the fix, we finally drop some packets, and rtt/cwnd of my single TCP flow are meeting our expectations. qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0) backlog 6056b 3p requeues 0 count 1 lastcount 1 ldelay 36.3ms drop_next 0us maxpacket 10598 ecn_mark 0 drop_overlimit 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kathleen Nichols <nichols@pollere.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Van Jacobson <vanj@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-03 22:17:40 -04:00
Tom Herbert	82a584b7cd	ipv6: Flow label state ranges This patch divides the IPv6 flow label space into two ranges: 0-7ffff is reserved for flow label manager, 80000-fffff will be used for creating auto flow labels (per RFC6438). This only affects how labels are set on transmit, it does not affect receive. This range split can be disbaled by systcl. Background: IPv6 flow labels have been an unmitigated disappointment thus far in the lifetime of IPv6. Support in HW devices to use them for ECMP is lacking, and OSes don't turn them on by default. If we had these we could get much better hashing in IPv6 networks without resorting to DPI, possibly eliminating some of the motivations to to define new encaps in UDP just for getting ECMP. Unfortunately, the initial specfications of IPv6 did not clarify how they are to be used. There has always been a vague concept that these can be used for ECMP, flow hashing, etc. and we do now have a good standard how to this in RFC6438. The problem is that flow labels can be either stateful or stateless (as in RFC6438), and we are presented with the possibility that a stateless label may collide with a stateful one. Attempts to split the flow label space were rejected in IETF. When we added support in Linux for RFC6438, we could not turn on flow labels by default due to this conflict. This patch splits the flow label space and should give us a path to enabling auto flow labels by default for all IPv6 packets. This is an API change so we need to consider compatibility with existing deployment. The stateful range is chosen to be the lower values in hopes that most uses would have chosen small numbers. Once we resolve the stateless/stateful issue, we can proceed to look at enabling RFC6438 flow labels by default (starting with scaled testing). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-03 21:58:01 -04:00
Florian Westphal	4749c3ef85	net: sched: remove TC_MUNGED bits Not used. pedit sets TC_MUNGED when packet content was altered, but all the core does is unset MUNGED again and then set OK2MUNGE. And the latter isn't tested anywhere. So lets remove both TC_MUNGED and TC_OK2MUNGE. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-02 22:25:17 -04:00
Martin KaFai Lau	afc4eef80c	ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer _rt6i_peer is no longer needed after the last patch, 'ipv6: Stop rt6_info from using inet_peer's metrics'. DST_METRICS_FORCE_OVERWRITE is added by commit `e5fd387ad5` ("ipv6: do not overwrite inetpeer metrics prematurely"). Since inetpeer is no longer used for metrics, this bit is also not needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Martin KaFai Lau	4b32b5ad31	ipv6: Stop rt6_info from using inet_peer's metrics inet_peer is indexed by the dst address alone. However, the fib6 tree could have multiple routing entries (rt6_info) for the same dst. For example, 1. A /128 dst via multiple gateways. 2. A RTF_CACHE route cloned from a /128 route. In the above cases, all of them will share the same metrics and step on each other. This patch will steer away from inet_peer's metrics and use dst_cow_metrics_generic() for everything. Change Highlights: 1. Remove rt6_cow_metrics() which currently acquires metrics from inet_peer for DST_HOST route (i.e. /128 route). 2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a full size metrics just to override the RTAX_MTU. 3. After (2), the RTF_CACHE route can also share the metrics with its dst.from route, by: dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true); 4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route. Instead, directly clone from rt->dst. [ Currently, cloning from another RTF_CACHE is only possible during rt6_do_redirect(). Also, the old clone is removed from the tree immediately after the new clone is added. ] In case of cloning from an older redirect RTF_CACHE, it should work as before. In case of cloning from an older pmtu RTF_CACHE, this patch will forget the pmtu and re-learn it (if there is any) from the redirected route. The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed in the next cleanup patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-01 20:57:06 -04:00
Varka Bhadram	5b4a103904	cfg802154: pass name_assign_type to rdev_add_virtual_intf() This code is based on commit `6bab2e19c5` ("cfg80211: pass name_assign_type to rdev_add_virtual_intf()") This will expose in sysfs whether the ifname of a IEEE-802.15.4 device is set by userspace or generated by the kernel. We are using two types of name_assign_types o NET_NAME_ENUM: Default interface name provided by kernel o NET_NAME_USER: Interface name provided by user. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-04-30 18:48:10 +02:00
Varka Bhadram	42fb23e2f5	mac802154: add description to mac802154 APIs This patch adds the proper description to the mac802154 core APIs. Signed-off-by: Varka Bhadram <varkab@cdac.in> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-04-30 18:48:09 +02:00
Eric Dumazet	64f40ff5bb	tcp: prepare CC get_info() access from getsockopt() We would like that optional info provided by Congestion Control modules using netlink can also be read using getsockopt() This patch changes get_info() to put this information in a buffer, instead of skb, like tcp_get_info(), so that following patch can reuse this common infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-29 17:10:38 -04:00
Eric Dumazet	0df48c26d8	tcp: add tcpi_bytes_acked to tcp_info This patch tracks total number of bytes acked for a TCP socket. This is the sum of all changes done to tp->snd_una, and allows for precise tracking of delivered data. RFC4898 named this : tcpEStatsAppHCThruOctetsAcked This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_acked was placed near tp->snd_una for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-29 17:10:37 -04:00
Matan Barak	73b5a6f2a7	net/bonding: Make DRV macros private The bonding modules currently defines four macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-26 22:59:53 -04:00
Eric Dumazet	b357a364c5	inet: fix possible panic in reqsk_queue_unlink() [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243 There is a race when reqsk_timer_handler() and tcp_check_req() call inet_csk_reqsk_queue_unlink() on the same req at the same time. Before commit `fa76ce7328` ("inet: get rid of central tcp/dccp listener timer"), listener spinlock was held and race could not happen. To solve this bug, we change reqsk_queue_unlink() to not assume req must be found, and we return a status, to conditionally release a refcount on the request sock. This also means tcp_check_req() in non fastopen case might or not consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have to properly handle this. (Same remark for dccp_check_req() and its callers) inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is called 4 times in tcp and 3 times in dccp. Fixes: `fa76ce7328` ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-24 11:39:15 -04:00
Emmanuel Grumbach	b497de63ad	mac80211: notify the driver on reordering buffer timeout When frames time out in the reordering buffer, it is a good indication that something went wrong and the driver may want to know about that to take action or trigger debug flows. It is pointless to notify the driver about each frame that is released. Notify each time the timer fires. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-24 12:25:01 +02:00
Emmanuel Grumbach	6382246e89	mac80211: notify the driver upon BAR Rx When we receive a BAR, this typically means that our peer doesn't hear our Block-Acks or that we can't hear its frames. Either way, it is a good indication that the link is in a bad condition. This is why it can serve as a probe to the driver. Use the event_callback callback for this. Since more events with the same data will be added in the feature, the structure that describes the data attached to the event is called in a generic name: ieee80211_ba_event. This also means that from now on, the event_callback can't sleep. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-24 12:25:00 +02:00
Johannes Berg	df1404650c	mac80211: remove support for IFF_PROMISC This support is essentially useless as typically networks are encrypted, frames will be filtered by hardware, and rate scaling will be done with the intended recipient in mind. For real monitoring of the network, the monitor mode support should be used instead. Removing it removes a lot of corner cases. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-24 11:14:13 +02:00
Johannes Berg	680a0daba7	mac80211: allow drivers to support S/G If drivers want to support S/G (really just gather DMA on TX) then we can now easily support this on the fast-xmit path since it just needs to write to the ethernet header (and already has a check for that being possible.) However, disallow this on the regular TX path (which has to handle fragmentation, software crypto, etc.) by calling skb_linearize(). Also allow the related HIGHDMA since that's not interesting to the code in mac80211 at all anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-22 11:25:30 +02:00
Johannes Berg	17c18bf880	mac80211: add TX fastpath In order to speed up mac80211's TX path, add the "fast-xmit" cache that will cache the data frame 802.11 header and other data to be able to build the frame more quickly. This cache is rebuilt when external triggers imply changes, but a lot of the checks done per packet today are simplified away to the check for the cache. There's also a more detailed description in the code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-22 10:02:25 +02:00
Jonathan Corbet	a839e463e8	mac80211: Fix mac80211.h docbook comments A couple of enums in mac80211.h became structures recently, but the comments didn't follow suit, leading to errors like: Error(.//include/net/mac80211.h:367): Cannot parse enum! Documentation/DocBook/Makefile:93: recipe for target 'Documentation/DocBook/80211.xml' failed make[1]: * [Documentation/DocBook/80211.xml] Error 1 Makefile:1361: recipe for target 'mandocs' failed make: * [mandocs] Error 2 Fix the comments comments accordingly. Added a couple of other small comment fixes while I was there to silence other recently-added docbook warnings. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-20 13:05:30 +02:00
Linus Torvalds	388f997620	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix verifier memory corruption and other bugs in BPF layer, from Alexei Starovoitov. 2) Add a conservative fix for doing BPF properly in the BPF classifier of the packet scheduler on ingress. Also from Alexei. 3) The SKB scrubber should not clear out the packet MARK and security label, from Herbert Xu. 4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue. 5) Pause handling is not correct in the stmmac driver because it doesn't take into consideration the RX and TX fifo sizes. From Vince Bridgers. 6) Failure path missing unlock in FOU driver, from Wang Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: dsa: use DEVICE_ATTR_RW to declare temp1_max netns: remove BUG_ONs from net_generic() IB/ipoib: Fix ndo_get_iflink sfc: Fix memcpy() with const destination compiler warning. altera tse: Fix network-delays and -retransmissions after high throughput. net: remove unused 'dev' argument from netif_needs_gso() act_mirred: Fix bogus header when redirecting from VLAN inet_diag: fix access to tcp cc information tcp: tcp_get_info() should fetch socket fields once net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state() skbuff: Do not scrub skb mark within the same name space Revert "net: Reset secmark when scrubbing packet" bpf: fix two bugs in verification logic when accessing 'ctx' pointer bpf: fix bpf helpers to use skb->mac_header relative offsets stmmac: Configure Flow Control to work correctly based on rxfifo size stmmac: Enable unicast pause frame detect in GMAC Register 6 stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree stmmac: Add defines and documentation for enabling flow control stmmac: Add properties for transmit and receive fifo sizes stmmac: fix oops on rmmod after assigning ip addr ...	2015-04-17 16:31:08 -04:00
Denys Vlasenko	2591ffd308	netns: remove BUG_ONs from net_generic() This inline has ~500 callsites. On 04/14/2015 08:37 PM, David Miller wrote: > That BUG_ON() was added 7 years ago, and I don't remember it ever > triggering or helping us diagnose something, so just remove it and > keep the function inlined. On x86 allyesconfig build: text data bss dec hex filename 82447071 22255384 20627456 125329911 77861f7 vmlinux4 82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: David S. Miller <davem@davemloft.net> CC: Jan Engelhardt <jengelh@medozas.de> CC: Jiri Pirko <jpirko@redhat.com> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-17 15:21:48 -04:00
Eric Dumazet	521f1cf1db	inet_diag: fix access to tcp cc information Two different problems are fixed here : 1) inet_sk_diag_fill() might be called without socket lock held. icsk->icsk_ca_ops can change under us and module be unloaded. -> Access to freed memory. Fix this using rcu_read_lock() to prevent module unload. 2) Some TCP Congestion Control modules provide information but again this is not safe against icsk->icsk_ca_ops change and nla_put() errors were ignored. Some sockets could not get the additional info if skb was almost full. Fix this by returning a status from get_info() handlers and using rcu protection as well. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-17 13:28:31 -04:00
Linus Torvalds	fa927894bb	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs update from Al Viro: "Now that net-next went in... Here's the next big chunk - killing ->aio_read() and ->aio_write(). There'll be one more pile today (direct_IO changes and generic_write_checks() cleanups/fixes), but I'd prefer to keep that one separate" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) ->aio_read and ->aio_write removed pcm: another weird API abuse infinibad: weird APIs switched to ->write_iter() kill do_sync_read/do_sync_write fuse: use iov_iter_get_pages() for non-splice path fuse: switch to ->read_iter/->write_iter switch drivers/char/mem.c to ->read_iter/->write_iter make new_sync_{read,write}() static coredump: accept any write method switch /dev/loop to vfs_iter_write() serial2002: switch to __vfs_read/__vfs_write ashmem: use __vfs_read() export __vfs_read() autofs: switch to __vfs_write() new helper: __vfs_write() switch hugetlbfs to ->read_iter() coda: switch to ->read_iter/->write_iter ncpfs: switch to ->read_iter/->write_iter net/9p: remove (now-)unused helpers p9_client_attach(): set fid->uid correctly ...	2015-04-15 13:22:56 -07:00
David S. Miller	bae97d8410	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next A final pull request, I know it's very late but this time I think it's worth a bit of rush. The following patchset contains Netfilter/nf_tables updates for net-next, more specifically concatenation support and dynamic stateful expression instantiation. This also comes with a couple of small patches. One to fix the ebtables.h userspace header and another to get rid of an obsolete example file in tree that describes a nf_tables expression. This time, I decided to paste the original descriptions. This will result in a rather large commit description, but I think these bytes to keep. Patrick McHardy says: ==================== netfilter: nf_tables: concatenation support The following patches add support for concatenations, which allow multi dimensional exact matches in O(1). The basic idea is to split the data registers, currently consisting of 4 registers of 16 bytes each, into smaller units, 16 registers of 4 bytes each, and making sure each register store always leaves the full 32 bit in a well defined state, meaning smaller stores will zero the remaining bits. Based on that, we can load multiple adjacent registers with different values, thereby building a concatenated bigger value, and use that value for set lookups. Sets are changed to use variable sized extensions for their key and data values, removing the fixed limit of 16 bytes while saving memory if less space is needed. As a side effect, these patches will allow some nice optimizations in the future, like using jhash2 in nft_hash, removing the masking in nft_cmp_fast, optimized data comparison using 32 bit word size etc. These are not done so far however. The patches are split up as follows: * the first five patches add length validation to register loads and stores to make sure we stay within bounds and prepare the validation functions for the new addressing mode * the next patches prepare for changing to 32 bit addressing by introducing a struct nft_regs, which holds the verdict register as well as the data registers. The verdict members are moved to a new struct nft_verdict to allow to pull struct nft_data out of the stack. * the next patches contain preparatory conversions of expressions and sets to use 32 bit addressing * the next patch introduces so far unused register conversion helpers for parsing and dumping register numbers over netlink * following is the real conversion to 32 bit addressing, consisting of replacing struct nft_data in struct nft_regs by an array of u32s and actually translating and validating the new register numbers. * the final two patches add support for variable sized data items and variable sized keys / data in set elements The patches have been verified to work correctly with nft binaries using both old and new addressing. ==================== Patrick McHardy says: ==================== netfilter: nf_tables: dynamic stateful expression instantiation The following patches are the grand finale of my nf_tables set work, using all the building blocks put in place by the previous patches to support something like iptables hashlimit, but a lot more powerful. Sets are extended to allow attaching expressions to set elements. The dynset expression dynamically instantiates these expressions based on a template when creating new set elements and evaluates them for all new or updated set members. In combination with concatenations this effectively creates state tables for arbitrary combinations of keys, using the existing expression types to maintain that state. Regular set GC takes care of purging expired states. We currently support two different stateful expressions, counter and limit. Using limit as a template we can express the functionality of hashlimit, but completely unrestricted in the combination of keys. Using counter we can perform accounting for arbitrary flows. The following examples from patch 5/5 show some possibilities. Userspace syntax is still WIP, especially the listing of state tables will most likely be seperated from normal set listings and use a more structured format: 1. Limit the rate of new SSH connections per host, similar to iptables hashlimit: flow ip saddr timeout 60s \ limit 10/second \ accept 2. Account network traffic between each set of /24 networks: flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \ counter 3. Account traffic to each host per user: flow skuid . ip daddr \ counter 4. Account traffic for each combination of source address and TCP flags: flow ip saddr . tcp flags \ counter The resulting set content after a Xmas-scan look like this: { 192.168.122.1 . fin \| psh \| urg : counter packets 1001 bytes 40040, 192.168.122.1 . ack : counter packets 74 bytes 3848, 192.168.122.1 . psh \| ack : counter packets 35 bytes 3144 } In the future the "expressions attached to elements" will be extended to also support user created non-stateful expressions to allow to efficiently select beween a set of parameter sets, f.i. a set of log statements with different prefixes based on the interface, which currently require one rule each. This will most likely have to wait until the next kernel version though. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-14 18:51:19 -04:00
David S. Miller	6e8a9d9148	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Al Viro says: ==================== netdev-related stuff in vfs.git There are several commits sitting in vfs.git that probably ought to go in via net-next.git. First of all, there's merge with vfs.git#iocb - that's Christoph's aio rework, which has triggered conflicts with the ->sendmsg() and ->recvmsg() patches a while ago. It's not so much Christoph's stuff that ought to be in net-next, as (pretty simple) conflict resolution on merge. The next chunk is switch to {compat_,}import_iovec/import_single_range - new safer primitives for initializing iov_iter. The primitives themselves come from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue), conversion of net/socket.c syscalls belongs in net-next, IMO. Next there's afs and rxrpc stuff from dhowells. And then there's sanitizing kernel_sendmsg et.al. + missing inlined helper for "how much data is left in msg->msg_iter" - this stuff is used in e.g. cifs stuff, but it belongs in net-next. That pile is pullable from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem I'll post the individual patches in there in followups; could you take a look and tell if everything in there is OK with you? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-13 18:18:05 -04:00
Eric Dumazet	789f558cfb	tcp/dccp: get rid of central timewait timer Using a timer wheel for timewait sockets was nice ~15 years ago when memory was expensive and machines had a single processor. This does not scale, code is ugly and source of huge latencies (Typically 30 ms have been seen, cpus spinning on death_lock spinlock.) We can afford to use an extra 64 bytes per timewait sock and spread timewait load to all cpus to have better behavior. Tested: On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1 on the target (lpaa24) Before patch : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 419594 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 437171 While test is running, we can observe 25 or even 33 ms latencies. lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2 lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2 After patch : About 90% increase of throughput : lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 810442 lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0 800992 And latencies are kept to minimal values during this load, even if network utilization is 90% higher : lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23 ... 1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-13 16:40:05 -04:00
Patrick McHardy	151d799a61	netfilter: nf_tables: mark stateful expressions Add a flag to mark stateful expressions. This is used for dynamic expression instanstiation to limit the usable expressions. Strictly speaking only the dynset expression can not be used in order to avoid recursion, but since dynamically instantiating non-stateful expressions will simply create an identical copy, which behaves no differently than the original, this limits to expressions where it actually makes sense to dynamically instantiate them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 20:12:31 +02:00
Patrick McHardy	f25ad2e907	netfilter: nf_tables: prepare for expressions associated to set elements Preparation to attach expressions to set elements: add a set extension type to hold an expression and dump the expression information with the set element. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 20:12:31 +02:00
Patrick McHardy	0b2d8a7b63	netfilter: nf_tables: add helper functions for expression handling Add helper functions for initializing, cloning, dumping and destroying a single expression that is not part of a rule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 20:12:31 +02:00
Patrick McHardy	7d7402642e	netfilter: nf_tables: variable sized set element keys / data This patch changes sets to support variable sized set element keys / data up to 64 bytes each by using variable sized set extensions. This allows to use concatenations with bigger data items suchs as IPv6 addresses. As a side effect, small keys/data now don't require the full 16 bytes of struct nft_data anymore but just the space they need. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:31 +02:00
Patrick McHardy	d0a11fc3dc	netfilter: nf_tables: support variable sized data in nft_data_init() Add a size argument to nft_data_init() and pass in the available space. This will be used by the following patches to support variable sized set element data. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:30 +02:00
Patrick McHardy	49499c3e6e	netfilter: nf_tables: switch registers to 32 bit addressing Switch the nf_tables registers from 128 bit addressing to 32 bit addressing to support so called concatenations, where multiple values can be concatenated over multiple registers for O(1) exact matches of multiple dimensions using sets. The old register values are mapped to areas of 128 bits for compatibility. When dumping register numbers, values are expressed using the old values if they refer to the beginning of a 128 bit area for compatibility. To support concatenations, register loads of less than a full 32 bit value need to be padded. This mainly affects the payload and exthdr expressions, which both unconditionally zero the last word before copying the data. Userspace fully passes the testsuite using both old and new register addressing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:29 +02:00
Patrick McHardy	b1c96ed37c	netfilter: nf_tables: add register parsing/dumping helpers Add helper functions to parse and dump register values in netlink attributes. These helpers will later be changed to take care of translation between the old 128 bit and the new 32 bit register numbers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:28 +02:00
Patrick McHardy	8cd8937ac0	netfilter: nf_tables: convert sets to u32 data pointers Simple conversion to use u32 pointers to the beginning of the data area to keep follow up patches smaller. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:27 +02:00
Patrick McHardy	e562d860d7	netfilter: nf_tables: kill nft_data_cmp() Only needlessly complicates things due to requiring specific argument types. Use memcmp directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:26 +02:00
Patrick McHardy	1ca2e1702c	netfilter: nf_tables: use struct nft_verdict within struct nft_data Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:24 +02:00
Patrick McHardy	a55e22e92f	netfilter: nf_tables: get rid of NFT_REG_VERDICT usage Replace the array of registers passed to expressions by a struct nft_regs, containing the verdict as a seperate member, which aliases to the NFT_REG_VERDICT register. This is needed to seperate the verdict from the data registers completely, so their size can be changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 17:17:07 +02:00
Patrick McHardy	d07db9884a	netfilter: nf_tables: introduce nft_validate_register_load() Change nft_validate_input_register() to not only validate the input register number, but also the length of the load, and rename it to nft_validate_register_load() to reflect that change. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:50 +02:00
Patrick McHardy	27e6d2017a	netfilter: nf_tables: kill nft_validate_output_register() All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:50 +02:00
Patrick McHardy	1ec10212f9	netfilter: nf_tables: rename nft_validate_data_load() The existing name is ambiguous, data is loaded as well when we read from a register. Rename to nft_validate_register_store() for clarity and consistency with the upcoming patch to introduce its counterpart, nft_validate_register_load(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:49 +02:00
Patrick McHardy	45d9bcda21	netfilter: nf_tables: validate len in nft_validate_data_load() For values spanning multiple registers, we need to validate that enough space is available from the destination register onwards. Add a len argument to nft_validate_data_load() and consolidate the existing length validations in preparation of that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:49 +02:00
David S. Miller	4e78eb0dbf	There isn't much left, but we have * new mac80211 internal software queue to allow drivers to have shorter hardware queues and pull on-demand * use rhashtable for mac80211 station table * minstrel rate control debug improvements and some refactoring * fix noisy message about TX power reduction * fix continuous message printing and activity if CRDA doesn't respond * fix VHT-related capabilities with "iw connect" or "iwconfig ..." * fix Kconfig for cfg80211 wireless extensions compatibility -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVJ7CPAAoJEDBSmw7B7bqr8+IQAKCAbUyd6PFRT5tcz9kW5GCW /ibb+n1e14yWKgNEe1gddUGKG/L3HGCBXNkCYzR2M8mlL7dLPqspaBcGHS4dx8F4 D0AuikqvtXIxfAXi0zmU2uo7rH7u2X34R2LtS8AlKByD+jmFvxMiPPvxNFgzJu/7 63UQm73p2pnu/KdXLW1OQEcpZtZJ9+N/uBiq9zbVdX3A8T84ME0oMyy+EAQqCZdK CcsTXHCnAgmmXWJlu1JRdopr1bd38mSGB70eXduFtPqDdmtQRnoaCQ9e+tJDA4j4 svEw0yDmsc4WG1EKLKKCRd3uFOZsng+lcXrHfpm5wlSPpCOItfQ9BzT3x1u6Y5JU Z1WMOMkkEce+95U7/RLoXwC/2RS3XelUXTde4cGIRMvO5drOrU58P0gdn3J+yKbv 6v+2GGKy/39tdXUOxIl3EZT/huIl+h1UNO8C2hyaEwdXK+X1zl31/u6kk1Ns18Wr YPEJixxHx0zR8jaZgDC7OlWLuqn4Ay+Ls9yCyIesdHzKpizJKqn83PntYnpJmxoA 9hlIyRDWnqH44KxzB85ni1C2Qudec3mcCWIWV7M+UoSC1Cgs/LxDzH7kRejR2ZIl vRhg5pqyr53L0h2lq5DO4Cj4UzbXb7YioKJRxjyKloNOlRrCZtK/VEsHbdsKEcIp d/wHj1AyFZeQfuhk8Qqr =mtuo -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== There isn't much left, but we have * new mac80211 internal software queue to allow drivers to have shorter hardware queues and pull on-demand * use rhashtable for mac80211 station table * minstrel rate control debug improvements and some refactoring * fix noisy message about TX power reduction * fix continuous message printing and activity if CRDA doesn't respond * fix VHT-related capabilities with "iw connect" or "iwconfig ..." * fix Kconfig for cfg80211 wireless extensions compatibility ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-12 20:43:46 -04:00
Al Viro	e1200fe68f	9p: switch p9_client_read() to passing struct iov_iter * ... and make it loop Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:28:27 -04:00
Al Viro	070b3656cf	9p: switch p9_client_write() to passing it struct iov_iter * ... and make it loop until it's done Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:28:25 -04:00
Al Viro	4f3b35c157	net/9p: switch the guts of p9_client_{read,write}() to iov_iter ... and have get_user_pages_fast() mapping fewer pages than requested to generate a short read/write. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:28:25 -04:00
Thomas Graf	78ebb0d00b	rtnetlink: Mark name argument of rtnl_create_link() const Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-10 12:42:40 -07:00
David S. Miller	c3d0dac693	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-04-09 We've had enough new patches during the past week (especially from Marcel) that it'd be good to still get these queued for 4.1. The majority of the changes are from Marcel with lots of cleanup & refactoring patches for the HCI UART driver. Marcel also split out some Broadcom & Intel vendor specific functionality into two new btintel & btbcm modules. In addition to the HCI driver changes there's the completion of our local OOB data interface for pairing, added support for requesting remote LE features when connecting, as well as a couple of minor fixes for mac802154. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-09 18:31:50 -04:00
David S. Miller	ca69d7102f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. They are: * nf_tables set timeout infrastructure from Patrick Mchardy. 1) Add support for set timeout support. 2) Add support for set element timeouts using the new set extension infrastructure. 4) Add garbage collection helper functions to get rid of stale elements. Elements are accumulated in a batch that are asynchronously released via RCU when the batch is full. 5) Add garbage collection synchronization helpers. This introduces a new element busy bit to address concurrent access from the netlink API and the garbage collector. 5) Add timeout support for the nft_hash set implementation. The garbage collector peridically checks for stale elements from the workqueue. * iptables/nftables cgroup fixes: 6) Ignore non full-socket objects from the input path, otherwise cgroup match may crash, from Daniel Borkmann. 7) Fix cgroup in nf_tables. 8) Save some cycles from xt_socket by skipping packet header parsing when skb->sk is already set because of early demux. Also from Daniel. * br_netfilter updates from Florian Westphal. 9) Save frag_max_size and restore it from the forward path too. 10) Use a per-cpu area to restore the original source MAC address when traffic is DNAT'ed. 11) Add helper functions to access physical devices. 12) Use these new physdev helper function from xt_physdev. 13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter state information. 14) Annotate original layer 2 protocol number in nf_bridge info, instead of using kludgy flags. 15) Also annotate the pkttype mangling when the packet travels back and forth from the IP to the bridge layer, instead of using a flag. * More nf_tables set enhancement from Patrick: 16) Fix possible usage of set variant that doesn't support timeouts. 17) Avoid spurious "set is full" errors from Netlink API when there are pending stale elements scheduled to be released. 18) Restrict loop checks to set maps. 19) Add support for dynamic set updates from the packet path. 20) Add support to store optional user data (eg. comments) per set element. BTW, I have also pulled net-next into nf-next to anticipate the conflict resolution between your okfn() signature changes and Florian's br_netfilter updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-09 14:46:04 -04:00
Varka Bhadram	23310f6f3a	mac802154: fix transmission power datatype Netlink attribute for the power is s8. But for the driver level operations we are collection power level value into integer. It has to be change to s8 from int. Signed-off-by: Varka Bhadram <varkab@cdac.in> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-04-09 19:56:15 +02:00
Varka Bhadram	94910d419a	mac802154: fix typo for device Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-04-09 09:19:33 +02:00
Marcel Holtmann	0fe29fd1cd	Bluetooth: Read LE remote features during connection establishment When establishing a Bluetooth LE connection, read the remote used features mask to determine which features are supported. This was not really needed with Bluetooth 4.0, but since Bluetooth 4.1 and also 4.2 have introduced new optional features, this becomes more important. This works the same as with BR/EDR where the connection enters the BT_CONFIG stage and hci_connect_cfm call is delayed until the remote features have been retrieved. Only after successfully receiving the remote features, the connection enters the BT_CONNECTED state. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-04-09 08:36:54 +03:00
Al Viro	da18428498	net: switch importing msghdr from userland to {compat_,}import_iovec() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-09 00:02:26 -04:00
Al Viro	237dae8890	Merge branch 'iocb' into for-davem trivial conflict in net/socket.c and non-trivial one in crypto - that one had evaded aio_complete() removal. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-09 00:01:38 -04:00
David S. Miller	82740b9ac2	NFC: 4.1 pull request This is the NFC pull request for 4.1. This is a shorter one than usual, as the Intel Field Peak NFC driver could not make it in time. We have: - A new driver for NXP NCI based chipsets, like e.g. the NPC100 or the PN7150. It currently only supports an i2c physical layer, but could easily be extended to work on top of e.g. SPI. This driver also includes support for user space triggered firmware updates. - A few minor st21nfc[ab] fixes, cleanups, and comments improvements. - A pn533 error return fix. - A few NFC related logs formatting cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVJV81AAoJEIqAPN1PVmxKM3sP/R6fKXMxtVxXsiPzBMk+SpBI onNXbCx27rp1lHTFznE16JAaaSfcQEwSQmYx7xa6KXqRYVlDfRfC+5R+rPGrCjfH kV/eLBNnYpmPw0ViVT7dsWK0b4me0k5pr9ki9mze3YxvuMbA5vdv0tvuRFz5IRF/ hl+WI5pntGuRtnIyBKIauAMylylUYVvCBGhmHnveiX0Dp8noJLBSV0wvdzujm51S +Uio/jHlUEV2lotrQBOfWNtEkwonXSwzZWSzimBCyEGLAwTx4lGXHmQftOtPd3zE sOT7Gw77ZCsulHoHiJyC0KpDSDS0NYVrtTI5BiuGgAivGi0YZw2XD3CYRIdy0m2Z lMoodYdiCgsxmrU6I6Rd/7DQBxD0Vhc+CGAyk41f7AwU4Fwq105kpSLupTtU+NzT ZpdvSXeirU8sxt+3WDOgv8esyYGZVVD8GuBbofMZQZ0vLq+FcDpYAW4w3LKvvi6X C+WN8f7SCI0kqpd4leyl6EG3SoQKFyPWobu0IlV520R9b76iBcyqooTIpvVa4Yk7 az6fKhi9gK/T6FHW78y6fnkczd47JKrC924m+g6P/hhTD5zQ956ferp0uFTjkPtF 8kNgRT7OIRi1JO783cnQx1uA61axC3GX1HFzsD9paVkTwRNtJ3eFsxtcYt7Nfv8D WGNWRQp5LmBsD/SqFBfk =MzOJ -----END PGP SIGNATURE----- Merge tag 'nfc-next-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC: 4.1 pull request This is the NFC pull request for 4.1. This is a shorter one than usual, as the Intel Field Peak NFC driver could not make it in time. We have: - A new driver for NXP NCI based chipsets, like e.g. the NPC100 or the PN7150. It currently only supports an i2c physical layer, but could easily be extended to work on top of e.g. SPI. This driver also includes support for user space triggered firmware updates. - A few minor st21nfc[ab] fixes, cleanups, and comments improvements. - A pn533 error return fix. - A few NFC related logs formatting cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-08 17:12:50 -04:00
Pablo Neira Ayuso	aadd51aa71	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflicts between `5888b93` ("Merge branch 'nf-hook-compress'") and Florian Westphal br_netfilter works. Conflicts: net/bridge/br_netfilter.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-08 18:30:21 +02:00
Patrick McHardy	68e942e88a	netfilter: nf_tables: support optional userdata for set elements Add an userdata set extension and allow the user to attach arbitrary data to set elements. This is intended to hold TLV encoded data like comments or DNS annotations that have no meaning to the kernel. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-08 16:58:27 +02:00
Patrick McHardy	22fe54d5fe	netfilter: nf_tables: add support for dynamic set updates Add a new "dynset" expression for dynamic set updates. A new set op ->update() is added which, for non existant elements, invokes an initialization callback and inserts the new element. For both new or existing elements the extenstion pointer is returned to the caller to optionally perform timer updates or other actions. Element removal is not supported so far, however that seems to be a rather exotic need and can be added later on. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-08 16:58:27 +02:00
Patrick McHardy	11113e190b	netfilter: nf_tables: support different set binding types Currently a set binding is assumed to be related to a lookup and, in case of maps, a data load. In order to use bindings for set updates, the loop detection checks must be restricted to map operations only. Add a flags member to the binding struct to hold the set "action" flags such as NFT_SET_MAP, and perform loop detection based on these. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-08 16:58:27 +02:00
Patrick McHardy	3dd0673ac3	netfilter: nf_tables: prepare set element accounting for async updates Use atomic operations for the element count to avoid races with async updates. To properly handle the transactional semantics during netlink updates, deleted but not yet committed elements are accounted for seperately and are treated as being already removed. This means for the duration of a netlink transaction, the limit might be exceeded by the amount of elements deleted. Set implementations must be prepared to handle this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-08 16:58:27 +02:00
Sheng Yong	8bc0034cf6	net: remove extra newlines Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 22:24:37 -04:00
Daniel Lee	2646c831c0	tcp: RFC7413 option support for Fast Open client Fast Open has been using an experimental option with a magic number (RFC6994). This patch makes the client by default use the RFC7413 option (34) to get and send Fast Open cookies. This patch makes the client solicit cookies from a given server first with the RFC7413 option. If that fails to elicit a cookie, then it tries the RFC6994 experimental option. If that also fails, it uses the RFC7413 option on all subsequent connect attempts. If the server returns a Fast Open cookie then the client caches the form of the option that successfully elicited a cookie, and uses that form on later connects when it presents that cookie. The idea is to gradually obsolete the use of experimental options as the servers and clients upgrade, while keeping the interoperability meanwhile. Signed-off-by: Daniel Lee <Longinus00@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 18:36:39 -04:00
Daniel Lee	7f9b838b71	tcp: RFC7413 option support for Fast Open server Fast Open has been using the experimental option with a magic number (RFC6994) to request and grant Fast Open cookies. This patch enables the server to support the official IANA option 34 in RFC7413 in addition. The change has passed all existing Fast Open tests with both old and new options at Google. Signed-off-by: Daniel Lee <Longinus00@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 18:36:39 -04:00
Johan Hedberg	d619ffcfbd	Bluetooth: Update SSP OOB data EIR definitions Since Bluetooth 4.1 there are two additional values for SSP OOB data, namely C-256 and R-256. This patch updates the EIR definitions to take into account both the 192 and 256 bit variants of C and R. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-04-07 23:11:37 +02:00
David Miller	79b16aadea	udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). That was we can make sure the output path of ipv4/ipv6 operate on the UDP socket rather than whatever random thing happens to be in skb->sk. Based upon a patch by Jiri Pirko. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2015-04-07 15:29:08 -04:00
David Miller	7026b1ddb6	netfilter: Pass socket pointer down through okfn(). On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 15:25:55 -04:00
Marcel Holtmann	2d7cc19eeb	Bluetooth: Remove hci_recv_stream_fragment function The hci_recv_stream_fragment function should have never been introduced in the first place. The Bluetooth core does not need to know anything about the HCI transport protocol. With all transport protocol specific detailed moved back into the drivers where they belong (mainly generic USB and UART drivers), this function can now be removed. This reduces the size of hci_dev structure and also removes an exported symbol from the Bluetooth core module. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-04-07 18:47:10 +02:00
Marcel Holtmann	5c7d2dd285	Bluetooth: Make data pointer of hci_recv_stream_fragment const The data pointer provided to hci_recv_stream_fragment function should have been marked const. The function has no business in modifying the original data. So fix this now. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-04-07 18:47:09 +02:00
David S. Miller	7abccdba25	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-04-04 Here's what's probably the last bluetooth-next pull request for 4.1: - Fixes for LE advertising data & advertising parameters - Fix for race condition with HCI_RESET flag - New BNEPGETSUPPFEAT ioctl, needed for certification - New HCI request callback type to get the resulting skb - Cleanups to use BIT() macro wherever possible - Consolidate Broadcom device entries in the btusb HCI driver - Check for valid flags in CMTP, HIDP & BNEP - Disallow local privacy & OOB data combo to prevent a potential race - Expose SMP & ECDH selftest results through debugfs - Expose current Device ID info through debugfs Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-07 11:47:52 -04:00
Johannes Berg	29464ccc78	cfg80211: move IE split utilities here from mac80211 As the next patch will require the IE splitting utility functions in cfg80211, move them there from mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-04-07 13:56:41 +02:00
David S. Miller	c85d6975ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mellanox/mlx4/cmd.c net/core/fib_rules.c net/ipv4/fib_frontend.c The fib_rules.c and fib_frontend.c conflicts were locking adjustments in 'net' overlapping addition and removal of code in 'net-next'. The mlx4 conflict was a bug fix in 'net' happening in the same place a constant was being replaced with a more suitable macro. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-06 22:34:15 -04:00
hannes@stressinduktion.org	f60e5990d9	ipv6: protect skb->sk accesses from recursive dereference inside the stack We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-06 16:12:49 -04:00
Christophe Ricard	1f74f323e2	nfc: nci: Add comment to explain NCI_HCI_MAX_PIPES According to specification etsi 102 622 chapter 4.4 pipes identifier is 7 bits long giving a 127 possible pipes value. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-04-06 00:19:05 +02:00
Christophe Ricard	0fc4a1291a	nfc: Reduce nfc_evt_transaction params length to 0 According to etsi 102 622 chapter 11.2.2.4 EVT_TRANSACTION, the nfc_evt_transaction parameters can be 0 up to 255 byte long. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-04-06 00:18:29 +02:00
Christophe Ricard	0b040964a0	nfc: hci: Add comment to explain NFC_HCI_MAX_PIPES According to specification etsi 102 622 chapter 4.4 pipes identifier is 7 bits long giving a 127 possible pipes value. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-04-06 00:18:20 +02:00
David S. Miller	073bfd5686	netfilter: Pass nf_hook_state through nft_set_pktinfo*(). Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-04 12:54:27 -04:00
David S. Miller	8fe22382d1	netfilter: Pass nf_hook_state through nf_nat_ipv6_{in,out,fn,local_fn}(). Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-04 12:48:08 -04:00
David S. Miller	d7cf4081ed	netfilter: Pass nf_hook_state through nf_nat_ipv4_{in,out,fn,local_fn}(). Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-04 12:45:19 -04:00

1 2 3 4 5 ...

8596 Commits