linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-17 04:07:58 +07:00

Author	SHA1	Message	Date
Florian Westphal	e56894356f	netfilter: conntrack: remove l4proto destroy hook Only one user (gre), add a direct call and remove this facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	2a389de86e	netfilter: conntrack: remove l4proto init and get_net callbacks Those were needed we still had modular trackers. As we don't have those anymore, prefer direct calls and remove all the (un)register infrastructure associated with this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	70aed4647c	netfilter: conntrack: remove sysctl registration helpers After previous patch these are not used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	b884fa4617	netfilter: conntrack: unify sysctl handling Due to historical reasons, all l4 trackers register their own sysctls. This leads to copy&pasted boilerplate code, that does exactly same thing, just with different data structure. Place all of this in a single file. This allows to remove the various ctl_table pointers from the ct_netns structure and reduces overall code size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	303e0c5589	netfilter: conntrack: avoid unneeded nf_conntrack_l4proto lookups after removal of the packet and invert function pointers, several places do not need to lookup the l4proto structure anymore. Remove those lookups. The function nf_ct_invert_tuplepr becomes redundant, replace it with nf_ct_invert_tuple everywhere. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	edf0338dab	netfilter: conntrack: remove pernet l4 proto register interface No used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	44fb87f635	netfilter: conntrack: remove remaining l4proto indirect packet calls Now that all l4trackers are builtin, no need to use a mix of direct and indirect calls. This removes the last two users: gre and the generic l4 protocol tracker. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	b184356d0a	netfilter: conntrack: remove module owner field No need to get/put module owner reference, none of these can be removed anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	197c4300ae	netfilter: conntrack: remove invert_tuple callback Only used by icmp(v6). Prefer a direct call and remove this function from the l4proto struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	df5e162908	netfilter: conntrack: remove pkt_to_tuple callback GRE is now builtin, so we can handle it via direct call and remove the callback. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:34 +01:00
Florian Westphal	751fc301ec	netfilter: conntrack: remove net_id No users anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	22fc4c4c9f	netfilter: conntrack: gre: switch module to be built-in This makes the last of the modular l4 trackers 'bool'. After this, all infrastructure to handle dynamic l4 protocol registration becomes obsolete and can be removed in followup patches. Old: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko New: 313728 net/netfilter/nf_conntrack.ko Old: text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko New: 112095 21381 240 133716 20a54 nf_conntrack.ko The size increase is only temporary. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	202e651cd4	netfilter: conntrack: gre: convert rwlock to rcu We can use gre. Lock is only needed when a new expectation is added. In case a single spinlock proves to be problematic we can either add one per netns or use an array of locks combined with net_hash_mix() or similar to pick the 'correct' one. But given this is only needed for an expectation rather than per packet a single one should be ok. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	e2e48b4716	netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls rather than handling them via indirect call, use a direct one instead. This leaves GRE as the last user of this indirect call facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	a47c540481	netfilter: conntrack: handle builtin l4proto packet functions via direct calls The l4 protocol trackers are invoked via indirect call: l4proto->packet(). With one exception (gre), all l4trackers are builtin, so we can make .packet optional and use a direct call for most protocols. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Phil Sutter	75dd48e2e4	netfilter: nf_tables: Support RULE_ID reference in new rule To allow for a batch to contain rules in arbitrary ordering, introduce NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION but contains the ID of another rule within the same batch. This helps iptables-nft-restore handling dumps with mixed insert/append commands correctly. Note that NFTA_RULE_POSITION takes precedence over NFTA_RULE_POSITION_ID, so if the former is present, the latter is ignored. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	8e2f311a68	netfilter: physdev: relax br_netfilter dependency Following command: iptables -D FORWARD -m physdev ... causes connectivity loss in some setups. Reason is that iptables userspace will probe kernel for the module revision of the physdev patch, and physdev has an artificial dependency on br_netfilter (xt_physdev use makes no sense unless a br_netfilter module is loaded). This causes the "phydev" module to be loaded, which in turn enables the "call-iptables" infrastructure. bridged packets might then get dropped by the iptables ruleset. The better fix would be to change the "call-iptables" defaults to 0 and enforce explicit setting to 1, but that breaks backwards compatibility. This does the next best thing: add a request_module call to checkentry. This was a stray '-D ... -m physdev' won't activate br_netfilter anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	827318feb6	netfilter: conntrack: remove helper hook again place them into the confirm one. Old: hook (300): ipv4/6_help() first call helper, then seqadj. hook (INT_MAX): confirm Now: hook (INT_MAX): confirm, first call helper, then seqadj, then confirm Not having the extra call is noticeable in bechmarks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	10870dd89e	netfilter: nf_tables: add direct calls for all builtin expressions With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func) check and and use direct calls for all the built-in expressions. ~15% improvement in pathological cases. checkpatch doesn't like the X macro due to the embedded return statement, but the macro has a very limited scope so I don't think its a problem. I would like to avoid bugs of the form If (e->ops->eval == (unsigned long)nft_foo_eval) nft_bar_eval(); and open-coded if ()/else if()/else cascade, thus the macro. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	4d44175aa5	netfilter: nf_tables: handle nft_object lookups via rhltable Instead of linear search, use rhlist interface to look up the objects. This fixes rulesets with thousands of named objects (quota, counters and the like). We only use a single table for this and consider the address of the table we're doing the lookup in as a part of the key. This reduces restore time of a sample ruleset with ~20k named counters from 37 seconds to 0.8 seconds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:33 +01:00
Florian Westphal	d152159b89	netfilter: nf_tables: prepare nft_object for lookups via hashtable Add a 'key' structure for object, so we can look them up by name + table combination (the name can be the same in each table). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-01-18 15:02:32 +01:00
Eric Dumazet	6bcdc40ddd	tcp: move rx_opt & syn_data_acked init to tcp_disconnect() If we make sure all listeners have these fields cleared, then a clone will also inherit zero values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	792c4354a5	tcp: move tp->rack init to tcp_disconnect() If we make sure all listeners have proper tp->rack value, then a clone will also inherit proper initial value. Note that fresh sockets init tp->rack from tcp_init_sock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	6cda8b7493	tcp: move app_limited init to tcp_disconnect() If we make sure all listeners have app_limited set to ~0U, then a clone will also inherit proper initial value. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	5c701549c9	tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect() If we make sure all listeners have these fields cleared, then a clone will also inherit zero values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	5d83676462	tcp: do not clear urg_data in tcp_create_openreq_child All listeners have this field cleared already, since tcp_disconnect() clears it and newly created sockets have also a zero value here. So a clone will inherit a zero value here. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	3a9a57f637	tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect() Passive connections can inherit proper value by cloning, if we make sure all listeners have the proper values there. tcp_disconnect() was setting snd_cwnd to 2, which seems quite obsolete since IW10 adoption. Also remove an obsolete comment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	b9e2e689aa	tcp: move mdev_us init to tcp_disconnect() If we make sure a listener always has its mdev_us field set to TCP_TIMEOUT_INIT, we do not need to rewrite this field after a new clone is created. tcp_disconnect() is very seldom used in real applications. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	a0070e463f	tcp: do not clear srtt_us in tcp_create_openreq_child All listeners have this field cleared already, since tcp_disconnect() clears it and newly created sockets have also a zero value here. So a clone will inherit a zero value here. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:05 -08:00
Eric Dumazet	eb2c80ca87	tcp: do not clear packets_out in tcp_create_openreq_child() New sockets have this field cleared, and tcp_disconnect() calls tcp_write_queue_purge() which among other things also clear tp->packets_out So a listener is guaranteed to have this field cleared. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:04 -08:00
Eric Dumazet	6a408147ea	tcp: move icsk_rto init to tcp_disconnect() If we make sure a listener always has its icsk_rto field set to TCP_TIMEOUT_INIT, we do not need to rewrite this field after a new clone is created. tcp_disconnect() is very seldom used in real applications. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:04 -08:00
Eric Dumazet	b84235e291	tcp: do not set snd_ssthresh in tcp_create_openreq_child() New sockets get the field set to TCP_INFINITE_SSTHRESH in tcp_init_sock() In case a socket had this field changed and transitions to TCP_LISTEN state, tcp_disconnect() also makes sure snd_ssthresh is set to TCP_INFINITE_SSTHRESH. So a listener has this field set to TCP_INFINITE_SSTHRESH already. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:19:04 -08:00
YueHaibing	d4fb30f6f1	tipc: remove unneeded semicolon in trace.c Remove unneeded semicolon Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 22:04:43 -08:00
Peter Oskolkov	22c2ad616b	net: add a route cache full diagnostic message In some testing scenarios, dst/route cache can fill up so quickly that even an explicit GC call occasionally fails to clean it up. This leads to sporadically failing calls to dst_alloc and "network unreachable" errors to the user, which is confusing. This patch adds a diagnostic message to make the cause of the failure easier to determine. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:37:25 -08:00
Petr Machata	6685987c29	switchdev: Add extack argument to call_switchdev_notifiers() A follow-up patch will enable vetoing of FDB entries. Make it possible to communicate details of why an FDB entry is not acceptable back to the user. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:47 -08:00
Petr Machata	87b0984ebf	net: Add extack argument to ndo_fdb_add() Drivers may not be able to support certain FDB entries, and an error code is insufficient to give clear hints as to the reasons of rejection. In order to make it possible to communicate the rejection reason, extend ndo_fdb_add() with an extack argument. Adapt the existing implementations of ndo_fdb_add() to take the parameter (and ignore it). Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:47 -08:00
Yuchung Cheng	c1d5674f83	tcp: less aggressive window probing on local congestion Previously when the sender fails to send (original) data packet or window probes due to congestion in the local host (e.g. throttling in qdisc), it'll retry within an RTO or two up to 500ms. In low-RTT networks such as data-centers, RTO is often far below the default minimum 200ms. Then local host congestion could trigger a retry storm pouring gas to the fire. Worse yet, the probe counter (icsk_probes_out) is not properly updated so the aggressive retry may exceed the system limit (15 rounds) until the packet finally slips through. On such rare events, it's wise to retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behaviors when a keep-alive probe or RTO retry is dropped due to local congestion. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	590d2026d6	tcp: retry more conservatively on local congestion Previously when the sender fails to retransmit a data packet on timeout due to congestion in the local host (e.g. throttling in qdisc), it'll retry within an RTO up to 500ms. In low-RTT networks such as data-centers, RTO is often far below the default minimum 200ms (and the cap 500ms). Then local host congestion could trigger a retry storm pouring gas to the fire. Worse yet, the retry counter (icsk_retransmits) is not properly updated so the aggressive retry may exceed the system limit (15 rounds) until the packet finally slips through. On such rare events, it's wise to retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behavior when a keep-alive probe is dropped due to local congestion. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	9721e709fa	tcp: simplify window probe aborting on USER_TIMEOUT Previously we use the next unsent skb's timestamp to determine when to abort a socket stalling on window probes. This no longer works as skb timestamp reflects the last instead of the first transmission. Instead we can estimate how long the socket has been stalling with the probe count and the exponential backoff behavior. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	01a523b071	tcp: create a helper to model exponential backoff Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	c7d13c8faa	tcp: properly track retry time on passive Fast Open This patch addresses a corner issue on timeout behavior of a passive Fast Open socket. A passive Fast Open server may write and close the socket when it is re-trying SYN-ACK to complete the handshake. After the handshake is completely, the server does not properly stamp the recovery start time (tp->retrans_stamp is 0), and the socket may abort immediately on the very first FIN timeout, instead of retying until it passes the system or user specified limit. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	7ae189759c	tcp: always set retrans_stamp on recovery Previously TCP socket's retrans_stamp is not set if the retransmission has failed to send. As a result if a socket is experiencing local issues to retransmit packets, determining when to abort a socket is complicated w/o knowning the starting time of the recovery since retrans_stamp may remain zero. This complication causes sub-optimal behavior that TCP may use the latest, instead of the first, retransmission time to compute the elapsed time of a stalling connection due to local issues. Then TCP may disrecard TCP retries settings and keep retrying until it finally succeed: not a good idea when the local host is already strained. The simple fix is to always timestamp the start of a recovery. It's worth noting that retrans_stamp is also used to compare echo timestamp values to detect spurious recovery. This patch does not break that because retrans_stamp is still later than when the original packet was sent. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	7f12422c48	tcp: always timestamp on every skb transmission Previously TCP skbs are not always timestamped if the transmission failed due to memory or other local issues. This makes deciding when to abort a socket tricky and complicated because the first unacknowledged skb's timestamp may be 0 on TCP timeout. The straight-forward fix is to always timestamp skb on every transmission attempt. Also every skb retransmission needs to be flagged properly to avoid RTT under-estimation. This can happen upon receiving an ACK for the original packet and the a previous (spurious) retransmission has failed. It's worth noting that this reverts to the old time-stamping style before commit `8c72c65b42` ("tcp: update skb->skb_mstamp more carefully") which addresses a problem in computing the elapsed time of a stalled window-probing socket. The problem will be addressed differently in the next patches with a simpler approach. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	88f8598d0a	tcp: exit if nothing to retransmit on RTO timeout Previously TCP only warns if its RTO timer fires and the retransmission queue is empty, but it'll cause null pointer reference later on. It's better to avoid such catastrophic failure and simply exit with a warning. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
David Herrmann	49b4994c14	net/ipv6/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE The udp-tunnel setup allows binding sockets to a network device. Prefer the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name just to look it up in the ioctl again. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:52 -08:00
David Herrmann	2eadee72db	net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE The udp-tunnel setup allows binding sockets to a network device. Prefer the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name just to look it up in the ioctl again. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:52 -08:00
David Herrmann	f5dd3d0c96	net: introduce SO_BINDTOIFINDEX sockopt This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:51 -08:00
Vakul Garg	692d7b5d1f	tls: Fix recvmsg() to be able to peek across multiple records This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:20:40 -08:00
YueHaibing	01cb8a1a64	net/tls: Make function tls_sw_do_sendpage static Fixes the following sparse warning: net/tls/tls_sw.c:1023:5: warning: symbol 'tls_sw_do_sendpage' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:45:21 -08:00
YueHaibing	f3de19af0f	net/tls: remove unused function tls_sw_sendpage_locked There are no in-tree callers. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:44:58 -08:00

1 2 3 4 5 ...

54439 Commits