linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-20 16:22:07 +07:00

History

Neal Cardwell 0f8782ea14 tcp_bbr: add BBR congestion control This commit implements a new TCP congestion control algorithm: BBR (Bottleneck Bandwidth and RTT). A detailed description of BBR will be published in ACM Queue, Vol. 14 No. 5, September-October 2016, as "BBR: Congestion-Based Congestion Control". BBR has significantly increased throughput and reduced latency for connections on Google's internal backbone networks and google.com and YouTube Web servers. BBR requires only changes on the sender side, not in the network or the receiver side. Thus it can be incrementally deployed on today's Internet, or in datacenters. The Internet has predominantly used loss-based congestion control (largely Reno or CUBIC) since the 1980s, relying on packet loss as the signal to slow down. While this worked well for many years, loss-based congestion control is unfortunately out-dated in today's networks. On today's Internet, loss-based congestion control causes the infamous bufferbloat problem, often causing seconds of needless queuing delay, since it fills the bloated buffers in many last-mile links. On today's high-speed long-haul links using commodity switches with shallow buffers, loss-based congestion control has abysmal throughput because it over-reacts to losses caused by transient traffic bursts. In 1981 Kleinrock and Gale showed that the optimal operating point for a network maximizes delivered bandwidth while minimizing delay and loss, not only for single connections but for the network as a whole. Finding that optimal operating point has been elusive, since any single network measurement is ambiguous: network measurements are the result of both bandwidth and propagation delay, and those two cannot be measured simultaneously. While it is impossible to disambiguate any single bandwidth or RTT measurement, a connection's behavior over time tells a clearer story. BBR uses a measurement strategy designed to resolve this ambiguity. It combines these measurements with a robust servo loop using recent control systems advances to implement a distributed congestion control algorithm that reacts to actual congestion, not packet loss or transient queue delay, and is designed to converge with high probability to a point near the optimal operating point. In a nutshell, BBR creates an explicit model of the network pipe by sequentially probing the bottleneck bandwidth and RTT. On the arrival of each ACK, BBR derives the current delivery rate of the last round trip, and feeds it through a windowed max-filter to estimate the bottleneck bandwidth. Conversely it uses a windowed min-filter to estimate the round trip propagation delay. The max-filtered bandwidth and min-filtered RTT estimates form BBR's model of the network pipe. Using its model, BBR sets control parameters to govern sending behavior. The primary control is the pacing rate: BBR applies a gain multiplier to transmit faster or slower than the observed bottleneck bandwidth. The conventional congestion window (cwnd) is now the secondary control; the cwnd is set to a small multiple of the estimated BDP (bandwidth-delay product) in order to allow full utilization and bandwidth probing while bounding the potential amount of queue at the bottleneck. When a BBR connection starts, it enters STARTUP mode and applies a high gain to perform an exponential search to quickly probe the bottleneck bandwidth (doubling its sending rate each round trip, like slow start). However, instead of continuing until it fills up the buffer (i.e. a loss), or until delay or ACK spacing reaches some threshold (like Hystart), it uses its model of the pipe to estimate when that pipe is full: it estimates the pipe is full when it notices the estimated bandwidth has stopped growing. At that point it exits STARTUP and enters DRAIN mode, where it reduces its pacing rate to drain the queue it estimates it has created. Then BBR enters steady state. In steady state, PROBE_BW mode cycles between first pacing faster to probe for more bandwidth, then pacing slower to drain any queue that created if no more bandwidth was available, and then cruising at the estimated bandwidth to utilize the pipe without creating excess queue. Occasionally, on an as-needed basis, it sends significantly slower to probe for RTT (PROBE_RTT mode). BBR has been fully deployed on Google's wide-area backbone networks and we're experimenting with BBR on Google.com and YouTube on a global scale. Replacing CUBIC with BBR has resulted in significant improvements in network latency and application (RPC, browser, and video) metrics. For more details please refer to our upcoming ACM Queue publication. Example performance results, to illustrate the difference between BBR and CUBIC: Resilience to random loss (e.g. from shallow buffers): Consider a netperf TCP_STREAM test lasting 30 secs on an emulated path with a 10Gbps bottleneck, 100ms RTT, and 1% packet loss rate. CUBIC gets 3.27 Mbps, and BBR gets 9150 Mbps (2798x higher). Low latency with the bloated buffers common in today's last-mile links: Consider a netperf TCP_STREAM test lasting 120 secs on an emulated path with a 10Mbps bottleneck, 40ms RTT, and 1000-packet bottleneck buffer. Both fully utilize the bottleneck bandwidth, but BBR achieves this with a median RTT 25x lower (43 ms instead of 1.09 secs). Our long-term goal is to improve the congestion control algorithms used on the Internet. We are hopeful that BBR can help advance the efforts toward this goal, and motivate the community to do further research. Test results, performance evaluations, feedback, and BBR-related discussions are very welcome in the public e-mail list for BBR: https://groups.google.com/forum/#!forum/bbr-dev NOTE: BBR must be used with the fq qdisc ("man tc-fq") with pacing enabled, since pacing is integral to the BBR design and implementation. BBR without pacing would not function properly, and may incur unnecessary high packet loss rates. Signed-off-by: Van Jacobson <vanj@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2016-09-21 00:23:01 -04:00
..
netfilter	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-12 15:52:44 -07:00
af_inet.c	gso: Support partial splitting at the frag_list pointer	2016-09-19 20:59:34 -04:00
ah4.c	ah4: Fix error return in ah_input().	2015-08-25 13:38:50 -07:00
arp.c	net: rename NET_{ADD\|INC}_STATS_BH()	2016-04-27 22:48:24 -04:00
cipso_ipv4.c	Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux into next	2016-07-07 10:15:34 +10:00
datagram.c	net: Set sk_txhash from a random number	2015-07-29 22:44:04 -07:00
devinet.c	netconf: add a notif when settings are created	2016-09-01 15:18:08 -07:00
esp4.c	esp: Fix ESN generation under UDP encapsulation	2016-06-23 11:52:00 -04:00
fib_frontend.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-12 15:52:44 -07:00
fib_lookup.h	ipv4: consider TOS in fib_select_default	2015-07-24 22:46:11 -07:00
fib_rules.c	net: flow: Add l3mdev flow update	2016-09-10 23:12:51 -07:00
fib_semantics.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-12 15:52:44 -07:00
fib_trie.c	ipv4: fix value of ->nlmsg_flags reported in RTM_NEWROUTE events	2016-09-09 16:50:23 -07:00
fou.c	fou: make nla_policy const	2016-09-01 14:09:00 -07:00
gre_demux.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-06-30 05:03:36 -04:00
gre_offload.c	gso: Support partial splitting at the frag_list pointer	2016-09-19 20:59:34 -04:00
icmp.c	net: icmp: rename ICMPMSGIN_INC_STATS_BH()	2016-04-27 22:48:23 -04:00
igmp.c	net/multicast: should not send source list records when have filter mode change	2016-08-08 16:04:39 -07:00
inet_connection_sock.c	timers, net/ipv4/inet: Initialize connection request timers as pinned	2016-07-07 10:35:06 +02:00
inet_diag.c	net: inet: diag: expose the socket mark to privileged processes.	2016-09-08 16:13:09 -07:00
inet_fragment.c	net: disable fragment reassembly if high_thresh is zero	2016-06-05 22:56:42 -04:00
inet_hashtables.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-05-04 00:52:29 -04:00
inet_timewait_sock.c	timers, net/ipv4/inet: Initialize connection request timers as pinned	2016-07-07 10:35:06 +02:00
inetpeer.c	net: Add helper function to compare inetpeer addresses	2015-08-28 13:32:36 -07:00
ip_forward.c	net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags	2016-07-19 16:40:22 -07:00
ip_fragment.c	net: rename IP_INC_STATS_BH()	2016-04-27 22:48:23 -04:00
ip_gre.c	net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()	2016-09-10 20:53:55 -07:00
ip_input.c	net: original ingress device index in PKTINFO	2016-05-11 19:31:40 -04:00
ip_options.c	net: ipv4: Convert IP network timestamps to be y2038 safe	2016-03-01 17:18:44 -05:00
ip_output.c	net: l3mdev: remove redundant calls	2016-09-10 23:12:52 -07:00
ip_sockglue.c	ipv4: accept u8 in IP_TOS ancillary data	2016-09-08 17:45:57 -07:00
ip_tunnel_core.c	ip_tunnel: do not clear l4 hashes	2016-09-09 19:33:11 -07:00
ip_tunnel.c	ip_tunnel: add collect_md mode to IPIP tunnel	2016-09-17 10:13:07 -04:00
ip_vti.c	vti: flush x-netns xfrm cache when vti interface is removed	2016-08-09 12:57:49 -07:00
ipcomp.c	ipv4: coding style: comparison for equality with NULL	2015-04-03 12:11:15 -04:00
ipconfig.c	net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout	2016-08-22 21:04:41 -07:00
ipip.c	ip_tunnel: add collect_md mode to IPIP tunnel	2016-09-17 10:13:07 -04:00
ipmr.c	net: ipmr/ip6mr: update lastuse on entry change	2016-07-26 15:18:31 -07:00
Kconfig	tcp_bbr: add BBR congestion control	2016-09-21 00:23:01 -04:00
Makefile	tcp_bbr: add BBR congestion control	2016-09-21 00:23:01 -04:00
netfilter.c	ipv4: Pass struct net into ip_route_me_harder	2015-09-29 20:21:32 +02:00
ping.c	sock: enable timestamping using control messages	2016-04-04 15:50:30 -04:00
proc.c	tcp: md5: add LINUX_MIB_TCPMD5FAILURE counter	2016-08-25 16:43:11 -07:00
protocol.c
raw.c	net: ipv4: Remove l3mdev_get_saddr	2016-09-10 23:12:53 -07:00
route.c	net: l3mdev: remove redundant calls	2016-09-10 23:12:52 -07:00
syncookies.c	net: rename NET_{ADD\|INC}_STATS_BH()	2016-04-27 22:48:24 -04:00
sysctl_net_ipv4.c	ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n	2016-05-23 14:32:06 -07:00
tcp_bbr.c	tcp_bbr: add BBR congestion control	2016-09-21 00:23:01 -04:00
tcp_bic.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_cdg.c	tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict	2016-09-21 00:22:59 -04:00
tcp_cong.c	tcp: new CC hook to set sending rate with rate_sample in any CA state	2016-09-21 00:23:01 -04:00
tcp_cubic.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_dctcp.c	tcp: return sizeof tcp_dctcp_info in dctcp_get_info()	2016-06-14 23:46:30 -07:00
tcp_diag.c	net: diag: Fix refcnt leak in error path destroying socket	2016-08-23 23:11:36 -07:00
tcp_fastopen.c	tcp: fastopen: avoid negative sk_forward_alloc	2016-09-08 16:08:10 -07:00
tcp_highspeed.c	tcp: add tcp_in_slow_start helper	2015-07-09 14:22:52 -07:00
tcp_htcp.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_hybla.c	tcp: do not slow start when cwnd equals ssthresh	2015-07-09 14:22:52 -07:00
tcp_illinois.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_input.c	tcp: new CC hook to set sending rate with rate_sample in any CA state	2016-09-21 00:23:01 -04:00
tcp_ipv4.c	tcp: use an RB tree for ooo receive queue	2016-09-08 17:25:58 -07:00
tcp_lp.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_metrics.c	tcp: make nla_policy const	2016-09-01 14:09:01 -07:00
tcp_minisocks.c	tcp: track application-limited rate samples	2016-09-21 00:23:00 -04:00
tcp_nv.c	tcp: add NV congestion control	2016-06-10 23:07:49 -07:00
tcp_offload.c	gso: Support partial splitting at the frag_list pointer	2016-09-19 20:59:34 -04:00
tcp_output.c	tcp: export tcp_mss_to_mtu() for congestion control modules	2016-09-21 00:23:01 -04:00
tcp_probe.c	net: ipv4: tcp_probe: Replace timespec with timespec64	2016-03-01 17:18:44 -05:00
tcp_rate.c	tcp: export data delivery rate	2016-09-21 00:23:00 -04:00
tcp_recovery.c	tcp: do not assume TCP code is non preemptible	2016-05-02 17:02:25 -04:00
tcp_scalable.c	tcp: add tcp_in_slow_start helper	2015-07-09 14:22:52 -07:00
tcp_timer.c	tcp_timer.c: Add kernel-doc function descriptions	2016-07-15 23:18:14 -07:00
tcp_vegas.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_vegas.h	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_veno.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_westwood.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_yeah.c	tcp: cwnd does not increase in TCP YeAH	2016-09-08 17:16:12 -07:00
tcp.c	tcp: export data delivery rate	2016-09-21 00:23:00 -04:00
tunnel4.c	tunnels: correct conditional build of MPLS and IPv6	2016-07-11 13:27:06 -07:00
udp_diag.c	net: inet: diag: expose the socket mark to privileged processes.	2016-09-08 16:13:09 -07:00
udp_impl.h	net: Remove iocb argument from sendmsg and recvmsg	2015-03-02 13:06:31 -05:00
udp_offload.c	gso: Support partial splitting at the frag_list pointer	2016-09-19 20:59:34 -04:00
udp_tunnel.c	net: Remove deprecated tunnel specific UDP offload functions	2016-06-17 20:23:32 -07:00
udp.c	net: ipv4: Remove l3mdev_get_saddr	2016-09-10 23:12:53 -07:00
udplite.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-08-30 00:54:02 -04:00
xfrm4_input.c	netfilter: Pass net into okfn	2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c	ipv4: hash net ptr into fragmentation bucket selection	2015-03-25 14:07:04 -04:00
xfrm4_output.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-10-24 06:54:12 -07:00
xfrm4_policy.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-12 15:52:44 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c