linux_dsm_epyc7002/include
Eric Dumazet a74f0fa082 tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT
TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-04 21:21:18 -08:00
..
acpi pci-v4.20-changes 2018-10-25 06:50:48 -07:00
asm-generic s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
clocksource
crypto KEYS: asym_tpm: extract key size & public key [ver #2] 2018-10-26 09:30:46 +01:00
drm drm, i915, amdgpu, bridge + core quirk 2018-11-02 10:58:20 -07:00
dt-bindings This time it looks like a quieter release cycle in the clk tree. I guess that's 2018-10-31 11:08:30 -07:00
keys KEYS: Move trusted.h to include/keys [ver #2] 2018-10-26 09:30:47 +01:00
kvm KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs 2018-08-12 12:06:34 +01:00
linux skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' 2018-12-04 08:36:36 -08:00
math-emu
media media: Rename vb2_m2m_request_queue -> v4l2_m2m_request_queue 2018-11-06 05:24:22 -05:00
memory
misc
net tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT 2018-12-04 21:21:18 -08:00
pcmcia
ras
rdma First merge window pull request 2018-10-26 07:38:19 -07:00
scsi
soc soc/qman: add return value to interrupt coalesce changing APIs 2018-11-23 11:17:06 -08:00
sound ASoC: Updates for v5.0/v4.20 2018-10-22 23:26:37 +02:00
target
trace net: Add trace events for all receive exit points 2018-11-30 13:23:25 -08:00
uapi devlink: Add 'fw_load_policy' generic parameter 2018-12-03 13:55:43 -08:00
video
xen CONFIG_XEN_PV breaks xen_create_contiguous_region on ARM 2018-11-02 17:18:34 +01:00