linux_dsm_epyc7002/net/core
Eric Dumazet d41a69f1d3 tcp: make tcp_sendmsg() aware of socket backlog
Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02 17:02:26 -04:00
..
datagram.c udp: enable MSG_PEEK at non-zero offset 2016-04-05 16:29:37 -04:00
dev_addr_lists.c net: fix spelling for synchronized 2014-11-18 15:26:32 -05:00
dev_ioctl.c dev_ioctl: use sizeof(x) instead of sizeof x 2014-11-18 15:27:32 -05:00
dev.c net: constify is_skb_forwardable's arguments 2016-04-29 16:13:36 -04:00
devlink.c devlink: implement shared buffer occupancy monitoring interface 2016-04-14 16:22:03 -04:00
drop_monitor.c net: Replace get_cpu_var through this_cpu_ptr 2014-08-26 13:45:47 -04:00
dst_cache.c net: dst_cache_per_cpu_dst_set() can be static 2016-03-18 17:45:08 -04:00
dst.c net: add dst_cache to ovs vxlan lwtunnel 2016-02-16 20:21:48 -05:00
ethtool.c net: ethtool: export conversion function between u32 and link mode 2016-04-18 14:45:08 -04:00
fib_rules.c libnl: nla_put_be64(): align on a 64-bit area 2016-04-23 20:13:24 -04:00
filter.c bpf: add event output helper for notifications/sampling/logging 2016-04-19 20:26:11 -04:00
flow_dissector.c net/flow_dissector: Make dissector_uses_key() and skb_flow_dissector_target() public 2016-03-10 16:24:02 -05:00
flow.c flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c 2015-09-01 17:00:24 -07:00
gen_estimator.c net: sched: Add description for cpu_bstats argument 2016-03-20 16:48:07 -04:00
gen_stats.c sched: align nlattr properly when needed 2016-04-26 12:00:49 -04:00
hwbm.c net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
link_watch.c dev: introduce dev_get_iflink() 2015-04-02 14:04:59 -04:00
lwtunnel.c lwtunnel: autoload of lwt modules 2016-02-21 22:00:28 -05:00
Makefile net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
neighbour.c neigh: align nlattr properly when needed 2016-04-26 12:00:49 -04:00
net_namespace.c netns: make nsid_lock per net 2015-05-17 23:41:11 -04:00
net-procfs.c net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
net-sysfs.c net: core: use __ethtool_get_ksettings 2016-02-25 22:06:47 -05:00
net-sysfs.h net: netdev_kobject_init: annotate with __init 2014-01-05 20:27:54 -05:00
net-traces.c net: IPv6 fib lookup tracepoint 2015-11-22 11:54:10 -05:00
netclassid_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c Revert "netpoll: Fix extra refcount release in netpoll_cleanup()" 2016-04-05 19:34:44 -04:00
netprio_cgroup.c core: remove unneded headers for net cgroup controllers. 2016-02-17 15:31:27 -05:00
pktgen.c net: remove NETDEV_TX_LOCKED support 2016-04-26 15:53:05 -04:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c tcp: restore fastopen operations 2015-10-05 03:19:06 -07:00
rtnetlink.c rtnl: align nlattr properly when needed 2016-04-26 12:00:49 -04:00
scm.c unix: correctly track in-flight fds in sending process user_struct 2016-02-08 10:30:42 -05:00
secure_seq.c net: remove a sparse error in secure_dccpv6_sequence_number() 2015-05-25 22:55:37 -04:00
skbuff.c skbuff: Add pskb_extract() helper function 2016-04-25 16:54:14 -04:00
sock_diag.c sock_diag: align nlattr properly when needed 2016-04-26 12:00:48 -04:00
sock_reuseport.c soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.c tcp: make tcp_sendmsg() aware of socket backlog 2016-05-02 17:02:26 -04:00
stream.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
sysctl_net_core.c net:Add sysctl_max_skb_frags 2016-02-09 04:28:06 -05:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
utils.c net: move net_get_random_once to lib 2015-10-08 05:26:35 -07:00