linux_dsm_epyc7002/net
Eric Dumazet 8b27dae5a2 tcp: add one skb cache for rx
Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23 21:57:38 -04:00
..
6lowpan 6lowpan: fix debugfs_simple_attr.cocci warnings 2019-01-22 09:51:19 +01:00
9p 9p/net: put a lower bound on msize 2018-12-25 17:07:49 +09:00
802
8021q net: Remove switchdev.h inclusion from team/bond/vlan 2019-02-24 17:40:46 -08:00
appletalk appletalk: Fix potential NULL pointer dereference in unregister_snap_client 2019-03-15 11:25:48 -07:00
atm net: atm: Add another IS_ENABLED(CONFIG_COMPAT) in atm_dev_ioctl 2019-03-07 10:14:50 -08:00
ax25 ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
batman-adv genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
bluetooth Bluetooth: Add quirk for reading BD_ADDR from fwnode property 2019-02-26 10:08:26 +01:00
bpf bpf: fix warning about using plain integer as NULL 2019-03-08 21:17:07 +01:00
bpfilter bpfilter: re-add header search paths to tools include to fix build error 2019-02-23 13:34:40 -08:00
bridge net: bridge: use eth_broadcast_addr() to assign broadcast address 2019-03-20 11:02:47 -07:00
caif net: caif: use skb helpers instead of open-coding them 2019-02-17 11:01:17 -08:00
can can: bcm: check timer values before ktime conversion 2019-01-22 11:33:46 +01:00
ceph libceph: use struct_size() for kmalloc() in crush_decode() 2019-03-05 18:55:17 +01:00
core net: convert rps_needed and rfs_needed to new static branch api 2019-03-23 21:57:38 -04:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 15:00:17 -08:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
dns_resolver dns: Allow the dns resolver to retrieve a server set 2018-10-04 09:40:52 -07:00
dsa net: dsa: Use prepare/commit phase in dsa_slave_vlan_rx_add_vid() 2019-03-03 20:45:52 -08:00
ethernet net/ethernet: Add parse_protocol header_ops support 2019-02-22 12:55:31 -08:00
hsr genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
ieee802154 genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
ife
ipv4 tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
ipv6 tcp: add one skb cache for rx 2019-03-23 21:57:38 -04:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm kcm: Remove unnecessary SLAB_PANIC for kmem_cache_create() in kcm_init 2019-02-23 13:46:24 -08:00
key af_key: unconditionally clone on broadcast 2019-02-12 10:36:42 +01:00
l2tp genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 net: remove 'fallback' argument from dev->ndo_select_queue() 2019-03-20 11:18:55 -07:00
mac802154 mac802154: Remove VLA usage of skcipher 2018-09-28 12:46:07 +08:00
mpls Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-02 12:54:35 -08:00
ncsi genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
netfilter genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
netlabel genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
netlink genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
netrom netrom: switch to sock timer API 2019-01-27 10:38:04 -08:00
nfc genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
packet net: remove 'fallback' argument from dev->ndo_select_queue() 2019-03-20 11:18:55 -07:00
phonet phonet: fix building with clang 2019-02-21 16:23:56 -08:00
psample
qrtr mm: replace all open encodings for NUMA_NO_NODE 2019-03-05 21:07:14 -08:00
rds 5.1 Merge Window Pull Request 2019-03-09 15:53:03 -08:00
rfkill rfkill: gpio: Remove unused include 2018-12-18 13:13:56 +01:00
rose net: rose: fix a possible stack overflow 2019-03-18 16:53:22 -07:00
rxrpc rxrpc: Fix client call queueing, waiting for channel 2019-03-08 18:24:53 -08:00
sched net: sched: add empty status flag for NOLOCK qdisc 2019-03-23 21:52:36 -04:00
sctp sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_STREAM_SCHEDULER sockopt 2019-03-18 18:31:09 -07:00
smc genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
strparser net: strparser: fix a missing check for create_singlethread_workqueue 2019-03-15 12:51:56 -07:00
sunrpc Miscellaneous NFS server fixes. Probably the most visible bug is one 2019-03-12 15:06:54 -07:00
switchdev switchdev: Remove unused transaction item queue 2019-03-01 21:35:19 -08:00
tipc genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
tls net/tls: Replace kfree_skb() with consume_skb() 2019-03-21 10:14:26 -07:00
unix io_uring-2019-03-06 2019-03-08 14:48:40 -08:00
vmw_vsock vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock 2019-03-08 15:15:44 -08:00
wimax genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
wireless genetlink: make policy common to family 2019-03-22 10:38:23 -04:00
x25 net/x25: reset state in x25_connect() 2019-03-11 15:40:14 -07:00
xdp xsk: fix umem memory leak on cleanup 2019-03-16 01:27:51 +01:00
xfrm net: dev: rename queue selection helpers. 2019-03-20 11:18:54 -07:00
compat.c Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-03-05 14:08:26 -08:00
Kconfig net: devlink: turn devlink into a built-in 2019-02-26 08:49:05 -08:00
Makefile net: split out functions related to registering inflight socket files 2019-02-28 08:24:23 -07:00
socket.c net: add documentation to socket.c 2019-03-15 15:29:47 -07:00
sysctl_net.c