linux_dsm_epyc7002/net/core
Eric Dumazet 024158d3b5 net: avoid 32 x truesize under-estimation for tiny skbs
[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 16:04:03 +01:00
..
bpf_sk_storage.c bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON 2020-09-25 13:58:01 -07:00
datagram.c net: zerocopy: combine pages in zerocopy_sg_from_iter() 2020-08-20 16:12:50 -07:00
datagram.h
dev_addr_lists.c net: core: add nested_level variable in net_device 2020-09-28 15:00:15 -07:00
dev_ioctl.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
dev.c xdp: Remove the xdp_attachment_flags_ok() callback 2020-12-09 16:27:42 +01:00
devlink.c devlink: Make sure devlink instance and port are in same net namespace 2020-11-25 17:26:34 -08:00
drop_monitor.c genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
dst_cache.c
dst.c net: Correct the comment of dst_dev_put() 2020-09-10 13:28:57 -07:00
failover.c
fib_notifier.c
fib_rules.c fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m 2020-09-08 20:09:08 -07:00
filter.c net: Properly typecast int values to set sk_max_pacing_rate 2020-10-22 12:18:25 -07:00
flow_dissector.c net: flow_dissector: avoid indirect call to DSA .flow_dissect for generic case 2020-09-26 14:17:59 -07:00
flow_offload.c net: flow_offload: Fix memory leak for indirect flow block 2020-12-09 16:08:33 -08:00
gen_estimator.c
gen_stats.c
gro_cells.c gro_cells: reduce number of synchronize_net() calls 2020-11-25 11:28:12 -08:00
hwbm.c
link_watch.c
lwt_bpf.c lwt_bpf: Replace preempt_disable() with migrate_disable() 2020-12-07 11:53:40 -08:00
lwtunnel.c
Makefile
neighbour.c net: Exempt multicast addresses from five-second neighbor lifetime 2020-11-13 14:24:39 -08:00
net_namespace.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
net-procfs.c net-sysfs: add backlog len and CPU id to softnet data 2020-09-21 13:56:37 -07:00
net-sysfs.c net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc 2021-01-12 20:18:11 +01:00
net-sysfs.h
net-traces.c
netclassid_cgroup.c
netevent.c
netpoll.c net: Have netpoll bring-up DSA management interface 2020-11-18 11:04:11 -08:00
netprio_cgroup.c netprio_cgroup: Fix unlimited memory leak of v2 cgroups 2020-05-09 20:59:21 -07:00
page_pool.c
pktgen.c pktgen: Fix inconsistent of format with argument type in pktgen.c 2020-10-01 18:45:23 -07:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c
rtnetlink.c rtnetlink: fix data overflow in rtnl_calcit() 2020-10-21 18:24:08 -07:00
scm.c fs: Add receive_fd() wrapper for __receive_fd() 2020-07-13 11:03:44 -07:00
secure_seq.c crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
skbuff.c net: avoid 32 x truesize under-estimation for tiny skbs 2021-01-23 16:04:03 +01:00
skmsg.c bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list 2020-11-18 00:14:04 +01:00
sock_diag.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
sock_map.c net, sockmap: Don't call bpf_prog_put() on NULL pointer 2020-10-15 21:05:23 +02:00
sock_reuseport.c udp: Prevent reuseport_select_sock from reading uninitialized socks 2021-01-23 16:03:59 +01:00
sock.c net: Properly typecast int values to set sk_max_pacing_rate 2020-10-22 12:18:25 -07:00
stream.c
sysctl_net_core.c net: add option to not create fall-back tunnels in root-ns as well 2020-08-28 06:52:44 -07:00
timestamping.c
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c
xdp.c xdp: Remove the xdp_attachment_flags_ok() callback 2020-12-09 16:27:42 +01:00