linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-24 13:59:22 +07:00

History

Maciej Żenczykowski aefa927744 bpf: Do not change gso_size during bpf_skb_change_proto() [ Upstream commit 364745fbe981a4370f50274475da4675661104df ] This is technically a backwards incompatible change in behaviour, but I'm going to argue that it is very unlikely to break things, and likely to fix far more then it breaks. In no particular order, various reasons follow: (a) I've long had a bug assigned to myself to debug a super rare kernel crash on Android Pixel phones which can (per stacktrace) be traced back to BPF clat IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later on during transmit deep in the GSO engine, AFAICT precisely because of this change to gso_size, though I've never been able to manually reproduce it. I believe it may be related to the particular network offload support of attached USB ethernet dongle being used for tethering off of an IPv6-only cellular connection. The reason might be we end up with more segments than max permitted, or with a GSO packet with only one segment... (either way we break some assumption and hit a BUG_ON) (b) There is no check that the gso_size is > 20 when reducing it by 20, so we might end up with a negative (or underflowing) gso_size or a gso_size of 0. This can't possibly be good. Indeed this is probably somehow exploitable (or at least can result in a kernel crash) by delivering crafted packets and perhaps triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS) is related to MTU, but not directly derived from it: gso_size/MSS may be significantly smaller then one would get by deriving from local MTU. And on some NICs (which do loose MTU checking on receive, it may even potentially be larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just going from 21 to 1 is potentially problematic because it increases the number of segments by a factor of 21 (think DoS, or some other crash due to too many segments). (c) It's always safe to not increase the gso_size, because it doesn't result in the max packet size increasing. So the skb_increase_gso_size() call was always unnecessary for correctness (and outright undesirable, see later). As such the only part which is potentially dangerous (ie. could cause backwards compatibility issues) is the removal of the skb_decrease_gso_size() call. (d) If the packets are ultimately destined to the local device, then there is absolutely no benefit to playing around with gso_size. It only matters if the packets will egress the device. ie. we're either forwarding, or transmitting from the device. (e) This logic only triggers for packets which are GSO. It does not trigger for skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a GSO packet (and you don't even know what the MTU is, so you can't even fix it). As such your transmit path must already be able to handle an MTU 20 bytes larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28 bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size() call doesn't actually increase the size of the packets your transmit side must be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/ route MTUs must already be set correctly. Since for example with an IPv4 egress MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames, so you need a 1520 byte device MTU. This means if your IPv6 device's egress MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the need to handle fragments). This is to handle normal non-GSO packets. Thus the reduction is simply not needed for GSO packets, because when they're correctly built, they will already be the right size. (f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP segments) should not be modified, so that TCP's MSS counting works correctly (this matters for congestion control). If protocol conversion changes the gso_size, then the number of TCP segments may increase or decrease. Packet loss after protocol conversion can result in partial loss of MSS segments that the sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs in the middle of the segments it sent? (g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee that gso_size (and thus resulting packet MTU) is changed. ie. you must assume it won't be changed. (h) changing gso_size is outright buggy for UDP GSO packets, where framing matters (I believe that's also the case for SCTP, but it's already excluded by [g]). So the only remaining case is TCP, which also doesn't want it (see [f]). (i) see also the reasoning on the previous attempt at fixing this (commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e) which shows that the current behaviour causes TCP packet loss: In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the coalesced packet payload can be > MSS, but < MSS + 20. bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload length. After then tcp_gso_segment checks for the payload length if it is <= MSS. The condition is causing the packet to be dropped. tcp_gso_segment(): [...] mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) goto out; [...] Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary and buggy, and decreasing can go negative. Fixes: `6578171a7f` ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dongseok Yi <dseok.yi@samsung.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>		2021-07-14 16:56:27 +02:00
..
bpf_sk_storage.c	bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON	2020-09-25 13:58:01 -07:00
datagram.c	udp: fix skb_copy_and_csum_datagram with odd segment sizes	2021-02-17 11:02:28 +01:00
datagram.h	net/core: Allow the compiler to verify declaration and definition consistency	2019-03-27 13:49:44 -07:00
dev_addr_lists.c	net: core: add nested_level variable in net_device	2020-09-28 15:00:15 -07:00
dev_ioctl.c	net: fix dev_ifsioc_locked() race condition	2021-03-07 12:34:07 +01:00
dev.c	net: sched: fix tx action reschedule issue with stopped queue	2021-06-03 09:00:47 +02:00
devlink.c	devlink: Correct VIRTUAL port to not have phys_port attributes	2021-06-10 13:39:16 +02:00
drop_monitor.c	drop_monitor: Perform cleanup upon probe registration failure	2021-03-30 14:31:57 +02:00
dst_cache.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
dst.c	net, bpf: Fix ip6ip6 crash with collect_md populated skbs	2021-03-30 14:32:05 +02:00
failover.c	failover: allow name change on IFF_UP slave interfaces	2019-04-10 22:12:26 -07:00
fib_notifier.c	net: fib_notifier: propagate extack down to the notifier block callback	2019-10-04 11:10:56 -07:00
fib_rules.c	fib: Return the correct errno code	2021-06-18 10:00:06 +02:00
filter.c	bpf: Do not change gso_size during bpf_skb_change_proto()	2021-07-14 16:56:27 +02:00
flow_dissector.c	flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()	2021-05-19 10:12:57 +02:00
flow_offload.c	net: flow_offload: Fix memory leak for indirect flow block	2020-12-09 16:08:33 -08:00
gen_estimator.c	net_sched: gen_estimator: support large ewma log	2021-01-27 11:55:23 +01:00
gen_stats.c	docs: networking: convert gen_stats.txt to ReST	2020-04-28 14:39:46 -07:00
gro_cells.c	gro_cells: reduce number of synchronize_net() calls	2020-11-25 11:28:12 -08:00
hwbm.c	net: hwbm: Make the hwbm_pool lock a mutex	2019-06-09 19:40:10 -07:00
link_watch.c	net: Add IF_OPER_TESTING	2020-04-20 12:43:24 -07:00
lwt_bpf.c	lwt_bpf: Replace preempt_disable() with migrate_disable()	2020-12-07 11:53:40 -08:00
lwtunnel.c	net: ipv6: add rpl sr tunnel	2020-03-29 22:30:57 -07:00
Makefile	ethtool: move to its own directory	2019-12-12 17:07:05 -08:00
neighbour.c	neighbour: allow NUD_NOARP entries to be forced GCed	2021-06-10 13:39:29 +02:00
net_namespace.c	net: make get_net_ns return error if NET_NS is disabled	2021-06-23 14:42:44 +02:00
net-procfs.c	net-sysfs: add backlog len and CPU id to softnet data	2020-09-21 13:56:37 -07:00
net-sysfs.c	net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc	2021-01-12 20:18:11 +01:00
net-sysfs.h	net-sysfs: add netdev_change_owner()	2020-02-26 20:07:25 -08:00
net-traces.c	page_pool: add tracepoints for page_pool with details need by XDP	2019-06-19 11:23:13 -04:00
netclassid_cgroup.c	cgroup, netclassid: remove double cond_resched	2020-04-21 15:44:30 -07:00
netevent.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
netpoll.c	net: Have netpoll bring-up DSA management interface	2020-11-18 11:04:11 -08:00
netprio_cgroup.c	netprio_cgroup: Fix unlimited memory leak of v2 cgroups	2020-05-09 20:59:21 -07:00
page_pool.c	mm: fix struct page layout on 32-bit systems	2021-05-19 10:13:17 +02:00
pktgen.c	pktgen: fix misuse of BUG_ON() in pktgen_thread_worker()	2021-03-07 12:34:09 +01:00
ptp_classifier.c	ptp: Add generic ptp v2 header parsing function	2020-08-19 16:07:49 -07:00
request_sock.c	tcp: add rcu protection around tp->fastopen_rsk	2019-10-13 10:13:08 -07:00
rtnetlink.c	rtnetlink: Fix regression in bridge VLAN configuration	2021-06-23 14:42:42 +02:00
scm.c	fs: Add receive_fd() wrapper for __receive_fd()	2020-07-13 11:03:44 -07:00
secure_seq.c	crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h	2020-05-08 15:32:17 +10:00
skbuff.c	net: fix up truesize of cloned skb in skb_prepare_for_shift()	2021-03-07 12:34:05 +01:00
skmsg.c	bpf, sockmap: Fix incorrect fwd_alloc accounting	2021-04-14 08:42:01 +02:00
sock_diag.c	bpf, net: Rework cookie generator as per-cpu one	2020-09-30 11:50:35 -07:00
sock_map.c	net, sockmap: Don't call bpf_prog_put() on NULL pointer	2020-10-15 21:05:23 +02:00
sock_reuseport.c	udp: Prevent reuseport_select_sock from reading uninitialized socks	2021-01-23 16:03:59 +01:00
sock.c	net: sock: fix in-kernel mark setting	2021-06-10 13:39:17 +02:00
stream.c	tcp: make sure EPOLLOUT wont be missed	2019-08-19 13:07:43 -07:00
sysctl_net_core.c	net: add option to not create fall-back tunnels in root-ns as well	2020-08-28 06:52:44 -07:00
timestamping.c	net: Introduce a new MII time stamping interface.	2019-12-25 19:51:33 -08:00
tso.c	net: tso: add UDP segmentation support	2020-06-18 20:46:23 -07:00
utils.c	net: Fix skb->csum update in inet_proto_csum_replace16().	2020-01-24 20:54:30 +01:00
xdp.c	xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model	2021-04-14 08:42:09 +02:00