linux_dsm_epyc7002/net/ipv4
Wei Wang 5018c59607 ipv4: fix race condition between route lookup and invalidation
Jesse and Ido reported the following race condition:
<CPU A, t0> - Received packet A is forwarded and cached dst entry is
taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()

<t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
from multiple ISPs"), route is added / deleted and rt_cache_flush() is
called

<CPU B, t2> - Received packet B tries to use the same cached dst entry
from t0, but rt_cache_valid() is no longer true and it is replaced in
rt_cache_route() by the newer one. This calls dst_dev_put() on the
original dst entry which assigns the blackhole netdev to 'dst->dev'

<CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
to 'dst->dev' being the blackhole netdev

There are 2 issues in the v4 routing code:
1. A per-netns counter is used to do the validation of the route. That
means whenever a route is changed in the netns, users of all routes in
the netns needs to redo lookup. v6 has an implementation of only
updating fn_sernum for routes that are affected.
2. When rt_cache_valid() returns false, rt_cache_route() is called to
throw away the current cache, and create a new one. This seems
unnecessary because as long as this route does not change, the route
cache does not need to be recreated.

To fully solve the above 2 issues, it probably needs quite some code
changes and requires careful testing, and does not suite for net branch.

So this patch only tries to add the deleted cached rt into the uncached
list, so user could still be able to use it to receive packets until
it's done.

Fixes: 95c47f9cf5 ("ipv4: call dst_dev_put() properly")
Signed-off-by: Wei Wang <weiwan@google.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-17 16:44:03 -07:00
..
bpfilter SPDX update for 5.2-rc2, round 1 2019-05-21 12:33:38 -07:00
netfilter netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
af_inet.c net: remove empty inet_exit_net 2019-08-19 18:22:54 -07:00
ah4.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
arp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cipso_ipv4.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
datagram.c udp: correct reuseport selection with connected sockets 2019-09-16 09:02:18 +02:00
devinet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
esp4_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2019-07-05 15:01:15 -07:00
esp4.c xfrm: remove get_mtu indirection from xfrm_type 2019-07-01 06:16:40 +02:00
fib_frontend.c ipv4: Add lockdep condition to fix for_each_entry() 2019-08-09 11:01:08 -07:00
fib_lookup.h ipv4: Use accessors for fib_info nexthop data 2019-06-04 19:26:49 -07:00
fib_notifier.c
fib_rules.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
fib_semantics.c net: Properly update v4 routes with v6 nexthop 2019-09-05 12:35:58 +02:00
fib_trie.c net: route dump netlink NLM_F_MULTI flag missing 2019-08-24 16:49:48 -07:00
fou.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
gre_demux.c net: remove unused parameter from skb_checksum_try_convert 2019-07-05 15:30:36 -07:00
gre_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
icmp.c ipv4/icmp: fix rt dst dev null pointer dereference 2019-08-24 14:49:35 -07:00
igmp.c net: fix __ip_mc_inc_group usage 2019-08-20 12:48:06 -07:00
inet_connection_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
inet_diag.c tcp: annotate sk->sk_wmem_queued lockless reads 2019-10-13 10:13:08 -07:00
inet_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
inet_hashtables.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
inet_timewait_sock.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
inetpeer.c net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-02-08 21:50:15 -08:00
ip_forward.c ipv4: Revert removal of rt_uses_gateway 2019-09-20 18:23:33 -07:00
ip_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ip_gre.c erspan: remove the incorrect mtu limit for erspan 2019-09-30 11:10:07 -07:00
ip_input.c netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
ip_options.c netfilter: nf_tables: add support for matching IPv4 options 2019-06-21 18:35:51 +02:00
ip_output.c tcp: honor SO_PRIORITY in TIME_WAIT state 2019-09-27 12:05:02 +02:00
ip_sockglue.c ip_sockglue: Fix missing-check bug in ip_ra_control() 2019-05-25 11:00:50 -07:00
ip_tunnel_core.c ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL 2019-06-18 20:48:45 -04:00
ip_tunnel.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269 2019-06-05 17:30:29 +02:00
ip_vti.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ipcomp.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2019-07-05 15:01:15 -07:00
ipconfig.c ipconfig: add carrier_timeout kernel parameter 2019-02-01 15:24:13 -08:00
ipip.c ipip: validate header length in ipip_tunnel_xmit 2019-07-25 17:23:40 -07:00
ipmr_base.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-07 17:22:09 -07:00
ipmr.c netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
Kconfig net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
Makefile net: Initial nexthop code 2019-05-28 21:37:30 -07:00
metrics.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
netfilter.c netfilter: ipv4: remove useless export_symbol 2019-01-28 11:32:58 +01:00
netlink.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nexthop.c nexthops: remove redundant assignment to variable err 2019-08-22 12:14:05 -07:00
ping.c ip: support SO_MARK cmsg 2019-09-13 21:44:19 +02:00
proc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw_diag.c net: don't warn in inet diag when IPV6 is disabled 2019-07-03 11:26:35 -07:00
raw.c netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
route.c ipv4: fix race condition between route lookup and invalidation 2019-10-17 16:44:03 -07:00
syncookies.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
sysctl_net_ipv4.c tcp: add new tcp_mtu_probe_floor sysctl 2019-08-09 13:03:30 -07:00
tcp_bbr.c tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth 2019-09-27 20:37:50 +02:00
tcp_bic.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_bpf.c net/tls: prevent skb_orphan() from leaking TLS plain text with offload 2019-08-08 22:39:35 -07:00
tcp_cdg.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_cong.c tcp: fix tcp_set_congestion_control() use from bpf hook 2019-07-18 20:33:48 -07:00
tcp_cubic.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_dctcp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c tcp: annotate tp->write_seq lockless reads 2019-10-13 10:13:08 -07:00
tcp_fastopen.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
tcp_highspeed.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_htcp.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_hybla.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_illinois.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_input.c tcp: annotate sk->sk_sndbuf lockless reads 2019-10-13 10:13:08 -07:00
tcp_ipv4.c tcp: annotate tp->write_seq lockless reads 2019-10-13 10:13:08 -07:00
tcp_lp.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_metrics.c tcp: refactor setting the initial congestion window 2019-05-01 11:47:54 -04:00
tcp_minisocks.c tcp: annotate tp->snd_nxt lockless reads 2019-10-13 10:13:08 -07:00
tcp_nv.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_output.c tcp: annotate sk->sk_wmem_queued lockless reads 2019-10-13 10:13:08 -07:00
tcp_rate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_timer.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
tcp_ulp.c bpf: sockmap/tls, close can race with map free 2019-07-22 16:04:17 +02:00
tcp_vegas.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_vegas.h
tcp_veno.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_westwood.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_yeah.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp.c tcp: fix a possible lockdep splat in tcp_done() 2019-10-15 20:13:33 -07:00
tunnel4.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
udp_diag.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
udp_impl.h udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
udp_tunnel.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
udp.c udp: only do GSO if # of segs > 1 2019-10-03 11:47:10 -04:00
udplite.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_output.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm4_policy.c ipv4: Revert removal of rt_uses_gateway 2019-09-20 18:23:33 -07:00
xfrm4_protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm4_state.c xfrm: remove eth_proto value from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
xfrm4_tunnel.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00