linux_dsm_epyc7002/net/ipv6
Jon Maxwell 45caeaa5ac dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 21:55:47 -07:00
..
ila lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
netfilter ipv6: orphan skbs in reassembly unit 2017-03-01 20:55:57 -08:00
addrconf_core.c
addrconf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-04 17:31:39 -08:00
addrlabel.c
af_inet6.c ipv6: reorder icmpv6_init() and ip6_mr_init() 2017-03-07 14:57:33 -08:00
ah6.c IPsec: do not ignore crypto err in ah6 input 2017-01-16 12:57:48 +01:00
anycast.c
calipso.c
datagram.c ipv6: Handle IPv4-mapped src to in6addr_any dst. 2017-02-14 12:13:51 -05:00
esp6_offload.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
esp6.c esp: Introduce a helper to setup the trailer 2017-01-17 10:23:08 +01:00
exthdrs_core.c
exthdrs_offload.c
exthdrs.c ipv6: sr: remove cleanup flag and fix HMAC computation 2017-02-03 11:05:23 -05:00
fib6_rules.c
fou6.c
icmp.c net: for rate-limited ICMP replies save one atomic operation 2017-01-09 15:49:12 -05:00
inet6_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
inet6_hashtables.c inet: collapse ipv4/v6 rcv_saddr_equal functions into one 2017-01-18 13:04:28 -05:00
ip6_checksum.c
ip6_fib.c ipv6: make ECMP route replacement less greedy 2017-03-13 12:16:17 -07:00
ip6_flowlabel.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip6_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
ip6_icmp.c
ip6_input.c
ip6_offload.c net/tunnel: set inner protocol in network gro hooks 2017-03-09 13:19:52 -08:00
ip6_offload.h
ip6_output.c ipv6: avoid write to a possibly cloned skb 2017-03-13 12:53:35 -07:00
ip6_tunnel.c ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() 2017-02-01 12:27:33 -05:00
ip6_udp_tunnel.c
ip6_vti.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2017-03-07 15:00:37 -08:00
ip6mr.c ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt 2017-02-26 21:25:46 -05:00
ipcomp6.c
ipv6_sockglue.c net: Allow IP_MULTICAST_IF to set index to L3 slave 2016-12-30 15:24:47 -05:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
mcast_snoop.c
mcast.c igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() 2017-02-09 16:43:45 -05:00
mip6.c ktime: Get rid of ktime_equal() 2016-12-25 17:21:23 +01:00
ndisc.c ipv6 addrconf: Implemented enhanced DAD (RFC7527) 2016-12-03 23:21:37 -05:00
netfilter.c
output_core.c ipv6: Set skb->protocol properly for local output 2016-12-02 12:34:22 -05:00
ping.c ipv6: remove unnecessary inet6_sk check 2016-12-29 12:05:49 -05:00
proc.c
protocol.c
raw.c net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP 2017-02-07 13:07:47 -05:00
reassembly.c
route.c net: ipv6: Remove redundant RTA_OIF in multipath routes 2017-03-09 13:04:48 -08:00
seg6_hmac.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
seg6_iptunnel.c ipv6: sr: fix non static symbol warnings 2017-02-07 11:42:35 -05:00
seg6.c ipv6: seg6_genl_set_tunsrc() must check kmemdup() return value 2017-01-20 11:07:48 -05:00
sit.c sit: fix a double free on error path 2017-02-08 13:12:22 -05:00
syncookies.c syncookies: use SipHash in place of SHA1 2017-01-09 13:58:57 -05:00
sysctl_net_ipv6.c
tcp_ipv6.c dccp/tcp: fix routing redirect race 2017-03-13 21:55:47 -07:00
tcpv6_offload.c
tunnel6.c
udp_impl.h udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
udp_offload.c
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm6_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm6_mode_beet.c
xfrm6_mode_ro.c
xfrm6_mode_transport.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm6_mode_tunnel.c
xfrm6_output.c
xfrm6_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm6_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm6_state.c
xfrm6_tunnel.c