linux_dsm_epyc7002/net
Jon Maxwell 45caeaa5ac dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 21:55:47 -07:00
..
6lowpan 6lowpan: use rb_entry() 2017-01-22 16:46:13 -05:00
9p Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 21:44:35 -08:00
802 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
8021q net: remove ndo_neigh_{construct, destroy} from stacked devices 2017-02-06 11:25:57 -05:00
appletalk lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
atm net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ax25 net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
batman-adv This contains just the average.h change in order to get it 2017-03-02 14:39:17 -08:00
bluetooth net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
bridge bridge: drop netfilter fake rtable unconditionally 2017-03-13 13:01:10 -07:00
caif sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
can can: bcm: fix hrtimer/tasklet termination in bcm op removal 2017-01-30 11:05:04 +01:00
ceph Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
core net: use net->count to check whether a netns is alive or not 2017-03-13 16:02:27 -07:00
dcb net: dcb: set error code on failures 2016-12-03 23:54:25 -05:00
dccp dccp/tcp: fix routing redirect race 2017-03-13 21:55:47 -07:00
decnet net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
dns_resolver Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
ethernet Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
hsr net/hsr: use eth_hw_addr_random() 2017-02-21 13:25:22 -05:00
ieee802154 lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
ife net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ipv4 dccp/tcp: fix routing redirect race 2017-03-13 21:55:47 -07:00
ipv6 dccp/tcp: fix routing redirect race 2017-03-13 21:55:47 -07:00
ipx ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
irda net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
iucv net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
kcm sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
key netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-28 10:00:39 -08:00
l3mdev net: ipv6: Remove l3mdev_get_saddr6 2016-09-10 23:12:53 -07:00
lapb Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
llc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-04 17:31:39 -08:00
mac802154 sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
mpls mpls: Do not decrement alive counter for unregister events 2017-03-12 23:45:36 -07:00
ncsi net/ncsi: Improve HNCDSC AEN handler 2016-10-20 11:23:08 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-04 17:31:39 -08:00
netlabel netlabel: add CALIPSO to the list of built-in protocols 2017-01-06 22:20:45 -05:00
netlink net: adjust skb->truesize in pskb_expand_head() 2017-01-27 12:03:29 -05:00
netrom net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
nfc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
openvswitch openvswitch: actions: fixed a brace coding style warning 2017-03-02 13:14:44 -08:00
packet net: don't call strlen() on the user buffer in packet_bind_spkt() 2017-03-01 20:55:57 -08:00
phonet net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
psample net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
qrtr net: qrtr: Mark 'buf' as little endian 2017-01-10 20:45:04 -05:00
rds net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
rfkill rfkill: remove rfkill-regulator 2017-01-24 11:07:35 +01:00
rose net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
rxrpc rxrpc: Wake up the transmitter if Rx window size increases on the peer 2017-03-10 09:34:23 -08:00
sched act_connmark: avoid crashing on malformed nlattrs with null parms 2017-03-12 23:32:41 -07:00
sctp net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
smc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
strparser strparser: destroy workqueue on module exit 2017-03-03 20:43:26 -08:00
sunrpc sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
switchdev Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
tipc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
unix net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
vmw_vsock net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
wimax genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
wireless Some more updates: 2017-02-10 14:31:51 -05:00
x25 net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2017-03-07 15:00:37 -08:00
compat.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
Kconfig bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
Makefile net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
socket.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
sysctl_net.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00