linux_dsm_epyc7002/net
Cong Wang a3b6f3a375 net_sched: fix RTNL deadlock again caused by request_module()
commit d349f997686887906b1183b5be96933c5452362a upstream.

tcf_action_init_1() loads tc action modules automatically with
request_module() after parsing the tc action names, and it drops RTNL
lock and re-holds it before and after request_module(). This causes a
lot of troubles, as discovered by syzbot, because we can be in the
middle of batch initializations when we create an array of tc actions.

One of the problem is deadlock:

CPU 0					CPU 1
rtnl_lock();
for (...) {
  tcf_action_init_1();
    -> rtnl_unlock();
    -> request_module();
				rtnl_lock();
				for (...) {
				  tcf_action_init_1();
				    -> tcf_idr_check_alloc();
				   // Insert one action into idr,
				   // but it is not committed until
				   // tcf_idr_insert_many(), then drop
				   // the RTNL lock in the _next_
				   // iteration
				   -> rtnl_unlock();
    -> rtnl_lock();
    -> a_o->init();
      -> tcf_idr_check_alloc();
      // Now waiting for the same index
      // to be committed
				    -> request_module();
				    -> rtnl_lock()
				    // Now waiting for RTNL lock
				}
				rtnl_unlock();
}
rtnl_unlock();

This is not easy to solve, we can move the request_module() before
this loop and pre-load all the modules we need for this netlink
message and then do the rest initializations. So the loop breaks down
to two now:

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                struct tc_action_ops *a_o;

                a_o = tc_action_load_ops(name, tb[i]...);
                ops[i - 1] = a_o;
        }

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                act = tcf_action_init_1(ops[i - 1]...);
        }

Although this looks serious, it only has been reported by syzbot, so it
seems hard to trigger this by humans. And given the size of this patch,
I'd suggest to make it to net-next and not to backport to stable.

This patch has been tested by syzbot and tested with tdc.py by me.

Fixes: 0fedc63fad ("net_sched: commit action insertions together")
Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:47 +01:00
..
6lowpan
9p
802
8021q net: vlan: avoid leaks on register_vlan_dev() failures 2021-01-17 14:16:55 +01:00
appletalk
atm
ax25
batman-adv
bluetooth Bluetooth: Put HCI device if inquiry procedure interrupts 2021-03-04 11:37:25 +01:00
bpf bpf: Reject too big ctx_size_in for raw_tp test run 2021-01-27 11:55:07 +01:00
bpfilter
bridge net: bridge: Fix a warning when del bridge sysfs 2021-02-23 15:53:23 +01:00
caif
can can: isotp: isotp_getname(): fix kernel information leak 2021-01-17 14:17:05 +01:00
ceph
core bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx 2021-03-04 11:37:33 +01:00
dcb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands 2021-01-23 16:04:01 +01:00
dccp
decnet
dns_resolver
dsa net: dsa: call teardown method on probe failure 2021-02-17 11:02:28 +01:00
ethernet
ethtool
hsr
ieee802154
ife
ipv4 net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-03-04 11:38:46 +01:00
ipv6 net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-03-04 11:38:46 +01:00
iucv
kcm
key af_key: relax availability checks for skb size calculation 2021-02-13 13:55:02 +01:00
l2tp
l3mdev
lapb net: lapb: Copy the skb before sending a packet 2021-02-10 09:29:14 +01:00
llc
mac80211 mac80211: fix potential overflow when multiplying to u32 integers 2021-03-04 11:37:32 +01:00
mac802154
mpls
mptcp mptcp: skip to next candidate if subflow has unacked data 2021-02-23 15:53:23 +01:00
ncsi
netfilter netfilter: conntrack: skip identical origin tuple in same zone only 2021-02-17 11:02:26 +01:00
netlabel
netlink
netrom
nfc tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer 2021-03-04 11:37:36 +01:00
nsh
openvswitch net: openvswitch: fix TTL decrement exception action execution 2021-02-23 15:53:23 +01:00
packet net: fix proc_fs init handling in af_packet and tls 2021-02-23 15:53:23 +01:00
phonet
psample
qrtr net: qrtr: Fix memory leak in qrtr_tun_open 2021-03-04 11:38:47 +01:00
rds RDMA: Lift ibdev_to_node from rds to common code 2021-02-26 10:12:59 +01:00
rfkill
rose
rxrpc rxrpc: Fix clearance of Tx/Rx ring when releasing a call 2021-02-17 11:02:28 +01:00
sched net_sched: fix RTNL deadlock again caused by request_module() 2021-03-04 11:38:47 +01:00
sctp net: fix iteration for sctp transport seq_files 2021-02-17 11:02:29 +01:00
smc
strparser
sunrpc svcrdma: Hold private mutex while invoking rdma_accept() 2021-03-04 11:38:08 +01:00
switchdev net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP 2021-02-07 15:37:12 +01:00
tipc tipc: fix NULL deref in tipc_link_xmit() 2021-01-23 16:04:00 +01:00
tls net: fix proc_fs init handling in af_packet and tls 2021-02-23 15:53:23 +01:00
unix
vmw_vsock vsock: fix locking in vsock_shutdown() 2021-02-17 11:02:30 +01:00
wimax
wireless wext: fix NULL-ptr-dereference with cfg80211's lack of commit() 2021-02-03 23:28:38 +01:00
x25
xdp xsk: Clear pool even for inactive queues 2021-01-27 11:55:10 +01:00
xfrm xfrm: Fix wraparound in xfrm_policy_addr_delta() 2021-02-03 23:28:45 +01:00
compat.c
devres.c
Kconfig
Makefile
socket.c
sysctl_net.c