linux_dsm_epyc7002/net/netlink
Johannes Berg 1d6d43d480 netlink: disable IRQs for netlink_lock_table()
[ Upstream commit 1d482e666b8e74c7555dbdfbfb77205eeed3ff2d ]

Syzbot reports that in mac80211 we have a potential deadlock
between our "local->stop_queue_reasons_lock" (spinlock) and
netlink's nl_table_lock (rwlock). This is because there's at
least one situation in which we might try to send a netlink
message with this spinlock held while it is also possible to
take the spinlock from a hardirq context, resulting in the
following deadlock scenario reported by lockdep:

       CPU0                    CPU1
       ----                    ----
  lock(nl_table_lock);
                               local_irq_disable();
                               lock(&local->queue_stop_reason_lock);
                               lock(nl_table_lock);
  <Interrupt>
    lock(&local->queue_stop_reason_lock);

This seems valid, we can take the queue_stop_reason_lock in
any kind of context ("CPU0"), and call ieee80211_report_ack_skb()
with the spinlock held and IRQs disabled ("CPU1") in some
code path (ieee80211_do_stop() via ieee80211_free_txskb()).

Short of disallowing netlink use in scenarios like these
(which would be rather complex in mac80211's case due to
the deep callchain), it seems the only fix for this is to
disable IRQs while nl_table_lock is held to avoid hitting
this scenario, this disallows the "CPU0" portion of the
reported deadlock.

Note that the writer side (netlink_table_grab()) already
disables IRQs for this lock.

Unfortunately though, this seems like a huge hammer, and
maybe the whole netlink table locking should be reworked.

Reported-by: syzbot+69ff9dff50dcfe14ddd4@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-16 12:01:36 +02:00
..
af_netlink.c netlink: disable IRQs for netlink_lock_table() 2021-06-16 12:01:36 +02:00
af_netlink.h netlink: Add new socket option to enable strict checking on dumps 2018-10-08 10:39:04 -07:00
diag.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
genetlink.c genetlink: allow dumping command-specific policy 2020-10-03 14:18:29 -07:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile netlink: add infrastructure to expose policies to userspace 2020-04-30 17:51:42 -07:00
policy.c netlink: export policy in extended ACK 2020-10-09 20:22:32 -07:00