linux_dsm_epyc7002/net
Ilya Dryomov 583d0fef75 libceph: clear msg->con in ceph_msg_release() only
The following bit in ceph_msg_revoke_incoming() is unsafe:

    struct ceph_connection *con = msg->con;
    if (!con)
            return;
    mutex_lock(&con->mutex);
    <more msg->con use>

There is nothing preventing con from getting destroyed right after
msg->con test.  One easy way to reproduce this is to disable message
signing only on the server side and try to map an image.  The system
will go into a

    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature
    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature

loop which has to be interrupted with Ctrl-C.  Hit Ctrl-C and you are
likely to end up with a random GP fault if the reset handler executes
"within" ceph_msg_revoke_incoming():

                     <yet another reply w/o a signature>
                                   ...
          <Ctrl-C>
    rbd_obj_request_end
      ceph_osdc_cancel_request
        __unregister_request
          ceph_osdc_put_request
            ceph_msg_revoke_incoming
                                   ...
                                osd_reset
                                  __kick_osd_requests
                                    __reset_osd
                                      remove_osd
                                        ceph_con_close
                                          reset_connection
                                            <clear con->in_msg->con>
                                            <put con ref>
                                              put_osd
                                                <free osd/con>
              <msg->con use> <-- !!!

If ceph_msg_revoke_incoming() executes "before" the reset handler,
osd/con will be leaked because ceph_msg_revoke_incoming() clears
con->in_msg but doesn't put con ref, while reset_connection() only puts
con ref if con->in_msg != NULL.

The current msg->con scheme was introduced by commits 38941f8031
("libceph: have messages point to their connection") and 92ce034b5a
("libceph: have messages take a connection reference"), which defined
when messages get associated with a connection and when that
association goes away.  Part of the problem is that this association is
supposed to go away in much too many places; closing this race entirely
requires either a rework of the existing or an addition of a new layer
of synchronization.

In lieu of that, we can make it *much* less likely to hit by
disassociating messages only on their destruction and resend through
a different connection.  This makes the code simpler and is probably
a good thing to do regardless - this patch adds a msg_con_set() helper
which is is called from only three places: ceph_con_send() and
ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear
it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:46 +01:00
..
6lowpan 6lowpan: move module_init into core functionality 2015-08-11 22:05:36 +02:00
9p net/9p: Remove ib_get_dma_mr calls 2015-08-30 18:12:36 -04:00
802 net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
8021q net: 8021q: convert to using IFF_NO_QUEUE 2015-08-18 11:55:06 -07:00
appletalk net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
atm atm: deal with setting entry before mkip was called 2015-09-17 22:13:32 -07:00
ax25 NET: AX.25: Stop heartbeat timer on disconnect. 2015-07-15 15:59:58 -07:00
batman-adv batman-adv: turn batadv_neigh_node_get() into local function 2015-08-27 20:15:34 +02:00
bluetooth Bluetooth: Fix initializing conn_params in scan phase 2015-10-16 09:24:41 +02:00
bridge bridge: fix igmpv3 / mldv2 report parsing 2015-09-11 15:08:20 -07:00
caif net: caif: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
can can: replace timestamp as unique skb attribute 2015-07-12 21:13:22 +02:00
ceph libceph: clear msg->con in ceph_msg_release() only 2015-11-02 23:37:46 +01:00
core openvswitch: Fix egress tunnel info. 2015-10-22 19:39:25 -07:00
dcb net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dccp tcp/dccp: fix timewait races in timer handling 2015-09-21 16:32:29 -07:00
decnet net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
dns_resolver
dsa net: dsa: exit probe if no switch were found 2015-10-07 04:56:11 -07:00
ethernet flow_dissector: Add flags argument to skb_flow_dissector functions 2015-09-01 15:06:22 -07:00
hsr net: hsr: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
ieee802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-08-29 13:15:03 -07:00
ipv4 fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key 2015-10-27 18:14:51 -07:00
ipv6 ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues 2015-10-29 07:01:50 -07:00
ipx net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
irda irda: precedence bug in irlmp_seq_hb_idx() 2015-10-21 07:48:26 -07:00
iucv net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
key af_key: fix two typos 2015-10-23 03:05:19 -07:00
l2tp l2tp: protect tunnel->del_work by ref_count 2015-09-28 22:39:10 -07:00
lapb
llc tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
mac80211 mac80211: Fix hwflags debugfs file format 2015-10-13 10:30:56 +02:00
mac802154 ieee802154: add ack request default handling 2015-08-10 20:43:06 +02:00
mpls mpls: fix mpls_net_init memory leak 2015-08-31 12:45:09 -07:00
netfilter netfilter: ipset: Fix sleeping memory allocation in atomic context 2015-10-17 13:01:24 +02:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink netlink: fix locking around NETLINK_LIST_MEMBERSHIPS 2015-10-22 07:18:28 -07:00
netrom netfilter: Remove spurios included of netfilter.h 2015-06-18 21:14:32 +02:00
nfc nfc: netlink: Add capability to reply to vendor_cmd with data 2015-08-20 22:00:11 +02:00
openvswitch openvswitch: Fix skb leak using IPv6 defrag 2015-10-27 19:32:18 -07:00
packet Fix AF_PACKET ABI breakage in 4.2 2015-09-23 14:33:55 -07:00
phonet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rds RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv 2015-10-27 19:46:34 -07:00
rfkill rfkill: Copy "all" global state to other types 2015-09-04 14:26:56 +02:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
rxrpc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sched sch_hhf: fix return value of hhf_drop() 2015-10-11 04:49:33 -07:00
sctp net: sctp: Don't use 64 kilobyte lookup table for four elements 2015-09-28 22:52:21 -07:00
sunrpc Changes for 4.3-rc5 2015-10-15 13:44:35 -07:00
switchdev switchdev: check if the vlan id is in the proper vlan range 2015-10-13 04:43:24 -07:00
tipc tipc: conditionally expand buffer headroom over udp tunnel 2015-10-21 19:13:48 -07:00
unix net/unix: fix logic about sk_peek_offset 2015-10-05 06:33:09 -07:00
vmw_vsock VSOCK: Fix lockdep issue. 2015-10-22 18:26:29 -07:00
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless cfg80211: regulatory: restore proper user alpha2 2015-09-04 14:29:25 +02:00
x25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
xfrm xfrm: Fix state threshold configuration from userspace 2015-09-29 11:45:55 +02:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
Kconfig lwtunnel: infrastructure for handling light weight tunnels like mpls 2015-07-21 10:39:03 -07:00
Makefile mpls: Refactor how the mpls module is built 2015-03-04 00:26:06 -05:00
socket.c net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
sysctl_net.c net: sysctl: fix a kmemleak warning 2015-10-23 06:22:08 -07:00