linux_dsm_epyc7002/drivers/infiniband/hw/bnxt_re
Selvin Xavier de5c95d0f5 RDMA/bnxt_re: Fix system crash during RDMA resource initialization
bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses
the L2 driver.

The following sequence can trigger a crash

Acquires the rtnl_lock ->
	Registers roce driver callback with L2 driver ->
		release the rtnl lock
bnxt_re acquires the rtnl_lock ->
	Request for MSIx vectors ->
		release the rtnl_lock

Issue happens when bnxt_re proceeds with remaining part of initialization
and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic.

The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are
not initialized yet,

<snip>
[ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
[ 3551.726674] PGD 0
[ 3551.726679] Oops: 0002 1 SMP
...
[ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014
[ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000
[ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>]
bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
...
[ 3551.726872] Call Trace:
[ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re]
[ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en]
[ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en]
[ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en]
[ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en]
[ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en]
[ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250
[ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0
[ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210
[ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260
[ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30
[ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290
[ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210
[ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0
[ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880
...
[ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21
...
[ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
[ 3551.727175] RSP <ffff97e3173bf788>
[ 3551.727177] CR2: 0000000000000000

Avoid this inconsistent state and  system crash by acquiring
the rtnl lock for the entire duration of device initialization.
Re-factor the code to remove the rtnl lock from the individual function
and acquire and release it from the caller.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Fixes: 6e04b10356 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-24 09:24:16 -06:00
..
bnxt_re.h infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks 2018-03-14 18:24:13 -04:00
hw_counters.c RDMA/bnxt_re: expose detailed stats retrieved from HW 2018-01-18 14:49:18 -05:00
hw_counters.h RDMA/bnxt_re: expose detailed stats retrieved from HW 2018-01-18 14:49:18 -05:00
ib_verbs.c bnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces 2018-09-05 16:08:41 -06:00
ib_verbs.h RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const 2018-07-30 20:09:34 -06:00
Kconfig bnxt_re: add MAY_USE_DEVLINK dependency 2017-07-29 14:17:48 -07:00
main.c RDMA/bnxt_re: Fix system crash during RDMA resource initialization 2018-09-24 09:24:16 -06:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
qplib_fp.c bnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces 2018-09-05 16:08:41 -06:00
qplib_fp.h RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes 2018-05-25 11:03:47 -06:00
qplib_rcfw.c RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes 2018-05-25 11:03:47 -06:00
qplib_rcfw.h RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes 2018-05-25 11:03:47 -06:00
qplib_res.c RDMA/bnxt_re: Use common error handling code in bnxt_qplib_alloc_dpi_tbl() 2018-02-01 15:24:31 -07:00
qplib_res.h bnxt_re: Make room for mapping beyond 32 entries 2017-10-18 10:24:13 -04:00
qplib_sp.c RDMA/bnxt_re: Fix a couple off by one bugs 2018-07-04 12:06:26 -06:00
qplib_sp.h RDMA/bnxt_re: expose detailed stats retrieved from HW 2018-01-18 14:49:18 -05:00
roce_hsi.h RDMA/bnxt_re: Fix incorrect DB offset calculation 2018-02-28 12:10:32 -07:00