From 8a4f300b978edbbaa73ef9eca660e45eb9f13873 Mon Sep 17 00:00:00 2001
From: Kamal Heib <kamalheib1@gmail.com>
Date: Wed, 5 Feb 2020 13:05:30 +0200
Subject: [PATCH 01/15] RDMA/hfi1: Fix memory leak in
 _dev_comp_vect_mappings_create

Make sure to free the allocated cpumask_var_t's to avoid the following
reported memory leak by kmemleak:

$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8897f812d6a8 (size 8):
  comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0
    [<0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1]
    [<0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1]
    [<0000000071be7e52>] init_one+0x25e/0xf10 [hfi1]
    [<000000005483d4c2>] local_pci_probe+0xd4/0x180
    [<000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0
    [<000000001d626905>] process_one_work+0x8f0/0x17b0
    [<000000007e569e7e>] worker_thread+0x536/0xb50
    [<00000000fd39a4a5>] kthread+0x30c/0x3d0
    [<0000000056f2edb3>] ret_from_fork+0x3a/0x50

Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/hfi1/affinity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index c142b23bb401..1aeea5d65c01 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -479,6 +479,8 @@ static int _dev_comp_vect_mappings_create(struct hfi1_devdata *dd,
 			  rvt_get_ibdev_name(&(dd)->verbs_dev.rdi), i, cpu);
 	}
 
+	free_cpumask_var(available_cpus);
+	free_cpumask_var(non_intr_cpus);
 	return 0;
 
 fail:

From a70ed0f2e6262e723ae8d70accb984ba309eacc2 Mon Sep 17 00:00:00 2001
From: Kaike Wan <kaike.wan@intel.com>
Date: Mon, 10 Feb 2020 08:10:26 -0500
Subject: [PATCH 02/15] IB/hfi1: Acquire lock to release TID entries when user
 file is closed

Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.

This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.

Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/hfi1/user_exp_rcv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f05742ac0949..2443423585b3 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -142,10 +142,12 @@ void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd)
 {
 	struct hfi1_ctxtdata *uctxt = fd->uctxt;
 
+	mutex_lock(&uctxt->exp_mutex);
 	if (!EXP_TID_SET_EMPTY(uctxt->tid_full_list))
 		unlock_exp_tids(uctxt, &uctxt->tid_full_list, fd);
 	if (!EXP_TID_SET_EMPTY(uctxt->tid_used_list))
 		unlock_exp_tids(uctxt, &uctxt->tid_used_list, fd);
+	mutex_unlock(&uctxt->exp_mutex);
 
 	kfree(fd->invalid_tids);
 	fd->invalid_tids = NULL;

From be8638344c70bf492963ace206a9896606b6922d Mon Sep 17 00:00:00 2001
From: Mike Marciniszyn <mike.marciniszyn@intel.com>
Date: Mon, 10 Feb 2020 08:10:33 -0500
Subject: [PATCH 03/15] IB/hfi1: Close window for pq and request coliding

Cleaning up a pq can result in the following warning and panic:

  WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
  list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
   nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  Call Trace:
   [<ffffffff90365ac0>] dump_stack+0x19/0x1b
   [<ffffffff8fc98b78>] __warn+0xd8/0x100
   [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80
   [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0
   [<ffffffff8ff9713d>] list_del+0xd/0x30
   [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110
   [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
   [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
   [<ffffffff8fe4519c>] __fput+0xec/0x260
   [<ffffffff8fe453fe>] ____fput+0xe/0x10
   [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
   [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
   [<ffffffff90379134>] int_signal+0x12/0x17
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
  PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
   nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
  RIP: 0010:[<ffffffff8fe1f93e>]  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
  RSP: 0018:ffff88b5393abd60  EFLAGS: 00010287
  RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
  RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
  RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
  R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
  R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
  FS:  00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80
   [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110
   [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
   [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
   [<ffffffff8fe4519c>] __fput+0xec/0x260
   [<ffffffff8fe453fe>] ____fput+0xe/0x10
   [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
   [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
   [<ffffffff90379134>] int_signal+0x12/0x17
  Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
  RIP  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
   RSP <ffff88b5393abd60>
  CR2: 0000000000000010

The panic is the result of slab entries being freed during the destruction
of the pq slab.

The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.

Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.

Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/hfi1/file_ops.c     | 52 ++++++++++++++---------
 drivers/infiniband/hw/hfi1/hfi.h          |  5 ++-
 drivers/infiniband/hw/hfi1/user_exp_rcv.c |  3 --
 drivers/infiniband/hw/hfi1/user_sdma.c    | 17 +++++---
 4 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index bef6946861b2..259115886d35 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -200,23 +200,24 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
 
 	fd = kzalloc(sizeof(*fd), GFP_KERNEL);
 
-	if (fd) {
-		fd->rec_cpu_num = -1; /* no cpu affinity by default */
-		fd->mm = current->mm;
-		mmgrab(fd->mm);
-		fd->dd = dd;
-		kobject_get(&fd->dd->kobj);
-		fp->private_data = fd;
-	} else {
-		fp->private_data = NULL;
-
-		if (atomic_dec_and_test(&dd->user_refcount))
-			complete(&dd->user_comp);
-
-		return -ENOMEM;
-	}
-
+	if (!fd || init_srcu_struct(&fd->pq_srcu))
+		goto nomem;
+	spin_lock_init(&fd->pq_rcu_lock);
+	spin_lock_init(&fd->tid_lock);
+	spin_lock_init(&fd->invalid_lock);
+	fd->rec_cpu_num = -1; /* no cpu affinity by default */
+	fd->mm = current->mm;
+	mmgrab(fd->mm);
+	fd->dd = dd;
+	kobject_get(&fd->dd->kobj);
+	fp->private_data = fd;
 	return 0;
+nomem:
+	kfree(fd);
+	fp->private_data = NULL;
+	if (atomic_dec_and_test(&dd->user_refcount))
+		complete(&dd->user_comp);
+	return -ENOMEM;
 }
 
 static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
@@ -301,21 +302,30 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
 static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
 {
 	struct hfi1_filedata *fd = kiocb->ki_filp->private_data;
-	struct hfi1_user_sdma_pkt_q *pq = fd->pq;
+	struct hfi1_user_sdma_pkt_q *pq;
 	struct hfi1_user_sdma_comp_q *cq = fd->cq;
 	int done = 0, reqs = 0;
 	unsigned long dim = from->nr_segs;
+	int idx;
 
-	if (!cq || !pq)
+	idx = srcu_read_lock(&fd->pq_srcu);
+	pq = srcu_dereference(fd->pq, &fd->pq_srcu);
+	if (!cq || !pq) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -EIO;
+	}
 
-	if (!iter_is_iovec(from) || !dim)
+	if (!iter_is_iovec(from) || !dim) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -EINVAL;
+	}
 
 	trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim);
 
-	if (atomic_read(&pq->n_reqs) == pq->n_max_reqs)
+	if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -ENOSPC;
+	}
 
 	while (dim) {
 		int ret;
@@ -333,6 +343,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
 		reqs++;
 	}
 
+	srcu_read_unlock(&fd->pq_srcu, idx);
 	return reqs;
 }
 
@@ -707,6 +718,7 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
 	if (atomic_dec_and_test(&dd->user_refcount))
 		complete(&dd->user_comp);
 
+	cleanup_srcu_struct(&fdata->pq_srcu);
 	kfree(fdata);
 	return 0;
 }
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 6365e8ffed9d..cae12f416ca0 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1444,10 +1444,13 @@ struct mmu_rb_handler;
 
 /* Private data for file operations */
 struct hfi1_filedata {
+	struct srcu_struct pq_srcu;
 	struct hfi1_devdata *dd;
 	struct hfi1_ctxtdata *uctxt;
 	struct hfi1_user_sdma_comp_q *cq;
-	struct hfi1_user_sdma_pkt_q *pq;
+	/* update side lock for SRCU */
+	spinlock_t pq_rcu_lock;
+	struct hfi1_user_sdma_pkt_q __rcu *pq;
 	u16 subctxt;
 	/* for cpu affinity; -1 if none */
 	int rec_cpu_num;
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index 2443423585b3..4da03f823474 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -87,9 +87,6 @@ int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd,
 {
 	int ret = 0;
 
-	spin_lock_init(&fd->tid_lock);
-	spin_lock_init(&fd->invalid_lock);
-
 	fd->entry_to_rb = kcalloc(uctxt->expected_count,
 				  sizeof(struct rb_node *),
 				  GFP_KERNEL);
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index fd754a16475a..c2f0d9ba93de 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -179,7 +179,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
 	pq = kzalloc(sizeof(*pq), GFP_KERNEL);
 	if (!pq)
 		return -ENOMEM;
-
 	pq->dd = dd;
 	pq->ctxt = uctxt->ctxt;
 	pq->subctxt = fd->subctxt;
@@ -236,7 +235,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
 		goto pq_mmu_fail;
 	}
 
-	fd->pq = pq;
+	rcu_assign_pointer(fd->pq, pq);
 	fd->cq = cq;
 
 	return 0;
@@ -264,8 +263,14 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
 
 	trace_hfi1_sdma_user_free_queues(uctxt->dd, uctxt->ctxt, fd->subctxt);
 
-	pq = fd->pq;
+	spin_lock(&fd->pq_rcu_lock);
+	pq = srcu_dereference_check(fd->pq, &fd->pq_srcu,
+				    lockdep_is_held(&fd->pq_rcu_lock));
 	if (pq) {
+		rcu_assign_pointer(fd->pq, NULL);
+		spin_unlock(&fd->pq_rcu_lock);
+		synchronize_srcu(&fd->pq_srcu);
+		/* at this point there can be no more new requests */
 		if (pq->handler)
 			hfi1_mmu_rb_unregister(pq->handler);
 		iowait_sdma_drain(&pq->busy);
@@ -277,7 +282,8 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
 		kfree(pq->req_in_use);
 		kmem_cache_destroy(pq->txreq_cache);
 		kfree(pq);
-		fd->pq = NULL;
+	} else {
+		spin_unlock(&fd->pq_rcu_lock);
 	}
 	if (fd->cq) {
 		vfree(fd->cq->comps);
@@ -321,7 +327,8 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
 {
 	int ret = 0, i;
 	struct hfi1_ctxtdata *uctxt = fd->uctxt;
-	struct hfi1_user_sdma_pkt_q *pq = fd->pq;
+	struct hfi1_user_sdma_pkt_q *pq =
+		srcu_dereference(fd->pq, &fd->pq_srcu);
 	struct hfi1_user_sdma_comp_q *cq = fd->cq;
 	struct hfi1_devdata *dd = pq->dd;
 	unsigned long idx = 0;

From f92e48718889b3d49cee41853402aa88cac84a6b Mon Sep 17 00:00:00 2001
From: Kaike Wan <kaike.wan@intel.com>
Date: Mon, 10 Feb 2020 08:10:40 -0500
Subject: [PATCH 04/15] IB/rdmavt: Reset all QPs when the device is shut down

When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
  IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
  PGD 0
  Oops: 0000 1 SMP
  Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
  sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
  CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
  Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
  task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
  RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
  RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
  RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
  RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
  RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
  R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
  R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
  FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
  ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
  ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
  ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
  Call Trace:
  IRQ

  [ffffffff810a6b85] queue_work_on+0x45/0x50
  [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
  [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffff81097316] call_timer_fn+0x36/0x110
  [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
  [ffffffff8109982d] run_timer_softirq+0x22d/0x310
  [ffffffff81090b3f] __do_softirq+0xef/0x280
  [ffffffff816b6a5c] call_softirq+0x1c/0x30
  [ffffffff8102d3c5] do_softirq+0x65/0xa0
  [ffffffff81090ec5] irq_exit+0x105/0x110
  [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
  [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
  EOI

  [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
  [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
  [ffffffff81034fee] arch_cpu_idle+0xe/0x30
  [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
  [ffffffff81051af6] start_secondary+0x1b6/0x230
  Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
  RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
  RSP ffff88105df43d48
  CR2: 0000000000000102

The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.

Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/sw/rdmavt/qp.c | 84 +++++++++++++++++++------------
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 3cdf75d0c7a4..7858d499db03 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -61,6 +61,8 @@
 #define RVT_RWQ_COUNT_THRESHOLD 16
 
 static void rvt_rc_timeout(struct timer_list *t);
+static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			 enum ib_qp_type type);
 
 /*
  * Convert the AETH RNR timeout code into the number of microseconds.
@@ -452,40 +454,41 @@ int rvt_driver_qp_init(struct rvt_dev_info *rdi)
 }
 
 /**
- * free_all_qps - check for QPs still in use
+ * rvt_free_qp_cb - callback function to reset a qp
+ * @qp: the qp to reset
+ * @v: a 64-bit value
+ *
+ * This function resets the qp and removes it from the
+ * qp hash table.
+ */
+static void rvt_free_qp_cb(struct rvt_qp *qp, u64 v)
+{
+	unsigned int *qp_inuse = (unsigned int *)v;
+	struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device);
+
+	/* Reset the qp and remove it from the qp hash list */
+	rvt_reset_qp(rdi, qp, qp->ibqp.qp_type);
+
+	/* Increment the qp_inuse count */
+	(*qp_inuse)++;
+}
+
+/**
+ * rvt_free_all_qps - check for QPs still in use
  * @rdi: rvt device info structure
  *
  * There should not be any QPs still in use.
  * Free memory for table.
+ * Return the number of QPs still in use.
  */
 static unsigned rvt_free_all_qps(struct rvt_dev_info *rdi)
 {
-	unsigned long flags;
-	struct rvt_qp *qp;
-	unsigned n, qp_inuse = 0;
-	spinlock_t *ql; /* work around too long line below */
-
-	if (rdi->driver_f.free_all_qps)
-		qp_inuse = rdi->driver_f.free_all_qps(rdi);
+	unsigned int qp_inuse = 0;
 
 	qp_inuse += rvt_mcast_tree_empty(rdi);
 
-	if (!rdi->qp_dev)
-		return qp_inuse;
+	rvt_qp_iter(rdi, (u64)&qp_inuse, rvt_free_qp_cb);
 
-	ql = &rdi->qp_dev->qpt_lock;
-	spin_lock_irqsave(ql, flags);
-	for (n = 0; n < rdi->qp_dev->qp_table_size; n++) {
-		qp = rcu_dereference_protected(rdi->qp_dev->qp_table[n],
-					       lockdep_is_held(ql));
-		RCU_INIT_POINTER(rdi->qp_dev->qp_table[n], NULL);
-
-		for (; qp; qp = rcu_dereference_protected(qp->next,
-							  lockdep_is_held(ql)))
-			qp_inuse++;
-	}
-	spin_unlock_irqrestore(ql, flags);
-	synchronize_rcu();
 	return qp_inuse;
 }
 
@@ -902,14 +905,14 @@ static void rvt_init_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
 }
 
 /**
- * rvt_reset_qp - initialize the QP state to the reset state
+ * _rvt_reset_qp - initialize the QP state to the reset state
  * @qp: the QP to reset
  * @type: the QP type
  *
  * r_lock, s_hlock, and s_lock are required to be held by the caller
  */
-static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
-			 enum ib_qp_type type)
+static void _rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			  enum ib_qp_type type)
 	__must_hold(&qp->s_lock)
 	__must_hold(&qp->s_hlock)
 	__must_hold(&qp->r_lock)
@@ -955,6 +958,27 @@ static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
 	lockdep_assert_held(&qp->s_lock);
 }
 
+/**
+ * rvt_reset_qp - initialize the QP state to the reset state
+ * @rdi: the device info
+ * @qp: the QP to reset
+ * @type: the QP type
+ *
+ * This is the wrapper function to acquire the r_lock, s_hlock, and s_lock
+ * before calling _rvt_reset_qp().
+ */
+static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			 enum ib_qp_type type)
+{
+	spin_lock_irq(&qp->r_lock);
+	spin_lock(&qp->s_hlock);
+	spin_lock(&qp->s_lock);
+	_rvt_reset_qp(rdi, qp, type);
+	spin_unlock(&qp->s_lock);
+	spin_unlock(&qp->s_hlock);
+	spin_unlock_irq(&qp->r_lock);
+}
+
 /** rvt_free_qpn - Free a qpn from the bit map
  * @qpt: QP table
  * @qpn: queue pair number to free
@@ -1546,7 +1570,7 @@ int rvt_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	switch (new_state) {
 	case IB_QPS_RESET:
 		if (qp->state != IB_QPS_RESET)
-			rvt_reset_qp(rdi, qp, ibqp->qp_type);
+			_rvt_reset_qp(rdi, qp, ibqp->qp_type);
 		break;
 
 	case IB_QPS_RTR:
@@ -1695,13 +1719,7 @@ int rvt_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	struct rvt_qp *qp = ibqp_to_rvtqp(ibqp);
 	struct rvt_dev_info *rdi = ib_to_rvt(ibqp->device);
 
-	spin_lock_irq(&qp->r_lock);
-	spin_lock(&qp->s_hlock);
-	spin_lock(&qp->s_lock);
 	rvt_reset_qp(rdi, qp, ibqp->qp_type);
-	spin_unlock(&qp->s_lock);
-	spin_unlock(&qp->s_hlock);
-	spin_unlock_irq(&qp->r_lock);
 
 	wait_event(qp->wait, !atomic_read(&qp->refcount));
 	/* qpn is now available for use again */

From a72f4ac1d778f7bde93dfee69bfc23377ec3d74f Mon Sep 17 00:00:00 2001
From: Avihai Horon <avihaih@mellanox.com>
Date: Sun, 26 Jan 2020 19:15:00 +0200
Subject: [PATCH 05/15] RDMA/core: Fix invalid memory access in
 spec_filter_size

Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory.  The following crash from syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:memchr_inv+0xd3/0x330
  Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
  ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01
  00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
  RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
  RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
  RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
  R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
  R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
  FS:  00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   spec_filter_size.part.16+0x34/0x50
   ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
   ib_uverbs_ex_create_flow+0x9ea/0x1b40
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x465b49
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
  RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
  R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
  Modules linked in:
  Dumping ftrace buffer:
     (ftrace buffer empty)

Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index c8693f5231dd..025933752e1d 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2745,12 +2745,6 @@ static int kern_spec_to_ib_spec_action(struct uverbs_attr_bundle *attrs,
 	return 0;
 }
 
-static size_t kern_spec_filter_sz(const struct ib_uverbs_flow_spec_hdr *spec)
-{
-	/* Returns user space filter size, includes padding */
-	return (spec->size - sizeof(struct ib_uverbs_flow_spec_hdr)) / 2;
-}
-
 static ssize_t spec_filter_size(const void *kern_spec_filter, u16 kern_filter_size,
 				u16 ib_real_filter_sz)
 {
@@ -2894,11 +2888,16 @@ int ib_uverbs_kern_spec_to_ib_spec_filter(enum ib_flow_spec_type type,
 static int kern_spec_to_ib_spec_filter(struct ib_uverbs_flow_spec *kern_spec,
 				       union ib_flow_spec *ib_spec)
 {
-	ssize_t kern_filter_sz;
+	size_t kern_filter_sz;
 	void *kern_spec_mask;
 	void *kern_spec_val;
 
-	kern_filter_sz = kern_spec_filter_sz(&kern_spec->hdr);
+	if (check_sub_overflow((size_t)kern_spec->hdr.size,
+			       sizeof(struct ib_uverbs_flow_spec_hdr),
+			       &kern_filter_sz))
+		return -EINVAL;
+
+	kern_filter_sz /= 2;
 
 	kern_spec_val = (void *)kern_spec +
 		sizeof(struct ib_uverbs_flow_spec_hdr);

From 10189e8e6fe8dcde13435f9354800429c4474fb1 Mon Sep 17 00:00:00 2001
From: Mark Zhang <markz@mellanox.com>
Date: Sun, 26 Jan 2020 19:17:08 +0200
Subject: [PATCH 06/15] IB/mlx5: Return failure when rts2rts_qp_counters_set_id
 is not supported

When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.

This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.

Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index a4f8e7030787..957f3a52589b 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3441,9 +3441,6 @@ static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
 	struct mlx5_ib_qp_base *base;
 	u32 set_id;
 
-	if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id))
-		return 0;
-
 	if (counter)
 		set_id = counter->id;
 	else
@@ -6576,6 +6573,7 @@ void mlx5_ib_drain_rq(struct ib_qp *qp)
  */
 int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter)
 {
+	struct mlx5_ib_dev *dev = to_mdev(qp->device);
 	struct mlx5_ib_qp *mqp = to_mqp(qp);
 	int err = 0;
 
@@ -6585,6 +6583,11 @@ int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter)
 		goto out;
 	}
 
+	if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id)) {
+		err = -EOPNOTSUPP;
+		goto out;
+	}
+
 	if (mqp->state == IB_QPS_RTS) {
 		err = __mlx5_ib_qp_set_counter(qp, counter);
 		if (!err)

From d219face9059f38ad187bde133451a2a308fdb7c Mon Sep 17 00:00:00 2001
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
Date: Tue, 4 Feb 2020 14:42:30 +0530
Subject: [PATCH 07/15] RDMA/iw_cxgb4: initiate CLOSE when entering TERM

As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.

In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:

 Call Trace:
  schedule+0x2f/0xa0
  schedule_preempt_disabled+0xa/0x10
  __mutex_lock.isra.5+0x2d0/0x4a0
  c4iw_ep_disconnect+0x39/0x430    => tries to reacquire ep lock again
  c4iw_modify_qp+0x468/0x10d0
  rx_data+0x218/0x570              => acquires ep lock
  process_work+0x5f/0x70
  process_one_work+0x1a7/0x3b0
  worker_thread+0x30/0x390
  kthread+0x112/0x130
  ret_from_fork+0x35/0x40

Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 ++++
 drivers/infiniband/hw/cxgb4/qp.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index ee1182f9b627..d69dece3b1d5 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3036,6 +3036,10 @@ static int terminate(struct c4iw_dev *dev, struct sk_buff *skb)
 				       C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
 		}
 
+		/* As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3,
+		 * when entering the TERM state the RNIC MUST initiate a CLOSE.
+		 */
+		c4iw_ep_disconnect(ep, 1, GFP_KERNEL);
 		c4iw_put_ep(&ep->com);
 	} else
 		pr_warn("TERM received tid %u no ep/qp\n", tid);
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index bbcac539777a..89ac2f9ae6dd 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1948,10 +1948,10 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 			qhp->attr.layer_etype = attrs->layer_etype;
 			qhp->attr.ecode = attrs->ecode;
 			ep = qhp->ep;
-			c4iw_get_ep(&ep->com);
-			disconnect = 1;
 			if (!internal) {
+				c4iw_get_ep(&ep->com);
 				terminate = 1;
+				disconnect = 1;
 			} else {
 				terminate = qhp->attr.send_term;
 				ret = rdma_fini(rhp, qhp, ep);

From 663218a3e715fd9339d143a3e10088316b180f4f Mon Sep 17 00:00:00 2001
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
Date: Fri, 7 Feb 2020 19:44:29 +0530
Subject: [PATCH 08/15] RDMA/siw: Remove unwanted WARN_ON in
 siw_cm_llp_data_ready()

Warnings like below can fill up the dmesg while disconnecting RDMA
connections.
Hence, remove the unwanted WARN_ON.

  WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  Call Trace:
   <IRQ>
   tcp_data_queue+0x226/0xb40
   tcp_rcv_established+0x220/0x620
   tcp_v4_do_rcv+0x12a/0x1e0
   tcp_v4_rcv+0xb05/0xc00
   ip_local_deliver_finish+0x69/0x210
   ip_local_deliver+0x6b/0xe0
   ip_rcv+0x273/0x362
   __netif_receive_skb_core+0xb35/0xc30
   netif_receive_skb_internal+0x3d/0xb0
   napi_gro_frags+0x13b/0x200
   t4_ethrx_handler+0x433/0x7d0 [cxgb4]
   process_responses+0x318/0x580 [cxgb4]
   napi_rx_handler+0x14/0x100 [cxgb4]
   net_rx_action+0x149/0x3b0
   __do_softirq+0xe3/0x30a
   irq_exit+0x100/0x110
   do_IRQ+0x7f/0xe0
   common_interrupt+0xf/0xf
   </IRQ>

Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 0c3f0588346e..c5651a96b196 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1225,10 +1225,9 @@ static void siw_cm_llp_data_ready(struct sock *sk)
 	read_lock(&sk->sk_callback_lock);
 
 	cep = sk_to_cep(sk);
-	if (!cep) {
-		WARN_ON(1);
+	if (!cep)
 		goto out;
-	}
+
 	siw_dbg_cep(cep, "state: %d\n", cep->state);
 
 	switch (cep->state) {

From a0767da7774d91a668f9c223cec3e76172cd833b Mon Sep 17 00:00:00 2001
From: Michael Guralnik <michaelgur@mellanox.com>
Date: Wed, 12 Feb 2020 09:26:31 +0200
Subject: [PATCH 09/15] RDMA/core: Add missing list deletion on freeing event
 queue

When the uobject file scheme was revised to allow device disassociation
from the file it became possible for read() to still happen the driver
destroys the uobject.

The old clode code was not tolerant to concurrent read, and when it was
moved to the driver destroy it creates a bug.

Ensure the event_list is empty after driver destroy by adding the missing
list_del(). Otherwise read() can trigger a use after free and double
kfree.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/core/uverbs_std_types.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index 994d8744b246..3abfc63225cb 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -220,6 +220,7 @@ void ib_uverbs_free_event_queue(struct ib_uverbs_event_queue *event_queue)
 	list_for_each_entry_safe(entry, tmp, &event_queue->event_list, list) {
 		if (entry->counter)
 			list_del(&entry->obj_list);
+		list_del(&entry->list);
 		kfree(entry);
 	}
 	spin_unlock_irq(&event_queue->lock);

From a8af8694a5e8ddaaef4bd7b6426c12b7759c846c Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih@mellanox.com>
Date: Wed, 12 Feb 2020 09:26:32 +0200
Subject: [PATCH 10/15] RDMA/mlx5: Fix async events cleanup flows

As in the prior patch, the devx code is not fully cleaning up its
event_lists before finishing driver_destroy allowing a later read to
trigger user after free conditions.

Re-arrange things so that the event_list is always empty after destroy and
ensure it remains empty until the file is closed.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 51 +++++++++++++++++--------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index d7efc9f6daf0..46e1ab771f10 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2319,14 +2319,12 @@ static int deliver_event(struct devx_event_subscription *event_sub,
 
 	if (ev_file->omit_data) {
 		spin_lock_irqsave(&ev_file->lock, flags);
-		if (!list_empty(&event_sub->event_list)) {
+		if (!list_empty(&event_sub->event_list) ||
+		    ev_file->is_destroyed) {
 			spin_unlock_irqrestore(&ev_file->lock, flags);
 			return 0;
 		}
 
-		/* is_destroyed is ignored here because we don't have any memory
-		 * allocation to clean up for the omit_data case
-		 */
 		list_add_tail(&event_sub->event_list, &ev_file->event_list);
 		spin_unlock_irqrestore(&ev_file->lock, flags);
 		wake_up_interruptible(&ev_file->poll_wait);
@@ -2473,11 +2471,11 @@ static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
 			return -ERESTARTSYS;
 		}
 
-		if (list_empty(&ev_queue->event_list) &&
-		    ev_queue->is_destroyed)
-			return -EIO;
-
 		spin_lock_irq(&ev_queue->lock);
+		if (ev_queue->is_destroyed) {
+			spin_unlock_irq(&ev_queue->lock);
+			return -EIO;
+		}
 	}
 
 	event = list_entry(ev_queue->event_list.next,
@@ -2551,10 +2549,6 @@ static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
 		return -EOVERFLOW;
 	}
 
-	if (ev_file->is_destroyed) {
-		spin_unlock_irq(&ev_file->lock);
-		return -EIO;
-	}
 
 	while (list_empty(&ev_file->event_list)) {
 		spin_unlock_irq(&ev_file->lock);
@@ -2667,8 +2661,10 @@ static int devx_async_cmd_event_destroy_uobj(struct ib_uobject *uobj,
 
 	spin_lock_irq(&comp_ev_file->ev_queue.lock);
 	list_for_each_entry_safe(entry, tmp,
-				 &comp_ev_file->ev_queue.event_list, list)
+				 &comp_ev_file->ev_queue.event_list, list) {
+		list_del(&entry->list);
 		kvfree(entry);
+	}
 	spin_unlock_irq(&comp_ev_file->ev_queue.lock);
 	return 0;
 };
@@ -2680,11 +2676,29 @@ static int devx_async_event_destroy_uobj(struct ib_uobject *uobj,
 		container_of(uobj, struct devx_async_event_file,
 			     uobj);
 	struct devx_event_subscription *event_sub, *event_sub_tmp;
-	struct devx_async_event_data *entry, *tmp;
 	struct mlx5_ib_dev *dev = ev_file->dev;
 
 	spin_lock_irq(&ev_file->lock);
 	ev_file->is_destroyed = 1;
+
+	/* free the pending events allocation */
+	if (ev_file->omit_data) {
+		struct devx_event_subscription *event_sub, *tmp;
+
+		list_for_each_entry_safe(event_sub, tmp, &ev_file->event_list,
+					 event_list)
+			list_del_init(&event_sub->event_list);
+
+	} else {
+		struct devx_async_event_data *entry, *tmp;
+
+		list_for_each_entry_safe(entry, tmp, &ev_file->event_list,
+					 list) {
+			list_del(&entry->list);
+			kfree(entry);
+		}
+	}
+
 	spin_unlock_irq(&ev_file->lock);
 	wake_up_interruptible(&ev_file->poll_wait);
 
@@ -2699,15 +2713,6 @@ static int devx_async_event_destroy_uobj(struct ib_uobject *uobj,
 	}
 	mutex_unlock(&dev->devx_event_table.event_xa_lock);
 
-	/* free the pending events allocation */
-	if (!ev_file->omit_data) {
-		spin_lock_irq(&ev_file->lock);
-		list_for_each_entry_safe(entry, tmp,
-					 &ev_file->event_list, list)
-			kfree(entry); /* read can't come any more */
-		spin_unlock_irq(&ev_file->lock);
-	}
-
 	put_device(&dev->ib_dev.dev);
 	return 0;
 };

From 9ea04d0df6e6541c6736b43bff45f1e54875a1db Mon Sep 17 00:00:00 2001
From: Yonatan Cohen <yonatanc@mellanox.com>
Date: Wed, 12 Feb 2020 09:26:34 +0200
Subject: [PATCH 11/15] IB/umad: Fix kernel crash while unloading ib_umad

When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:

	    CPU0            	        CPU1
	 ib_umad_kill_port()        ibdev_show()
	    port->ib_dev = NULL
                                      dev_name(port->ib_dev)

The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.

  kernel stack
  PF: error_code(0x0000) - not-present page
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
  RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
  RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
  RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
  R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
  R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
  FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
  Call Trace:
   dev_attr_show+0x15/0x50
   sysfs_kf_seq_show+0xb8/0x1a0
   seq_read+0x12d/0x350
   vfs_read+0x89/0x140
   ksys_read+0x55/0xd0
   do_syscall_64+0x55/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/core/user_mad.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index d1407fa378e8..1235ffb2389b 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1312,6 +1312,9 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
 	struct ib_umad_file *file;
 	int id;
 
+	cdev_device_del(&port->sm_cdev, &port->sm_dev);
+	cdev_device_del(&port->cdev, &port->dev);
+
 	mutex_lock(&port->file_mutex);
 
 	/* Mark ib_dev NULL and block ioctl or other file ops to progress
@@ -1331,8 +1334,6 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
 
 	mutex_unlock(&port->file_mutex);
 
-	cdev_device_del(&port->sm_cdev, &port->sm_dev);
-	cdev_device_del(&port->cdev, &port->dev);
 	ida_free(&umad_ida, port->dev_num);
 
 	/* balances device_initialize() */

From 9b6d3bbc1335404b331f4f11dc896066bdf1c752 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Wed, 12 Feb 2020 09:26:35 +0200
Subject: [PATCH 12/15] RDMA/mlx5: Prevent overflow in mmap offset calculations

The cmd and index variables declared as u16 and the result is supposed to
be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) <<
16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e874d688d040..987bfdcd12a5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2283,8 +2283,8 @@ static int mlx5_ib_mmap_offset(struct mlx5_ib_dev *dev,
 
 static u64 mlx5_entry_to_mmap_offset(struct mlx5_user_mmap_entry *entry)
 {
-	u16 cmd = entry->rdma_entry.start_pgoff >> 16;
-	u16 index = entry->rdma_entry.start_pgoff & 0xFFFF;
+	u64 cmd = (entry->rdma_entry.start_pgoff >> 16) & 0xFFFF;
+	u64 index = entry->rdma_entry.start_pgoff & 0xFFFF;
 
 	return (((index >> 8) << 16) | (cmd << MLX5_IB_MMAP_CMD_SHIFT) |
 		(index & 0xFF)) << PAGE_SHIFT;

From 8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <yanjunz@mellanox.com>
Date: Wed, 12 Feb 2020 09:26:33 +0200
Subject: [PATCH 13/15] RDMA/rxe: Fix soft lockup problem due to using tasklets
 in softirq

When run stress tests with RXE, the following Call Traces often occur

  watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
  ...
  Call Trace:
  <IRQ>
  create_object+0x3f/0x3b0
  kmem_cache_alloc_node_trace+0x129/0x2d0
  __kmalloc_reserve.isra.52+0x2e/0x80
  __alloc_skb+0x83/0x270
  rxe_init_packet+0x99/0x150 [rdma_rxe]
  rxe_requester+0x34e/0x11a0 [rdma_rxe]
  rxe_do_task+0x85/0xf0 [rdma_rxe]
  tasklet_action_common.isra.21+0xeb/0x100
  __do_softirq+0xd0/0x298
  irq_exit+0xc5/0xd0
  smp_apic_timer_interrupt+0x68/0x120
  apic_timer_interrupt+0xf/0x20
  </IRQ>
  ...

The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 116cafc9afcf..4bc88708b355 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task, 1);
+						rxe_run_task(&qp->req.task, 0);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task, 1);
+		rxe_run_task(&qp->req.task, 0);
 	}
 }
 
@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task, 0);
 		}
 	}
 
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task, 1);
+					rxe_run_task(&qp->req.task, 0);
 				}
 
 				if (pkt) {

From 1dd017882e01d2fcd9c5dbbf1eb376211111c393 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Wed, 12 Feb 2020 10:06:51 +0200
Subject: [PATCH 14/15] RDMA/core: Fix protection fault in get_pkey_idx_qp_list

We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list.  The following
crash from Syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
  Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
  8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
  02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
  RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
  RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
  RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
  R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
  R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
  FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   port_pkey_list_insert+0xd7/0x7c0
   ib_security_modify_qp+0x6fa/0xfc0
   _ib_modify_qp+0x8c4/0xbf0
   modify_qp+0x10da/0x16d0
   ib_uverbs_modify_qp+0x9a/0x100
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Message-Id: <20200212080651.GB679970@unreal>
---
 drivers/infiniband/core/security.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 6eb6d2717ca5..2b4d80393bd0 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -339,22 +339,16 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
 	if (!new_pps)
 		return NULL;
 
-	if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) {
-		if (!qp_pps) {
-			new_pps->main.port_num = qp_attr->port_num;
-			new_pps->main.pkey_index = qp_attr->pkey_index;
-		} else {
-			new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ?
-						  qp_attr->port_num :
-						  qp_pps->main.port_num;
-
-			new_pps->main.pkey_index =
-					(qp_attr_mask & IB_QP_PKEY_INDEX) ?
-					 qp_attr->pkey_index :
-					 qp_pps->main.pkey_index;
-		}
+	if (qp_attr_mask & IB_QP_PORT)
+		new_pps->main.port_num =
+			(qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+	if (qp_attr_mask & IB_QP_PKEY_INDEX)
+		new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
+						      qp_attr->pkey_index;
+	if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
 		new_pps->main.state = IB_PORT_PKEY_VALID;
-	} else if (qp_pps) {
+
+	if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
 		new_pps->main.port_num = qp_pps->main.port_num;
 		new_pps->main.pkey_index = qp_pps->main.pkey_index;
 		if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)

From 685eff513183d6d64a5f413531e683d23b8b198b Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Thu, 6 Feb 2020 10:27:54 -0400
Subject: [PATCH 15/15] IB/mlx5: Use div64_u64 for num_var_hw_entries
 calculation

On i386:

ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!

Fixes: f164be8c0366 ("IB/mlx5: Extend caps stage to handle VAR capabilities")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reported-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 987bfdcd12a5..e4bcfa81b70a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6545,7 +6545,7 @@ static int mlx5_ib_init_var_table(struct mlx5_ib_dev *dev)
 					doorbell_bar_offset);
 	bar_size = (1ULL << log_doorbell_bar_size) * 4096;
 	var_table->stride_size = 1ULL << log_doorbell_stride;
-	var_table->num_var_hw_entries = bar_size / var_table->stride_size;
+	var_table->num_var_hw_entries = div64_u64(bar_size, var_table->stride_size);
 	mutex_init(&var_table->bitmap_lock);
 	var_table->bitmap = bitmap_zalloc(var_table->num_var_hw_entries,
 					  GFP_KERNEL);