Commit Graph

84 Commits

Author SHA1 Message Date
Linus Torvalds
7c225c69f8 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2 updates

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
  memory hotplug: fix comments when adding section
  mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
  mm: simplify nodemask printing
  mm,oom_reaper: remove pointless kthread_run() error check
  mm/page_ext.c: check if page_ext is not prepared
  writeback: remove unused function parameter
  mm: do not rely on preempt_count in print_vma_addr
  mm, sparse: do not swamp log with huge vmemmap allocation failures
  mm/hmm: remove redundant variable align_end
  mm/list_lru.c: mark expected switch fall-through
  mm/shmem.c: mark expected switch fall-through
  mm/page_alloc.c: broken deferred calculation
  mm: don't warn about allocations which stall for too long
  fs: fuse: account fuse_inode slab memory as reclaimable
  mm, page_alloc: fix potential false positive in __zone_watermark_ok
  mm: mlock: remove lru_add_drain_all()
  mm, sysctl: make NUMA stats configurable
  shmem: convert shmem_init_inodecache() to void
  Unify migrate_pages and move_pages access checks
  mm, pagevec: rename pagevec drained field
  ...
2017-11-15 19:42:40 -08:00
Johannes Thumshirn
3c07347841 drivers/infiniband/sw/rdmavt/qp.c: use kmalloc_array_node()
Now that we have a NUMA-aware version of kmalloc_array() we can use it
instead of kmalloc_node() without an overflow check in the size
calculation.

Link: http://lkml.kernel.org/r/20170927082038.3782-5-jthumshirn@suse.de
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Linus Torvalds
ad0835a930 Updates for 4.15 kernel merge window
- Add iWARP support to qedr driver
 - Lots of misc fixes across subsystem
 - Multiple update series to hns roce driver
 - Multiple update series to hfi1 driver
 - Updates to vnic driver
 - Add kref to wait struct in cxgb4 driver
 - Updates to i40iw driver
 - Mellanox shared pull request
 - timer_setup changes
 - massive cleanup series from Bart Van Assche
 - Two series of SRP/SRPT changes from Bart Van Assche
 - Core updates from Mellanox
 - i40iw updates
 - IPoIB updates
 - mlx5 updates
 - mlx4 updates
 - hns updates
 - bnxt_re fixes
 - PCI write padding support
 - Sparse/Smatch/warning cleanups/fixes
 - CQ moderation support
 - SRQ support in vmw_pvrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
 eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
 3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
 V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
 b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
 GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
 TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
 E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
 EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
 jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
 Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
 3JMrjjLxtCIvpUyzCvDW
 =A1oL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a fairly plain pull request. Lots of driver updates across the
  stack, a huge number of static analysis cleanups including a close to
  50 patch series from Bart Van Assche, and a number of new features
  inside the stack such as general CQ moderation support.

  Nothing really stands out, but there might be a few conflicts as you
  take things in. In particular, the cleanups touched some of the same
  lines as the new timer_setup changes.

  Everything in this pull request has been through 0day and at least two
  days of linux-next (since Stephen doesn't necessarily flag new
  errors/warnings until day2). A few more items (about 30 patches) from
  Intel and Mellanox showed up on the list on Tuesday. I've excluded
  those from this pull request, and I'm sure some of them qualify as
  fixes suitable to send any time, but I still have to review them
  fully. If they contain mostly fixes and little or no new development,
  then I will probably send them through by the end of the week just to
  get them out of the way.

  There was a break in my acceptance of patches which coincides with the
  computer problems I had, and then when I got things mostly back under
  control I had a backlog of patches to process, which I did mostly last
  Friday and Monday. So there is a larger number of patches processed in
  that timeframe than I was striving for.

  Summary:
   - Add iWARP support to qedr driver
   - Lots of misc fixes across subsystem
   - Multiple update series to hns roce driver
   - Multiple update series to hfi1 driver
   - Updates to vnic driver
   - Add kref to wait struct in cxgb4 driver
   - Updates to i40iw driver
   - Mellanox shared pull request
   - timer_setup changes
   - massive cleanup series from Bart Van Assche
   - Two series of SRP/SRPT changes from Bart Van Assche
   - Core updates from Mellanox
   - i40iw updates
   - IPoIB updates
   - mlx5 updates
   - mlx4 updates
   - hns updates
   - bnxt_re fixes
   - PCI write padding support
   - Sparse/Smatch/warning cleanups/fixes
   - CQ moderation support
   - SRQ support in vmw_pvrdma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
  RDMA/core: Rename kernel modify_cq to better describe its usage
  IB/mlx5: Add CQ moderation capability to query_device
  IB/mlx4: Add CQ moderation capability to query_device
  IB/uverbs: Add CQ moderation capability to query_device
  IB/mlx5: Exposing modify CQ callback to uverbs layer
  IB/mlx4: Exposing modify CQ callback to uverbs layer
  IB/uverbs: Allow CQ moderation with modify CQ
  iw_cxgb4: atomically flush the qp
  iw_cxgb4: only call the cq comp_handler when the cq is armed
  iw_cxgb4: Fix possible circular dependency locking warning
  RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
  IB/core: Only maintain real QPs in the security lists
  IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
  RDMA/core: Make function rdma_copy_addr return void
  RDMA/vmw_pvrdma: Add shared receive queue support
  RDMA/core: avoid uninitialized variable warning in create_udata
  RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
  RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
  RDMA/bnxt_re: Set QP state in case of response completion errors
  RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
  ...
2017-11-15 14:54:53 -08:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Doug Ledford
894b82c427 Merge branch 'timer_setup' into for-next
Conflicts:
	drivers/infiniband/hw/cxgb4/cm.c
	drivers/infiniband/hw/qib/qib_driver.c
	drivers/infiniband/hw/qib/qib_mad.c

There were minor fixups needed in these files.  Just minor context diffs
due to patches from independent sources touching the same basic area.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:12:09 -04:00
Kees Cook
a2930e5c44 IB/rdmavt: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

setup_timer() was already being called before the open-coded init_timer()
and .data assignment. These are removed as well.

Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 11:48:19 -04:00
Doug Ledford
e527ff92b6 Merge branch 'hfi1' into k.o/for-next
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:15:14 -04:00
Alex Estrin
f9586abfa3 IB/rdmavt: Don't wait for resources in QP reset
Per the IBTA spec, QP destroy shall fail if the QP is attached
to multicast groups, although the spec is silent on modify_qp
to reset state. It implies that ULP must deregister QP from
all mcast groups for destroy to succeed.
The faulty patch "IB/ipoib: Update broadcast object if PKey value
was changed in index 0" exposed two issues in rdmavt:
1. Rvt QP reset waits for qp references to go to zero.
This will hang if QP is attached to multicast groups.
2. The mcast group detach will fail for a QP in reset state
therefore preventing ULP from correcting the issue.
This patch moves the reference count wait to the the destroy QP
path and allows a QP mcast detach to work in the reset state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:13:00 -04:00
Bart Van Assche
2caaa2335b RDMA/rdmavt: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:07 -04:00
Mike Marciniszyn
0208da90de IB/rdmavt: Handle dereg of inuse MRs properly
A destroy of an MR prior to destroying the QP can cause the following
diagnostic if the QP is referencing the MR being de-registered:

hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108
              00 pd ffff880859b20b00

The solution is to when the a non-zero refcount is encountered when
the MR is destroyed the QPs needs to be iterated looking for QPs in
the same PD as the MR.  If rvt_qp_mr_clean() detects any such QP
references the rkey/lkey, the QP needs to be put into an error state
via a call to rvt_qp_error() which will trigger the clean up of any
stuck references.

This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5.

[This is reproduced with the 0.4.9 version of qperf and the rc_bw test]

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:31 -04:00
Mike Marciniszyn
4734b4f417 IB/rdmavt: Add QP iterator API for QPs
There are currently 3 spots in the qib and hfi1 driver that have
knowledge of the internal QP hash list that should only be in
scope to rdmavt QP code.

Add an iterator API for processing all QPs to hide the
nature of the RCU hashlist.

The API consists of:
- rvt_qp_iter_init()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter_next()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter()
  * For iterating all QPs

The first two are used for things like seq_file prints.

The last is for code that just needs to iterate all QPs
in the system.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:28 -04:00
Mike Marciniszyn
3aaee8ab47 IB/rdmavt: Use rvt_put_swqe() in rvt_clear_mr_ref()
hfi1 and qib were converted in previous patches, do the same for rdmavt.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:16 -04:00
Don Hiatt
13c1922288 IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDs
rvt_check_ah() delegates lid verification to underlying
driver. Underlying driver uses different conditions to
check for dlid depending on whether the device supports
extended LIDs

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Sebastian Sanchez
16570d3da0 IB/hfi1: Remove pmtu from the QP structure
The pmtu field doens't have be stored in the QP structure
as it can easily be calculated when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Mike Marciniszyn
3ffea7d8cd IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compression
The server side of qperf panics as follows:

[242446.336860] IP: report_bug+0x64/0x10
[242446.341031] PGD 1c0c067
[242446.341032] P4D 1c0c067
[242446.343951] PUD 1c0d063
[242446.346870] PMD 8587ea067
[242446.349788] PTE 800000083e14016
[242446.352901]
[242446.358352] Oops: 0003 [#1] SM
[242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1
[242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201
[242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000
[242446.465097] RIP: 0010:report_bug+0x64/0x10
[242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000
[242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000
[242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740
[242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025
[242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3
[242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000
[242446.516067] FS:  0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000
[242446.525191] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003
[242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e
[242446.539756] Call Trace
[242446.542582]  fixup_bug+0x2c/0x5
[242446.546277]  do_trap+0x12b/0x18
[242446.549972]  do_error_trap+0x89/0x11
[242446.554171]  ? hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.559324]  ? ttwu_do_wakeup+0x1e/0x14
[242446.563795]  ? ttwu_do_activate+0x77/0x8
[242446.568363]  do_invalid_op+0x20/0x3
[242446.572448]  invalid_op+0x1e/0x3
[242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004
[242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000
[242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24
[242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000
[242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000
[242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800
[242446.628293]  ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1
[242446.633449]  hfi1_rc_rcv+0x3da/0x1270 [hfi1
[242446.638312]  ? sc_buffer_alloc+0x113/0x150 [hfi1
[242446.643662]  hfi1_ib_rcv+0x1c9/0x2e0 [hfi1
[242446.648428]  process_receive_ib+0x19a/0x270 [hfi1
[242446.653866]  ? process_rcv_qp_work+0xd2/0x160 [hfi1
[242446.659505]  handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1
[242446.666693]  ? irq_finalize_oneshot+0x100/0x10
[242446.671846]  receive_context_thread+0x1b/0x140 [hfi1
[242446.677576]  irq_thread_fn+0x1e/0x4
[242446.681659]  irq_thread+0x13c/0x1b
[242446.685646]  ? irq_forced_thread_fn+0x60/0x6
[242446.690604]  kthread+0x112/0x15
[242446.694298]  ? irq_thread_check_affinity+0xe0/0xe
[242446.699738]  ? kthread_park+0x60/0x6
[242446.703919]  ? do_syscall_64+0x67/0x15
[242446.708292]  ret_from_fork+0x25/0x3
[242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0
[242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c
[242446.739935] CR2: ffffffffa06647e
[242446.743763] ---[ end trace 0e90a20d0aa494f7 ]--

The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok()
doesn't interpret the new return value from rvt_lkey_ok() properly
leading to an mr reference count underrun.

Additionally, remove an unused argument in rvt_sge_adjacent()
aw well as an unneeded incr local in rvt_post_one_wr().

Fixes: Commit 14fe13fcd3 ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Doug Ledford
03da084ed8 Merge branch 'hfi1' into k.o/for-4.14 2017-07-24 08:33:43 -04:00
Kaike Wan
a25ce4270b IB/rdmavt: Setting of QP timeout can overflow jiffies computation
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause
overflow due to the fact that the input to the function usecs_to_jiffies
is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is
equal to or greater than 30. The consequence is unnecessarily excessive
retry and thus degradation of the system performance.

This patch fixes the problem by limiting the input to 5-bit and calling
usecs_to_jiffies() before multiplying the scaling factor.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Leon Romanovsky
0f4d027c3b IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.

The patch removes the gfp flags argument and updates all driver paths.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:23 -04:00
Dennis Dalessandro
6c31e5283c IB/hfi1: Use QPN mask to avoid overflow
Ensure we can't come up with an array size that is bigger than the array
by applying the QPN mask before the divide in the free_qpn function.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Dennis Dalessandro
b2f8a04e77 IB/rdmavt: Remove duplicated functions
The free_qpn() function from the hfi1/qib driver which was the basis for
rdmavt_free_qpn() function was accidentally left in the code. Remove it.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Mike Marciniszyn
14fe13fcd3 IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()
SGEs that are contiguous needlessly consume driver dependent TX resources.

The lkey validation logic is enhanced to compress the SGE that ends
up in the send wqe when consecutive addresses are detected.

The lkey validation API used to return 1 (success) or 0 (fail).

The return value is now an -errno, 0 (compressed), or 1 (uncompressed).  A
additional argument is added to pass the last SQE for the compression.

Loopback callers always pass a NULL to last_sge since the optimization is
of little benefit in that situation.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:56:33 -04:00
Dasaratharaman Chandramouli
d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Mike Marciniszyn
44dcfa4b18 IB/rdmavt: Avoid reseting wqe send_flags in unreserve
The wqe should be read only and in fact the superfluous reset of the
RVT_SEND_RESERVE_USED flag causes an issue where reserved operations
elicit a bad completion to the ULP.

The maintenance of the flag is now entirely within rvt_post_one_wr()
where a reserved operation will set the flag and a non-reserved operation
will insure the operation that is about to be posted has the flag reset.

Fixes: Commit 856cc4c237 ("IB/hfi1: Add the capability for reserved operations")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Sebastian Sanchez
5f14e4e667 IB/rdmavt, IB/hfi1: Fix timer migration regressions
RC timeout counter isn't getting incremented.
Increment counter and add the trace for it.

Fixes: 87c23b4ab018 ("IB/rdmavt: Adding timer logic to rdmavt")
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn
2a1b7c8b8e IB/rdmavt: Add additional fields to post send trace
This fix is to get additional debugging information.

The following fields are added:
- wqe
- qpt
- num_sge
- ssn
- pid
- send_flags

These additional fields provide for more focused filtering
and triggering.

The patch also moves the trace to just before the wqe is
posted to get the most accurate information and future proofs
the code to trace all possible reserved opcodes.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Mike Marciniszyn
43a474aadb IB/rdmavt, IB/hfi1, IB/qib: Make wc opcode translation driver dependent
The work to create a completion helper moved the translation of send
wqe operations to completion opcodes to rdmvat.

This precludes having driver dependent operations.  Make the translation
driver dependent by doing the translation in the driver prior to the
rvt_qp_swqe_complete() call using restored translation tables.

Fixes: Commit f2dc9cdce8 ("IB/rdmavt: Add a send completion helper")
Fixes: Commit 0771da5a6e ("IB/hfi1,IB/qib: Use new send completion helper")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05 14:45:09 -04:00
Don Hiatt
832666c163 IB/hfi1, qib, rdmavt: Move AETH defines to rdma/ib_hdrs.h
Rename RVT AETH defines and export in rdma/ib_hdrs.h

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Don Hiatt
881fccb864 IB/hfi1: Add rvt_rnr_tbl_to_usec function
Return usec from an index into ib_rvt_rnr_table.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Venkata Sandeep Dhanalakota
11a10d4bc7 IB/rdmavt: Adding timer logic to rdmavt
To move common code across target to rdmavt for code reuse.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:39 -05:00
Brian Welty
beb5a04267 IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavt
Add rvt_rc_error() and rvt_comm_est() as shared functions in
rdmavt, moved from hfi1/qib logic.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:38 -05:00
Jim Foraker
22dccc5454 IB/rdmavt: Only put mmap_info ref if it exists
rvt_create_qp() creates qp->ip only when a qp creation request comes from
userspace (udata is not NULL).  If we exceed the number of available
queue pairs however, the error path always attempts to put a kref to this
structure.  If the requestor is inside the kernel, this leads to a crash.

We fix this by checking that qp->ip is not NULL before caling kref_put().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:16:11 -05:00
Mike Marciniszyn
f2dc9cdce8 IB/rdmavt: Add a send completion helper
This is for use by client drivers to drive
send completions into a CQ.

A new exported table allows for the mapping
of ib_wr_opcode into a ib_wc_opcode.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Parav Pandit
61347fa608 IB/rdmavt: Trivial function comment corrected.
Corrected function name in comment from qib_ to rvt_.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-03 10:55:27 -04:00
Mike Marciniszyn
68e78b3d78 IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debug
This patch adds lockdep asserts in key code paths for
insuring lock correctness.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02 08:42:10 -04:00
Mike Marciniszyn
222f7a9aac IB/rdmavt: Add qp init function
Add an rvt_qp_init() to initialize specific
common fields as the qp is created or reset.

The routine is shared by the rvt_reset_qp() and
the rvt_create_qp().

The intent is that lock dep assertions will only
appear in the rvt_reset_qp().

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02 08:42:10 -04:00
Mike Marciniszyn
30a345cc01 IB/rdmavt: Move reset calldown to reset path
The reset calldown is misplaced.

It should only be called in the code that actually
transitions the QP to reset.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02 08:42:09 -04:00
Mike Marciniszyn
eefa1d8961 IB/rdmavt: Correct sparse annotation
The __must_hold() is sufficent to correct the sparse
context imbalance inside a function.

Per Documentation/sparse.txt:
__must_hold - The specified lock is held on function entry and exit.

Fixes: Commit c0a67f6ba3 ("IB/rdmavt: Annotate rvt_reset_qp()")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02 08:42:08 -04:00
Mike Marciniszyn
4d6f85c3fa IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines
This improves readability and hides the reference count
mechanism from the client drivers.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16 14:35:27 -04:00
Mike Marciniszyn
56c8ca510d IB/rdmvat: Fix double vfree() in rvt_create_qp() error path
The unwind logic for creating a user QP has a double vfree
of the non-shared receive queue when handling a "too many qps"
failure.

The code unwinds the mmmap info by decrementing a reference
count which will call rvt_release_mmap_info() which in turn
does the vfree() of the r_rq.wq.  The unwind code then does
the same free.

Fix by guarding the vfree() with the same test that is done
in close and only do the vfree() if qp->ip is NULL.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22 15:00:42 -04:00
Ira Weiny
fe508272c9 IB/rdmavt: Eliminate redundant opcode test in mr ref clear
The use of the specific opcode test is redundant since
all ack entry users correctly manipulate the mr pointer
to selectively trigger the reference clearing.

The overly specific test hinders the use of implementation
specific operations.

The change needs to get rid of the union to insure that
an atomic value is not seen as an MR pointer.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 22:46:21 -04:00
Jianxin Xiong
d9b13c2030 IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
the server contains messages like these:

[  931.992501] svcrdma: Error -22 posting RDMA_READ
[  952.076879] svcrdma: Error -22 posting RDMA_READ
[  982.154127] svcrdma: Error -22 posting RDMA_READ
[ 1012.235884] svcrdma: Error -22 posting RDMA_READ
[ 1042.319194] svcrdma: Error -22 posting RDMA_READ

Here is why:

With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:

(1)IB_WR_REG_MR, signaled;
(2)IB_WR_RDMA_READ, signaled;
(3)IB_WR_LOCAL_INV, signaled & fencing.

These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).

This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.

To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Mike Marciniszyn
856cc4c237 IB/hfi1: Add the capability for reserved operations
This fix allows for support of in-kernel reserved operations
without impacting the ULP user.

The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.

ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.

The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Jianxin Xiong
d9f8723924 IB/rdmavt: Handle local operations in post send
Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:

(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.

(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.

Define a new a flag bit for the post send table to identify local
operations.

Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 16:00:58 -04:00
Mike Marciniszyn
2821c509fc IB/rdmavt: Use new driver specific post send table
Change rvt_post_one_wr to use the new table mechanism for
post send.

Validate that each low level driver specifies the table.

Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:45 -04:00
Mike Marciniszyn
afcf8f7647 IB/rdmavt: Add data structures and routines for table driven post send
Add flexibility for driver dependent operations in post send
because different drivers will have differing post send
operation support.

This includes data structure definitions to support a table
driven scheme along with the necessary validation routine
using the new table.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 15:47:44 -04:00
Mike Marciniszyn
c755f4afa6 IB/rdmavt: Correct qp_priv_alloc() return value test
The current drivers return errors from this calldown
wrapped in an ERR_PTR().

The rdmavt code incorrectly tests for NULL.

The code is fixed to use IS_ERR() and change ret according
to the driver return value.

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:16:15 -04:00
Ashutosh Dixit
8ae84f7c56 IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail
pointers, there is no need to zero out qp->s_ack_queue itself.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:16:15 -04:00
Brian Welty
501edc4244 IB/rdmavt: Correct warning during QPN allocation
Correct calculation of the low order bits which should be unset
based on use of qos_shift parameter when assigning QPN.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17 20:11:26 -04:00
Bart Van Assche
c0a67f6ba3 IB/rdmavt: Annotate rvt_reset_qp()
This patch avoids that sparse reports the following warning:

rdmavt/qp.c:507:17: warning: context imbalance in 'rvt_reset_qp' - unexpected unlock

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06 19:36:21 -04:00
Mike Marciniszyn
8b103e9cde IB/rdamvt: Fix rdmavt s_ack_queue sizing
rdmavt allows the driver to specify the size of the ack queue, but
only uses it for the modify QP limit testing for setting the atomic
limit value.

The driver dependent size is now used to size the s_ack_queue ring
dynamicially.

Since the driver knows its size, the driver will use its define
for any ring size dependent code.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26 12:21:10 -04:00