Commit Graph

826292 Commits

Author SHA1 Message Date
Mike Marciniszyn
4c4b1996b5 IB/hfi1: Fix WQ_MEM_RECLAIM warning
The work_item cancels that occur when a QP is destroyed can elicit the
following trace:

 workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
 WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
 Call Trace:
  __flush_work.isra.29+0x8c/0x1a0
  ? __switch_to_asm+0x40/0x70
  __cancel_work_timer+0x103/0x190
  ? schedule+0x32/0x80
  iowait_cancel_work+0x15/0x30 [hfi1]
  rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
  rvt_destroy_qp+0x65/0x1f0 [rdmavt]
  ? _cond_resched+0x15/0x30
  ib_destroy_qp+0xe9/0x230 [ib_core]
  ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
  process_one_work+0x171/0x370
  worker_thread+0x49/0x3f0
  kthread+0xf8/0x130
  ? max_active_store+0x80/0x80
  ? kthread_bind+0x10/0x10
  ret_from_fork+0x35/0x40

Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.

The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
entangled with memory reclaim, so this flag is appropriate.

Fixes: 0a226edd20 ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:57:45 -03:00
Leon Romanovsky
10bf13c334 RDMA/mlx5: Remove MAYEXEC flag
MAYEXEC flag was mistakenly added in the commit cited in the fixes line.

Fixes: 4eb6ab13b9 ("RDMA: Remove rdma_user_mmap_page")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:56:55 -03:00
Ariel Levkovich
33cde96fb5 IB/mlx5: Device resource control for privileged DEVX user
For DEVX users who have SYS_RAWIO capability, we set the internal device
resources capability when creating the UCTX.  This will allow the device
to restrict the allocation of internal device resources such as SW ICM
memory to privileged DEVX users only.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
25c13324d0 IB/mlx5: Add steering SW ICM device memory type
This patch adds support for allocating, deallocating and registering a new
device memory type, STEERING_SW_ICM.  This memory can be allocated and
used by a privileged user for direct rule insertion and management of the
device's steering tables.

The type is provided by the user via the dedicated attribute in the
alloc_dm ioctl command.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
4056b12efd IB/mlx5: Warn on allocated MEMIC buffers during cleanup
Adding a warning on allocated MEMIC buffers that weren't freed prior to
driver tear down.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
3b113a1ec3 IB/mlx5: Support device memory type attribute
This patch intoruduces a new mlx5_ib driver attribute to the DM allocation
method - the DM type.

In order to allow addition of new types in downstream patches this patch
also refactors the allocation, deallocation and registration handlers to
consider the requested type and perform the necessary actions according to
it.

Since not all future device memory types will be such that are mapped to
user memory, the mandatory page index output attribute is modified to be
optional.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:50 -03:00
Leon Romanovsky
3a4ef2e2b5 RDMA/rdmavt: Catch use-after-free access of AH structures
Prior to commit d345691471 ("RDMA: Handle AH allocations by IB/core"),
AH destroy path is rdmavt returned -EBUSY warning to application and
caused to potential leakage of kernel memory of AH structure.

After that commit, the AH structure is always freed but such early return
in driver code can potentially cause to use-after-free error.

Add warning to catch such situation to help driver developers to fix AH
release path.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:06:54 -03:00
Parav Pandit
943bd984b1 RDMA/core: Allow detaching gid attribute netdevice for RoCE
When there is active traffic through a GID, a QP/AH holds reference to
this GID entry. RoCE GID entry holds reference to its attached
netdevice. Due to this when netdevice is deleted by admin user, its
refcount is not dropped.

Therefore, while deleting RoCE GID, wait for all GID attribute's netdev
users to finish accessing netdev in rcu context.  Once all users done
accessing it, release the netdev refcount.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:03 -03:00
Parav Pandit
5102eca903 net/smc: Use rdma_read_gid_l2_fields to L2 fields
Use core provided API to fill the source MAC address and use
rdma_read_gid_attr_ndev_rcu() to get stable netdev.

This is preparation patch to allow gid attribute to become NULL when
associated net device is removed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:03 -03:00
Parav Pandit
dab2175800 RDMA/rxe: Use rdma_read_gid_attr_ndev_rcu to access netdev
Use rdma_read_gid_attr_ndev_rcu() to access netdevice attached to GID
entry under rcu lock.

This ensures that while working on the netdevice of the GID, it doesn't
get freed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:03 -03:00
Parav Pandit
adb4a57a7a RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev
To access the netdevice of the GID attribute, use an existing API
rdma_read_gid_attr_ndev_rcu().

This further reduces dependency on open access to netdevice of GID
attribute.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:03 -03:00
Parav Pandit
a70c07397f RDMA: Introduce and use GID attr helper to read RoCE L2 fields
Instead of RoCE drivers figuring out vlan, smac fields while working on
QP/AH, provide a helper routine to read the L2 fields such as vlan_id and
source mac address.

This moves logic from mlx5 driver to core for wider usage for RoCE ports.

This is a preparation patch to allow detaching netdev in subsequent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:02 -03:00
Parav Pandit
8f97486024 IB/cm: Reduce dependency on gid attribute ndev check
GID type to path record type conversion can be done directly based on port
type and gid attribute type.  There is no need to find out using indirect
way by its GID attribute's ndev field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:02 -03:00
Parav Pandit
3bf3e2b881 RDMA/rxe: Consider skb reserve space based on netdev of GID
Always consider the skb reserve space based on netdevice of the GID
attribute, regardless of vlan or non vlan netdevice.

Fixes: 43c9fc509f ("rdma_rxe: make rxe work over 802.1q VLAN devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:02 -03:00
Kamal Heib
dd05cb828d RDMA: Get rid of iw_cm_verbs
Integrate iw_cm_verbs data members into ib_device_ops and ib_device
structs, this is done to achieve the following:

1) Avoid memory related bugs durring error unwind
2) Make the code more cleaner
3) Reduce code duplication

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:56:56 -03:00
Parav Pandit
eb15c78b05 RDMA/core: Do not invoke init_port on compat devices
The driver interface cannot manipulate the sysfs of the compat device,
only of the full device so we must avoid calling the driver sysfs APIs on
compat devices.

This prevents an oops:

 Call Trace:
 dump_stack+0x5a/0x73
 kobject_init+0x74/0x80
 kobject_init_and_add+0x35/0xb0
 hfi1_create_port_files+0x6e/0x3c0 [hfi1]
 ib_setup_port_attrs+0x43b/0x560 [ib_core]
 add_one_compat_dev+0x16a/0x230 [ib_core]
 rdma_dev_init_net+0x110/0x160 [ib_core]
 ops_init+0x38/0xf0
 setup_net+0xcf/0x1e0
 copy_net_ns+0xb7/0x130
 create_new_namespaces+0x11a/0x1b0
 unshare_nsproxy_namespaces+0x55/0xa0
 ksys_unshare+0x1a7/0x340
 __x64_sys_unshare+0xe/0x20
 do_syscall_64+0x5b/0x180
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 5417783eab ("RDMA/core: Support core port attributes in non init_net")
Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:41:23 -03:00
Artemy Kovalyov
1a418f7764 IB/core: Set qp->real_qp before it may be accessed
real_qp should be initialized before ib_destroy_qp() is called.
ib_destroy_qp() may be called in the error flow if ib_create_qp_security()
failed.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:17:45 -03:00
Jack Morgenstein
8f4426aa19 IB/mlx5: Add missing XRC options to QP optional params mask
The QP transition optional parameters for the various transition for XRC
QPs are identical to those for RC QPs.

Many of the XRC QP transition optional parameter bits are missing from the
QP optional mask table.  These omissions caused failures when doing XRC QP
state transitions.

For example, when trying to change the response timer of an XRC receive QP
via the RTS2RTS transition, the new timer value was ignored because
MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask
for XRC qps for the RTS2RTS transition.

Fix this by adding the missing XRC optional parameters for all QP
transitions to the opt_mask table.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Fixes: a4774e9095 ("IB/mlx5: Fix opt param mask according to firmware spec")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:15:13 -03:00
Shamir Rabinovitch
4f33dd41b2 RDMA/uverbs: Initialize uverbs_attr_bundle ucontext in ib_uverbs_get_context
ib_uverbs_get_context does not have a uobject so it does not call the
rdma_lookup_get_uobject which is used to set up the uverbs_attr_bundle
ucontext. For ib_uverbs_get_context we need to set up this manually before
we send the uverbs_attr_bundle down to the driver layer.

This completes the change that was done in commit 70f06b26f0 ("IB:
ucontext should be set properly for all cmd & ioctl paths")

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:09:25 -03:00
Gal Pressman
f89adedaf3 RDMA/uverbs: Initialize udata struct on destroy flows
Cited commit introduced the udata parameter to different destroy flows
but the uapi method definition does not have udata (i.e has_udata flag
is not set). As a result, an uninitialized udata struct is being passed
down to the driver callbacks.

Fix that by clearing the driver udata even in cases where has_udata flag
is not set.

Fixes: c4367a2635 ("IB: Pass uverbs_attr_bundle down ib_x destroy path")
Cc: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Co-developed-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-02 17:07:02 -03:00
Shiraz Saleem
7872168a83 RDMA/umem: Handle page combining avoidance correctly in ib_umem_add_sg_table()
The flag update_cur_sg tracks whether contiguous pages from a new set of
page_list pages can be merged into the SGE passed into
ib_umem_add_sg_table(). If this flag is true, but the total segment length
exceeds the max_seg_size supported by HW, we avoid combining to this SGE
and move to a new SGE (x) and merge 'len' pages to it. However, if i <
npages, the next iteration can incorrectly merge 'len' contiguous pages
into x instead of into a new SGE since update_cur_sg is still true.

Reset update_cur_sg to false always after the check to merge pages into
the first SGE passed in to ib_umem_add_sg_table().  Also, prevent a new
SGE's segment length from ever exceeding HW max_seg_sz.

There is a crash on hfi1 as result of this where-in max_seg_sz is
defaulting to 64K. Due to above bug, unfolding SGE's in __ib_umem_release
points to a bad page ptr.

 TEST comp-wfr.perfnative.STL-22166-WDT _ perftest native 2-Write_4097QP_4MB STARTING at 1555387093
 BUG: Bad page state in process ib_write_bw  pfn:7ebca0
 page:ffffcd675faf2800 count:0 mapcount:1 mapping:0000000000000000 index:0x1
 flags: 0x17ffffc0000000()
 raw: 0017ffffc0000000 dead000000000100 dead000000000200 0000000000000000
 raw: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
 page dumped because: nonzero mapcount
 CPU: 18 PID: 15853 Comm: ib_write_bw Tainted: G    B             5.1.0-rc4 #1
 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
 Call Trace:
  dump_stack+0x5a/0x73
  bad_page+0xf5/0x10f
  free_pcppages_bulk+0x62c/0x680
  free_unref_page+0x54/0x70
  __ib_umem_release+0x148/0x1a0 [ib_uverbs]
  ib_umem_release+0x22/0x80 [ib_uverbs]
  rvt_dereg_mr+0x67/0xb0 [rdmavt]
  ib_dereg_mr_user+0x37/0x60 [ib_core]
  destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
  uobj_destroy+0x4d/0x60 [ib_uverbs]
  __uobj_get_destroy+0x33/0x50 [ib_uverbs]
  __uobj_perform_destroy+0xa/0x30 [ib_uverbs]
  ib_uverbs_dereg_mr+0x66/0x90 [ib_uverbs]
  ib_uverbs_write+0x3e1/0x500 [ib_uverbs]
  vfs_write+0xad/0x1b0
  ksys_write+0x5a/0xd0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-02 16:50:12 -03:00
Gal Pressman
923abb9d79 RDMA/core: Introduce RDMA subsystem ibdev_* print functions
Similarly to dev/netdev/etc printk helpers, add standard printk helpers
for the RDMA subsystem.

Example output:
efa 0000:00:06.0 efa_0: Hello World!
efa_0: Hello World! (no parent device set)
(NULL ib_device): Hello World! (ibdev is NULL)

Cc: Jason Baron <jbaron@akamai.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-01 12:29:28 -04:00
Matthew Wilcox
b9b0f34531 uverbs: Convert idr to XArray
The word 'idr' is scattered throughout the API, so I haven't changed it,
but the 'idr' variable is now an XArray.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 12:27:11 -03:00
Matthew Wilcox
a7b36d5fa8 ib/bnxt: Remove mention of idr_alloc from comment
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 12:02:18 -03:00
Jason Gunthorpe
1d045aa76f Merge branch 'mlx5_tir_icm' into rdma.git for-next
Ariel Levkovich says:

====================
The series exposes the ICM address of the receive transport
interface (TIR) of Raw Packet and RSS QPs to the user since they are
required to properly create and insert steering rules that direct flows to
these QPs.
====================

For dependencies this branch is based on mlx5-next from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* branch 'mlx5_tir_icm':
  IB/mlx5: Expose TIR ICM address to user space
  net/mlx5: Introduce new TIR creation core API
  net/mlx5: Expose TIR ICM address in command outbox

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 10:33:00 -03:00
Ariel Levkovich
1f1d6abbf0 IB/mlx5: Expose TIR ICM address to user space
This patch exposes the TIR ICM address of raw packet and RSS
QPs to user space.

In order to pass the new field, the patch extends the mlx5
specific QP creation response structure and fills it with
the icm address returned by the FW command, if available.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 10:03:19 -03:00
Ariel Levkovich
96780e4f46 net/mlx5: Introduce new TIR creation core API
Introducing new TIR creation core API which allows caller
to receive back from the call the full command outbox.

This comes as a preparation for the next patch that will
retrieve the TIR ICM address from the command outbox.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-24 12:33:37 -07:00
Ariel Levkovich
3e07047021 net/mlx5: Expose TIR ICM address in command outbox
Adding the TIR ICM address to the create_tir command outbox
through which the device reports the ICM address of the newly
created TIR.

The TIR address can be used for direct attachment to a steering
rule in SW managed steering mode.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-24 12:33:37 -07:00
Ariel Levkovich
9fba2b9b4f net/mlx5: Expose SW ICM related device memory capabilities
Add SW ICM related fields to the device memory capabilities
structure and sw ownership capability in flow table properties.

The currently supported SW ICM types are steering and header modify
and the changes exposes the device memory capabilities for each
of these two types.

SW ICM memory can be allocated by SW and then be accessed by RDMA
operations for direct management of the HW packet handling tables.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-24 12:33:36 -07:00
Jason Gunthorpe
449a224c10 Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:

====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
 * BAR pages intended for read-only can be switched to writable via mprotect.
 * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
 * Disassociate causes SIGBUS when touching the pages.
 * CPU pages are being mapped through to the process via remap_pfn_range
   instead of the more appropriate vm_insert_page, causing weird behaviors
   during disassociation.

This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================

For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

* branch 'rdma_mmap':
  RDMA: Remove rdma_user_mmap_page
  RDMA/mlx5: Use get_zeroed_page() for clock_info
  RDMA/ucontext: Fix regression with disassociate
  RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
  RDMA/mlx5: Do not allow the user to write to the clock page

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:20:34 -03:00
Jason Gunthorpe
4eb6ab13b9 RDMA: Remove rdma_user_mmap_page
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:18:36 -03:00
Jason Gunthorpe
ddcdc368b1 RDMA/mlx5: Use get_zeroed_page() for clock_info
get_zeroed_page() returns a virtual address for the page which is better
than allocating a struct page and doing a permanent kmap on it.

Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 13:40:50 -03:00
Jason Gunthorpe
67f269b37f RDMA/ucontext: Fix regression with disassociate
When this code was consolidated the intention was that the VMA would
become backed by anonymous zero pages after the zap_vma_pte - however this
very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
bits to transform the VMA into an anonymous VMA. Since the vm_ops was
removed this broke.

Now userspace gets a SIGBUS if it touches the vma after disassociation.

Instead of converting the VMA to anonymous provide a fault handler that
puts a zero'd page into the VMA when user-space touches it after
disassociation.

Cc: stable@vger.kernel.org
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 13:32:25 -03:00
Jason Gunthorpe
d5e560d3f7 RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.

Cc: stable@vger.kernel.org
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:40 -03:00
Jason Gunthorpe
c660133c33 RDMA/mlx5: Do not allow the user to write to the clock page
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.

Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1 ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:24 -03:00
John Fleck
3c176c9d72 IB/hfi1: Remove reference to RHF.VCRCErr
The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:11 -03:00
Mike Marciniszyn
a9c62e0078 IB/hfi1: Add selected Rcv counters
These counters are required for error analysis and debug.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:10 -03:00
Mike Marciniszyn
d40f69c9b9 IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts
The reference count adjustments on reference count completion
are open coded throughout.

Add a routine to do all reference count adjustments and use.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn
52cdbcc2b1 IB/rdmavt: Use more efficient allowed_ops
QP creation already records the allowed_ops.

Take advantage of that single field to replace multiple qp_type
specific tests.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn
715ab1a862 IB/rdmavt: Fix ab/ba include issues
The currently include file ordering for rdmavt headers has an
ab/ba include issue the precludes using inlines from rdma_vt.h
in rdmavt_qp.h.

At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h.

Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h
and move qp related inlines to rdmavt_qp.h.

Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared
by rdmavt_cq.h and rdmavt_qp.h.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn
62644c1d2b IB/hfi1: Make opfn.h self sufficient
The opfn.h include file build-ablility depends on the including file
having the correct includes.

Fix by making opfn.h self sufficient.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Kaike Wan
ea752bc5e5 IB/{rdmavt, hfi1): Miscellaneous comment fixes
This patch fixes miscellaneous comment errors.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:48 -03:00
Josh Collier
07c5ba9124 IB/hfi1: Add debugfs to control expansion ROM write protect
Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple
handles to PCI resource0. In order to continue to support expansion ROM
updates while the driver is loaded, the driver must now provide an
interface to control the expansion ROM write protection.

This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom
user tool to disable the expansion ROM write protection by opening the
file and writing a '1'.  The write protection is released when writing a
'0' or automatically re-enabled when the file handle is closed.  The
current implementation will only allow one handle to be opened at a time
across all hfi1 devices.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:26:41 -03:00
Leon Romanovsky
5742582222 RDMA/hns: Remove asynchronic QP destroy
Verbs destroy callbacks are synchronous operations and can't be delayed.
The expectation is that after driver returned from destroy function, the
memory can be freed and user won't be able to access it again.

Ditch workqueue implementation used in HNS driver.

Fixes: d838c481e0 ("IB/hns: Fix the bug when destroy qp")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: oulijun <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:55:31 -03:00
Parav Pandit
5d7ed2f27b RDMA/cma: Consider scope_id while binding to ipv6 ll address
When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.

listener-1: addr=lla, scope_id=A, port=X
listener-2: addr=lla, scope_id=B, port=X

However while comparing the addresses only addr and port are considered,
due to which 2nd listener fails to listen.

In below example of two listeners, 2nd listener is failing with address in
use error.

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545&

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545
rdma_bind_addr: Address already in use

To overcome this, consider the scope_ids as well which forms the accurate
IPv6 link local address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:39:16 -03:00
Parav Pandit
823b23da71 IB/core: Allow vlan link local address based RoCE GIDs
IPv6 link local address for a VLAN netdevice has nothing to do with its
resemblance with the default GID, because VLAN link local GID is in
different layer 2 domain.

Now that RoCE MAD packet processing and route resolution consider the
right GID index, there is no need for an unnecessary check which prevents
the addition of vlan based IPv6 link local GIDs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:08:15 -03:00
Saeed Mahameed
c3bdd5e651 Linux 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
 pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
 H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
 wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
 jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
 4+dwGBk=
 =pyXR
 -----END PGP SIGNATURE-----

Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next

Linux 5.1-rc1

We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-22 15:25:39 -07:00
Mark Bloch
5fb58c9e2f RDMA/mlx5: Don't create IB representors when in multiport RoCE mode
Switchdev mode and mutiport RoCE mode aren't compatible at this point.
Don't create IB reps when a user switches to switchdev mode and the driver
operates in that mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
d3b5cc1cd9 RDMA/mlx5: Initialize roce port info before multiport master init
When working in mutliport RoCE mode it is possible to attach a slave
before the master. In that case the slave is waiting for a master to be
attached.  When the master is attached it goes over the list of waiting
slaves, finds a slave that is compatible and tries to bind it to itself.

The call stack is:
mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port()

In the bind function we will create a netdev notifier, but this is done
before we initialize the RoCE structure (this is done at a later stage by
the master in the ROCE stage).

Once events are delivered to that notifier we will use
mlx5_ib_get_native_port_mdev() to get the actual port and as the native
port is zero we will access an invalid index in the port structure.

Move the RoCE structure initialization to an earlier stage.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
7f575103b0 RDMA/mlx5: Allow DEVX and raw creation flow on reps
Remove the limitations that were in place and provide support for DEVX and
raw flow creation on reps.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00