Commit Graph

2239 Commits

Author SHA1 Message Date
Gal Pressman
923abb9d79 RDMA/core: Introduce RDMA subsystem ibdev_* print functions
Similarly to dev/netdev/etc printk helpers, add standard printk helpers
for the RDMA subsystem.

Example output:
efa 0000:00:06.0 efa_0: Hello World!
efa_0: Hello World! (no parent device set)
(NULL ib_device): Hello World! (ibdev is NULL)

Cc: Jason Baron <jbaron@akamai.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-01 12:29:28 -04:00
Matthew Wilcox
b9b0f34531 uverbs: Convert idr to XArray
The word 'idr' is scattered throughout the API, so I haven't changed it,
but the 'idr' variable is now an XArray.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 12:27:11 -03:00
Jason Gunthorpe
449a224c10 Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:

====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
 * BAR pages intended for read-only can be switched to writable via mprotect.
 * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
 * Disassociate causes SIGBUS when touching the pages.
 * CPU pages are being mapped through to the process via remap_pfn_range
   instead of the more appropriate vm_insert_page, causing weird behaviors
   during disassociation.

This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================

For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

* branch 'rdma_mmap':
  RDMA: Remove rdma_user_mmap_page
  RDMA/mlx5: Use get_zeroed_page() for clock_info
  RDMA/ucontext: Fix regression with disassociate
  RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
  RDMA/mlx5: Do not allow the user to write to the clock page

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:20:34 -03:00
Jason Gunthorpe
4eb6ab13b9 RDMA: Remove rdma_user_mmap_page
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:18:36 -03:00
Jason Gunthorpe
67f269b37f RDMA/ucontext: Fix regression with disassociate
When this code was consolidated the intention was that the VMA would
become backed by anonymous zero pages after the zap_vma_pte - however this
very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
bits to transform the VMA into an anonymous VMA. Since the vm_ops was
removed this broke.

Now userspace gets a SIGBUS if it touches the vma after disassociation.

Instead of converting the VMA to anonymous provide a fault handler that
puts a zero'd page into the VMA when user-space touches it after
disassociation.

Cc: stable@vger.kernel.org
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 13:32:25 -03:00
Parav Pandit
5d7ed2f27b RDMA/cma: Consider scope_id while binding to ipv6 ll address
When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.

listener-1: addr=lla, scope_id=A, port=X
listener-2: addr=lla, scope_id=B, port=X

However while comparing the addresses only addr and port are considered,
due to which 2nd listener fails to listen.

In below example of two listeners, 2nd listener is failing with address in
use error.

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545&

$ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545
rdma_bind_addr: Address already in use

To overcome this, consider the scope_ids as well which forms the accurate
IPv6 link local address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:39:16 -03:00
Parav Pandit
823b23da71 IB/core: Allow vlan link local address based RoCE GIDs
IPv6 link local address for a VLAN netdevice has nothing to do with its
resemblance with the default GID, because VLAN link local GID is in
different layer 2 domain.

Now that RoCE MAD packet processing and route resolution consider the
right GID index, there is no need for an unnecessary check which prevents
the addition of vlan based IPv6 link local GIDs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 10:08:15 -03:00
Parav Pandit
2e5b8a0116 RDMA/core: Add a netlink command to change net namespace of rdma device
Provide an option to change the net namespace of a rdma device through a
netlink command. When multiple rdma devices exists in a system, and when
containers are used, this will limit rdma device visibility to a specified
net namespace.

An example command to change net namespace of mlx5_1 device to the
previously created net namespace 'foo' is:

$ ip netns add foo
$ rdma dev set mlx5_1 netns foo

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 14:44:58 -03:00
Parav Pandit
decbc7a6b0 RDMA/core: Introduce a helper function to change net namespace of rdma device
Introduce a helper function that changes rdma device's net namespace which
performs mini disable/enable sequence to have device visible only in
assigned net namespace.

Device unregistration, device rename and device change net namespace
may be invoked concurrently.

(a) device unregistration needs to wait if a device change (rename or net
    namespace change) operation is in progress.
(b) device net namespace change should not proceed if the unregistration
    has started.
(c) while one cpu is changing device net namespace, other cpu should not
    be able to rename or change net namespace.

To address above concurrency,
(a) Use unreg_mutex to synchronize between ib_unregister_device() and net
    namespace change operation
(b) In cases where unregister_device() has started unregistration before
    change_netns got chance to acquire unreg_mutex, validate the refcount
    - if it dropped to zero, abort the net namespace change operation.

Finally use the helper function to change net namespace of ib device to
move the device back to init_net when such net is deleted.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 14:44:58 -03:00
Parav Pandit
3042492bd1 RDMA/core: Avoid freeing netdevs in disable_device()
So we can use the disable_device() helper while changing the net namespace
of the rdma device in a subsequent patch, move free_netdevs() out of it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 14:44:58 -03:00
Shiraz Saleem
d0b5c01bb4 RDMA/umem: Use correct value for SG entries in sg_copy_to_buffer()
With page combining, the assumption that number of SG entries in umem SGL
equal to number of system pages in umem no longer holds.

umem->sg_nents tracks the SG entries in umem SGL. Use it in
sg_pcopy_to_buffer() as opposed to ib_umem_num_pages(umem).

Fixes: d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
68e326dea1 RDMA: Handle SRQ allocations by IB/core
Convert SRQ allocation from drivers to be in the IB/core

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
d345691471 RDMA: Handle AH allocations by IB/core
Simplify drivers by ensuring lifetime of ib_ah object. The changes
in .create_ah() go hand in hand with relevant update in .destroy_ah().

We will use this opportunity and convert .destroy_ah() to don't fail, as
it was suggested a long time ago, because there is nothing to do in case
of failure during destroy.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Jason Gunthorpe
feec576a6a IB: When attrs.udata/ufile is available use that instead of uobject
The ucontext and ufile should not be accessed via the uobject, all these
cases have an attrs so use that instead.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
9e886b39a7 RDMA/nldev: Return device protocol
Add new RDMA_NLDEV_ATTR_DEV_PROTOCOL attribute to give ability for UDEV
rules create IB device stable names based on link type protocol.  The
assumption that devices like mlx4 with duality in their link type under
one IB device struct won't be allowed in the future.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
c87e65cfb9 RDMA/cm: Move debug counters to be under relevant IB device
The sysfs layout is created by CM incorrectly presented RDMA devices with
InfiniBand link layer. Layout of such devices represents device tree of
connections. By moving CM statistics to be under relevant port of IB
device, we will fix the following issues:

 * Symlink name - It used device name instead of specific identifier.
 * Target location - It was supposed to point to PCI-ID/infiniband_cm/
   instead of PCI-ID/infiniband/
 * Target name - It created extra device file under already existing
   device folder, e.g. mlx5_0/mlx5_0
 * Crash during boot with RDMA persistent naming patches.

 sysfs: cannot create duplicate filename '/class/infiniband_cm/mlx5_0'
 CPU: 29 PID: 433 Comm: modprobe Not tainted 5.0.0-rc5+ #178
 Call Trace:
  dump_stack+0xcc/0x180
  sysfs_warn_dup.cold.3+0x17/0x2d
  sysfs_do_create_link_sd.isra.2+0xd0/0xf0
  device_add+0x7cb/0x1450
  device_create_groups_vargs+0x1ae/0x220
  device_create+0x93/0xc0
  cm_add_one+0x38f/0xf60 [ib_cm]
  add_client_context+0x167/0x210 [ib_core]
  enable_device_and_get+0x230/0x3f0 [ib_core]
  ib_register_device+0x823/0xbf0 [ib_core]
  __mlx5_ib_add+0x45/0x150 [mlx5_ib]
  mlx5_ib_add+0x1b3/0x5e0 [mlx5_ib]
  mlx5_add_device+0x130/0x3a0 [mlx5_core]
  mlx5_register_interface+0x1a9/0x270 [mlx5_core]
  do_one_initcall+0x14f/0x5de
  do_init_module+0x247/0x7c0
  load_module+0x4c2f/0x60d0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

After this change:
[leonro@server ~]$ ls -al /sys/class/infiniband/ibp0s12f0/ports/1/
drwxr-xr-x  2 root root    0 Mar 11 11:17 cm_rx_duplicates
drwxr-xr-x  2 root root    0 Mar 11 11:17 cm_rx_msgs
drwxr-xr-x  2 root root    0 Mar 11 11:17 cm_tx_msgs
drwxr-xr-x  2 root root    0 Mar 11 11:17 cm_tx_retries

Fixes: 110cf374a8 ("infiniband: make cm_device use a struct device and not a kobject.")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:24 -03:00
Shiraz Saleem
d10bcf947a RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs
Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.

Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.

Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.

Move npages tracking to ib_umem_odp as ODP drivers still need it.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:24 -03:00
Leon Romanovsky
c7252a6532 RDMA/cm: Remove useless zeroing of static global variable
Static global variables are initialized to zero by C standard,
there is no need to zero them again.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-04 08:29:04 -03:00
Leon Romanovsky
061ccb52d2 RDMA/cma: Set proper port number as index
Conversion from IDR to XArray missed the fact that idr_alloc() returned
index as a return value, this index was saved in port variable and used as
query index later on. This caused to the following error.

 BUG: KASAN: use-after-free in cma_check_port+0x86a/0xa20 [rdma_cm]
 Read of size 8 at addr ffff888069fde998 by task ucmatose/387
 CPU: 3 PID: 387 Comm: ucmatose Not tainted 5.1.0-rc2+ #253
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
 Call Trace:
  dump_stack+0x7c/0xc0
  print_address_description+0x6c/0x23c
  ? cma_check_port+0x86a/0xa20 [rdma_cm]
  kasan_report.cold.3+0x1c/0x35
  ? cma_check_port+0x86a/0xa20 [rdma_cm]
  ? cma_check_port+0x86a/0xa20 [rdma_cm]
  cma_check_port+0x86a/0xa20 [rdma_cm]
  rdma_bind_addr+0x11bc/0x1b00 [rdma_cm]
  ? find_held_lock+0x33/0x1c0
  ? cma_ndev_work_handler+0x180/0x180 [rdma_cm]
  ? wait_for_completion+0x3d0/0x3d0
  ucma_bind+0x120/0x160 [rdma_ucm]
  ? ucma_resolve_addr+0x1a0/0x1a0 [rdma_ucm]
  ucma_write+0x1f8/0x2b0 [rdma_ucm]
  ? ucma_open+0x260/0x260 [rdma_ucm]
  vfs_write+0x157/0x460
  ksys_write+0xb8/0x170
  ? __ia32_sys_read+0xb0/0xb0
  ? trace_hardirqs_off_caller+0x5b/0x160
  ? do_syscall_64+0x18/0x3c0
  do_syscall_64+0x95/0x3c0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

  Allocated by task 381:
   __kasan_kmalloc.constprop.5+0xc1/0xd0
   cma_alloc_port+0x4d/0x160 [rdma_cm]
   rdma_bind_addr+0x14e7/0x1b00 [rdma_cm]
   ucma_bind+0x120/0x160 [rdma_ucm]
   ucma_write+0x1f8/0x2b0 [rdma_ucm]
   vfs_write+0x157/0x460
   ksys_write+0xb8/0x170
   do_syscall_64+0x95/0x3c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

  Freed by task 381:
   __kasan_slab_free+0x12e/0x180
   kfree+0xed/0x290
   rdma_destroy_id+0x6b6/0x9e0 [rdma_cm]
   ucma_close+0x110/0x300 [rdma_ucm]
   __fput+0x25a/0x740
   task_work_run+0x10e/0x190
   do_exit+0x85e/0x29e0
   do_group_exit+0xf0/0x2e0
   get_signal+0x2e0/0x17e0
   do_signal+0x94/0x1570
   exit_to_usermode_loop+0xfa/0x130
   do_syscall_64+0x327/0x3c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: <syzbot+2e3e485d5697ea610460@syzkaller.appspotmail.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Fixes: 638267537a ("cma: Convert portspace IDRs to XArray")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:20:32 -03:00
Shamir Rabinovitch
ff23dfa134 IB: Pass only ib_udata in function prototypes
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.

Make ib_udata the only argument psssed.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 15:00:47 -03:00
Shamir Rabinovitch
bdeacabd1a IB: Remove 'uobject->context' dependency in object destroy APIs
Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:59:35 -03:00
Shamir Rabinovitch
c4367a2635 IB: Pass uverbs_attr_bundle down ib_x destroy path
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:57:35 -03:00
Shamir Rabinovitch
a6a3797df2 IB: Pass uverbs_attr_bundle down uobject destroy path
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:36 -03:00
Shamir Rabinovitch
70f06b26f0 IB: ucontext should be set properly for all cmd & ioctl paths
the Attempt to use the below commit to initialize the ucontext for the
uobject destroy path has shown that the below commit is incomplete.

Parts were reverted and the ucontext set up in the uverbs_attr_bundle was
moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX
macros and rdma_alloc_begin_uobject which is called when uobject is
created.

Fixes: 3d9dfd0603 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:03 -03:00
Parav Pandit
2b34c55802 RDMA/core: Add command to set ib_core device net namspace sharing mode
Add netlink command that enables/disables sharing rdma device among
multiple net namespaces.

Using rdma tool,
$rdma sys set netns shared (default mode)

When rdma subsystem netns mode is set to shared mode, rdma devices
will be accessible in all net namespaces.

Using rdma tool,
$rdma sys set netns exclusive

When rdma subsystem netns mode is set to exclusive mode, devices
will be accessible in only one net namespace at any given
point of time.

If there are any net namespaces other than default init_net exists,
while executing this command, it will fail and mode cannot be changed.

To change this mode, netlink command is used instead of sysctl, because
netlink command allows to auto load a module.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
cb7e0e1305 RDMA/core: Add interface to read device namespace sharing mode
Add an interface via netlink command to query whether rdma devices are
shared among multiple net namespaces or not. When using RDMAtool, it can
be queried as,

$rdma system show netns
netns shared

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
37eeab55ae RDMA/core: Extend ib_device_get_by_index for net namespace
Extend ib_device_get_by_index() API to check device access for
net namespace for serving netlink commands.

Also enforce net ns check on dumpit commands which iterate over all
registered rdma devices and which don't call ib_device_get_by_index().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
41c6140189 RDMA: Check net namespace access for uverbs, umad, cma and nldev
Introduce an API rdma_dev_access_netns() to check whether a rdma device
can be accessed from the specified net namespace or not.
Use rdma_dev_access_netns() while opening character uverbs, umad network
device and also check while rdma cm_id binds to rdma device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
a56bc45b27 RDMA/core: Add module param to disable device sharing among net ns
Add module parameter to change a sharing mode of ib_core early in the
boot process. This parameter helps to those systems where modern up
to date rdma tool (iproute2) package may not be available during
kernel upgrade cycle.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
5417783eab RDMA/core: Support core port attributes in non init_net
Now that sysfs compatibility layer for non init_net exists, add core port
attributes such as pkey and gid table to non init_net ns.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
4e0f7b9070 RDMA/core: Implement compat device/sysfs tree in net namespace
Implement compatibility layer sysfs entries of ib_core so that non
init_net net namespaces can also discover rdma devices.

Each non init_net net namespace has ib_core_device created in it.
Such ib_core_device sysfs tree resembles rdma devices found in
init_net namespace.

This allows discovering rdma devices in multiple non init_net net
namespaces via sysfs entries and helpful to rdma-core userspace.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
62dfa7955e RDMA/core: Restrict sysfs entries view to init_net
This is a preparation patch to provide isolation of rdma device in a
network namespace.

As first step, make rdma device visible only in init net namespace.
Subsequent patch will enable rdma device visibility back in multiple net
namespaces using compat ib_core_device device/sysfs tree.

Given that the IB subsystem depends on net stack, it needs to be
initialized after netdev and since it support devices, it needs to be
initialized before the device subsystem; therefore, change initcall
sequence to fs_initcall, so that when ib_core is compiled in the kernel
image, the right init sequence is followed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Parav Pandit
cebe556bd7 RDMA/core: Introduce ib_core_device to hold device
In order to support sysfs entries in multiple net namespaces for a rdma
device, introduce a ib_core_device whose scope is limited to hold core
device and per port sysfs related entries.

This is preparation patch so that multiple ib_core_devices in each net
namespace can be created in subsequent patch who all can share ib_device.

(a) Move sysfs specific fields to ib_core_device.
(b) Make sysfs and device life cycle related routines to work on
    ib_core_device.
(c) Introduce and use rdma_init_coredev() helper to initialize
    coredev fields.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 14:52:02 -03:00
Bart Van Assche
0080aed4e4 RDMA/uverbs: Allow the compiler to verify declaration and definition consistency
This patch avoids that sparse reports the following warnings:

drivers/infiniband/core/uverbs_std_types_flow_action.c:442:30: warning: symbol 'uverbs_def_obj_flow_action' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_dm.c:112:30: warning: symbol 'uverbs_def_obj_dm' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_counters.c:153:30: warning: symbol 'uverbs_def_obj_counters' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_mr.c:213:30: warning: symbol 'uverbs_def_obj_mr' was not declared. Should it be static?

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 0bd01f3d09 ("RDMA/uverbs: Require all objects to have a driver destroy function")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 10:22:48 -03:00
Bart Van Assche
2dcdebff5e RDMA/uverbs: Annotate uverbs_request_next_ptr() return value as a __user pointer
This patch avoids that sparse complains about a mismatch between the
returned value and the function return type.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: c3bea3d2dc ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 10:22:48 -03:00
Bart Van Assche
259e66bcdf RDMA/uverbs: Add a __user annotation to a pointer
This patch avoids that sparse and smatch report the following:

  warning: cast removes address space of expression

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 3a6532c9af ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 10:22:48 -03:00
Ira Weiny
2ccfbb70c2 IB/MAD: Add SMP details to MAD tracing
Decode more information from the packet and include it in the trace.

Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:52:01 -03:00
Ira Weiny
056533192a IB/UMAD: Add umad trace points
Trace MADs going to/from user space.

Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:52:01 -03:00
Ira Weiny
0e65bae205 IB/MAD: Add agent trace points
Trace agent details when agents are [un]registered.  In addition, report
agent details on send/recv.

Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:52:00 -03:00
Ira Weiny
821bf1de45 IB/MAD: Add recv path trace point
Trace received MAD details.

Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:52:00 -03:00
Ira Weiny
4d60cad5db IB/MAD: Add send path trace points
Use the standard Linux trace mechanism to trace MADs being sent.  4 trace
points are added, when the MAD is posted to the qp, when the MAD is
completed, if a MAD is resent, and when the MAD completes in error.

Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:52:00 -03:00
Ira Weiny
4ae2744410 IB/core: Ensure an invalidate_range callback on ODP MR
No device supports ODP MR without an invalidate_range callback.

Warn on any any device which attempts to support ODP without supplying
this callback.

Then we can remove the checks for the callback within the code.

This stems from the discussion

https://www.spinics.net/lists/linux-rdma/msg76460.html

...which concluded this code was no longer necessary.

Acked-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 16:39:40 -03:00
Matthew Wilcox
638267537a cma: Convert portspace IDRs to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 12:00:15 -03:00
Matthew Wilcox
81cc440883 ucm: Convert ctx_id_table to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 11:50:29 -03:00
Matthew Wilcox
8e5a9d61e2 ib core: Convert query_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 11:47:05 -03:00
Matthew Wilcox
ae78ff3a0f RDMA/cm: Convert local_id_table to XArray
Also introduce cm_local_id() to reduce the amount of boilerplate when
converting a local ID to an XArray index.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 11:44:22 -03:00
Matthew Wilcox
949a237046 IB/mad: Convert ib_mad_clients to XArray
Pull the allocation function out into its own function to reduce the
length of ib_register_mad_agent() a little and keep all the allocation
logic together.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 11:36:20 -03:00
Erez Alfasi
19b1a294b0 RDMA: Use __packed annotation instead of __attribute__ ((packed))
"__attribute__" set of macros has been standardized, have became more
potentially portable and consistent code back in v2.6.21 by commit
82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros").
Moreover, nowadays checkpatch.pl warns about using __attribute__((packed))
instead of __packed.

This patch converts all the "__attribute__ ((packed))" annotations to
"__packed" within the RDMA subsystem.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25 21:14:12 -03:00
Linus Torvalds
ea295481b6 XArray updates for 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAlyHF2oUHHdpbGx5QGlu
 ZnJhZGVhZC5vcmcACgkQDpNsjXcpgj5j9AgAlpeptRfnPO0+VXj+EbxaOOI8tOG+
 w+vBasWoQB+lZ9ctf1qUQVSeLn0ErxTM7BaIP7plfDrEWiIbRWkV18B+heS5d1Yz
 aTV1d/8tG6/eo61K2VqXHbUhymgMtbXDsg1rwWTF8+Q4xIcMqfYAR0f9ptU1Oejc
 pNAn16dYgKi6+4eluY7gXxruBosQ6yNml6iEje9A3uR8nhzTI/P3Yf2GGIZnQLsL
 +UIx4Ps38dJ3VCYBPfbnszZfYPpILUH9/Bdx+mAMUtZwvpM3JYqc8XsiFfqDO7n1
 3003yUytnRkb1UK3QIvkbPt0G8UOI4s9fxRPsA8lLSww/f2y1r5kC4Mxbg==
 =HSP/
 -----END PGP SIGNATURE-----

Merge tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax

Pull XArray updates from Matthew Wilcox:
 "This pull request changes the xa_alloc() API. I'm only aware of one
  subsystem that has started trying to use it, and we agree on the fixup
  as part of the merge.

  The xa_insert() error code also changed to match xa_alloc() (EEXIST to
  EBUSY), and I added xa_alloc_cyclic(). Beyond that, the usual
  bugfixes, optimisations and tweaking.

  I now have a git tree with all users of the radix tree and IDR
  converted over to the XArray that I'll be feeding to maintainers over
  the next few weeks"

* tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax:
  XArray: Fix xa_reserve for 2-byte aligned entries
  XArray: Fix xa_erase of 2-byte aligned entries
  XArray: Use xa_cmpxchg to implement xa_reserve
  XArray: Fix xa_release in allocating arrays
  XArray: Mark xa_insert and xa_reserve as must_check
  XArray: Add cyclic allocation
  XArray: Redesign xa_alloc API
  XArray: Add support for 1s-based allocation
  XArray: Change xa_insert to return -EBUSY
  XArray: Update xa_erase family descriptions
  XArray tests: RCU lock prohibits GFP_KERNEL
2019-03-11 20:06:18 -07:00
John Hubbard
0c507d8f84 RDMA/umem: Revert broken 'off by one' fix
The previous attempted bug fix overlooked the fact that
ib_umem_odp_map_dma_single_page() was doing a put_page() upon hitting an
error. So there was not really a bug there.

Therefore, this reverts the off-by-one change, but keeps the change to use
release_pages() in the error path.

Fixes: 75a3e6a3c1 ("RDMA/umem: minor bug fix in error handling path")
Suggested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-06 14:42:37 -04:00