The sysfs layout is created by CM incorrectly presented RDMA devices with
InfiniBand link layer. Layout of such devices represents device tree of
connections. By moving CM statistics to be under relevant port of IB
device, we will fix the following issues:
* Symlink name - It used device name instead of specific identifier.
* Target location - It was supposed to point to PCI-ID/infiniband_cm/
instead of PCI-ID/infiniband/
* Target name - It created extra device file under already existing
device folder, e.g. mlx5_0/mlx5_0
* Crash during boot with RDMA persistent naming patches.
sysfs: cannot create duplicate filename '/class/infiniband_cm/mlx5_0'
CPU: 29 PID: 433 Comm: modprobe Not tainted 5.0.0-rc5+ #178
Call Trace:
dump_stack+0xcc/0x180
sysfs_warn_dup.cold.3+0x17/0x2d
sysfs_do_create_link_sd.isra.2+0xd0/0xf0
device_add+0x7cb/0x1450
device_create_groups_vargs+0x1ae/0x220
device_create+0x93/0xc0
cm_add_one+0x38f/0xf60 [ib_cm]
add_client_context+0x167/0x210 [ib_core]
enable_device_and_get+0x230/0x3f0 [ib_core]
ib_register_device+0x823/0xbf0 [ib_core]
__mlx5_ib_add+0x45/0x150 [mlx5_ib]
mlx5_ib_add+0x1b3/0x5e0 [mlx5_ib]
mlx5_add_device+0x130/0x3a0 [mlx5_core]
mlx5_register_interface+0x1a9/0x270 [mlx5_core]
do_one_initcall+0x14f/0x5de
do_init_module+0x247/0x7c0
load_module+0x4c2f/0x60d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
After this change:
[leonro@server ~]$ ls -al /sys/class/infiniband/ibp0s12f0/ports/1/
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_duplicates
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_msgs
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_msgs
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_retries
Fixes: 110cf374a8 ("infiniband: make cm_device use a struct device and not a kobject.")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.
Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.
Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.
Move npages tracking to ib_umem_odp as ODP drivers still need it.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Static global variables are initialized to zero by C standard,
there is no need to zero them again.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Conversion from IDR to XArray missed the fact that idr_alloc() returned
index as a return value, this index was saved in port variable and used as
query index later on. This caused to the following error.
BUG: KASAN: use-after-free in cma_check_port+0x86a/0xa20 [rdma_cm]
Read of size 8 at addr ffff888069fde998 by task ucmatose/387
CPU: 3 PID: 387 Comm: ucmatose Not tainted 5.1.0-rc2+ #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x7c/0xc0
print_address_description+0x6c/0x23c
? cma_check_port+0x86a/0xa20 [rdma_cm]
kasan_report.cold.3+0x1c/0x35
? cma_check_port+0x86a/0xa20 [rdma_cm]
? cma_check_port+0x86a/0xa20 [rdma_cm]
cma_check_port+0x86a/0xa20 [rdma_cm]
rdma_bind_addr+0x11bc/0x1b00 [rdma_cm]
? find_held_lock+0x33/0x1c0
? cma_ndev_work_handler+0x180/0x180 [rdma_cm]
? wait_for_completion+0x3d0/0x3d0
ucma_bind+0x120/0x160 [rdma_ucm]
? ucma_resolve_addr+0x1a0/0x1a0 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
? ucma_open+0x260/0x260 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
? __ia32_sys_read+0xb0/0xb0
? trace_hardirqs_off_caller+0x5b/0x160
? do_syscall_64+0x18/0x3c0
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Allocated by task 381:
__kasan_kmalloc.constprop.5+0xc1/0xd0
cma_alloc_port+0x4d/0x160 [rdma_cm]
rdma_bind_addr+0x14e7/0x1b00 [rdma_cm]
ucma_bind+0x120/0x160 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 381:
__kasan_slab_free+0x12e/0x180
kfree+0xed/0x290
rdma_destroy_id+0x6b6/0x9e0 [rdma_cm]
ucma_close+0x110/0x300 [rdma_ucm]
__fput+0x25a/0x740
task_work_run+0x10e/0x190
do_exit+0x85e/0x29e0
do_group_exit+0xf0/0x2e0
get_signal+0x2e0/0x17e0
do_signal+0x94/0x1570
exit_to_usermode_loop+0xfa/0x130
do_syscall_64+0x327/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: <syzbot+2e3e485d5697ea610460@syzkaller.appspotmail.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Fixes: 638267537a ("cma: Convert portspace IDRs to XArray")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.
Make ib_udata the only argument psssed.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
the Attempt to use the below commit to initialize the ucontext for the
uobject destroy path has shown that the below commit is incomplete.
Parts were reverted and the ucontext set up in the uverbs_attr_bundle was
moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX
macros and rdma_alloc_begin_uobject which is called when uobject is
created.
Fixes: 3d9dfd0603 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add netlink command that enables/disables sharing rdma device among
multiple net namespaces.
Using rdma tool,
$rdma sys set netns shared (default mode)
When rdma subsystem netns mode is set to shared mode, rdma devices
will be accessible in all net namespaces.
Using rdma tool,
$rdma sys set netns exclusive
When rdma subsystem netns mode is set to exclusive mode, devices
will be accessible in only one net namespace at any given
point of time.
If there are any net namespaces other than default init_net exists,
while executing this command, it will fail and mode cannot be changed.
To change this mode, netlink command is used instead of sysctl, because
netlink command allows to auto load a module.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add an interface via netlink command to query whether rdma devices are
shared among multiple net namespaces or not. When using RDMAtool, it can
be queried as,
$rdma system show netns
netns shared
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Extend ib_device_get_by_index() API to check device access for
net namespace for serving netlink commands.
Also enforce net ns check on dumpit commands which iterate over all
registered rdma devices and which don't call ib_device_get_by_index().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce an API rdma_dev_access_netns() to check whether a rdma device
can be accessed from the specified net namespace or not.
Use rdma_dev_access_netns() while opening character uverbs, umad network
device and also check while rdma cm_id binds to rdma device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add module parameter to change a sharing mode of ib_core early in the
boot process. This parameter helps to those systems where modern up
to date rdma tool (iproute2) package may not be available during
kernel upgrade cycle.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that sysfs compatibility layer for non init_net exists, add core port
attributes such as pkey and gid table to non init_net ns.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Implement compatibility layer sysfs entries of ib_core so that non
init_net net namespaces can also discover rdma devices.
Each non init_net net namespace has ib_core_device created in it.
Such ib_core_device sysfs tree resembles rdma devices found in
init_net namespace.
This allows discovering rdma devices in multiple non init_net net
namespaces via sysfs entries and helpful to rdma-core userspace.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is a preparation patch to provide isolation of rdma device in a
network namespace.
As first step, make rdma device visible only in init net namespace.
Subsequent patch will enable rdma device visibility back in multiple net
namespaces using compat ib_core_device device/sysfs tree.
Given that the IB subsystem depends on net stack, it needs to be
initialized after netdev and since it support devices, it needs to be
initialized before the device subsystem; therefore, change initcall
sequence to fs_initcall, so that when ib_core is compiled in the kernel
image, the right init sequence is followed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to support sysfs entries in multiple net namespaces for a rdma
device, introduce a ib_core_device whose scope is limited to hold core
device and per port sysfs related entries.
This is preparation patch so that multiple ib_core_devices in each net
namespace can be created in subsequent patch who all can share ib_device.
(a) Move sysfs specific fields to ib_core_device.
(b) Make sysfs and device life cycle related routines to work on
ib_core_device.
(c) Introduce and use rdma_init_coredev() helper to initialize
coredev fields.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse reports the following warnings:
drivers/infiniband/core/uverbs_std_types_flow_action.c:442:30: warning: symbol 'uverbs_def_obj_flow_action' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_dm.c:112:30: warning: symbol 'uverbs_def_obj_dm' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_counters.c:153:30: warning: symbol 'uverbs_def_obj_counters' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_mr.c:213:30: warning: symbol 'uverbs_def_obj_mr' was not declared. Should it be static?
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 0bd01f3d09 ("RDMA/uverbs: Require all objects to have a driver destroy function")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse complains about a mismatch between the
returned value and the function return type.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: c3bea3d2dc ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse and smatch report the following:
warning: cast removes address space of expression
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 3a6532c9af ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Decode more information from the packet and include it in the trace.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Trace MADs going to/from user space.
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Trace agent details when agents are [un]registered. In addition, report
agent details on send/recv.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Trace received MAD details.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use the standard Linux trace mechanism to trace MADs being sent. 4 trace
points are added, when the MAD is posted to the qp, when the MAD is
completed, if a MAD is resent, and when the MAD completes in error.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
No device supports ODP MR without an invalidate_range callback.
Warn on any any device which attempts to support ODP without supplying
this callback.
Then we can remove the checks for the callback within the code.
This stems from the discussion
https://www.spinics.net/lists/linux-rdma/msg76460.html
...which concluded this code was no longer necessary.
Acked-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Also introduce cm_local_id() to reduce the amount of boilerplate when
converting a local ID to an XArray index.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull the allocation function out into its own function to reduce the
length of ib_register_mad_agent() a little and keep all the allocation
logic together.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
"__attribute__" set of macros has been standardized, have became more
potentially portable and consistent code back in v2.6.21 by commit
82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros").
Moreover, nowadays checkpatch.pl warns about using __attribute__((packed))
instead of __packed.
This patch converts all the "__attribute__ ((packed))" annotations to
"__packed" within the RDMA subsystem.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAlyHF2oUHHdpbGx5QGlu
ZnJhZGVhZC5vcmcACgkQDpNsjXcpgj5j9AgAlpeptRfnPO0+VXj+EbxaOOI8tOG+
w+vBasWoQB+lZ9ctf1qUQVSeLn0ErxTM7BaIP7plfDrEWiIbRWkV18B+heS5d1Yz
aTV1d/8tG6/eo61K2VqXHbUhymgMtbXDsg1rwWTF8+Q4xIcMqfYAR0f9ptU1Oejc
pNAn16dYgKi6+4eluY7gXxruBosQ6yNml6iEje9A3uR8nhzTI/P3Yf2GGIZnQLsL
+UIx4Ps38dJ3VCYBPfbnszZfYPpILUH9/Bdx+mAMUtZwvpM3JYqc8XsiFfqDO7n1
3003yUytnRkb1UK3QIvkbPt0G8UOI4s9fxRPsA8lLSww/f2y1r5kC4Mxbg==
=HSP/
-----END PGP SIGNATURE-----
Merge tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax
Pull XArray updates from Matthew Wilcox:
"This pull request changes the xa_alloc() API. I'm only aware of one
subsystem that has started trying to use it, and we agree on the fixup
as part of the merge.
The xa_insert() error code also changed to match xa_alloc() (EEXIST to
EBUSY), and I added xa_alloc_cyclic(). Beyond that, the usual
bugfixes, optimisations and tweaking.
I now have a git tree with all users of the radix tree and IDR
converted over to the XArray that I'll be feeding to maintainers over
the next few weeks"
* tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax:
XArray: Fix xa_reserve for 2-byte aligned entries
XArray: Fix xa_erase of 2-byte aligned entries
XArray: Use xa_cmpxchg to implement xa_reserve
XArray: Fix xa_release in allocating arrays
XArray: Mark xa_insert and xa_reserve as must_check
XArray: Add cyclic allocation
XArray: Redesign xa_alloc API
XArray: Add support for 1s-based allocation
XArray: Change xa_insert to return -EBUSY
XArray: Update xa_erase family descriptions
XArray tests: RCU lock prohibits GFP_KERNEL
The previous attempted bug fix overlooked the fact that
ib_umem_odp_map_dma_single_page() was doing a put_page() upon hitting an
error. So there was not really a bug there.
Therefore, this reverts the off-by-one change, but keeps the change to use
release_pages() in the error path.
Fixes: 75a3e6a3c1 ("RDMA/umem: minor bug fix in error handling path")
Suggested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
1. Bug fix: fix an off by one error in the code that cleans up if it fails
to dma-map a page, after having done a get_user_pages_remote() on a
range of pages.
2. Refinement: for that same cleanup code, release_pages() is better than
put_page() in a loop.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to call kfree(pd) because ib_dealloc_pd() internally
frees PD.
Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Following the PD conversion patch, do the same for ucontext allocations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first parameter of WARN_ONCE() is a condition, then following
parameters are the message. In this case, we left out the condition so it
will just print the ops->type string.
Fixes: 3856ec4b93 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It is possible that during a page fault handling, the process that owns
the MR is terminating. The indication for it is failure to get the
task_struct or take reference on the mm_struct. In this case just abort
the page-fault handler with error but without a warning to the kernel log.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before calling the provider's alloc_mw function, verify that the
given memory type is either IB_MW_TYPE_1 or IB_MW_TYPE_2.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The strlen() check at the beginning of iw_cm_map() ensures that devname
and ifname strings are less than destinations to which they are supposed
to be copied. Change strncpy() call to be strcpy(), because we are
protected from overflow. Zero the entire string buffer to avoid copying
uninitialized kernel stack memory to userspace.
This fixes the compilation warning below:
In file included from ./include/linux/dma-mapping.h:6,
from drivers/infiniband/core/iwcm.c:38:
In function _strncpy_,
inlined from _iw_cm_map_ at drivers/infiniband/core/iwcm.c:519:2:
./include/linux/string.h:253:9: warning: ___builtin_strncpy_ specified
bound 32 equals destination size [-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: d53ec8af56 ("RDMA/iwcm: Don't copy past the end of dev_name() string")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
old_pd is used only if IB_MR_REREG_PD flags is set.
For readability move it's initialization to where it is used.
While there rewrite the whole 'if-else' block so on error jump directly
to label and no need for 'else'
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support for new LINK messages to allow adding and deleting rdma
interfaces. This will be used initially for soft rdma drivers which
instantiate device instances dynamically by the admin specifying a netdev
device to use. The rdma_rxe module will be the first user of these
messages.
The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register
with the rdma core if they provide link add/delete functions. Each driver
registers with a unique "type" string, that is used to dispatch messages
coming from user space. A new RDMA_NLDEV_ATTR is defined for the "type"
string. User mode will pass 3 attributes in a NEWLINK message:
RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created,
RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and
RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this
link. The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of
the device to delete.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since rxe allows unregistration from other threads the rxe pointer can
become invalid any moment after ib_register_driver returns. This could
cause a user triggered use after free.
Add another driver callback to be called right after the device becomes
registered to complete any device setup required post-registration. This
callback has enough core locking to prevent the device from becoming
unregistered.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rxe has an open coded version of this that is not as safe as the core
version. This lets us eliminate the internal device list entirely from
rxe.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These APIs are intended to support drivers that exist outside the usual
driver core probe()/remove() callbacks. Normally the driver core will
prevent remove() from running concurrently with probe(), once this safety
is lost drivers need more support to get the locking and lifetimes right.
ib_unregister_driver() is intended to be used during module_exit of a
driver using these APIs. It unregisters all the associated ib_devices.
ib_unregister_device_and_put() is to be used by a driver-specific removal
function (ie removal by name, removal from a netdev notifier, removal from
netlink)
ib_unregister_queued() is to be used from netdev notifier chains where
RTNL is held.
The locking is tricky here since once things become async it is possible
to race unregister with registration. This is largely solved by relying on
the registration refcount, unregistration will only ever work on something
that has a positive registration refcount - and then an unregistration
mutex serializes all competing unregistrations of the same device.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Several drivers need to find the ib_device from a given netdev. rxe needs
this at speed in an unsleepable context, so choose to implement the
translation using a RCU safe hash table.
The hash table can have a many to one mapping. This is intended to support
some future case where multiple IB drivers (ie iWarp and RoCE) connect to
the same netdevs. driver_ids will need to be different to support this.
In the process this makes the struct ib_device and ib_port_data RCU safe
by deferring their kfrees.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The associated netdev should not actually be very dynamic, so for most
drivers there is no reason for a callback like this. Provide an API to
inform the core code about the net dev affiliation and use a core
maintained data structure instead.
This allows the core code to be more aware of the ndev relationship which
will allow some new APIs based around this.
This also uses locking that makes some kind of sense, many drivers had a
confusing RCU lock, or missing locking which isn't right.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>