Commit Graph

10692 Commits

Author SHA1 Message Date
Michal Kalderon
97f6125092 RDMA/qedr: Add doorbell overflow recovery support
Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Link: https://lore.kernel.org/r/20191030094417.16866-8-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
4c6bb02d59 RDMA/qedr: Use the common mmap API
Remove all functions related to mmap from qedr and use the common API.

Link: https://lore.kernel.org/r/20191030094417.16866-7-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
11f1a75567 RDMA/siw: Use the common mmap_xa helpers
Remove the functions related to managing the mmap_xa database.  This code
is now common in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-6-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
e84d3c184e RDMA/efa: Use the common mmap_xa helpers
Remove the functions related to managing the mmap_xa database.  This code
was replaced with common code in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
c043ff2cfb RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped.  The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.  Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Michal Kalderon
3411f9f01b RDMA/core: Create mmap database and cookie helper functions
Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:00 -04:00
Michal Kalderon
b86deba977 RDMA/core: Move core content from ib_uverbs to ib_core
Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-05 09:59:26 -04:00
Greg Kroah-Hartman
09b0965ee8 IB: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-04 08:38:58 +01:00
Parav Pandit
e53a9d26cf IB/mlx5: Introduce and use mlx5_core_is_vf()
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

This enables to clearly identify PF, VF and non virtual functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:40:27 -07:00
Michael Guralnik
11f552e217 IB/mlx5: Test write combining support
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine.  The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:52:51 -03:00
Leon Romanovsky
546d30099e RDMA/mlx5: Return proper error value
Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:31:49 -03:00
Arnd Bergmann
d5b60e26e8 RDMA/hns: Fix build error again
This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb48 ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:

drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

Revert commits a07fc0bb48 ("RDMA/hns: Fix build error") and
a3e2d4c7e7 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.

Fixes: a07fc0bb48 ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7e7 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282df5 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-29 16:16:54 -03:00
Jason Gunthorpe
bb3dba3300 Merge branch 'odp_rework' into rdma.git for-next
Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:47:52 -03:00
Jason Gunthorpe
46870b2391 RDMA/odp: Remove broken debugging call to invalidate_range
invalidate_range() also obtains the umem_mutex which is being held at this
point, so if this path were was ever called it would deadlock. Thus
conclude the debugging never triggers and rework it into a simple WARN_ON
and leave things as they are.

While here add a note to explain how we could possibly get inconsistent
page pointers.

Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
09689703d2 RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
d561987f34 RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.

Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.

Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
5256edcb98 RDMA/mlx5: Rework implicit ODP destroy
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
b70d785d23 RDMA/mlx5: Avoid double lookups on the pagefault path
Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.

This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.

Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
3389baa831 RDMA/mlx5: Reduce locking in implicit_mr_get_data()
Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.

This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.

The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.

Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
423f52d650 RDMA/mlx5: Use an xarray for the children of an implicit ODP
Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.

This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.

Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.

Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.

The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.

At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.

Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
54375e7382 RDMA/mlx5: Split implicit handling from pagefault_mr
The single routine has a very confusing scheme to advance to the next
child MR when working on an implicit parent. This scheme can only be used
when working with an implicit parent and must not be triggered when
working on a normal MR.

Re-arrange things by directly putting all the single-MR stuff into one
function and calling it in a loop for the implicit case. Simplify some of
the error handling in the new pagefault_real_mr() to remove unneeded gotos.

Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
9162420dde RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
c2edcd6935 RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
This makes the routines easier to understand, particularly with respect
the locking requirements of the entire sequence. The implicit_mr_alloc()
had a lot of ifs specializing it to each of the callers, and only a very
small amount of code was actually shared.

Following patches will cause the flow in the two functions to diverge
further.

Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
3d5f3c54e7 RDMA/mlx5: Rework implicit_mr_get_data
This function is intended to loop across each MTT chunk in the implicit
parent that intersects the range [io_virt, io_virt+bnct).  But it is has a
confusing construction, so:

- Consistently use imr and odp_imr to refer to the implicit parent
  to avoid confusion with the normal mr and odp of the child
- Directly compute the inclusive start/end indexes by shifting. This is
  clearer to understand the intent and avoids any errors from unaligned
  values of addr
- Iterate directly over the range of MTT indexes, do not make a loop
  out of goto
- Follow 'success oriented flow', with goto error unwind
- Directly calculate the range of idx's that need update_xlt
- Ensure that any leaf MR added to the interval tree always results in an
  update to the XLT

Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
74bddb3682 RDMA/mlx5: Delete struct mlx5_priv->mkey_table
No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
806b101b2b RDMA/mlx5: Use a dedicated mkey xarray for ODP
There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
50211ec944 RDMA/mlx5: Split sig_err MR data into its own xarray
The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
fb985e278a RDMA/mlx5: Use SRCU properly in ODP prefetch
When working with SRCU protected xarrays the xarray itself should be the
SRCU 'update' point. Instead prefetch is using live as the SRCU update
point and this prevents switching the locking design to use the xarray
instead.

To solve this the prefetch must only read from the xarray once, and hold
on to the actual MR pointer for the duration of the async
operation. Incrementing num_pending_prefetch delays destruction of the MR,
so it is suitable.

Prefetch calls directly to the pagefault_mr using the MR pointer and only
does a single xarray lookup.

All the testing if a MR is prefetchable or not is now done only in the
prefetch code and removed from the pagefault critical path.

Link: https://lore.kernel.org/r/20191009160934.3143-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
036313316d Linux 5.4-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl210Z8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv+kIAKRpO7EuDokQL4qp
 hxEEaCMJA1T055EMlNU6FVAq/ZbmapzreUyNYiRMpPWKGTWNMkhIcZQfysYeGZz5
 y/KRxAiVxlcB+3v3yRmoZd/XoQmhgvJmqD4zhaGI2Utonow4f/SGSEFFZqqs9WND
 4HJROjZHgQ4JBxg9Z+QMo0FxbV/DCZpEOUq51N9WJywyyDRb18zotE83stpU+pE2
 fjqT7mk0NLrnYXuDRAbFC1Aau9ed4H6LlwLmxaqxq/Pt5Rz7wIKwKL9HIT4Dm/0a
 qpani6phhHWL7MwUpa2wkEonFCD03rJFl3DUVJo64Ijh4up5D/jyXQ+GKV2P4WKJ
 275Rb5Q=
 =WiZZ
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc5' into rdma.git for-next

Linux 5.4-rc5

For dependencies in the next patches

Conflict resolved by keeping the delete of the unlock.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:36:29 -03:00
Bryan Tan
a52dc3a100 RDMA/vmw_pvrdma: Use resource ids from physical device if available
This change allows the RDMA stack to use physical resource numbers if they
are passed up from the device. This is accomplished by separating the
concept of the QP number from the QP handle. Previously, the two were the
same, as the QP number was exposed to the guest and also used to reference
a virtual QP in the device backend.

With physical resource numbers exposed, the QP number given to the guest
is the number assigned from the physical HCA's QP, while the QP handle is
still the internal handle used to reference a virtual QP. Regardless of
whether the device is exposing physical ids, the driver will still try to
pick up the QP handle from the backend if possible. The MR keys exposed to
the guest will also be the MR keys created by the physical HCA, instead of
virtual MR keys. The distinction between handle and keys is already
present for MRs so there is no need to do anything special here.

A new version of the create QP response has been added to the device API
to pass up the QP number and handle. The driver will also report these to
userspace in the udata response if userspace supports it or not create the
queuepair if not. I also had to do a refactor of the destroy qp code to
reuse it if we fail to copy to userspace.

Link: https://lore.kernel.org/r/20191028181444.19448-1-aditr@vmware.com
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:09:23 -03:00
Lijun Ou
b681a05299 RDMA/hns: Prevent memory leaks of eq->buf_list
eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num
is set to 0, or there will be memory leaks.

Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 15:06:38 -03:00
Bart Van Assche
c9121262d5 RDMA/core: Set DMA parameters correctly
The dma_set_max_seg_size() call in setup_dma_device() does not have any
effect since device->dev.dma_parms is NULL. Fix this by initializing
device->dev.dma_parms first.

Link: https://lore.kernel.org/r/20191025225830.257535-5-bvanassche@acm.org
Fixes: d10bcf947a ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:04 -03:00
Bart Van Assche
a401fb819c RDMA/siw: Increase DMA max_segment_size parameter
Increase the DMA max_segment_size parameter from 64 KB to 2 GB.

Link: https://lore.kernel.org/r/20191025225830.257535-4-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:04 -03:00
Bart Van Assche
97458fd510 RDMA/rxe: Increase DMA max_segment_size parameter
Increase the DMA max_segment_size parameter from 64 KB to 2 GB.

Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:03 -03:00
Bernard Metzler
0edefddbae RDMA/siw: Fix post_recv QP state locking
Do not release qp state lock if not previously acquired.

Fixes: cf049bb31f ("RDMA/siw: Fix SQ/RQ drain logic")
Link: https://lore.kernel.org/r/20191025142903.20625-1-bmt@zurich.ibm.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:34:33 -03:00
Potnuri Bharat Teja
d4934f4569 RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
_put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by
process_work(). fix double free by freeing the skb only in process_work().

Fixes: 1dad0ebeea ("iw_cxgb4: Avoid touch after free error in ARP failure handlers")
Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com
Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:13:33 -03:00
Potnuri Bharat Teja
5212c3fda2 RDMA/iw_cxgb4: Report correct port speed/width
Query speed/width from corresponding netdev.

Link: https://lore.kernel.org/r/1572001022-4533-1-git-send-email-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:11:46 -03:00
Jason Gunthorpe
1524b12a6e RDMA/mlx5: Use irq xarray locking for mkey_table
The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.

  WARNING: inconsistent lock state
  5.4.0-rc1 #12 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xe1/0x200
    _raw_spin_lock_irqsave+0x35/0x50
    reg_mr_callback+0x2dd/0x450 [mlx5_ib]
    mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
    mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
   [..]

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&xa->xa_lock)->rlock#3);
    <Interrupt>
      lock(&(&xa->xa_lock)->rlock#3);

   *** DEADLOCK ***

  2 locks held by python3/343:
   #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
   #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]

  stack backtrace:
  CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x86/0xca
   print_usage_bug.cold.50+0x2e5/0x355
   mark_lock+0x871/0xb50
   ? match_held_lock+0x20/0x250
   ? check_usage_forwards+0x240/0x240
   __lock_acquire+0x7de/0x23a0
   ? __kasan_check_read+0x11/0x20
   ? mark_lock+0xae/0xb50
   ? mark_held_locks+0xb0/0xb0
   ? find_held_lock+0xca/0xf0
   lock_acquire+0xe1/0x200
   ? xa_erase+0x12/0x30
   _raw_spin_lock+0x2a/0x40
   ? xa_erase+0x12/0x30
   xa_erase+0x12/0x30
   mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
   uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
   uverbs_free_mw+0x1a/0x20 [ib_uverbs]
   destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
   [..]

Fixes: 0417791536 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:08:48 -03:00
Michal Kalderon
24e412c1e0 RDMA/qedr: Fix memory leak in user qp and mr
User QPs pbl's weren't freed properly.
MR pbls weren't freed properly.

Fixes: e0290cce6a ("qedr: Add support for memory registeration verbs")
Link: https://lore.kernel.org/r/20191027200451.28187-5-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:36 -03:00
Michal Kalderon
82af6d19d8 RDMA/qedr: Fix synchronization methods and memory leaks in qedr
Re-design of the iWARP CM related objects reference counting and
synchronization methods, to ensure operations are synchronized correctly
and that memory allocated for "ep" is properly released. Also makes sure
QP memory is not released before ep is finished accessing it.

Where as the QP object is created/destroyed by external operations, the ep
is created/destroyed by internal operations and represents the tcp
connection associated with the QP.

QP destruction flow:
- needs to wait for ep establishment to complete (either successfully or
  with error)
- needs to wait for ep disconnect to be fully posted to avoid a race
  condition of disconnect being called after reset.
- both the operations above don't always happen, so we use atomic flags to
  indicate whether the qp destruction flow needs to wait for these
  completions or not, if the destroy is called before these operations
  began, the flows will check the flags and not execute them ( connect /
  disconnect).

We use completion structure for waiting for the completions mentioned
above.

The QP refcnt was modified to kref object.  The EP has a kref added to it
to handle additional worker thread accessing it.

Memory Leaks - https://www.spinics.net/lists/linux-rdma/msg83762.html

Concurrency not managed correctly -
https://www.spinics.net/lists/linux-rdma/msg67949.html

Fixes: de0089e692 ("RDMA/qedr: Add iWARP connection management qp related callbacks")
Link: https://lore.kernel.org/r/20191027200451.28187-4-michal.kalderon@marvell.com
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:35 -03:00
Michal Kalderon
5fdff18b4d RDMA/qedr: Fix qpids xarray api used
The qpids xarray isn't accessed from irq context and therefore there
is no need to use the xa_XXX_irq version of the apis.
Remove the _irq.

Fixes: b6014f9e5f ("qedr: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20191027200451.28187-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:35 -03:00
Michal Kalderon
73ab512f72 RDMA/qedr: Fix srqs xarray initialization
There was a missing initialization for the srqs xarray.
SRQs xarray can also be called from irq context when searching
for an element and uses the xa_XXX_irq apis, therefore should
be initialized with IRQ flags.

Fixes: 9fd15987ed ("qedr: Convert srqidr to XArray")
Link: https://lore.kernel.org/r/20191027200451.28187-2-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:01:27 -03:00
Colin Ian King
994195e153 RDMA/hns: Fix memory leak on 'context' on error return path
Currently, the error return path when the call to function
dev->dfx->query_cqc_info fails will leak object 'context'. Fix this by
making the error return path via 'err' return return codes rather than
-EMSGSIZE, set ret appropriately for all error return paths and for the
memory leak now return via 'err' rather than just returning without
freeing context.

Link: https://lore.kernel.org/r/20191024131034.19989-1-colin.king@canonical.com
Addresses-Coverity: ("Resource leak")
Fixes: e1c9a0dc29 ("RDMA/hns: Dump detailed driver-specific CQ")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:41:23 -03:00
Yangyang Li
887803db86 RDMA/hns: Bugfix for qpc/cqc timer configuration
qpc/cqc timer entry size needs one page, but currently they are fixedly
configured to 4096, which is not appropriate in 64K page scenarios. So
they should be modified to PAGE_SIZE.

Fixes: 0e40dc2f70 ("RDMA/hns: Add timer allocation support for hip08")
Link: https://lore.kernel.org/r/1571908917-16220-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:33:06 -03:00
Lijun Ou
5c7e76fb7c RDMA/hns: Fix to support 64K page for srq
SRQ's page size configuration of BA and buffer should depend on current
PAGE_SHIFT, or it can't work in scenario of 64K page.

Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Link: https://lore.kernel.org/r/1571908917-16220-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:33:06 -03:00
Bart Van Assche
79d81ef42c RDMA/srpt: Fix TPG creation
Unlike the iSCSI target driver, for the SRP target driver it is sufficient
if a single TPG can be associated with each RDMA port name. However, users
started associating multiple TPGs with RDMA port names. Support this by
converting the single TPG in struct srpt_port_id into a list. This patch
fixes the following list corruption issue:

 list_add corruption. prev->next should be next (ffffffffc0a080c0), but was ffffa08a994ce6f0. (prev=ffffa08a994ce6f0).
 WARNING: CPU: 2 PID: 2597 at lib/list_debug.c:28 __list_add_valid+0x6a/0x70
 CPU: 2 PID: 2597 Comm: targetcli Not tainted 5.4.0-rc1.3bfa3c9602a7 #1
 RIP: 0010:__list_add_valid+0x6a/0x70
 Call Trace:
  core_tpg_register+0x116/0x200 [target_core_mod]
  srpt_make_tpg+0x3f/0x60 [ib_srpt]
  target_fabric_make_tpg+0x41/0x290 [target_core_mod]
  configfs_mkdir+0x158/0x3e0
  vfs_mkdir+0x108/0x1a0
  do_mkdirat+0x77/0xe0
  do_syscall_64+0x55/0x1d0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lore.kernel.org/r/20191023204106.23326-1-bvanassche@acm.org
Reported-by: Honggang LI <honli@redhat.com>
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:30:00 -03:00
Leon Romanovsky
f9e66db143 RDMA/hns: Delete BITS_PER_BYTE redefinition
HNS redefined available in bits.h define and didn't use it, we can safely
delete it.

Link: https://lore.kernel.org/r/20191023054239.31648-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 13:28:24 -03:00
Jason Gunthorpe
515f60004e RDMA/hns: Prevent undefined behavior in hns_roce_set_user_sq_size()
The "ucmd->log_sq_bb_count" variable is a user controlled variable in the
0-255 range.  If we shift more than then number of bits in an int then
it's undefined behavior (it shift wraps), and potentially the int could
become negative.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20190608092514.GC28890@mwanda
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
2019-10-28 11:20:34 -03:00
Leon Romanovsky
24f5214923 RDMA/cm: Update copyright together with SPDX tag
Add Mellanox to lust of copyright holders and replace copyright
boilerplate with relevant SPDX tag.

Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:11 -03:00
Leon Romanovsky
a916051191 RDMA/cm: Use specific keyword to check define
There is a specific define keyword to check if define exists or not,
let's use it instead of open-coded variant.

Link: https://lore.kernel.org/r/20191020071559.9743-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:10 -03:00
Leon Romanovsky
8d625101a7 RDMA/cm: Delete unused cm_is_active_peer function
Function cm_is_active_peer is not used, delete it.

Link: https://lore.kernel.org/r/20191020071559.9743-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 10:15:10 -03:00
Parav Pandit
549af00833 IB/core: Avoid deadlock during netlink message handling
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.

[..]
  rdma_nl_rcv()
  mutex_lock()
   [..]
   rdma_nl_rcv_msg()
      ib_get_client_nl_info()
         request_module()
           iw_cm_init()
             rdma_nl_register()
               mutex_lock(); <- Deadlock, acquiring mutex again

Due to above call sequence, following call trace and deadlock is observed.

  kernel: __mutex_lock+0x35e/0x860
  kernel: ? __mutex_lock+0x129/0x860
  kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
  kernel: rdma_nl_register+0x1a/0x90 [ib_core]
  kernel: ? 0xffffffffc029b000
  kernel: iw_cm_init+0x34/0x1000 [iw_cm]
  kernel: do_one_initcall+0x67/0x2d4
  kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
  kernel: do_init_module+0x5a/0x223
  kernel: load_module+0x1998/0x1e10
  kernel: ? __symbol_put+0x60/0x60
  kernel: __do_sys_finit_module+0x94/0xe0
  kernel: do_syscall_64+0x5a/0x270
  kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

  process stack trace:
  [<0>] __request_module+0x1c9/0x460
  [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
  [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
  [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
  [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
  [<0>] netlink_unicast+0x179/0x220
  [<0>] netlink_sendmsg+0x2f6/0x3f0
  [<0>] sock_sendmsg+0x30/0x40
  [<0>] ___sys_sendmsg+0x27a/0x290
  [<0>] __sys_sendmsg+0x58/0xa0
  [<0>] do_syscall_64+0x5a/0x270
  [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.

1. Split the lock protecting the cb_table into a per-index lock, and make
   it a rwlock. This lock is used to ensure no callbacks are running after
   unregistration returns. Since a module will not be registered once it
   is already running callbacks, this avoids the deadlock.

2. Use smp_store_release() to update the cb_table during registration so
   that no lock is required. This avoids lockdep problems with thinking
   all the rwsems are the same lock class.

Fixes: 0e2d00eb6f ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 20:49:37 -03:00
Mark Zhang
a15542bb72 RDMA/nldev: Skip counter if port doesn't match
The counter resource should return -EAGAIN if it was requested for a
different port, this is similar to how QP works if the users provides a
port filter.

Otherwise port filtering in netlink will return broken counter nests.

Fixes: c4ffee7c9b ("RDMA/netlink: Implement counter dumpit calback")
Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 15:31:23 -03:00
Leon Romanovsky
dc2f7edcc0 RDMA/rxe: Remove useless rxe_init_device_param assignments
IB devices are allocated with kzalloc and don't need explicit zero
assignments for their parameters. It can be removed safely.

Link: https://lore.kernel.org/r/20191020055724.7410-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24 15:20:18 -03:00
Leon Romanovsky
ac71ffcfb4 RDMA/core: Check that process is still alive before sending it to the users
The PID information can disappear asynchronously because the task can be
killed and moved to zombie state. In this case, PID will be zero in
similar way to the kernel tasks. Recognize such situation where we are
asking to return orphaned object and simply skip filling PID attribute.

As part of this change, document the same scenario in counter.c code.

Link: https://lore.kernel.org/r/20191010071105.25538-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23 16:02:12 -03:00
Leon Romanovsky
cf7e93c12f RDMA/restrack: Remove PID namespace support
IB resources are bounded to IB device and file descriptors, both entities
are unaware to PID namespaces and to task lifetime.

The difference in model caused to unpredictable behavior for the following
scenario:
 1. Create FD and context
 2. Share it with ephemeral child
 3. Create any object and exit that child

The end result of this flow, that those newly created objects will be
tracked by restrack, but won't be visible for users because task_struct
associated with them already exited.

The right thing is to rely on net namespace only for any filtering
purposes and drop PID namespace.

Link: https://lore.kernel.org/r/20191010071105.25538-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23 15:58:31 -03:00
Arnd Bergmann
1832f2d8ff compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.

One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.

I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.

Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:44 +02:00
Parav Pandit
c4c8aff5a9 IB/core: Do not notify GID change event of an unregistered device
When IB device is undergoing unregistration, the GID cache is always
cleaned up after all clients are unregistered with the below flow.

__ib_unregister_device()
  disable_device()
  ib_cache_cleanup_one()
    gid_table_cleanup_one()
      cleanup_gid_table_port()

There is no use in generating a GID change event at this stage, where
there is no active client of the device and device is nearly unregistered.

Link: https://lore.kernel.org/r/20191020065427.8772-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:58:22 -03:00
Michael Guralnik
3f89b01f4b IB/mlx5: Align usage of QP1 create flags with rest of mlx5 defines
There is little value in keeping separate function for one flag, provide
it directly like any other mlx5 define.

Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:39:49 -03:00
Ran Rozenstein
68abaa765e IB/mlx5: Remove dead code
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the
dead code.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org
Signed-off-by: Ran Rozenstein <ranro@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:37:48 -03:00
Chuhong Yuan
a29e1012c1 RDMA/uverbs: Add a check for uverbs_attr_get to uverbs_copy_to_struct_or_zero
All current callers for uverbs_copy_to_struct_or_zero() already check that
the attribute exists, but it make sense to verify the result like the
other functions do.

Link: https://lore.kernel.org/r/20191018081533.8544-1-hslester96@gmail.com
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:10:03 -03:00
Parav Pandit
d3bd939670 IB/cma: Honor traffic class from lower netdevice for RoCE
When a macvlan netdevice is used for RoCE, consider the tos->prio->tc
mapping as SL using its lower netdevice.

1. If the lower netdevice is a VLAN netdevice, consider the VLAN netdevice
   and it's parent netdevice for mapping
2. If the lower netdevice is not a VLAN netdevice, consider tc mapping
   directly from the lower netdevice

Link: https://lore.kernel.org/r/20191015072058.17347-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:56:22 -03:00
Erez Alfasi
4061ff7aa3 RDMA/nldev: Provide MR statistics
Add RDMA nldev netlink interface for dumping MR statistics information.

Output example:

$ ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

$ rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:33:31 -03:00
Erez Alfasi
e1b95ae0b0 RDMA/mlx5: Return ODP type per MR
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show
mr' dump.

For example:

$ rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Erez Alfasi
fb91069088 RDMA/nldev: Allow different fill function per resource
So far res_get_common_{dumpit, doit} was using the default resource fill
function which was defined as part of the nldev_fill_res_entry
fill_entries.

Add a fill function pointer as an argument allows us to use different fill
function in case we want to dump different values then 'rdma resource'
flow do, but still use the same existing general resources dumping flow.

Link: https://lore.kernel.org/r/20191016062308.11886-3-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Erez Alfasi
a3de94e3d6 IB/mlx5: Introduce ODP diagnostic counters
Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.

Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Dan Carpenter
a9018adfde RDMA/uverbs: Prevent potential underflow
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function.  We check that:

        if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {

But we don't check if "attr.comp_vector" is negative.  It could
potentially lead to an array underflow.  My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.

And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.

Fixes: 9ee79fce36 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:05:36 -03:00
rd.dunlab@gmail.com
7c21072dde infiniband: fix sw/rdmavt/ kernel-doc notation
Add kernel-doc for missing function parameters.
Remove excess kernel-doc descriptions.
Fix expected kernel-doc formatting (use ':' instead of '-' after @funcarg).

../drivers/infiniband/sw/rdmavt/ah.c:138: warning: Excess function parameter 'udata' description in 'rvt_destroy_ah'
../drivers/infiniband/sw/rdmavt/vt.c:698: warning: Function parameter or member 'pkey_table' not described in 'rvt_init_port'
../drivers/infiniband/sw/rdmavt/cq.c:561: warning: Excess function parameter 'rdi' description in 'rvt_driver_cq_init'
../drivers/infiniband/sw/rdmavt/cq.c:575: warning: Excess function parameter 'rdi' description in 'rvt_cq_exit'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'qp' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'aeth' not described in 'rvt_add_rnr_timer'
../drivers/infiniband/sw/rdmavt/qp.c:2591: warning: Function parameter or member 'qp' not described in 'rvt_stop_rc_timers'
../drivers/infiniband/sw/rdmavt/qp.c:2624: warning: Function parameter or member 'qp' not described in 'rvt_del_timers_sync'
../drivers/infiniband/sw/rdmavt/qp.c:2697: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter_init'
../drivers/infiniband/sw/rdmavt/qp.c:2728: warning: Function parameter or member 'iter' not described in 'rvt_qp_iter_next'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'rdi' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'v' not described in 'rvt_qp_iter'
../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter'

Link: https://lore.kernel.org/r/20191010035240.251184229@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
d6537c1a9c infiniband: fix core/ kernel-doc notation
Correct function parameter names (typos or renames).
Add kernel-doc notation for missing function parameters.

../drivers/infiniband/core/sa_query.c:1263: warning: Function parameter or member 'gid_attr' not described in 'ib_init_ah_attr_from_path'
../drivers/infiniband/core/sa_query.c:1263: warning: Excess function parameter 'sgid_attr' description in 'ib_init_ah_attr_from_path'

../drivers/infiniband/core/device.c:145: warning: Function parameter or member 'dev' not described in 'rdma_dev_access_netns'
../drivers/infiniband/core/device.c:145: warning: Excess function parameter 'device' description in 'rdma_dev_access_netns'
../drivers/infiniband/core/device.c:1333: warning: Function parameter or member 'name' not described in 'ib_register_device'
../drivers/infiniband/core/device.c:1461: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device'
../drivers/infiniband/core/device.c:1461: warning: Excess function parameter 'device' description in 'ib_unregister_device'
../drivers/infiniband/core/device.c:1483: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_and_put'
../drivers/infiniband/core/device.c:1550: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_queued'

Link: https://lore.kernel.org/r/20191010035240.191542461@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
b24da1a0d4 infiniband: fix ulp/iser/iser_initiator.c kernel-doc warnings
Add kernel-doc notation for missing function parameters:

../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'conn' not described in 'iser_send_command'
../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'task' not described in 'iser_send_command'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'conn' not described in 'iser_send_data_out'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'task' not described in 'iser_send_data_out'
../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'hdr' not described in 'iser_send_data_out'

Link: https://lore.kernel.org/r/20191010035240.132033937@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
134a42a66b infiniband: fix ulp/iser/iser_verbs.c kernel-doc notation
Various kernel-doc fixes:

- fix typos
- don't use /** for internal structs or functions
- fix Return: kernel-doc formatting
- add kernel-doc notation for missing function parameters

../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'size' not described in 'iser_alloc_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:221: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fmr_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'size' not described in 'iser_alloc_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:338: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fastreg_pool'
../drivers/infiniband/ulp/iser/iser_verbs.c:568: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_release'
../drivers/infiniband/ulp/iser/iser_verbs.c:603: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_terminate'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'signal' not described in 'iser_post_send'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'ib_conn' not described in 'iser_post_send'
../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'tx_desc' not described in 'iser_post_send'

Link: https://lore.kernel.org/r/20191010035240.070520193@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
094c88f3c5 infiniband: fix core/verbs.c kernel-doc notation
Add missing function parameter descriptions:

../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'flags' not described in '__ib_alloc_pd'
../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'caller' not described in '__ib_alloc_pd'

Link: https://lore.kernel.org/r/20191010035240.011497492@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
96f4b0b68d infiniband: fix ulp/srpt/ib_srpt.h kernel-doc notation
Fix kernel-doc warnings (typos or renames) in ib_srpt.h:

../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_guid_id' not described in 'srpt_port'
../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_gid_id' not described in 'srpt_port'

Link: https://lore.kernel.org/r/20191010035239.950150496@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com
dfa4344da3 infiniband: fix ulp/opa_vnic/opa_vnic_internal.h kernel-doc notation
Remove kernel-doc notation on 4 structs since they are internal and
none of the struct fields/members are described.
This removes 45 kernel-doc warnings.

Link: https://lore.kernel.org/r/20191010035239.818405496@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
28f2a6aeed infiniband: fix ulp/iser/iscsi_iser.h kernel-doc warnings
Fix kernel-doc warnings and typos/spellos.

../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'dma_addr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'cqe' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'reg_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'send_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'inv_wr' not described in 'iser_tx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:277: warning: Function parameter or member 'cqe' not described in 'iser_rx_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:296: warning: Function parameter or member 'rsp' not described in 'iser_login_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:339: warning: Function parameter or member 'reg_mem' not described in 'iser_reg_ops'
../drivers/infiniband/ulp/iser/iscsi_iser.h:399: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:413: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool'
../drivers/infiniband/ulp/iser/iscsi_iser.h:439: warning: Function parameter or member 'reg_cqe' not described in 'ib_conn'
../drivers/infiniband/ulp/iser/iscsi_iser.h:491: warning: Function parameter or member 'snd_w_inv' not described in 'iser_conn'

This leaves 2 "member not described" warnings that I don't know how to fix:

../drivers/infiniband/ulp/iser/iscsi_iser.h:401: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc'
../drivers/infiniband/ulp/iser/iscsi_iser.h:415: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool'

Link: https://lore.kernel.org/r/20191010035239.756365352@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
526f2c5063 infiniband: fix core/ipwm_util.h kernel-doc warnings
Fix kernel-doc warnings and expected formatting.

../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'a_sockaddr' not described in 'iwpm_compare_sockaddr'
../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'b_sockaddr' not described in 'iwpm_compare_sockaddr'
../drivers/infiniband/core/iwpm_util.h:280: warning: Function parameter or member 'iwpm_pid' not described in 'iwpm_send_hello'

Link: https://lore.kernel.org/r/20191010035239.695604406@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com
df130f878e infiniband: fix ulp/iser/iscsi_iser.[hc] kernel-doc notation
Fix struct name in kernel-doc notation to match the struct name below it.
Fix one typo (spello).
Fix formatting as expected for kernel-doc notation.
Fix parameter name to match the function's parameter name to eliminate a
kernel-doc warning.

../drivers/infiniband/ulp/iser/iscsi_iser.c:815: warning: Function parameter or member 'non_blocking' not described in 'iscsi_iser_ep_connect'

Link: https://lore.kernel.org/r/20191010035239.623888112@gmail.com
Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:45:31 -03:00
Jason Gunthorpe
a2aca4d7f0 Merge branch 'mlx5-rd-sgl' into rdma.git for-next
From Yamin Friedman:

====================
This series from Yamin implements long standing "TODO" existed in rw.c. It
allows the driver to specify a cut-over point where it is faster to build
a lkey MR rather than do a large SGL for RDMA READ operations.

mlx5 HW gets a notable performane boost by switching to MRs.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-rd-sgl': (3 commits)
  RDMA/mlx5: Add capability for max sge to get optimized performance
  RDMA/rw: Support threshold for registration vs scattering to local pages
  net/mlx5: Expose optimal performance scatter entries capability
2019-10-22 14:27:25 -03:00
Yamin Friedman
366090564b RDMA/mlx5: Add capability for max sge to get optimized performance
Allows the IB device to provide a value of maximum scatter gather entries
per RDMA READ.

In certain cases it may be preferable for a device to perform UMR memory
registration rather than have many scatter entries in a single RDMA READ.
This provides a significant performance increase in devices capable of
using different memory registration schemes based on the number of scatter
gather entries. This general capability allows each device vendor to fine
tune when it is better to use memory registration.

Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:27:00 -03:00
Yamin Friedman
00bd1439f4 RDMA/rw: Support threshold for registration vs scattering to local pages
If there are more scatter entries than the recommended limit provided by
the ib device, UMR registration is used. This will provide optimal
performance when performing large RDMA READs over devices that advertise
the threshold capability.

With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes:
Without use of cap: 70Gb/sec
With use of cap: 84Gb/sec

Link: https://lore.kernel.org/r/20191007135933.12483-3-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:26:52 -03:00
Bernard Metzler
cf049bb31f RDMA/siw: Fix SQ/RQ drain logic
Storage ULPs (e.g. iSER & NVMeOF) use ib_drain_qp() to drain
QP/CQ. Current SIW's own drain routines do not properly wait until all
SQ/RQ elements are completed and reaped from the CQ. This may cause touch
after free issues.  New logic relies on generic
__ib_drain_sq()/__ib_drain_rq() posting a final work request, which SIW
immediately flushes to CQ.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20191004125356.20673-1-bmt@zurich.ibm.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 13:43:10 -03:00
Donald Dutile
5a0d523781 ib/srp: Add missing new line after displaying fast_io_fail_tmo param
Long-time missing new-line in sysfs output.
Simply add it.

Signed-off-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20191009164937.21989-1-ddutile@redhat.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 16:55:16 -04:00
Yangyang Li
d302c6e3a6 RDMA/hns: Release qp resources when failed to destroy qp
Even if no response from hardware, we should make sure that qp related
resources are released to avoid memory leaks.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:45:22 -04:00
Yixing Liu
3dcad1f842 RDMA/hns: Fix a spelling mistake in a macro
HNS_ROCE_ALOGN_UP should be HNS_ROCE_ALIGN_UP, this patch fix it.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-6-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Lang Cheng
cfd82da4e7 RDMA/hns: Modify return value of restrack functions
The restrack function return EINVAL instead of EMSGSIZE when the driver
operation fails.

Fixes: 4b42d05d0b ("RDMA/hns: Remove unnecessary kzalloc")
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Weihang Li
32883228b9 RDMA/hns: Modify variable/field name from vlan to vlan_id
The name of vlan and vlan_tag is not clear enough, it's actually means
vlan id.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-4-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:38 -04:00
Weihang Li
e8a07de57e RDMA/hns: Fix wrong parameters when initial mtt of srq->idx_que
The parameters npages used to initial mtt of srq->idx_que shouldn't be
same with srq's. And page_shift should be calculated from idx_buf_pg_sz.
This patch fixes above issues and use field named npage and page_shift
in hns_roce_buf instead of two temporary variables to let us use them
anywhere.

Fixes: 18df508c79 ("RDMA/hns: Remove if-else judgment statements for creating srq")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:37 -04:00
Weihang Li
9f7d706400 RDMA/hns: remove a redundant le16_to_cpu
Type of ah->av.vlan is u16, there will be a problem using le16_to_cpu
on it.

Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Link: https://lore.kernel.org/r/1567566885-23088-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-21 15:29:37 -04:00
Parav Pandit
777a8b32bc IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields
Current code tries to derive VLAN ID and compares it with GID
attribute for matching entry. This raw search fails on macvlan
netdevice as its not a VLAN device, but its an upper device of a VLAN
netdevice.

Due to this limitation, incoming QP1 packets fail to match in the
GID table. Such packets are dropped.

Hence, to support it, use the existing rdma_read_gid_l2_fields()
that takes care of diffferent device types.

Fixes: dbf727de74 ("IB/core: Use GID table in AH creation and dmac resolution")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:55:33 -04:00
Kamal Heib
b806c94ee4 RDMA/qedr: Fix reported firmware version
Remove spaces from the reported firmware version string.
Actual value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8. 37. 7. 0

Expected value:
$ cat /sys/class/infiniband/qedr0/fw_ver
8.37.7.0

Fixes: ec72fce401 ("qedr: Add support for RoCE HW init")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:50:14 -04:00
Krishnamraju Eraparaju
e17fa5c95e RDMA/siw: free siw_base_qp in kref release routine
As siw_free_qp() is the last routine to access 'siw_base_qp' structure,
freeing this structure early in siw_destroy_qp() could cause
touch-after-free issue.
Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp().

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:49:01 -04:00
Krishnamraju Eraparaju
54102dd410 RDMA/iwcm: move iw_rem_ref() calls out of spinlock
kref release routines usually perform memory release operations,
hence, they should not be called with spinlocks held.
one such case is: SIW kref release routine siw_free_qp(), which
can sleep via vfree() while freeing queue memory.

Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks.

Fixes: 922a8e9fb2 ("RDMA: iWARP Connection Manager.")
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:40:01 -04:00
Potnuri Bharat Teja
612e0486ad iw_cxgb4: fix ECN check on the passive accept
pass_accept_req() is using the same skb for handling accept request and
sending accept reply to HW. Here req and rpl structures are pointing to
same skb->data which is over written by INIT_TP_WR() and leads to
accessing corrupt req fields in accept_cr() while checking for ECN flags.
Reordered code in accept_cr() to fetch correct req fields.

Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18 14:35:19 -04:00
Mike Marciniszyn
22bb136534 IB/hfi1: Use a common pad buffer for 9B and 16B packets
There is no reason for a different pad buffer for the two
packet types.

Expand the current buffer allocation to allow for both
packet types.

Fixes: f8195f3b14 ("IB/hfi1: Eliminate allocation while atomic")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:32:25 -04:00
Kaike Wan
9ed5bd7d22 IB/hfi1: Avoid excessive retry for TID RDMA READ request
A TID RDMA READ request could be retried under one of the following
conditions:
- The RC retry timer expires;
- A later TID RDMA READ RESP packet is received before the next
  expected one.
For the latter, under normal conditions, the PSN in IB space is used
for comparison. More specifically, the IB PSN in the incoming TID RDMA
READ RESP packet is compared with the last IB PSN of a given TID RDMA
READ request to determine if the request should be retried. This is
similar to the retry logic for noraml RDMA READ request.

However, if a TID RDMA READ RESP packet is lost due to congestion,
header suppresion will be disabled and each incoming packet will raise
an interrupt until the hardware flow is reloaded. Under this condition,
each packet KDETH PSN will be checked by software against r_next_psn
and a retry will be requested if the packet KDETH PSN is later than
r_next_psn. Since each TID RDMA READ segment could have up to 64
packets and each TID RDMA READ request could have many segments, we
could make far more retries under such conditions, and thus leading to
RETRY_EXC_ERR status.

This patch fixes the issue by removing the retry when the incoming
packet KDETH PSN is later than r_next_psn. Instead, it resorts to
RC timer and normal IB PSN comparison for any request retry.

Fixes: 9905bf06e8 ("IB/hfi1: Add functions to receive TID RDMA READ response")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:31:17 -04:00
Rafi Wiener
c8973df2da RDMA/mlx5: Clear old rate limit when closing QP
Before QP is closed it changes to ERROR state, when this happens
the QP was left with old rate limit that was already removed from
the table.

Fixes: 7d29f349a4 ("IB/mlx5: Properly adjust rate limit on QP state transitions")
Signed-off-by: Rafi Wiener <rafiw@mellanox.com>
Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:07:25 -04:00
Parav Pandit
03232cc43c IB/mlx5: Introduce and use mkey context setting helper routine
Introduce and use set_mkc_access_pd_addr_fields() which sets mkey
context's access rights, PD, address fields.  Thereby avoid the code
duplication.

Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:45:07 -03:00
Max Gurtovoy
3466c060ef RDMA/iser: Use iser_err instead of pr_err for logging
Make sure all the debug prints in ib_iser module use the common driver
logger.

Link: https://lore.kernel.org/r/1570366580-24097-1-git-send-email-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:35:52 -03:00
Devesh Sharma
39c48c5146 RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter series
Broadcom's 575xx adapter series has support for SRIOV VFs.  Making changes
to enable SRIOV VF support. There are two major area where changes are
done:

 - Added new DB location for control-path and data-path DB ring

 - New devices do not need to issue the sriov-config slow-path command
   thus, skipping to call that firmware command.

For now enabling support for 64 RoCE VFs.

Link: https://lore.kernel.org/r/1570081715-14301-1-git-send-email-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:22:24 -03:00
Honggang Li
b2e872f451 RDMA/srp: Calculate max_it_iu_size if remote max_it_iu length available
The default maximum immediate size is too big for old srp clients, which
do not support immediate data.

According to the SRP and SRP-2 specifications, the IOControllerProfile
attributes for SRP target ports contains the maximum initiator to target
iu length.

The maximum initiator to target iu length can be obtained by sending MAD
packets to query subnet manager port and SRP target ports. We should
calculate the max_it_iu_size instead of the default value, when remote
maximum initiator to target iu length available.

Link: https://lore.kernel.org/r/20190927174352.7800-2-honli@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 15:17:32 -03:00
Honggang Li
547ed331bb RDMA/srp: Add parse function for maximum initiator to target IU size
According to SRP specifications 'srp-r16a' and 'srp2r06',
IOControllerProfile attributes for SRP target port include the maximum
initiator to target IU size.

SRP connection daemons, such as srp_daemon, can get the value from the
subnet manager. The SRP connection daemon can pass this value to kernel.

This patch adds a parse function for it.

Upstream commit [1] enables the kernel parameter, 'use_imm_data', by
default. [1] also use (8 * 1024) as the default value for kernel parameter
'max_imm_data'. With those default values, the maximum initiator to target
IU size will be 8260.

In case the SRPT modules, which include the in-tree 'ib_srpt.ko' module,
do not support SRP-2 'immediate data' feature, the default maximum
initiator to target IU size is significantly smaller than 8260. For
'ib_srpt.ko' module, which built from source before [2], the default
maximum initiator to target IU is 2116.

[1] introduces a regression issue for old srp targets with default kernel
parameters, as the connection will be rejected because of a too large
maximum initiator to target IU size.

[1] commit 882981f4a4 ("RDMA/srp: Add support for immediate data")
[2] commit 5dabcd0456 ("RDMA/srpt: Add support for immediate data")

Link: https://lore.kernel.org/r/20190927174352.7800-1-honli@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 15:17:32 -03:00
Jason Gunthorpe
0417791536 RDMA/mlx5: Add missing synchronize_srcu() for MW cases
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.

Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.

This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.

Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa603815c7 RDMA/mlx5: Put live in the correct place for ODP MRs
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa116b810a RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:

    CPU0                                         CPU1
  dereg_mr()
                                          mlx5_ib_advise_mr_prefetch()
            				   srcu_read_lock()
                                            num_pending_prefetch_inc()
					      if (!live)
   live = 0
   atomic_read() == 0
     // skip flush_workqueue()
                                              atomic_inc()
 					      queue_work();
            				   srcu_read_unlock()
   WARN_ON(atomic_read())  // Fails

Swap the order so that the synchronize_srcu() prevents this.

Fixes: a6bc3875f1 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
9dc775e7f5 RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()
This fixes a race of the form:
    CPU0                               CPU1
mlx5_ib_invalidate_range()     mlx5_ib_invalidate_range()
				 // This one actually makes npages == 0
				 ib_umem_odp_unmap_dma_pages()
				 if (npages == 0 && !dying)
  // This one does nothing
  ib_umem_odp_unmap_dma_pages()
  if (npages == 0 && !dying)
     dying = 1;
                                    dying = 1;
				    schedule_work(&umem_odp->work);
     // Double schedule of the same work
     schedule_work(&umem_odp->work);  // BOOM

npages and dying must be read and written under the umem_mutex lock.

Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.

This avoids an expensive double critical section in
mlx5_ib_invalidate_range().

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
f28b1932ea RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:

    CPU0                               CPU1
  mlx5_ib_update_xlt()
   mlx5_odp_populate_klm()
     odp_lookup() == NULL
     pklm = ZAP
                                      implicit_mr_get_data()
 				        implicit_mr_alloc()
 					  <update interval tree>
					mlx5_ib_update_xlt
					  mlx5_odp_populate_klm()
					    odp_lookup() != NULL
					    pklm = VALID
					   mlx5_ib_post_send_wait()

    mlx5_ib_post_send_wait() // Replaces VALID with ZAP

This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.

The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.

The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
880505cfef RDMA/mlx5: Do not allow rereg of a ODP MR
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Mohamad Heib
1cbe866cbc IB/core: Fix wrong iterating on ports
rdma_for_each_port is already incrementing the iterator's value it
receives therefore, after the first iteration the iterator is increased by
2 which eventually causing wrong queries and possible traces.

Fix the above by removing the old redundant incrementation that was used
before rdma_for_each_port() macro.

Cc: <stable@vger.kernel.org>
Fixes: ea1075edcb ("RDMA: Add and use rdma_for_each_port")
Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Reviewed-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:50:27 -03:00
Parav Pandit
909624d8db IB/cm: Use container_of() instead of typecast
Use container_of() macro to get to timewait info structure instead of
typecasting.

Link: https://lore.kernel.org/r/20191002122517.17721-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Erez Alfasi
6f26b2ac69 IB/mlx5: Remove unnecessary else statement
'else' is not generally useful after a break or return. Remove this
unnecessary statement.

Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Erez Alfasi
2d67c07988 IB/mlx5: Remove unnecessary return statement
There is no reason to call return at the end of function which returns
void. Remove this unnecessary statement.

Link: https://lore.kernel.org/r/20191002122517.17721-3-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Leon Romanovsky
4b2a67362e RDMA/mlx5: Group boolean parameters to take less space
Clean the code to store all boolean parameters inside one variable.

Link: https://lore.kernel.org/r/20191002122517.17721-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Bart Van Assche
9b64f7d0bb RDMA/srpt: Postpone HCA removal until after configfs directory removal
A shortcoming of the SCSI target core is that it does not have an API
for removing tpg or wwn objects. Wait until these directories have been
removed before allowing HCA removal to finish.

See also Bart Van Assche, "Re: Why using configfs as the only interface is
wrong for a storage target", 2011-02-07
(https://www.spinics.net/lists/linux-scsi/msg50248.html).

This patch fixes the following kernel crash:

==================================================================
BUG: KASAN: use-after-free in __configfs_open_file.isra.4+0x1a8/0x400
Read of size 8 at addr ffff88811880b690 by task restart-lio-srp/1215

CPU: 1 PID: 1215 Comm: restart-lio-srp Not tainted 5.3.0-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 print_address_description+0x74/0x32d
 __kasan_report.cold.6+0x1b/0x36
 kasan_report+0x12/0x17
 __asan_load8+0x54/0x90
 __configfs_open_file.isra.4+0x1a8/0x400
 configfs_open_file+0x13/0x20
 do_dentry_open+0x2b1/0x770
 vfs_open+0x58/0x60
 path_openat+0x5fa/0x14b0
 do_filp_open+0x115/0x180
 do_sys_open+0x1d4/0x2a0
 __x64_sys_openat+0x59/0x70
 do_syscall_64+0x6b/0x2d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f2f2bd3fcce
Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 19 d7 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffd155f7850 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000564609ba88e0 RCX: 00007f2f2bd3fcce
RDX: 0000000000000241 RSI: 0000564609ba8cf0 RDI: 00000000ffffff9c
RBP: 00007ffd155f7950 R08: 0000000000000000 R09: 0000000000000020
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000001 R15: 0000564609ba8cf0

Allocated by task 995:
 save_stack+0x21/0x90
 __kasan_kmalloc.constprop.9+0xc7/0xd0
 kasan_kmalloc+0x9/0x10
 __kmalloc+0x153/0x370
 srpt_add_one+0x4f/0x561 [ib_srpt]
 add_client_context+0x251/0x290 [ib_core]
 ib_register_client+0x1da/0x220 [ib_core]
 iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock]
 do_one_initcall+0xcd/0x43a
 do_init_module+0x103/0x380
 load_module+0x3b77/0x3eb0
 __do_sys_finit_module+0x12d/0x1b0
 __x64_sys_finit_module+0x43/0x50
 do_syscall_64+0x6b/0x2d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1221:
 save_stack+0x21/0x90
 __kasan_slab_free+0x139/0x190
 kasan_slab_free+0xe/0x10
 slab_free_freelist_hook+0x67/0x1e0
 kfree+0xcb/0x2a0
 srpt_remove_one+0x596/0x670 [ib_srpt]
 remove_client_context+0x9a/0xe0 [ib_core]
 disable_device+0x106/0x1b0 [ib_core]
 __ib_unregister_device+0x5f/0xf0 [ib_core]
 ib_unregister_driver+0x11a/0x170 [ib_core]
 0xffffffffa087f666
 __x64_sys_delete_module+0x1f8/0x2c0
 do_syscall_64+0x6b/0x2d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88811880b300
 which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 912 bytes inside of
 4096-byte region [ffff88811880b300, ffff88811880c300)
The buggy address belongs to the page:
page:ffffea0004620200 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0x0 compound_mapcount: 0
flags: 0x2fff000000010200(slab|head)
raw: 2fff000000010200 dead000000000100 dead000000000122 ffff88811ac0de00
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88811880b580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88811880b600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88811880b680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
 ffff88811880b700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88811880b780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Link: https://lore.kernel.org/r/20190930231707.48259-16-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:07 -03:00
Bart Van Assche
3236fd61ee RDMA/srpt: Make the code for handling port identities more systematic
Introduce a new data structure for the information about an RDMA port
name. This patch does not change any functionality.

Link: https://lore.kernel.org/r/20190930231707.48259-15-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:07 -03:00
Bart Van Assche
be408e65f5 RDMA/srpt: Rework the code that waits until an RDMA port is no longer in use
The current implementation does not wait until srpt_release_channel()
has finished and hence can trigger a use-after-free. Rework
srpt_release_sport() such that it waits until srpt_release_channel()
has finished. This patch fixes the following KASAN complaint:

==================================================================
BUG: KASAN: use-after-free in srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt]
Read of size 8 at addr ffff888115c71100 by task kworker/4:3/807

CPU: 4 PID: 807 Comm: kworker/4:3 Not tainted 5.3.0-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events srpt_release_channel_work [ib_srpt]
Call Trace:
 dump_stack+0x86/0xca
 print_address_description+0x74/0x32d
 __kasan_report.cold.6+0x1b/0x36
 kasan_report+0x12/0x17
 __asan_load8+0x54/0x90
 srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt]
 srpt_free_ioctx_ring.part.24+0x50/0x80 [ib_srpt]
 srpt_release_channel_work+0x2ad/0x390 [ib_srpt]
 process_one_work+0x51a/0xa60
 worker_thread+0x67/0x5b0
 kthread+0x1dc/0x200
 ret_from_fork+0x24/0x30

Allocated by task 984:
 save_stack+0x21/0x90
 __kasan_kmalloc.constprop.9+0xc7/0xd0
 kasan_kmalloc+0x9/0x10
 __kmalloc+0x153/0x370
 srpt_add_one+0x4f/0x570 [ib_srpt]
 add_client_context+0x251/0x290 [ib_core]
 ib_register_client+0x1da/0x220 [ib_core]
 iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock]
 do_one_initcall+0xcd/0x43a
 do_init_module+0x103/0x380
 load_module+0x3b77/0x3eb0
 __do_sys_finit_module+0x12d/0x1b0
 __x64_sys_finit_module+0x43/0x50
 do_syscall_64+0x6b/0x2d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1128:
 save_stack+0x21/0x90
 __kasan_slab_free+0x139/0x190
 kasan_slab_free+0xe/0x10
 slab_free_freelist_hook+0x67/0x1e0
 kfree+0xcb/0x2a0
 srpt_remove_one+0x569/0x5b0 [ib_srpt]
 remove_client_context+0x9a/0xe0 [ib_core]
 disable_device+0x106/0x1b0 [ib_core]
 __ib_unregister_device+0x5f/0xf0 [ib_core]
 ib_unregister_device_and_put+0x48/0x60 [ib_core]
 nldev_dellink+0x120/0x180 [ib_core]
 rdma_nl_rcv+0x287/0x480 [ib_core]
 netlink_unicast+0x2cc/0x370
 netlink_sendmsg+0x3b1/0x630
 __sys_sendto+0x1db/0x290
 __x64_sys_sendto+0x80/0xa0
 do_syscall_64+0x6b/0x2d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888115c71100
 which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 0 bytes inside of
 4096-byte region [ffff888115c71100, ffff888115c72100)
The buggy address belongs to the page:
page:ffffea0004571c00 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0xffff888115c70000 compound_mapcount: 0
flags: 0x2fff000000010200(slab|head)
raw: 2fff000000010200 ffffea00045ac408 ffffea0004593208 ffff88811ac0de00
raw: ffff888115c70000 0000000000070002 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888115c71000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888115c71080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888115c71100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff888115c71180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888115c71200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Link: https://lore.kernel.org/r/20190930231707.48259-14-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:07 -03:00
Bart Van Assche
6eaed91c67 RDMA/srpt: Rework the approach for closing an RDMA channel
Instead of relying on a waitqueue, report when the identity of an RDMA
channel can be reused through a completion.

Link: https://lore.kernel.org/r/20190930231707.48259-13-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:07 -03:00
Bart Van Assche
b5948cfdde RDMA/srpt: Improve a debug message
The ib_srpt driver uses two different identifiers while registering a
session with the LIO core. Report both identifiers if the modified
pr_debug() statement is enabled.

Link: https://lore.kernel.org/r/20190930231707.48259-12-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:07 -03:00
Bart Van Assche
cbca2442a0 RDMA/srpt: Fix handling of iWARP logins
The path_rec pointer is NULL set for IB and RoCE logins but not for iWARP
logins. Hence check the path_rec pointer before dereferencing it.

Link: https://lore.kernel.org/r/20190930231707.48259-11-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:06 -03:00
Bart Van Assche
09f8a1486d RDMA/srpt: Fix handling of SR-IOV and iWARP ports
Management datagrams (MADs) are not supported by SR-IOV VFs nor by iWARP
ports. Support SR-IOV VFs and iWARP ports by only logging an error message
if MAD handler registration fails.

Link: https://lore.kernel.org/r/20190930231707.48259-10-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:06 -03:00
Bart Van Assche
fdbcf5c026 RDMA/srp: Make route resolving error messages more informative
The IPv6 scope ID is essential when setting up an iWARP connection
between IPv6 link-local addresses. Report the scope ID in error messages.

Link: https://lore.kernel.org/r/20190930231707.48259-9-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:06 -03:00
Bart Van Assche
bf58347061 RDMA/srp: Honor the max_send_sge device attribute
Instead of assuming that max_send_sge >= 3, restrict the number of scatter
gather elements to what is supported by the RDMA adapter.

Link: https://lore.kernel.org/r/20190930231707.48259-8-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:06 -03:00
Bart Van Assche
14673778d0 RDMA/srp: Remove two casts
This patch does not change any functionality.

Link: https://lore.kernel.org/r/20190930231707.48259-7-bvanassche@acm.org
Cc: Honggang LI <honli@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:05 -03:00
Bart Van Assche
934f05b05d RDMA/siw: Make node GUIDs valid EUI-64 identifiers
>From the IBTA: "GUID (Global Unique Identifier): A globally unique EUI-64
compliant identifier." Make sure that siw GUIDs are valid EUI-64 identifiers.

Link: https://lore.kernel.org/r/20190930231707.48259-6-bvanassche@acm.org
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:35:05 -03:00
Leon Romanovsky
594e6c5d41 RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
Properly unwind QP counter rebinding in case of failure.

Trying to rebind the counter after unbiding it is not going to work
reliably, move the unbind to the end so it doesn't have to be unwound.

Fixes: b389327df9 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink")
Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:29:55 -03:00
Greg KH
3840c5b788 RDMA/cxgb4: Do not dma memory off of the stack
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack,
which is generally considered a very bad thing.  On some architectures it
could be a security problem, but odds are none of them actually run this
driver, so it's just a "normal" bug.

Resolve this by allocating the memory for a message off of the heap
instead of the stack.  kmalloc() always will give us a proper memory
location that DMA will work correctly from.

Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com
Reported-by: Nicolas Waisman <nico@semmle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:13:27 -03:00
Potnuri Bharat Teja
30e0f6cf5a RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel
Remove iw_cxgb3 module from kernel as the corresponding HW Chelsio T3 has
reached EOL.

Link: https://lore.kernel.org/r/20190930074252.20133-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:08:59 -03:00
Jack Morgenstein
94635c36f3 RDMA/cm: Fix memory leak in cm_add/remove_one
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.

This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().

Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().

Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.

Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().

Note: the kfree call in the first chunk below (in the cm_add_one error
flow) fixes an old, undetected memory leak.

Fixes: c87e65cfb9 ("RDMA/cm: Move debug counters to be under relevant IB device")
Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 14:58:31 -03:00
Christophe JAILLET
ab59ca3eb4 RDMA/core: Fix an error handling path in 'res_get_common_doit()'
According to surrounding error paths, it is likely that 'goto err_get;' is
expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be
missing.

Fixes: c5dfe0ea6f ("RDMA/nldev: Add resource tracker doit callback")
Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 14:55:18 -03:00
Shiraz, Saleem
ee4e4040ab RDMA/i40iw: Associate ibdev to netdev before IB device registration
i40iw IB device registration fails with ENODEV.

ib_register_device
 setup_device/setup_port_data
  i40iw_port_immutable
   ib_query_port
     iw_query_port
      ib_device_get_netdev(ENODEV)

ib_device_get_netdev() does not have a netdev associated
with the ibdev and thus fails.
Use ib_device_set_netdev() to associate netdev to ibdev
in i40iw before IB device registration.

Fixes: 4929116bdf ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 14:29:14 -03:00
Kamal Heib
f3fceba5da RDMA/rxe: Verify modify_device mask
Verify that the passed mask to rxe_modify_device() is supported.

Link: https://lore.kernel.org/r/20190923104158.5331-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 13:06:10 -03:00
Kamal Heib
39ce85f3b1 RDMA/bnxt_re: Remove unsupported modify_device callback
There is no need to return always zero for function which is not
supported.

Link: https://lore.kernel.org/r/20190923104158.5331-3-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 13:06:10 -03:00
Kamal Heib
d0f3ef36bf RDMA/core: Fix return code when modify_device isn't supported
The proper return code is "-EOPNOTSUPP" when modify_device callback is not
supported.

Link: https://lore.kernel.org/r/20190923104158.5331-2-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 13:06:10 -03:00
Bart Van Assche
050dbddf24 RDMA/siw: Fix port number endianness in a debug message
sin_port and sin6_port are big endian member variables. Convert these port
numbers into CPU endianness before printing.

Link: https://lore.kernel.org/r/20190930231707.48259-5-bvanassche@acm.org
Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 12:18:18 -03:00
Bart Van Assche
23c1c13cdd RDMA/siw: Simplify several debug messages
Do not print the remote address if it is not used. Use %pISp instead of
%pI4 %d.

Link: https://lore.kernel.org/r/20190930231707.48259-4-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 12:18:17 -03:00
Bart Van Assche
b66f31efbd RDMA/iwcm: Fix a lock inversion issue
This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org
Fixes: de910bd921 ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 12:11:50 -03:00
Potnuri Bharat Teja
91724c1e5a RDMA/iw_cxgb4: fix SRQ access from dump_qp()
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used
by the application. This patch matches the QPID before dumping them.  Also
removes unwanted SRQ id addition to QP id xarray.

Fixes: 2f43129127 ("cxgb4: Convert qpidr to XArray")
Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com
Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:48:10 -03:00
Navid Emamdoost
34b3be18a0 RDMA/hfi1: Prevent memory leak in sdma_init
In sdma_init if rhashtable_init fails the allocated memory for
tmp_sdma_rht should be released.

Fixes: 5a52a7acf7 ("IB/hfi1: NULL pointer dereference when freeing rhashtable")
Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:34:55 -03:00
Michal Kalderon
390d3fdcae RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_port
If an iWARP driver is probed and removed while there are no ips set for
the device, it will lead to a reference count leak on the inet device of
the netdevice.

In addition, the netdevice was accessed after already calling netdev_put,
which could lead to using the netdev after already freed.

Fixes: 4929116bdf ("RDMA/core: Add common iWARP query port")
Link: https://lore.kernel.org/r/20190925123332.10746-1-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:31:27 -03:00
Max Gurtovoy
6eeff06db9 IB/iser: remove redundant macro definitions
Use the general linux definition for 4K and retrieve the rest from it.

Link: https://lore.kernel.org/r/1569359148-12312-1-git-send-email-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:27:13 -03:00
Max Gurtovoy
7718cf03c3 IB/iser: bound protection_sg size by data_sg size
In case we don't set the sg_prot_tablesize, the scsi layer assign the
default size (65535 entries). We should limit this size since we should
take into consideration the underlaying device capability. This cap is
considered when calculating the sg_tablesize. Otherwise, for example,
we can get that /sys/block/sdb/queue/max_segments is 128 and
/sys/block/sdb/queue/max_integrity_segments is 65535.

Link: https://lore.kernel.org/r/1569359027-10987-1-git-send-email-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:24:24 -03:00
Max Gurtovoy
70bcc63f84 IB/iser: add unlikely checks in the fast path
ib_post_send, ib_post_recv and ib_dma_map_sg  operations should succeed
unless something unusual happened to the ib device.

Link: https://lore.kernel.org/r/1569274369-29217-1-git-send-email-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 11:23:22 -03:00
Krishnamraju Eraparaju
df791c54d6 RDMA/siw: Fix serialization issue in write_space()
In siw_qp_llp_write_space(), 'sock' members should be accessed with
sk_callback_lock held, otherwise, it could race with
siw_sk_restore_upcalls(). And this could cause "NULL deref" panic.  Below
panic is due to the NULL cep returned from sk_to_cep(sk):

  Call Trace:
   <IRQ>    siw_qp_llp_write_space+0x11/0x40 [siw]
   tcp_check_space+0x4c/0xf0
   tcp_rcv_established+0x52b/0x630
   tcp_v4_do_rcv+0xf4/0x1e0
   tcp_v4_rcv+0x9b8/0xab0
   ip_protocol_deliver_rcu+0x2c/0x1c0
   ip_local_deliver_finish+0x44/0x50
   ip_local_deliver+0x6b/0xf0
   ? ip_protocol_deliver_rcu+0x1c0/0x1c0
   ip_rcv+0x52/0xd0
   ? ip_rcv_finish_core.isra.14+0x390/0x390
   __netif_receive_skb_one_core+0x83/0xa0
   netif_receive_skb_internal+0x73/0xb0
   napi_gro_frags+0x1ff/0x2b0
   t4_ethrx_handler+0x4a7/0x740 [cxgb4]
   process_responses+0x2c9/0x590 [cxgb4]
   ? t4_sge_intr_msix+0x1d/0x30 [cxgb4]
   ? handle_irq_event_percpu+0x51/0x70
   ? handle_irq_event+0x41/0x60
   ? handle_edge_irq+0x97/0x1a0
   napi_rx_handler+0x14/0xe0 [cxgb4]
   net_rx_action+0x2af/0x410
   __do_softirq+0xda/0x2a8
   do_softirq_own_stack+0x2a/0x40
   </IRQ>
   do_softirq+0x50/0x60
   __local_bh_enable_ip+0x50/0x60
   ip_finish_output2+0x18f/0x520
   ip_output+0x6e/0xf0
   ? __ip_finish_output+0x1f0/0x1f0
   __ip_queue_xmit+0x14f/0x3d0
   ? __slab_alloc+0x4b/0x58
   __tcp_transmit_skb+0x57d/0xa60
   tcp_write_xmit+0x23b/0xfd0
   __tcp_push_pending_frames+0x2e/0xf0
   tcp_sendmsg_locked+0x939/0xd50
   tcp_sendmsg+0x27/0x40
   sock_sendmsg+0x57/0x80
   siw_tx_hdt+0x894/0xb20 [siw]
   ? find_busiest_group+0x3e/0x5b0
   ? common_interrupt+0xa/0xf
   ? common_interrupt+0xa/0xf
   ? common_interrupt+0xa/0xf
   siw_qp_sq_process+0xf1/0xe60 [siw]
   ? __wake_up_common_lock+0x87/0xc0
   siw_sq_resume+0x33/0xe0 [siw]
   siw_run_sq+0xac/0x190 [siw]
   ? remove_wait_queue+0x60/0x60
   kthread+0xf8/0x130
   ? siw_sq_resume+0xe0/0xe0 [siw]
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x35/0x40

Fixes: f29dd55b02 ("rdma/siw: queue pair methods")
Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 10:55:27 -03:00
Adit Ranadive
18545e8b68 RDMA/vmw_pvrdma: Free SRQ only once
An extra kfree cleanup was missed since these are now deallocated by core.

Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com
Cc: <stable@vger.kernel.org>
Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 10:47:58 -03:00
Mark Zhang
663912a637 RDMA/counter: Prevent QP counter manual binding in auto mode
If auto mode is configured, manual counter allocation and QP bind is not
allowed.

Fixes: 1bd8e0a9d0 ("RDMA/counter: Allow manual mode configuration support")
Link: https://lore.kernel.org/r/20190916071154.20383-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 10:30:16 -03:00
Linus Torvalds
02dc96ef6c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Sanity check URB networking device parameters to avoid divide by
    zero, from Oliver Neukum.

 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
    don't work properly. Longer term this needs a better fix tho. From
    Vijay Khemka.

 3) Small fixes to selftests (use ping when ping6 is not present, etc.)
    from David Ahern.

 4) Bring back rt_uses_gateway member of struct rtable, it's semantics
    were not well understood and trying to remove it broke things. From
    David Ahern.

 5) Move usbnet snaity checking, ignore endpoints with invalid
    wMaxPacketSize. From Bjørn Mork.

 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.

 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
    Alex Vesker, and Yevgeny Kliteynik

 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.

 9) Fix crash when removing sch_cbs entry while offloading is enabled,
    from Vinicius Costa Gomes.

10) Signedness bug fixes, generally in looking at the result given by
    of_get_phy_mode() and friends. From Dan Crapenter.

11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.

12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.

13) Fix quantization code in tcp_bbr, from Kevin Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
  net: tap: clean up an indentation issue
  nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
  tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
  sk_buff: drop all skb extensions on free and skb scrubbing
  tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
  mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
  Documentation: Clarify trap's description
  mlxsw: spectrum: Clear VLAN filters during port initialization
  net: ena: clean up indentation issue
  NFC: st95hf: clean up indentation issue
  net: phy: micrel: add Asym Pause workaround for KSZ9021
  net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
  ptp: correctly disable flags on old ioctls
  lib: dimlib: fix help text typos
  net: dsa: microchip: Always set regmap stride to 1
  nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
  nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
  net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
  vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
  net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
  ...
2019-09-28 17:47:33 -07:00
Denis Efremov
7b0b692594 IB/hfi1: remove unlikely() from IS_ERR*() condition
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses
unlikely() internally.

Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 10:10:30 -07:00
Linus Torvalds
9c9fa97a8e Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few hot fixes

 - ocfs2 updates

 - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
   cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
   sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
   oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
   zsmalloc)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
  mm/zsmalloc.c: fix a -Wunused-function warning
  zswap: do not map same object twice
  zswap: use movable memory if zpool support allocate movable memory
  zpool: add malloc_support_movable to zpool_driver
  shmem: fix obsolete comment in shmem_getpage_gfp()
  mm/madvise: reduce code duplication in error handling paths
  mm: mmap: increase sockets maximum memory size pgoff for 32bits
  mm/mmap.c: refine find_vma_prev() with rb_last()
  riscv: make mmap allocation top-down by default
  mips: use generic mmap top-down layout and brk randomization
  mips: replace arch specific way to determine 32bit task with generic version
  mips: adjust brk randomization offset to fit generic version
  mips: use STACK_TOP when computing mmap base address
  mips: properly account for stack randomization and stack guard gap
  arm: use generic mmap top-down layout and brk randomization
  arm: use STACK_TOP when computing mmap base address
  arm: properly account for stack randomization and stack guard gap
  arm64, mm: make randomization selected by generic topdown mmap layout
  arm64, mm: move generic mmap layout functions to mm
  arm64: consider stack randomization for mmap base only when necessary
  ...
2019-09-24 16:10:23 -07:00
akpm@linux-foundation.org
2d15eb31b5 mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
[11~From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
v3.

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've had
  to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set.  That
  can only be applied after all call sites are converted, but it's good to
  get an early look at it.

This is part a tree-wide conversion, as described in fc1d8e7cca ("mm:
introduce put_user_page*(), placeholder versions").

This patch (of 3):

Provide more capable variation of put_user_pages_dirty_lock(), and delete
put_user_pages_dirty().  This is based on the following:

1.  Lots of call sites become simpler if a bool is passed into
   put_user_page*(), instead of making the call site choose which
   put_user_page*() variant to call.

2.  Christoph Hellwig's observation that set_page_dirty_lock() is
   usually correct, and set_page_dirty() is usually a bug, or at least
   questionable, within a put_user_page*() calling chain.

This leads to the following API choices:

    * put_user_pages_dirty_lock(page, npages, make_dirty)

    * There is no put_user_pages_dirty(). You have to
      hand code that, in the rare case that it's
      required.

[jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
  Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:08 -07:00
Linus Torvalds
299d14d4c3 pci-v5.4-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl2JNVAUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyTOA/9EZeyS7J+ZcOwihWz5vNijf0kfpKp
 /jZ9VF9nHjsL9Pw3/Fzha605Ssrtwcqge8g/sze9f0g/pxZk99lLHokE6dEOurEA
 GyKpNNMdiBol4YZMCsSoYji0MpwW0uMCuASPMiEwv2LxZ72A2Tu1RbgYLU+n4m1T
 fQldDTxsUMXc/OH/8SL8QDEh6o8qyDRhmSXFAOv8RGqN8N3iUwVwhQobKpwpmEvx
 ddzqWMS8f91qkhIKO7fgc9P4NI/7yI7kkF+wcdwtfiMO8Qkr4IdcdF7qwNVAtpKA
 A+sMRi59i2XxDTqRFx+wXXMa+rt+Pf1pucv77SO74xXWwpuXSxLVDYjULP1YQugK
 FTBo4SNmico/ts+n5cgm+CGMq2P2E29VYeqkI1Un6eDDvQnQlBgQdpdcBoadJ0rW
 y31OInjhRJC1ZK5bATKfCMbmB+VQxFsbyeUA7PBlrALyAmXZfw30iNxX9iHBhWqc
 myPNVEJJGp0cWTxGxMAU9MhelzeQxDAd+Eb44J5gv51bx0w9yqmZHECSDrOVdtYi
 HpOyI7E3Cb8m23BOHvCdB/v8igaYMZl08LUUJqu1S9mFclYyYVuOOIB04Yc2Qrx1
 3PHtT8TC47FbWuzKwo12RflzoAiNShJGw+tNKo6T1jC+r5jdbKWWtTnsoRqbSfaG
 rG5RJpB7EuQSP1Y=
 =/xB3
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it
     (Krzysztof Wilczynski)

   - Fix incorrect PCIe device types and remove dev->has_secondary_link
     to simplify code that deals with upstream/downstream ports (Mika
     Westerberg)

   - After suspend, restore Resizable BAR size bits correctly for 1MB
     BARs (Sumit Saxena)

   - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra)

  Virtualization:

   - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna
     Labs (Ali Saidi)

   - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg)

   - Remove group write permissions from sysfs sriov_numvfs,
     sriov_drivers_autoprobe (Kelsey Skunberg)

  Hotplug:

   - Simplify pciehp indicator control (Denis Efremov)

  Peer-to-peer DMA:

   - Allow P2P DMA between root ports for whitelisted bridges (Logan
     Gunthorpe)

   - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe)

   - DMA map P2P DMA requests that traverse host bridge (Logan
     Gunthorpe)

  Amazon Annapurna Labs host bridge driver:

   - Add DT binding and controller driver (Jonathan Chocron)

  Hyper-V host bridge driver:

   - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui)

   - Fix PCI domain number collisions (Haiyang Zhang)

   - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang)

   - Fix build errors on non-SYSFS config (Randy Dunlap)

  i.MX6 host bridge driver:

   - Limit DBI register length (Stefan Agner)

  Intel VMD host bridge driver:

   - Fix config addressing issues (Jon Derrick)

  Layerscape host bridge driver:

   - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao)

   - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately
     (Xiaowei Bao)

  Mediatek host bridge driver:

   - Add MT7629 controller support (Jianjun Wang)

  Mobiveil host bridge driver:

   - Fix CPU base address setup (Hou Zhiqiang)

   - Make "num-lanes" property optional (Hou Zhiqiang)

  Tegra host bridge driver:

   - Fix OF node reference leak (Nishka Dasgupta)

   - Disable MSI for root ports to work around design problem (Vidya
     Sagar)

   - Add Tegra194 DT binding and controller support (Vidya Sagar)

   - Add support for sideband pins and slot regulators (Vidya Sagar)

   - Add PIPE2UPHY support (Vidya Sagar)

  Misc:

   - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

   - Unexport pci_bus_get(), etc (Kelsey Skunberg)

   - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in
     the PCI core (Kelsey Skunberg)

   - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg)

   - Mark expected switch fall-through (Gustavo A. R. Silva)

   - Propagate errors for optional regulators and PHYs (Thierry Reding)

   - Fix kernel command line resource_alignment parameter issues (Logan
     Gunthorpe)"

* tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits)
  PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
  arm64: tegra: Add PCIe slot supply information in p2972-0000 platform
  arm64: tegra: Add configuration for PCIe C5 sideband signals
  PCI: tegra: Add support to enable slot regulators
  PCI: tegra: Add support to configure sideband pins
  PCI: vmd: Fix shadow offsets to reflect spec changes
  PCI: vmd: Fix config addressing when using bus offsets
  PCI: dwc: Add validation that PCIe core is set to correct mode
  PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
  dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
  PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
  PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
  PCI: Add ACS quirk for Amazon Annapurna Labs root ports
  PCI: Add Amazon's Annapurna Labs vendor ID
  MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer
  PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers
  dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries
  dt-bindings: PCI: tegra: Add sideband pins configuration entries
  PCI: tegra: Add Tegra194 PCIe support
  PCI: Get rid of dev->has_secondary_link flag
  ...
2019-09-23 19:16:01 -07:00
Linus Torvalds
018c6837f3 RDMA subsystem updates for 5.4
This cycle mainly saw lots of bug fixes and clean up code across the core
 code and several drivers, few new functional changes were made.
 
 - Many cleanup and bug fixes for hns
 
 - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
   bnxt_re, efa
 
 - Share the query_port code between all the iWarp drivers
 
 - General rework and cleanup of the ODP MR umem code to fit better with
   the mmu notifier get/put scheme
 
 - Support rdma netlink in non init_net name spaces
 
 - mlx5 support for XRC devx and DC ODP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
 mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
 KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
 W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
 rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
 xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
 Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
 CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
 axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
 F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
 yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
 ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
 =zHXW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull RDMA subsystem updates from Jason Gunthorpe:
 "This cycle mainly saw lots of bug fixes and clean up code across the
  core code and several drivers, few new functional changes were made.

   - Many cleanup and bug fixes for hns

   - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
     bnxt_re, efa

   - Share the query_port code between all the iWarp drivers

   - General rework and cleanup of the ODP MR umem code to fit better
     with the mmu notifier get/put scheme

   - Support rdma netlink in non init_net name spaces

   - mlx5 support for XRC devx and DC ODP"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
  RDMA: Fix double-free in srq creation error flow
  RDMA/efa: Fix incorrect error print
  IB/mlx5: Free mpi in mp_slave mode
  IB/mlx5: Use the original address for the page during free_pages
  RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
  RDMA/hns: Package operations of rq inline buffer into separate functions
  RDMA/hns: Optimize cmd init and mode selection for hip08
  IB/hfi1: Define variables as unsigned long to fix KASAN warning
  IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
  IB/hfi1: Add traces for TID RDMA READ
  RDMA/siw: Relax from kmap_atomic() use in TX path
  IB/iser: Support up to 16MB data transfer in a single command
  RDMA/siw: Fix page address mapping in TX path
  RDMA: Fix goto target to release the allocated memory
  RDMA/usnic: Avoid overly large buffers on stack
  RDMA/odp: Add missing cast for 32 bit
  RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
  Documentation/infiniband: update name of some functions
  RDMA/cma: Fix false error message
  RDMA/hns: Fix wrong assignment of qp_access_flags
  ...
2019-09-21 10:26:24 -07:00
Linus Torvalds
84da111de0 hmm related patches for 5.4
This is more cleanup and consolidation of the hmm APIs and the very
 strongly related mmu_notifier interfaces. Many places across the tree
 using these interfaces are touched in the process. Beyond that a cleanup
 to the page walker API and a few memremap related changes round out the
 series:
 
 - General improvement of hmm_range_fault() and related APIs, more
   documentation, bug fixes from testing, API simplification &
   consolidation, and unused API removal
 
 - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and
   make them internal kconfig selects
 
 - Hoist a lot of code related to mmu notifier attachment out of drivers by
   using a refcount get/put attachment idiom and remove the convoluted
   mmu_notifier_unregister_no_release() and related APIs.
 
 - General API improvement for the migrate_vma API and revision of its only
   user in nouveau
 
 - Annotate mmu_notifiers with lockdep and sleeping region debugging
 
 Two series unrelated to HMM or mmu_notifiers came along due to
 dependencies:
 
 - Allow pagemap's memremap_pages family of APIs to work without providing
   a struct device
 
 - Make walk_page_range() and related use a constant structure for function
   pointers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl1/nnkACgkQOG33FX4g
 mxqaRg//c6FqowV1pQlLutvAOAgMdpzfZ9eaaDKngy9RVQxz+k/MmJrdRH/p/mMA
 Pq93A1XfwtraGKErHegFXGEDk4XhOustVAVFwvjyXO41dTUdoFVUkti6ftbrl/rS
 6CT+X90jlvrwdRY7QBeuo7lxx7z8Qkqbk1O1kc1IOracjKfNJS+y6LTamy6weM3g
 tIMHI65PkxpRzN36DV9uCN5dMwFzJ73DWHp1b0acnDIigkl6u5zp6orAJVWRjyQX
 nmEd3/IOvdxaubAoAvboNS5CyVb4yS9xshWWMbH6AulKJv3Glca1Aa7QuSpBoN8v
 wy4c9+umzqRgzgUJUe1xwN9P49oBNhJpgBSu8MUlgBA4IOc3rDl/Tw0b5KCFVfkH
 yHkp8n6MP8VsRrzXTC6Kx0vdjIkAO8SUeylVJczAcVSyHIo6/JUJCVDeFLSTVymh
 EGWJ7zX2iRhUbssJ6/izQTTQyCH3YIyZ5QtqByWuX2U7ZrfkqS3/EnBW1Q+j+gPF
 Z2yW8iT6k0iENw6s8psE9czexuywa/Lttz94IyNlOQ8rJTiQqB9wLaAvg9hvUk7a
 kuspL+JGIZkrL3ouCeO/VA6xnaP+Q7nR8geWBRb8zKGHmtWrb5Gwmt6t+vTnCC2l
 olIDebrnnxwfBQhEJ5219W+M1pBpjiTpqK/UdBd92A4+sOOhOD0=
 =FRGg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This is more cleanup and consolidation of the hmm APIs and the very
  strongly related mmu_notifier interfaces. Many places across the tree
  using these interfaces are touched in the process. Beyond that a
  cleanup to the page walker API and a few memremap related changes
  round out the series:

   - General improvement of hmm_range_fault() and related APIs, more
     documentation, bug fixes from testing, API simplification &
     consolidation, and unused API removal

   - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
     and make them internal kconfig selects

   - Hoist a lot of code related to mmu notifier attachment out of
     drivers by using a refcount get/put attachment idiom and remove the
     convoluted mmu_notifier_unregister_no_release() and related APIs.

   - General API improvement for the migrate_vma API and revision of its
     only user in nouveau

   - Annotate mmu_notifiers with lockdep and sleeping region debugging

  Two series unrelated to HMM or mmu_notifiers came along due to
  dependencies:

   - Allow pagemap's memremap_pages family of APIs to work without
     providing a struct device

   - Make walk_page_range() and related use a constant structure for
     function pointers"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
  libnvdimm: Enable unit test infrastructure compile checks
  mm, notifier: Catch sleeping/blocking for !blockable
  kernel.h: Add non_block_start/end()
  drm/radeon: guard against calling an unpaired radeon_mn_unregister()
  csky: add missing brackets in a macro for tlb.h
  pagewalk: use lockdep_assert_held for locking validation
  pagewalk: separate function pointers from iterator data
  mm: split out a new pagewalk.h header from mm.h
  mm/mmu_notifiers: annotate with might_sleep()
  mm/mmu_notifiers: prime lockdep
  mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
  mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
  mm/hmm: hmm_range_fault() infinite loop
  mm/hmm: hmm_range_fault() NULL pointer bug
  mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
  mm/mmu_notifiers: remove unregister_no_release
  RDMA/odp: remove ib_ucontext from ib_umem
  RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  ...
2019-09-21 10:07:42 -07:00
David Ahern
77d5bc7e6a ipv4: Revert removal of rt_uses_gateway
Julian noted that rt_uses_gateway has a more subtle use than 'is gateway
set':
    https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/

Revert that part of the commit referenced in the Fixes tag.

Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole
in the second cacheline which contains the gateway declaration. So move
rt_gw_family down to the gateway declarations since they are always used
together, and then re-use that u8 for rt_uses_gateway. End result is that
rtable size is unchanged.

Fixes: 1550c17193 ("ipv4: Prepare rtable for IPv6 gateway")
Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-09-20 18:23:33 -07:00
Linus Torvalds
53e5e7a7a7 Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
 "Pathwalk-related stuff"

[ Audit-related cleanups, misc simplifications, and easier to follow
  nd->root refcounts     - Linus ]

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  devpts_pty_kill(): don't bother with d_delete()
  infiniband: don't bother with d_delete()
  hypfs: don't bother with d_delete()
  fs/namei.c: keep track of nd->root refcount status
  fs/namei.c: new helper - legitimize_root()
  kill the last users of user_{path,lpath,path_dir}()
  namei.h: get the comments on LOOKUP_... in sync with reality
  kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h
  audit_inode(): switch to passing AUDIT_INODE_...
  filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there
  filename_lookup(): audit_inode() argument is always 0
2019-09-18 13:03:01 -07:00
Linus Torvalds
81160dda9a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.

 2) Use bio_vec in the networking instead of custom skb_frag_t, from
    Matthew Wilcox.

 3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.

 4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.

 5) Support all variants of 5750X bnxt_en chips, from Michael Chan.

 6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
    Buslov.

 7) Add TCP syn cookies bpf helper, from Petar Penkov.

 8) Add 'nettest' to selftests and use it, from David Ahern.

 9) Add extack support to drop_monitor, add packet alert mode and
    support for HW drops, from Ido Schimmel.

10) Add VLAN offload to stmmac, from Jose Abreu.

11) Lots of devm_platform_ioremap_resource() conversions, from
    YueHaibing.

12) Add IONIC driver, from Shannon Nelson.

13) Several kTLS cleanups, from Jakub Kicinski.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
  mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
  mlxsw: spectrum: Register CPU port with devlink
  mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
  net: ena: fix incorrect update of intr_delay_resolution
  net: ena: fix retrieval of nonadaptive interrupt moderation intervals
  net: ena: fix update of interrupt moderation register
  net: ena: remove all old adaptive rx interrupt moderation code from ena_com
  net: ena: remove ena_restore_ethtool_params() and relevant fields
  net: ena: remove old adaptive interrupt moderation code from ena_netdev
  net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
  net: ena: enable the interrupt_moderation in driver_supported_features
  net: ena: reimplement set/get_coalesce()
  net: ena: switch to dim algorithm for rx adaptive interrupt moderation
  net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
  net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
  ethtool: implement Energy Detect Powerdown support via phy-tunable
  xen-netfront: do not assume sk_buff_head list is empty in error handling
  s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
  net: ena: don't wake up tx queue when down
  drop_monitor: Better sanitize notified packets
  ...
2019-09-18 12:34:53 -07:00
Linus Torvalds
4feaab05dc LED updates for 5.4-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXYAIeQAKCRBiEGxRG/Sl
 2/SzAQDEnoNxzV/R5kWFd+2kmFeY3cll0d99KMrWJ8om+kje6QD/cXxZHzFm+T1L
 UPF66k76oOODV7cyndjXnTnRXbeCRAM=
 =Szby
 -----END PGP SIGNATURE-----

Merge tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED updates from Jacek Anaszewski:
 "In this cycle we've finally managed to contribute the patch set
  sorting out LED naming issues. Besides that there are many changes
  scattered among various LED class drivers and triggers.

  LED naming related improvements:

   - add new 'function' and 'color' fwnode properties and deprecate
     'label' property which has been frequently abused for conveying
     vendor specific names that have been available in sysfs anyway

   - introduce a set of standard LED_FUNCTION* definitions

   - introduce a set of standard LED_COLOR_ID* definitions

   - add a new {devm_}led_classdev_register_ext() API with the
     capability of automatic LED name composition basing on the
     properties available in the passed fwnode; the function is
     backwards compatible in a sense that it uses 'label' data, if
     present in the fwnode, for creating LED name

   - add tools/leds/get_led_device_info.sh script for retrieving LED
     vendor, product and bus names, if applicable; it also performs
     basic validation of an LED name

   - update following drivers and their DT bindings to use the new LED
     registration API:

        - leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114,
          leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt

  Other LED class improvements:

   - replace {devm_}led_classdev_register() macros with inlines

   - allow to call led_classdev_unregister() unconditionally

   - switch to use fwnode instead of be stuck with OF one

  LED triggers improvements:

   - led-triggers:
        - fix dereferencing of null pointer
        - fix a memory leak bug

   - ledtrig-gpio:
        - GPIO 0 is valid

  Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer,
  more complete driver exists, based on a generic driver for the AMD
  SOCs gpio-controller, supporting LEDs as well other devices:

   - drop profile field from priv data

   - drop iosize field from priv data

   - drop enum_apu_led_platform_types

   - drop superseeded apu2/3 led support

   - add pr_fmt prefix for better log output

   - fix error message on probing failure

  Other misc fixes and improvements to existing LED class drivers:

   - leds-ns2, leds-max77650:
        - add of_node_put() before return

   - leds-pwm, leds-is31fl32xx:
        - use struct_size() helper

   - leds-lm3697, leds-lm36274, leds-lm3532:
        - switch to use fwnode_property_count_uXX()

   - leds-lm3532:
        - fix brightness control for i2c mode
        - change the define for the fs current register
        - fixes for the driver for stability
        - add full scale current configuration
        - dt: Add property for full scale current.
        - avoid potentially unpaired regulator calls
        - move static keyword to the front of declarations
        - fix optional led-max-microamp prop error handling

   - leds-max77650:
        - add of_node_put() before return
        - add MODULE_ALIAS()
        - Switch to fwnode property API

   - leds-as3645a:
        - fix misuse of strlcpy

   - leds-netxbig:
        - add of_node_put() in netxbig_leds_get_of_pdata()
        - remove legacy board-file support

   - leds-is31fl319x:
        - simplify getting the adapter of a client

   - leds-ti-lmu-common:
        - fix coccinelle issue
        - move static keyword to the front of declaration

   - leds-syscon:
        - use resource managed variant of device register

   - leds-ktd2692:
        - fix a typo in the name of a constant

   - leds-lp5562:
        - allow firmware files up to the maximum length

   - leds-an30259a:
        - fix typo

   - leds-pca953x:
        - include the right header"

* tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits)
  leds: lm3532: Fix optional led-max-microamp prop error handling
  led: triggers: Fix dereferencing of null pointer
  leds: ti-lmu-common: Move static keyword to the front of declaration
  leds: lm3532: Move static keyword to the front of declarations
  leds: trigger: gpio: GPIO 0 is valid
  leds: pwm: Use struct_size() helper
  leds: is31fl32xx: Use struct_size() helper
  leds: ti-lmu-common: Fix coccinelle issue in TI LMU
  leds: lm3532: Avoid potentially unpaired regulator calls
  leds: syscon: Use resource managed variant of device register
  leds: Replace {devm_}led_classdev_register() macros with inlines
  leds: Allow to call led_classdev_unregister() unconditionally
  leds: lm3532: Add full scale current configuration
  dt: lm3532: Add property for full scale current.
  leds: lm3532: Fixes for the driver for stability
  leds: lm3532: Change the define for the fs current register
  leds: lm3532: Fix brightness control for i2c mode
  leds: Switch to use fwnode instead of be stuck with OF one
  leds: max77650: Switch to fwnode property API
  led: triggers: Fix a memory leak bug
  ...
2019-09-17 18:40:42 -07:00
Jack Morgenstein
3eca7fc2d8 RDMA: Fix double-free in srq creation error flow
The cited commit introduced a double-free of the srq buffer in the error
flow of procedure __uverbs_create_xsrq().

The problem is that ib_destroy_srq_user() called in the error flow also
frees the srq buffer.

Thus, if uverbs_response() fails in __uverbs_create_srq(), the srq buffer
will be freed twice.

Cc: <stable@vger.kernel.org>
Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20190916071154.20383-5-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 14:37:38 -03:00
Gal Pressman
a3f4b8e318 RDMA/efa: Fix incorrect error print
The error print should indicate that it failed to get the queue
attributes, not network attributes.

Link: https://lore.kernel.org/r/20190910134301.4194-2-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 14:25:43 -03:00
Danit Goldberg
5d44adebbb IB/mlx5: Free mpi in mp_slave mode
ib_add_slave_port() allocates a multiport struct but never frees it.
Don't leak memory, free the allocated mpi struct during driver unload.

Cc: <stable@vger.kernel.org>
Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 14:08:18 -03:00
Danit Goldberg
130c2c576e IB/mlx5: Use the original address for the page during free_pages
The removal of 'buffer' in the patch below caused free_page() to use a
value that had been offset since the wqe pointer is adjusted while the
routine runs.

The current implementation of free_pages() rounds down to a pfn,
discarding the adjustment, but this is not the right way to use the
API. Preserve the initial value and use it for free_page().

Fixes: 0f51427bd0 ("RDMA/mlx5: Cleanup WQE page fault handler")
Link: https://lore.kernel.org/r/20190916064818.19823-2-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 13:39:56 -03:00
Colin Ian King
d97a3e92f3 RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
There is a spelling mistake in a literal string, fix it.

Fixes: 89f81008ba ("RDMA/bnxt_re: expose detailed stats retrieved from HW")
Link: https://lore.kernel.org/r/20190911092856.11146-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 10:58:57 -03:00
Lijun Ou
395b59a116 RDMA/hns: Package operations of rq inline buffer into separate functions
Here packages the codes of allocating and freeing rq inline buffer in
hns_roce_create_qp_common function in order to reduce the complexity.

Link: https://lore.kernel.org/r/1567068102-56919-3-git-send-email-liweihang@hisilicon.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 10:52:20 -03:00
Yixian Liu
3d50503b3b RDMA/hns: Optimize cmd init and mode selection for hip08
There are two modes for mailbox command (cmd) queue, i.e., event mode and
poll mode. For each mode, we use corresponding semaphores to protect the
cmd queue resource competition, so called event_sem and poll_sem. During
cmd init, both semaphores are initialized and poll mode is selected.
Thus, there is no need to up poll_sema again in cmd_use_polling.

Furthermore, there is no need to down the sema of the other side while
switching mode. This patch aims to decouple the switch between event mode
and poll mode of cmd.

Link: https://lore.kernel.org/r/1567068102-56919-2-git-send-email-liweihang@hisilicon.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 10:52:20 -03:00
Ira Weiny
f8659d68e2 IB/hfi1: Define variables as unsigned long to fix KASAN warning
Define the working variables to be unsigned long to be compatible with
for_each_set_bit and change types as needed.

While we are at it remove unused variables from a couple of functions.

This was found because of the following KASAN warning:
 ==================================================================
   BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
   Read of size 8 at addr ffff888362d778d0 by task kworker/u308:2/1889

   CPU: 21 PID: 1889 Comm: kworker/u308:2 Tainted: G W         5.3.0-rc2-mm1+ #2
   Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014
   Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
   Call Trace:
    dump_stack+0x9a/0xf0
    ? find_first_bit+0x19/0x70
    print_address_description+0x6c/0x332
    ? find_first_bit+0x19/0x70
    ? find_first_bit+0x19/0x70
    __kasan_report.cold.6+0x1a/0x3b
    ? find_first_bit+0x19/0x70
    kasan_report+0xe/0x12
    find_first_bit+0x19/0x70
    pma_get_opa_portstatus+0x5cc/0xa80 [hfi1]
    ? ret_from_fork+0x3a/0x50
    ? pma_get_opa_port_ectrs+0x200/0x200 [hfi1]
    ? stack_trace_consume_entry+0x80/0x80
    hfi1_process_mad+0x39b/0x26c0 [hfi1]
    ? __lock_acquire+0x65e/0x21b0
    ? clear_linkup_counters+0xb0/0xb0 [hfi1]
    ? check_chain_key+0x1d7/0x2e0
    ? lock_downgrade+0x3a0/0x3a0
    ? match_held_lock+0x2e/0x250
    ib_mad_recv_done+0x698/0x15e0 [ib_core]
    ? clear_linkup_counters+0xb0/0xb0 [hfi1]
    ? ib_mad_send_done+0xc80/0xc80 [ib_core]
    ? mark_held_locks+0x79/0xa0
    ? _raw_spin_unlock_irqrestore+0x44/0x60
    ? rvt_poll_cq+0x1e1/0x340 [rdmavt]
    __ib_process_cq+0x97/0x100 [ib_core]
    ib_cq_poll_work+0x31/0xb0 [ib_core]
    process_one_work+0x4ee/0xa00
    ? pwq_dec_nr_in_flight+0x110/0x110
    ? do_raw_spin_lock+0x113/0x1d0
    worker_thread+0x57/0x5a0
    ? process_one_work+0xa00/0xa00
    kthread+0x1bb/0x1e0
    ? kthread_create_on_node+0xc0/0xc0
    ret_from_fork+0x3a/0x50

   The buggy address belongs to the page:
   page:ffffea000d8b5dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
   flags: 0x17ffffc0000000()
   raw: 0017ffffc0000000 0000000000000000 ffffea000d8b5dc8 0000000000000000
   raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
   page dumped because: kasan: bad access detected

   addr ffff888362d778d0 is located in stack of task kworker/u308:2/1889 at offset 32 in frame:
    pma_get_opa_portstatus+0x0/0xa80 [hfi1]

   this frame has 1 object:
    [32, 36) 'vl_select_mask'

   Memory state around the buggy address:
    ffff888362d77780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff888362d77800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   >ffff888362d77880: 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00
                                                    ^
    ffff888362d77900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff888362d77980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2

 ==================================================================

Cc: <stable@vger.kernel.org>
Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20190911113053.126040.47327.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:55 -03:00
Kaike Wan
7199435414 IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
This patch adds a counter for credit waits to assist field debugging.

Link: https://lore.kernel.org/r/20190911113047.126040.10857.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:55 -03:00
Kaike Wan
c05fc15634 IB/hfi1: Add traces for TID RDMA READ
This patch adds traces to debug packet loss and retry for TID RDMA READ
protocol.

Link: https://lore.kernel.org/r/20190911113041.126040.64541.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:55 -03:00
Bernard Metzler
4db8fd4973 RDMA/siw: Relax from kmap_atomic() use in TX path
Since the transmit path is never executed in an atomic context, we do not
need kmap_atomic() and can always use less demanding kmap().

Link: https://lore.kernel.org/r/20190909132945.30462-1-bmt@zurich.ibm.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:55 -03:00
Jason Gunthorpe
75c66515e4 Merge tag 'v5.3-rc8' into rdma.git for-next
To resolve dependencies in following patches

mlx5_ib.h conflict resolved by keeing both hunks

Linux 5.3-rc8

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:51 -03:00
Sergey Gorenko
1ba7c8f800 IB/iser: Support up to 16MB data transfer in a single command
Maximum supported IO size is 8MB for the iSER driver. The current value is
limited by the ISCSI_ISER_MAX_SG_TABLESIZE macro. But the driver is able
to handle 16MB IOs without any significant changes. Increasing this limit
can be useful for the storage arrays which are fine tuned for IOs larger
than 8 MB.

This commit allows to configure maximum IO size up to 16MB using the
max_sectors module parameter.

Link: https://lore.kernel.org/r/20190912103534.18210-1-sergeygo@mellanox.com
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
Bernard Metzler
0404bd629f RDMA/siw: Fix page address mapping in TX path
Use the correct kmap()/kunmap() flow to determine page address used for
CRC computation. Using page_address() is wrong, since page might be in
highmem.

Fixes: b9be6f18cf ("rdma/siw: transmit path")
Link: https://lore.kernel.org/r/20190909132427.30264-1-bmt@zurich.ibm.com
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
Navid Emamdoost
4a9d46a9fe RDMA: Fix goto target to release the allocated memory
In bnxt_re_create_srq(), when ib_copy_to_udata() fails allocated memory
should be released by goto fail.

Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Link: https://lore.kernel.org/r/20190910222120.16517-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
Arnd Bergmann
2ac5a6d3a9 RDMA/usnic: Avoid overly large buffers on stack
It's never a good idea to put a 1000-byte buffer on the kernel stack. The
compiler warns about this instance when usnic_ib_log_vf() gets inlined
into usnic_ib_pci_probe():

drivers/infiniband/hw/usnic/usnic_ib_main.c:543:12: error: stack frame size of 1044 bytes in function 'usnic_ib_pci_probe' [-Werror,-Wframe-larger-than=]

As this is only called for debugging purposes in the setup path, it's
trivial to convert to a dynamic allocation.

Link: https://lore.kernel.org/r/20190906155730.2750200-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
Jason Gunthorpe
b97b218b30 RDMA/odp: Add missing cast for 32 bit
length is a size_t which is unsigned int on 32 bit:

../drivers/infiniband/core/umem_odp.c: In function 'ib_init_umem_odp':
../include/linux/overflow.h:59:15: warning: comparison of distinct pointer types lacks a cast
   59 |  (void) (&__a == &__b);   \
      |               ^~
../drivers/infiniband/core/umem_odp.c:220:7: note: in expansion of macro 'check_add_overflow'

Fixes: 204e3e5630 ("RDMA/odp: Check for overflow when computing the umem_odp end")
Link: https://lore.kernel.org/r/20190908080726.30017-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
YueHaibing
3b961b4f83 RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.  This is
detected by coccinelle.

Link: https://lore.kernel.org/r/20190906141727.26552-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
Håkon Bugge
a6e4d254c1 RDMA/cma: Fix false error message
In addr_handler(), assuming status == 0 and the device already has been
acquired (id_priv->cma_dev != NULL), we get the following incorrect
"error" message:

RDMA CM: ADDR_ERROR: failed to resolve IP. status 0

Fixes: 498683c6a7 ("IB/cma: Add debug messages to error flows")
Link: https://lore.kernel.org/r/20190902092731.1055757-1-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:55:55 -03:00
David S. Miller
94810bd365 mlx5-updates-2019-09-01 (Software steering support)
Abstract:
 --------
 Mellanox ConnetX devices supports packet matching, packet modification and
 redirection. These functionalities are also referred to as flow-steering.
 To configure a steering rule, the rule is written to the device owned
 memory, this memory is accessed and cached by the device when processing
 a packet.
 Steering rules are constructed from multiple steering entries (STE).
 
 Rules are configured using the Firmware command interface. The Firmware
 processes the given driver command and translates them to STEs, then
 writes them to the device memory in the current steering tables.
 This process is slow due to the architecture of the command interface and
 the processing complexity of each rule.
 
 The highlight of this patchset is to cut the middle man (The firmware) and
 do steering rules programming into device directly from the driver, with
 no firmware intervention whatsoever.
 
 Motivation:
 -----------
 Software (driver managed) steering allows for high rule insertion rates
 compared to the FW steering described above, this is achieved by using
 internal RDMA writes to the device owned memory instead of the slow
 command interface to program steering rules.
 
 Software (driver managed) steering, doesn't depend on new FW
 for new steering functionality, new implementations can be done in the
 driver skipping the FW layer.
 
 Performance:
 ------------
 The insertion rate on a single core using the new approach allows
 programming ~300K rules per sec. (Done via direct raw test to the new mlx5
 sw steering layer, without any kernel layer involved).
 
 Test: TC L2 rules
 33K/s with Software steering (this patchset).
 5K/s  with FW and current driver.
 This will improve OVS based solution performance.
 
 Architecture and implementation details:
 ----------------------------------------
 Software steering will be dynamically selected via devlink device
 parameter. Example:
 $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
           pci/0000:06:00.0:
           name flow_steering_mode type driver-specific
           values:
              cmode runtime value smfs
 
 mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
 and contained in mlx5/core/steering directory and controlled by
 MLX5_SW_STEERING kconfig flag.
 
 mlx5 core steering layer (fs_core) already provides a shim layer for
 implementing different steering mechanisms, software steering will
 leverage that as seen at the end of this series.
 
 When Software Steering for a specific steering domain
 (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
 targeting this domain to be created using  SW steering instead of FW.
 
 The implementation includes:
 Domain - The steering domain is the object that all other object resides
     in. It holds the memory allocator, send engine, locks and other shared
     data needed by lower objects such as table, matcher, rule, action.
     Each domain can contain multiple tables. Domain is equivalent to
     namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
     currently in mlx5_core fs_core (flow steering core).
 
 Table - Table objects are used for holding multiple matchers, each table
     has a level used to prevent processing loops. Packets are being
     directed to this table once it is set as the root table, this is done
     by fs_core using a FW command. A packet is being processed inside the
     table matcher by matcher until a successful hit, otherwise the packet
     will perform the default action.
 
 Matcher - Matchers objects are used to specify the fields mask for
     matching when processing a packet. A matcher belongs to a table, each
     matcher can hold multiple rules, each rule with different matching
     values corresponding to the matcher mask. Each matcher has a priority
     used for rule processing order inside the table.
 
 Action - Action objects are created to specify different steering actions
     such as count, reformat (encapsulate, decapsulate, ...), modify
     header, forward to table and many other actions. When creating a rule
     a sequence of actions can be provided to be executed on a successful
     match.
 
 Rule - Rule objects are used to specify a specific match on packets as
     well as the actions that should be executed. A rule belongs to a
     matcher.
 
 STE - This layer is used to hold the specific STE format for the device
     and to convert the requested rule to STEs. Each rule is constructed of
     an STE chain, Multiple rules construct a steering graph. Each node in
     the graph is a hash table containing multiple STEs. The index of each
     STE in the hash table is being calculated using a CRC32 hash function.
 
 Memory pool - Used for managing and caching device owned memory for rule
     insertion. The memory is being allocated using DM (device memory) API.
 
 Communication with device - layer for standard RDMA operation using  RC QP
     to configure the device steering.
 
 Command utility - This module holds all of the FW commands that are
     required for SW steering to function.
 
 Patch planning and files:
 -------------------------
 1) First patch, adds the support to Add flow steering actions to fs_cmd
 shim layer.
 
 2) Next 12 patch will add a file per each Software steering
 functionality/module as described above. (See patches with title: DR, *)
 
 3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
 build with the new files
 
 4) Next two patches will add the support for software steering in mlx5
 steering shim layer
 net/mlx5: Add API to set the namespace steering mode
 net/mlx5: Add direct rule fs_cmd implementation
 
 5) Last two patches will add the new devlink parameter to select mlx5
 steering mode, will be valid only for switchdev mode for now.
 Two modes are supported:
     1. DMFS - Device managed flow steering
     2. SMFS - Software/Driver managed flow steering.
 
     In the DMFS mode, the HW steering entities are created through the
     FW. In the SMFS mode this entities are created though the driver
     directly.
 
     The driver will use the devlink steering mode only if the steering
     domain supports it, for now SMFS will manages only the switchdev
     eswitch steering domain.
 
     User command examples:
     - Set SMFS flow steering mode::
 
         $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
 
     - Read device flow steering mode::
 
         $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
           pci/0000:06:00.0:
           name flow_steering_mode type driver-specific
           values:
              cmode runtime value smfs
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1uxPAACgkQSD+KveBX
 +j5AkggAymoYqG2G+s8cLa4vQFySaD1Td3VzzWglp7PlpDBE3UcSoMAMg/gIDU1D
 8F04PeCsJ6snt1ICk56vPNyAEHWfWeBUd56+QK5lEJBuwozyFvBh6HP81Bnr6T/n
 n6uTx45ljAFQPTHJjEOLBPSzEXecLu07+mvpzSoW0F3ehfGbELhL1IkVobr/RELx
 z4xZW9uM2vm5ylheWvjf4V1S/SvokgJazW9+4fh//rl8tfXgun5IfPoS0hqKie1/
 h5sjcMSYkYR4gLVqrhKmBYHmHVl/h0TYROckW8iC/+XX7ailSo9uPG7lPa6cm+GE
 7Bajlbz4oD/K5RWoByo+q+dmyjeVhQ==
 =M9bS
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-09-01  (Software steering support)

Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).

Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.

The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.

Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.

Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.

Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).

Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s  with FW and current driver.
This will improve OVS based solution performance.

Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
          pci/0000:06:00.0:
          name flow_steering_mode type driver-specific
          values:
             cmode runtime value smfs

mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.

mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.

When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using  SW steering instead of FW.

The implementation includes:
Domain - The steering domain is the object that all other object resides
    in. It holds the memory allocator, send engine, locks and other shared
    data needed by lower objects such as table, matcher, rule, action.
    Each domain can contain multiple tables. Domain is equivalent to
    namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
    currently in mlx5_core fs_core (flow steering core).

Table - Table objects are used for holding multiple matchers, each table
    has a level used to prevent processing loops. Packets are being
    directed to this table once it is set as the root table, this is done
    by fs_core using a FW command. A packet is being processed inside the
    table matcher by matcher until a successful hit, otherwise the packet
    will perform the default action.

Matcher - Matchers objects are used to specify the fields mask for
    matching when processing a packet. A matcher belongs to a table, each
    matcher can hold multiple rules, each rule with different matching
    values corresponding to the matcher mask. Each matcher has a priority
    used for rule processing order inside the table.

Action - Action objects are created to specify different steering actions
    such as count, reformat (encapsulate, decapsulate, ...), modify
    header, forward to table and many other actions. When creating a rule
    a sequence of actions can be provided to be executed on a successful
    match.

Rule - Rule objects are used to specify a specific match on packets as
    well as the actions that should be executed. A rule belongs to a
    matcher.

STE - This layer is used to hold the specific STE format for the device
    and to convert the requested rule to STEs. Each rule is constructed of
    an STE chain, Multiple rules construct a steering graph. Each node in
    the graph is a hash table containing multiple STEs. The index of each
    STE in the hash table is being calculated using a CRC32 hash function.

Memory pool - Used for managing and caching device owned memory for rule
    insertion. The memory is being allocated using DM (device memory) API.

Communication with device - layer for standard RDMA operation using  RC QP
    to configure the device steering.

Command utility - This module holds all of the FW commands that are
    required for SW steering to function.

Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.

2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)

3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files

4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation

5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
    1. DMFS - Device managed flow steering
    2. SMFS - Software/Driver managed flow steering.

    In the DMFS mode, the HW steering entities are created through the
    FW. In the SMFS mode this entities are created though the driver
    directly.

    The driver will use the devlink steering mode only if the steering
    domain supports it, for now SMFS will manages only the switchdev
    eswitch steering domain.

    User command examples:
    - Set SMFS flow steering mode::

        $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime

    - Read device flow steering mode::

        $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
          pci/0000:06:00.0:
          name flow_steering_mode type driver-specific
          values:
             cmode runtime value smfs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-03 21:46:13 -07:00
Maor Gottlieb
2b688ea5ef net/mlx5: Add flow steering actions to fs_cmd shim layer
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-03 12:54:19 -07:00
Al Viro
6effcab4da infiniband: don't bother with d_delete()
Dentries are never retained there; d_delete() + dput() is no
different from d_drop() + dput().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-03 09:30:55 -04:00
David S. Miller
765b7590c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
r8152 conflicts are the NAPI fixes in 'net' overlapping with
some tasklet stuff in net-next

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-02 11:20:17 -07:00
Saeed Mahameed
a06ebb8d95 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next patches needed for upcoming mlx5 software steering.

1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-02 00:16:05 -07:00
Ariel Levkovich
c9b9dcb430 net/mlx5: Move device memory management to mlx5_core
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.

This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.

In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-01 23:44:41 -07:00
Saeed Mahameed
537f321097 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport

Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
2019-08-28 11:48:56 -07:00
Lijun Ou
98c09b8c97 RDMA/hns: Fix wrong assignment of qp_access_flags
We used wrong shifts when set qp_attr->qp_access_flag,
this patch exchange V2_QP_RRE_S and V2_QP_RWE_S to fix it.

Fixes: 2a3d923f87 ("RDMA/hns: Replace magic numbers with #defines")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-10-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Wenpeng Liang
afca2a2b83 RDMA/hns: Delete the not-used lines
Delete the assignment of srq->ibsrq.ext.xrc.srq_num, beacause this
value is not used.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-9-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Wenpeng Liang
18df508c79 RDMA/hns: Remove if-else judgment statements for creating srq
Because if the value of srqwqe_buf_pg_sz is zero, npages and
ib_umem_page_count are equivalent, page_shif and PAGE_SHIFT
are equivalent in hns_roce_create_srq. Here remove it.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-8-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Lang Cheng
e075da5e7c RDMA/hns: Add reset process for function-clear
If the hardware is resetting, the driver should not perform
the mailbox operation.Function-clear needs to add relevant judgment.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-7-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Lang Cheng
bfe860351e RDMA/hns: Fix cast from or to restricted __le32 for driver
Sparse is whining about the u32 and __le32 mixed usage in the driver.
The roce_set_field() is used to __le32 data of hardware only.
If a variable is not delivered to the hardware, the __le32 type and
related operations are not required.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-6-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Lijun Ou
90c559b186 RDMA/hns: Remove the some magic number
Here uses the meaningful macro instead of the magic number
for readability.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Lang Chen <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-5-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Lang Cheng
82e620d9c3 RDMA/hns: Modify the data structure of hns_roce_av
we change type of some members to u32/u8 from __le32 as well as
split sl_tclass_flowlabel into three variables in hns_roce_av.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
Link: https://lore.kernel.org/r/1566393276-42555-4-git-send-email-oulijun@huawei.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 11:57:26 -04:00
Bernard Metzler
531a64e4c3 RDMA/siw: Fix IPv6 addr_list locking
Walking the address list of an inet6_dev requires
appropriate locking. Since the called function
siw_listen_address() may sleep, we have to use
rtnl_lock() instead of read_lock_bh().

Also introduces sanity checks if we got a device
from in_dev_get() or in6_dev_get().

Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 6c52fdc244 ("rdma/siw: connection management")
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190828130355.22830-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-28 10:29:19 -04:00
Jason Gunthorpe
a0d8994b30 Merge branch 'mlx5-odp-dc' into rdma.git for-next
Michael Guralnik says:

====================
The series adds support for on-demand paging for DC transport.

As DC is a mlx-only transport, the capabilities are exposed to the user
using DEVX objects and later on through mlx5dv_query_device.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-odp-dc':
  IB/mlx5: Add page fault handler for DC initiator WQE
  IB/mlx5: Remove check of FW capabilities in ODP page fault handling
  net/mlx5: Set ODP capabilities for DC transport to max
2019-08-28 11:25:37 -03:00
Michael Guralnik
75e46fc02c IB/mlx5: Add page fault handler for DC initiator WQE
Parsing DC initiator WQEs upon page fault requires skipping an address
vector segment, as in UD WQEs.

Link: https://lore.kernel.org/r/20190819120815.21225-4-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-28 11:18:40 -03:00
Michael Guralnik
29af94987b IB/mlx5: Remove check of FW capabilities in ODP page fault handling
As page fault handling is initiated by FW, there is no need to check that
the ODP supports the operation and transport.

Link: https://lore.kernel.org/r/20190819120815.21225-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-28 11:18:39 -03:00
David S. Miller
68aaf44595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in r8169, bug fix had two versions in net
and net-next, take the net-next hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 14:23:31 -07:00
Markus Elfring
fd1a52f38c RDMA/iwpm: Delete unnecessary checks before the macro call "dev_kfree_skb"
The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.

This issue was detected by using the Coccinelle software.

Link: https://lore.kernel.org/r/16df4c50-1f61-d7c4-3fc8-3073666d281d@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-27 13:09:23 -03:00
Gal Pressman
1bc5ba836e RDMA/efa: Use existing FIELD_SIZEOF macro
Use FIELD_SIZEOF macro instead of hard coding it in field_avail macro.

Link: https://lore.kernel.org/r/20190826115350.21718-3-galpress@amazon.com
Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-27 13:01:15 -03:00
Gal Pressman
958b6813f0 RDMA/efa: Remove umem check on dereg MR flow
EFA driver is not a kverbs provider, the check for MR umem is redundant.

Link: https://lore.kernel.org/r/20190826115350.21718-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-27 13:01:14 -03:00
Mark Zhang
d8abe88450 RDMA/mlx5: RDMA_RX flow type support for user applications
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic.
This patch adds RDMA_RX as a new flow type to allow the user to insert
steering rules to control RDMA traffic.
Two destinations are supported(but not set at the same time): devx
flow table object and QP.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-26 10:30:00 -04:00
Bernard Metzler
c536277e0d RDMA/siw: Fix 64/32bit pointer inconsistency
Fixes improper casting between addresses and unsigned types.
Changes siw_pbl_get_buffer() function to return appropriate
dma_addr_t, and not u64.

Also fixes debug prints. Now any potentially kernel private
pointers are printed formatted as '%pK', to allow keeping that
information secret.

Fixes: d941bfe500be ("RDMA/siw: Change CQ flags from 64->32 bits")
Fixes: b0fff7317b ("rdma/siw: completion queue methods")
Fixes: 8b6a361b8c ("rdma/siw: receive path")
Fixes: b9be6f18cf ("rdma/siw: transmit path")
Fixes: f29dd55b02 ("rdma/siw: queue pair methods")
Fixes: 2251334dca ("rdma/siw: application buffer management")
Fixes: 303ae1cdfd ("rdma/siw: application interface")
Fixes: 6c52fdc244 ("rdma/siw: connection management")
Fixes: a531975279 ("rdma/siw: main include file")

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822173738.26817-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-23 12:08:27 -04:00
Bernard Metzler
fab4f97e1f RDMA/siw: Fix SGL mapping issues
All user level and most in-kernel applications submit WQEs
where the SG list entries are all of a single type.
iSER in particular, however, will send us WQEs with mixed SG
types: sge[0] = kernel buffer, sge[1] = PBL region.
Check and set is_kva on each SG entry individually instead of
assuming the first SGE type carries through to the last.
This fixes iSER over siw.

Fixes: b9be6f18cf ("rdma/siw: transmit path")
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-22 11:21:06 -04:00
Selvin Xavier
d37b1e5340 RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
Driver copies FW commands to the HW queue as  units of 16 bytes. Some
of the command structures are not exact multiple of 16. So while copying
the data from those structures, the stack out of bounds messages are
reported by KASAN. The following error is reported.

[ 1337.530155] ==================================================================
[ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785

[ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G           OE     5.2.0-rc6+ #75
[ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
[ 1337.530542] Call Trace:
[ 1337.530548]  dump_stack+0x5b/0x90
[ 1337.530556]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530560]  print_address_description+0x65/0x22e
[ 1337.530568]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530575]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530577]  __kasan_report.cold.3+0x37/0x77
[ 1337.530581]  ? _raw_write_trylock+0x10/0xe0
[ 1337.530588]  ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530590]  kasan_report+0xe/0x20
[ 1337.530592]  memcpy+0x1f/0x50
[ 1337.530600]  bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re]
[ 1337.530608]  ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re]
[ 1337.530611]  ? xas_create+0x3aa/0x5f0
[ 1337.530613]  ? xas_start+0x77/0x110
[ 1337.530615]  ? xas_clear_mark+0x34/0xd0
[ 1337.530623]  bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re]
[ 1337.530631]  ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re]
[ 1337.530633]  ? bit_wait_io_timeout+0xc0/0xc0
[ 1337.530641]  bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re]
[ 1337.530648]  bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re]
[ 1337.530655]  bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re]
[ 1337.530677]  ib_dealloc_pd_user+0xbe/0xe0 [ib_core]
[ 1337.530683]  srpt_remove_one+0x5de/0x690 [ib_srpt]
[ 1337.530689]  ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt]
[ 1337.530692]  ? xa_load+0x87/0xe0
...
[ 1337.530840]  do_syscall_64+0x6d/0x1f0
[ 1337.530843]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1337.530845] RIP: 0033:0x7ff5b389035b
[ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48
[ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b
[ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8
[ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000
[ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50
[ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750

[ 1337.530885] The buggy address belongs to the page:
[ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 1337.530964] flags: 0x57ffffc0000000()
[ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000
[ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1337.530970] page dumped because: kasan: bad access detected

[ 1337.530996] Memory state around the buggy address:
[ 1337.531072]  ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2
[ 1337.531180]  ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
[ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00
[ 1337.531393]                                                  ^
[ 1337.531478]  ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531585]  ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1337.531691] ==================================================================

Fix this by passing the exact size of each FW command to
bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending
the command to HW, modify the req->cmd_size to number of 16 byte units.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-22 10:52:15 -04:00