Any mkey that is not enabled and assigned to userspace should have the PD
set to a kernel owned PD.
When cache entries are created for the first time the PDN is set to 0,
which is probably a kernel PD, but be explicit.
When a MR is registered using the hybrid reg_create with UMR xlt & enable
the disabled mkey is pointing at the user PD, keep it pointing at the
kernel until a UMR enables it and sets the user PD.
Fixes: 9ec4483a3f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code
now.
Fixes: aad719dcf3 ("RDMA/mlx5: Allow MRs to be created in the cache synchronously")
Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
hip06 does not support IB_WR_LOCAL_INV, so the ps_opcode should be set to
an invalid value instead of being left uninitialized.
Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Fixes: a2f3d4479f ("RDMA/hns: Avoid unncessary initialization")
Link: https://lore.kernel.org/r/1600350615-115217-1-git-send-email-oulijun@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When xa_insert() fails, the acquired resource in qedr_create_qp should
also be freed. However, current implementation does not handle the error.
Fix this by adding a new goto label that calls qedr_free_qp_resources.
Fixes: 1212767e23 ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.")
Link: https://lore.kernel.org/r/20200911125159.4577-1-keitasuzuki.park@sslab.ics.keio.ac.jp
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Receive side delayed ack mode is needed only for certain area networks/
connections. Therefore disable it by default.
Link: https://lore.kernel.org/r/20200909153707.2795-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky says:
====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_sw_owner_v2:
RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
RDMA/mlx5: Add sw_owner_v2 bit capability
Leon Romanovsky says:
====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.
* branch 'mlx5_active_speed':
RDMA: Fix link active_speed size
RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
net/mlx5: Refactor query port speed functions
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.
Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Expose the ICM address to access TIR and QP, this will allow sw_owned_v2
devices to steer traffic to TIRs and QPs same as done with sw_owner
capability.
Link: https://lore.kernel.org/r/20200903073857.1129166-4-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
sw_owner_v2 will replace sw_owner for future devices, this means that if
sw_owner_v2 is set sw_owner should be ignored and DM allocation is
required for sw_owner_v2 devices to function.
Link: https://lore.kernel.org/r/20200903073857.1129166-3-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.
Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Combine two same enums to avoid duplication.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.
And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.
According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.
Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
hw_context stores a pci_device pointer. Use the correct type instead of
void * and avoid typecasts.
Link: https://lore.kernel.org/r/20200914123528.270382-1-parav@mellanox.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Implement the XRC specific verbs. The additional QP type introduced new
logic to the rest of the verbs that now require distinguishing whether a
QP has an "RQ" or an "SQ" or both.
Link: https://lore.kernel.org/r/20200722102339.30104-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
A number of driver bug fixes and a few recent regressions:
- Several bug fixes for bnxt_re. Crashing, incorrect data reported,
and corruption on new HW
- Memory leak and crash in rxe
- Fix sysfs corruption in rxe if the netdev name is too long
- Fix a crash on error unwind in the new cq_pool code
- Fix kobject panics in rtrs by working device lifetime properly
- Fix a data corruption bug in iser target related to misaligned buffers
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl9atAYACgkQOG33FX4g
mxqSdw//Qi29dnxzVGpsaO4/krd/VmI6NT6eNpgK7Nqx80DaCYer0JhtwZOUxHqK
KbHIV9XB/f6BSI67c9ydYj4PNX6FpFnoUWQLvqZwip5VM7R6ifIVjm0ap1jCAUSS
axDLFZOySIOYNhcZ5I+MtY/kxykKBjMteuMXdpBe4FwZ+XSmsC5KkfRH/+FUhjVG
peL6aRVDv9TByH8w+iZE1wSmVrOphOE1C/jN5TyotQTmKe7IHoXJtkalosYHXFWw
KZaaz52e4IYKVFl4HIcl6+FfPExhxsyfDtRHluvn+vzY/wFy1RZw6F0BZt7mioy5
J8R6w82xEe/SNugTGuvIzqXOymmy9H4CrG9pHy4NRMMzC28LGI7qHJgVhr/jZy8+
GPxR26cywDhPsd4XA2K3mvs7DVSoBUPYlIUnHdYjfBZl/ghColg9+XGyNv6pdrke
Q7Kog5blcpOAahBX+ElBLvIZXk5oEk5W+3H/M0OeuVMQ/DrMtALrCnwpp4wDKVvO
9QuYfGgQ+25xbV9kwzckLGo5eedN3cRD/v4hcqvQUZo+9zLYZ/HZRMjpOdrscQ+I
QL4FgpcLpOASKZ+bYjjpFxK3rNVTDT9CYJw4/hxEaOhxRhtAO1Q9mJdvJTK6dj09
oR9LPyefQkyKCAt+heWHKKkEYDiwT8U1SlR8STotg24VHIj6Rb4=
=2DDd
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A number of driver bug fixes and a few recent regressions:
- Several bug fixes for bnxt_re. Crashing, incorrect data reported,
and corruption on new HW
- Memory leak and crash in rxe
- Fix sysfs corruption in rxe if the netdev name is too long
- Fix a crash on error unwind in the new cq_pool code
- Fix kobject panics in rtrs by working device lifetime properly
- Fix a data corruption bug in iser target related to misaligned
buffers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/isert: Fix unaligned immediate-data handling
RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
RDMA/core: Fix reported speed and width
RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
RDMA/bnxt_re: Restrict the max_gids to 256
RDMA/bnxt_re: Static NQ depth allocation
RDMA/bnxt_re: Fix the qp table indexing
RDMA/bnxt_re: Do not report transparent vlan from QP1
RDMA/mlx4: Read pkey table length instead of hardcoded value
RDMA/rxe: Fix panic when calling kmem_cache_create()
RDMA/rxe: Fix memleak in rxe_mem_init_user
RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
commit 59e8970b37 ("RDMA/qedr: Return max inline data in QP query
result") changed query_qp max_inline size to return the max roce inline
size. When iwarp was introduced, this should have been modified to return
the max inline size based on protocol. This size is cached in the device
attributes
Fixes: 69ad0e7fe8 ("RDMA/qedr: Add support for iWARP in user space")
Link: https://lore.kernel.org/r/20200902165741.8355-8-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently iWARP does not support mtu-change. Notify user when MTU changes
that reload of qedr is required for mtu to change. Display the correct
active mtu.
Fixes: f5b1b1775b ("RDMA/qedr: Add iWARP support in existing verbs")
Link: https://lore.kernel.org/r/20200902165741.8355-7-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In iWARP, accept could be called after a QP is already destroyed. In this
case an error should be returned and not success.
Fixes: 82af6d19d8 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr")
Link: https://lore.kernel.org/r/20200902165741.8355-5-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Change the doorbell setting so that the maximum value between the last and
current value is set. This is to avoid doorbells being lost.
Fixes: a7efd7773e ("qedr: Add support for PD,PKEY and CQ verbs")
Link: https://lore.kernel.org/r/20200902165741.8355-3-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is always the same value as IOVA masked by the page size, just use
that clearer calculation directly.
It is unclear of ocrdma hardware can actually support a true fbo, if so it
could use a different algorithm to compute the best page size.
Link: https://lore.kernel.org/r/17-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
zbva is always false, so fbo is never read.
A 'zero-based-virtual-address' is simply IOVA == 0, and the driver already
supports this.
Link: https://lore.kernel.org/r/16-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use
ib_umem_num_dma_blocks() inside the function, it is just some weird static
default.
All other places are just using it with PAGE_SIZE, switch to
ib_umem_num_dma_blocks().
As this is the last call site, remove ib_umem_num_count().
Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This driver always uses a DMA array made up of PAGE_SIZE elements, so just
use ib_umem_num_dma_blocks().
Since rdma_for_each_dma_block() always iterates exactly
ib_umem_num_dma_blocks() there is no need for the early exit check in
build_user_pbes(), delete it.
Link: https://lore.kernel.org/r/13-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
mtr_umem_page_count() does the same thing, replace it with the core code.
Also, ib_umem_find_best_pgsz() should always be called to check that the
umem meets the page_size requirement. If there is a limited set of
page_sizes that work it the pgsz_bitmap should be set to that set. 0 is a
failure and the umem cannot be used.
Lightly tidy the control flow to implement this flow properly.
Link: https://lore.kernel.org/r/12-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
ib_umem_page_count() returns the number of 4k entries required for a DMA
map, but bnxt_re already computes a variable page size. The correct API to
determine the size of the page table array is ib_umem_num_dma_blocks().
Fix the overallocation of the page array in fill_umem_pbl_tbl() when
working with larger page sizes by using the right function. Lightly
re-organize this function to make it clearer.
Replace the other calls to ib_umem_num_pages().
Fixes: d85582517e ("RDMA/bnxt_re: Use core helpers to get aligned DMA address")
Link: https://lore.kernel.org/r/11-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The length of the list populated by qedr_populate_pbls() should be
calculated using ib_umem_num_dma_blocks() with the same size/shift passed
to qedr_populate_pbls().
Link: https://lore.kernel.org/r/10-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.
Fixes: eb52c0333f ("RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size")
Link: https://lore.kernel.org/r/8-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
If ib_umem_find_best_pgsz() returns > PAGE_SIZE then the equation here is
not correct. 'start' should be 'virt'. Change it to use the core code for
page_num and the canonical calculation of page_shift.
Fixes: 40ddb3f020 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size")
Link: https://lore.kernel.org/r/7-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Tested-by: Gal Pressman <galpress@amazon.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
ib_umem_num_pages() should only be used by things working with the SGL in
CPU pages directly.
Drivers building DMA lists should use the new ib_num_dma_blocks() which
returns the number of blocks rdma_umem_for_each_block() will return.
To make this general for DMA drivers requires a different implementation.
Computing DMA block count based on umem->address only works if the
requested page size is < PAGE_SIZE and/or the IOVA == umem->address.
Instead the number of DMA pages should be computed in the IOVA address
space, not umem->address. Thus the IOVA has to be stored inside the umem
so it can be used for these calculations.
For now set it to umem->address by default and fix it up if
ib_umem_find_best_pgsz() was called. This allows drivers to be converted
to ib_umem_num_dma_blocks() safely.
Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Generally drivers should be using this core helper to split up the umem
into DMA pages.
These drivers are all probably wrong in some way to pass PAGE_SIZE in as
the HW page size. Either the driver doesn't support other page sizes and
it should use 4096, or the driver does support other page sizes and should
use ib_umem_find_best_pgsz() to select the best HW pages size of the HW
supported set.
The only case it could be correct is if the HW has a global setting for
PAGE_SIZE set at driver initialization time.
Link: https://lore.kernel.org/r/5-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This helper does the same as rdma_for_each_block(), except it works on a
umem. This simplifies most of the call sites.
Link: https://lore.kernel.org/r/4-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Change counters to return failure like any other verbs destroy, however
this flow shouldn't return error at all.
Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.
Kernel verbs and drivers that don't have DEVX flows shouldn't fail.
Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.
Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The HW release can fail and leave the system in limbo state, where SRQ is
removed from the table, but can't be destroyed later. In every reentry,
the initial xa_erase_irq() check will fail.
Rewrite the erase logic to keep index, but don't store the entry
itself. By doing it, we can safely reinsert entry back in the case of
destroy failure.
Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.
Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.
Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The IB verbs objects are counted by the kernel and ib_core ensures that
deallocate PD will success so it will be called once all other objects
that depends on PD will be released. This is achieved by managing various
reference counters on such objects.
The mlx5 driver didn't follow this standard flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under
ib_core responsibility.
In such interleaved scenarios deallocate command can fail and ib_core will
leave uobject in internal DB and attempt to clean it later to free
resources anyway.
This change partially restores returned value from dealloc_pd() for all
drivers, but keeping in mind that non-DEVX devices and kernel verbs paths
shouldn't fail.
Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
drivers/infiniband/hw/bnxt_re/main.c:1012:25:
warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable]
Fixes: f86b31c6a2 ("RDMA/bnxt_re: Static NQ depth allocation")
Link: https://lore.kernel.org/r/20200905121624.32776-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In preparation for unconditionally passing the struct tasklet_struct
pointer to all tasklet callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.
Link: https://lore.kernel.org/r/20200903060637.424458-5-allen.lkml@gmail.com
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>