linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 03:47:19 +07:00

Author	SHA1	Message	Date
David S. Miller	70f3522614	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-24 12:06:19 -08:00
Leon Romanovsky	a2a074ef39	RDMA: Handle ucontext allocations by IB/core Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-22 14:11:37 -07:00
Devesh Sharma	c50866e285	bnxt_re: fix the regression due to changes in alloc_pbl While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma driver, there was a regression added in the __alloc_pbl path. The change left bnxt_re in DOA state in for-next branch. Fixing the regression to avoid the host crash when a user space object is created. Restricting the unconditional access to hwq.pg_arr when hwq is initialized for user space objects. Fixes: `161ebe2498` ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL") Reported-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-22 11:17:22 -07:00
Håkon Bugge	2612d723aa	IB/mlx4: Increase the timeout for CM cache Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-22 09:35:03 -07:00
Moni Shoua	81dd4c4be3	IB/mlx5: Validate correct PD before prefetch MR When prefetching odp mr it is required to verify that pd of the mr is identical to the pd for which the advise_mr request arrived with. This check was missing from synchronous flow and is added now. Fixes: `813e90b1ae` ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-21 16:32:45 -07:00
Moni Shoua	a6bc3875f1	IB/mlx5: Protect against prefetch of invalid MR When deferring a prefetch request we need to protect against MR or PD being destroyed while the request is still enqueued. The first step is to validate that PD owns the lkey that describes the MR and that the MR that the lkey refers to is owned by that PD. The second step is to dequeue all requests when MR is destroyed. Since PD can't be destroyed while it owns MRs it is guaranteed that when a worker wakes up the request it refers to is still valid. Now, it is possible to refrain from taking a reference on the device since it is assured to be present as pd. While that, replace the dedicated ordered workqueue with the system unbound workqueue to reuse an existing resource and improve performance. This will also fix a bug of queueing to the wrong workqueue. Fixes: `813e90b1ae` ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-21 16:32:45 -07:00
Gustavo A. R. Silva	7264235ee7	IB/hfi1: Add missing break in switch statement Fix the following warning by adding a missing break: drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’: drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=] switch (prev->wr.opcode) { ^~~~~~ drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here case IB_WR_RDMA_READ: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: `c6c231175c` ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kaike Wan <Kaike.wan@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-21 14:08:12 -07:00
Jason Gunthorpe	815f748037	Merge branch 'mlx5-next' into rdma.git for-next From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux To resolve conflicts with net-next and pick up the first patch. * branch 'mlx5-next': net/mlx5: Factor out HCA capabilities functions IB/mlx5: Add support for 50Gbps per lane link modes net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register net/mlx5: Add new fields to Port Type and Speed register net/mlx5: Refactor queries to speed fields in Port Type and Speed register net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode net/mlx5: Relocate vport macros to the vport header file net/mlx5: E-Switch, Normalize the name of uplink vport number net/mlx5: Provide an alternative VF upper bound for ECPF net/mlx5: Add host params change event net/mlx5: Add query host params command net/mlx5: Update enable HCA dependency net/mlx5: Introduce Mellanox SmartNIC and modify page management logic IB/mlx5: Use unified register/load function for uplink and VF vports net/mlx5: Use consistent vport num argument type net/mlx5: Use void pointer as the type in address_of macro net/mlx5: Align ODP capability function with netdev coding style mlx5: use RCU lock in mlx5_eq_cq_get() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-21 12:40:18 -07:00
Davidlohr Bueso	ec95e0fa21	drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages() The current check does not take into account the previous value of pinned_vm; thus it is quite bogus as is. Fix this by checking the new value after the (optimistic) atomic inc. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-20 14:42:41 -07:00
Wei Yongjun	3b8f8b95d9	iw_cxgb4: Make function read_tcb() static Fixes the following sparse warning: drivers/infiniband/hw/cxgb4/cm.c:658:6: warning: symbol 'read_tcb' was not declared. Should it be static? Fixes: `11a27e2121` ("iw_cxgb4: complete the cached SRQ buffers") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-19 20:52:19 -07:00
Yangyang Li	6ac16e4039	RDMA/hns: Bugfix for set hem of SCC The method of set hem for scc context is different from other contexts. It should notify the hardware with the detailed idx in bt0 for scc, while for other contexts, it only need to notify the bt step and the hardware will calculate the idx. Here fixes the following error when unloading the hip08 driver: [ 123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 123.579023] {1}[Hardware Error]: event severity: recoverable [ 123.584670] {1}[Hardware Error]: Error 0, type: recoverable [ 123.590317] {1}[Hardware Error]: section_type: PCIe error [ 123.595877] {1}[Hardware Error]: version: 4.0 [ 123.600395] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 123.606562] {1}[Hardware Error]: device_id: 0000:7d:00.0 [ 123.612034] {1}[Hardware Error]: slot: 0 [ 123.616120] {1}[Hardware Error]: secondary_bus: 0x00 [ 123.621245] {1}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa222 [ 123.627847] {1}[Hardware Error]: class_code: 000002 [ 123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000 [ 123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000 [ 123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!! [ 123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified [ 123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error [ 123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error [ 123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5 [ 123.683147] hns3 0000:7d:00.0: AER: Device recovery successful [ 123.688978] hns3 0000:7d:00.0: PF Reset requested [ 123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF [ 123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5! Fixes: `6a157f7d1b` ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Reviewed-by: Yixian Liu <liuyixian@huawei.com> Reviewed-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-19 20:52:19 -07:00
Lijun Ou	3e394f9413	RDMA/hns: Modify qp&cq&pd specification according to UM Accroding to hip08's limitation, qp&cq specification is 1M, mtpt specification 1M in kernel space. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-19 20:52:19 -07:00
Parvi Kaustubhi	5bb3c1e9d4	IB/usnic: Fix deadlock There is a dead lock in usnic ib_register and netdev_notify path. usnic_ib_discover_pf() \| mutex_lock(&usnic_ib_ibdev_list_lock); \| usnic_ib_device_add(); \| ib_register_device() \| usnic_ib_query_port() \| mutex_lock(&us_ibdev->usdev_lock); \| ib_get_eth_speed() \| rtnl_lock() order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock rtnl_lock() \| usnic_ib_netdevice_event() \| mutex_lock(&usnic_ib_ibdev_list_lock); order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock Solution is to use the core's lock-free ib_device_get_by_netdev() scheme to lookup ib_dev while handling netdev & inet events. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-19 20:52:19 -07:00
Leon Romanovsky	cfe876d8e6	RDMA/cxgb4: Remove kref accounting for sync operation Ucontext allocation and release aren't async events and don't need kref accounting. The common layer of RDMA subsystem ensures that dealloc ucontext will be called after all other objects are released. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-15 21:39:15 -07:00
Leon Romanovsky	be56b07b4f	RDMA/nes: Remove useless usecnt variable and redundant memset The internal design of RDMA/core ensures that there dealloc ucontext will be called only if alloc_ucontext succeeded, hence there is no need to manage internal variable to mark validity of ucontext. As part of this change, remove redundant memeset too. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-15 21:39:15 -07:00
Bodong Wang	f8e8fa0262	net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver Eswitch has two users: IB and ETH. They both register repersentors when mlx5 interface is added, and unregister the repersentors when mlx5 interface is removed. Ideally, each driver should only deal with the entities which are unique to itself. However, current IB and ETH drivers have to perform the following eswitch operations: 1. When registering, specify how many vports to register. This number is the same for both drivers which is the total available vport numbers. 2. When unregistering, specify the number of registered vports to do unregister. Also, unload the repersentors which are already loaded. It's unnecessary for eswitch driver to hands out the control of above operations to individual driver users, as they're not unique to each driver. Instead, such operations should be centralized to eswitch driver. This consolidates eswitch control flow, and simplified IB and ETH driver. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-15 17:25:58 -08:00
Saeed Mahameed	259fae5a2c	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merge mlx5-next shared branched into net-next, From Bodong Wang: 1) Introduction of ECPF (Embedded CPU Physical Function), and low level bits for mlx5 SmartNic capabilities support. 2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core From Aya Levin, 3) Add support for 50Gbps per lane link modes in the Port Type and Speed register (PTYS) 4) Refactor low level query functions for PTYS register 5) Add support for 50Gbps per lane link modes to mlx5_ib Note: due to a change in API in mlx5/core and a later patch from net-next, a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT with MLX5_VPORT_UPLINK which exists only in upstream net-next. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-15 16:45:31 -08:00
Kaike Wan	e50838c27f	IB/hfi1: Fix a build warning for TID RDMA READ The following build warning was produced for the TID RDMA READ patch ("IB/hfi1: Enable TID RDMA READ protocol"): drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=] hfi1_setup_tid_rdma_wqe(qp, wqe); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/qp.c:329:2: note: here case IB_QPT_UC: ^~~~ This patch will fix the issue by adding the "fall through" comment. Fixes: `f1ab4efa6d` ("IB/hfi1: Enable TID RDMA READ protocol") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-15 15:55:50 -07:00
Shamir Rabinovitch	8994445054	IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-15 15:38:38 -07:00
Raju Rangoju	f09ef134a7	iw_cxgb4: cq/qp mask depends on bar2 pages in a host page Adjust the cq/qp mask based on the number of bar2 pages in a host page. For user-mode rdma, the granularity of the BAR2 memory mapped to a user rdma process during queue allocation must be based on the host page size. The lld attributes udb_density and ucq_density are used to figure out how many sge contexts are in a bar2 page. So the rdev->qpmask and rdev->cqmask in iw_cxgb4 need to now be adjusted based on how many sge bar2 pages are in a host page. Otherwise the device fails to work on non 4k page size systems. Fixes: `2391b0030e` ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-15 09:39:39 -07:00
Lijun Ou	dad1f9802e	RDMA/hns: Configure capacity of hns device This patch adds new device capability for IB_DEVICE_MEM_MGT_EXTENSIONS to indicate device support for the following features: 1. Fast register memory region. 2. send with remote invalidate by frmr 3. local invalidate memory regsion As well as adds the max depth of frmr page list len. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-14 13:20:19 -07:00
Yixian Liu	e95c716c7f	RDMA/hns: Delete useful prints for aeq subtype event Current all messages printed for aeq subtype event are wrong. Thus, delete them and only the value of subtype event is printed. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-14 13:20:19 -07:00
Yixian Liu	f7f27a5f03	RDMA/hns: Set allocated memory to zero for wrid The memory allocated for wrid should be initialized to zero. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-14 13:20:19 -07:00
Yixian Liu	ab22bf0521	RDMA/hns: Fix the state of rereg mr The state of mr after reregister operation should be set to valid state. Otherwise, it will keep the same as the state before reregistered. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-14 13:20:18 -07:00
chenglang	704e0e613a	RDMA/hns: Limit minimum ROCE CQ depth to 64 This patch modifies the minimum CQ depth specification of hip08 and is consistent with the processing of hip06. Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-14 13:20:18 -07:00
Aya Levin	08e8676f16	IB/mlx5: Add support for 50Gbps per lane link modes Driver now supports new link modes: 50Gbps per lane support for 50G/100G/200G. This patch reads the correct field (legacy vs. extended) based on a FW indication bit, and adds a translation function (link modes to IB width and speed) to the new link modes. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-14 12:14:42 -08:00
Aya Levin	a08b4ed137	net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register This patch exposes new link modes (including 50Gbps per lane), and ext_* fields which describes the new link modes in Port Type and Speed register (PTYS). Access functions, translation functions (speed <-> HW bits) and link max speed function were modified. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-14 12:14:42 -08:00
Aya Levin	bc4e12ffef	net/mlx5: Refactor queries to speed fields in Port Type and Speed register This patch fascicles queries to speed related fields in Port Type and Speed register (PTYS) into a single API. I addition, this patch refactors functions which serves only Ethernet driver: remove the protocol type as an input parameter, move code from 'core' directory into 'en' directory and add 'eth' prefix to the function's name. The patch also encapsulates functions that are not used outside the Ethernet driver removes redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-14 12:14:42 -08:00
Bodong Wang	b05af6aacd	net/mlx5: E-Switch, Normalize the name of uplink vport number Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-14 12:14:42 -08:00
Bodong Wang	f0666f1f22	IB/mlx5: Use unified register/load function for uplink and VF vports IB driver maintains different registration and load function calls for uplink and VF vports. This is not necessary as they only differ with each other on their profiles. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-02-14 12:14:41 -08:00
Shiraz, Saleem	52a572e9f7	RDMA/nes: Use for_each_sg_dma_page iterator for umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-13 09:00:43 -07:00
Colin Ian King	e8ac9389f0	RDMA: Fix allocation failure on pointer pd The null check on an allocation failure on pd is currently checking if pd is non-null rather than null. Fix this by adding the missing ! operator. Fixes: `21a428a019` ("RDMA: Handle PD allocations by IB/core") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-13 09:00:43 -07:00
Doug Ledford	d892273bb5	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next I had merged the hfi1-tid code into my local copy of for-next, but was waiting on 0day testing before pushing it (I pushed it to my wip branch). Having waited several days for 0day testing to show up, I'm finally just going to push it out. In the meantime, though, Jason pushed other stuff to for-next, so I needed to merge up the branches before pushing. Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-13 09:35:39 -05:00
Colin Ian King	a87145957e	RDMA/bnxt_re: fix or'ing of data into an uninitialized struct member The struct member comp_mask has not been initialized however a bit pattern is being bitwise or'd into the member and hence other bit fields in comp_mask may contain any garbage from the stack. Fix this by making the bitwise or into an assignment. Fixes: `95b86d1c91` ("RDMA/bnxt_re: Update kernel user abi to pass chip context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:34:54 -07:00
Mark Bloch	fc9e4477f9	RDMA/mlx5: Fix memory leak in case we fail to add an IB device Make sure the IB device is freed on failure. Fixes: `b5ca15ad7e` ("IB/mlx5: Add proper representors support") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:32:29 -07:00
Yishai Hadas	0da4d48d99	IB/mlx5: Fix bad flow upon DEVX mkey creation Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey from the radix tree in case there was a previous failure to insert it. Fixes: `534fd7aac5` ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:30:40 -07:00
Shiraz, Saleem	be8c456abf	RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:24:55 -07:00
Shiraz, Saleem	95ad233ffb	RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:24:55 -07:00
Shiraz, Saleem	f3e6d31179	RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:24:55 -07:00
Shiraz, Saleem	b44e47eb06	RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:24:55 -07:00
Shiraz, Saleem	48b586ac36	RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:24:55 -07:00
Shiraz, Saleem	3856ec5527	RDMA/hns: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:02:33 -07:00
Shiraz, Saleem	43fae91276	RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:02:33 -07:00
Shiraz, Saleem	8d249af3e6	RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:02:33 -07:00
Shiraz, Saleem	161ebe2498	RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-11 15:02:33 -07:00
Doug Ledford	82771f2033	Merge branch 'wip/dl-for-next' into for-next Due to concurrent work by myself and Jason, a normal fast forward merge was not possible. This brings in a number of hfi1 changes, mainly the hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed and integrated over a period of days. Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-09 12:54:04 -05:00
Raju Rangoju	f368ff188a	iw_cxgb4: fix srqidx leak during connection abort When an application aborts the connection by moving QP from RTS to ERROR, then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the connection by calling c4iw_ep_disconnect(). c4iw_ep_disconnect() does the following: 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4. 2. sends abort request CPL to hw. But, since the close_complete_upcall() is sent before sending the ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the connection holds one. Because, the srqidx is passed up to libcxgb4 only after corresponding ABORT_RPL is processed by kernel in abort_rpl(). This patch handle the corner-case by moving the call to close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl(). So that libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and libcxgb4 can relinquish the srqidx properly. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 17:02:05 -07:00
Raju Rangoju	11a27e2121	iw_cxgb4: complete the cached SRQ buffers If TP fetches an SRQ buffer but ends up not using it before the connection is aborted, then it passes the index of that SRQ buffer to the host in ABORT_REQ_RSS or ABORT_RPL CPL message. But, if the srqidx field is zero in the received ABORT_RPL or ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if it really did have an RQE cached. This works around a case where HW does not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL. The final value of rq_start is the one present in TCB with the TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx PDU feedback event pending. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 17:02:05 -07:00
Leon Romanovsky	21a428a019	RDMA: Handle PD allocations by IB/core The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 16:51:04 -07:00
Parvi Kaustubhi	0c23660649	IB/usnic: Fix locking when unregistering Move the call to usnic_ib_device_remove after usnic_ib_ibdev_list_lock has been released. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 16:21:59 -07:00
Steve Wise	c8a7eb554a	iw_cxgb4: use tos when finding ipv6 routes When IPv6 support was added, the correct tos was not passed to cxgb_find_route6(). This potentially results in the wrong route entry. Fixes: `830662f6f0` ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 16:18:06 -07:00
Steve Wise	cb3ba0bde8	iw_cxgb4: use tos when importing the endpoint import_ep() is passed the correct tos, but doesn't use it correctly. Fixes: `ac8e4c69a0` ("cxgb4/iw_cxgb4: TOS support") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 16:18:06 -07:00
Steve Wise	7235ea227e	iw_cxgb4: use listening ep tos when accepting new connections If the parent listening endpoint has a service type set, then use that when setting up the connection. This allows server-side applications to mandate the tos for passive side connections via rdma_set_service_type() on the listening endpoints. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-08 16:18:06 -07:00
Devesh Sharma	95b86d1c91	RDMA/bnxt_re: Update kernel user abi to pass chip context User space verbs provider library would need chip context. Changing the ABI to add chip version details in structure. Furthermore, changing the kernel driver ucontext allocation code to initialize the abi structure with appropriate values. As suggested by community, appended the new fields at the bottom of the ABI structure and retaining to older fields as those were in the older versions. Keeping the ABI version at 1 and adding a new field in the ucontext response structure to hold the component mask. The user space library should check pre-defined flags to figure out if a certain feature is supported on not. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:49 -07:00
Devesh Sharma	37f91cff2d	RDMA/bnxt_re: Add extended psn structure for 57500 adapters The new 57500 series of adapter has bigger psn search structure. The size of new structure is 16B. Changing the control path memory allocation and fast path code to accommodate the new psn structure while maintaining the backward compatibility. There are few additional changes listed below: - For 57500 chip max-sge are limited to 6 for now. - For 57500 chip max-receive-sge should be set to 6 for now. - Add driver/hardware interface structure for new chip. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:48 -07:00
Devesh Sharma	374c5285ab	RDMA/bnxt_re: Enable GSI QP support for 57500 series In the new 57500 series of adapters the GSI qp is a UD type QP unlike the previous generation where it was a Raw Eth QP. Changing the control and data path to support the same. Listing all the significant diffs: - AH creation resolve network type unconditionally - Add check at relevant places to distinguish from Raw Eth processing flow. - bnxt_re_process_res_ud_wc report completion with GRH flag when qp is GSI. - Change length, cfa_meta and smac to match new driver/hardware interface. - Add new driver/hardware interface. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:48 -07:00
Devesh Sharma	e0387e1dd4	RDMA/bnxt_re: Skip backing store allocation for 57500 series The backing store to keep HW context data structures is allocated and initialized by L2 driver. For 57500 chip RoCE driver do not require to allocate and initialize additional memory. Changing to skip duplicate allocation and initialization for 57500 adapters. Driver continues as before for older chips. This patch also takes care of stats context memory alignment to 128 boundary, a requirement for 57500 series of chip. Older chips do not care of alignment, thus the change is unconditional. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:48 -07:00
Devesh Sharma	b353ce556d	RDMA/bnxt_re: Add 64bit doorbells for 57500 series The new chip series has 64 bit doorbell for notification queues. Thus, both control and data path event queues need new routines to write 64 bit doorbell. Adding the same. There is new doorbell interface between the chip and driver. Changing the chip specific data structure definitions. Additional significant changes are listed below - bnxt_re_net_ring_free/alloc takes a new argument - bnxt_qplib_enable_nq and enable_rcfw uses new doorbell offset for new chip. - DB mapping for NQ and CREQ now maps 8 bytes. - DBR_DBR_* macros renames to DBC_DBC_* - store nq_db_offset in a 32bit data type. - got rid of __iowrite64_copy, used writeq instead. - changed the DB header initialization to simpler scheme. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:48 -07:00
Devesh Sharma	ae8637e131	RDMA/bnxt_re: Add chip context to identify 57500 series Adding setup and destroy routines for chip-context. The chip context would be used frequently in control and data path to take execution flow depending on the chip type. chip context structure pointer is added to the relevant data structures. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:24:48 -07:00
Gal Pressman	af8b38ed0b	IB/mlx5: Simplify WQE count power of two check Use is_power_of_2() instead of hard coding it in the driver. While at it, fix the meaningless error print. Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 13:14:55 -07:00
Davidlohr Bueso	8ea1f989aa	drivers/IB,usnic: reduce scope of mmap_sem usnic_uiom_get_pages() uses gup_longterm() so we cannot really get rid of mmap_sem altogether in the driver, but we can get rid of some complexity that mmap_sem brings with only pinned_vm. We can get rid of the wq altogether as we no longer need to defer work to unpin pages as the counter is now atomic. We also share the lock. Acked-by: Parvi Kaustubhi <pkaustub@cisco.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 12:54:02 -07:00
Davidlohr Bueso	0e15c25336	drivers/IB,hfi1: do not se mmap_sem This driver already uses gup_fast() and thus we can just drop the mmap_sem protection around the pinned_vm counter. Note that the window between when hfi1_can_pin_pages() is called and the actual counter is incremented remains the same as mmap_sem was _only_ used for when ->pinned_vm was touched. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.det> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 12:54:02 -07:00
Davidlohr Bueso	3a2a1e9056	drivers/IB,qib: optimize mmap_sem usage The driver uses mmap_sem for both pinned_vm accounting and get_user_pages(). Because rdma drivers might want to use gup_longterm() in the future we still need some sort of mmap_sem serialization (as opposed to removing it entirely by using gup_fast()). Now that pinned_vm is atomic the writer lock can therefore be converted to reader. This also fixes a bug that __qib_get_user_pages was not taking into account the current value of pinned_vm. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 12:54:02 -07:00
Davidlohr Bueso	70f8a3ca68	mm: make mm->pinned_vm an atomic64 counter Taking a sleeping lock to _only_ increment a variable is quite the overkill, and pretty much all users do this. Furthermore, some drivers (ie: infiniband and scif) that need pinned semantics can go to quite some trouble to actually delay via workqueue (un)accounting for pinned pages when not possible to acquire it. By making the counter atomic we no longer need to hold the mmap_sem and can simply some code around it for pinned_vm users. The counter is 64-bit such that we need not worry about overflows such as rdma user input controlled from userspace. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-07 12:54:02 -07:00
Kaike Wan	34025fb0c4	IB/hfi1: Prioritize the sending of ACK packets ACK packets are generally associated with request completion and resource release and therefore should be sent first. This patch optimizes the send engine by using the following policies: (1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags should have their priority incremented; (2) QPs with ACK or TID-ACK packet queued should have their priority incremented; (3) When a QP is queued to the wait list due to resource constraints, it will be queued to the head if it has ACK packet to send; (4) When selecting qps to run from the wait list, the one with the highest priority and starve_cnt will be selected; each priority will be equivalent to a fixed number of starve_cnt (16). Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	a05c9bdcfd	IB/hfi1: Add static trace for TID RDMA WRITE protocol This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA WRITE packets in IB header trace; 2. Adds trace events for various stages of the TID RDMA WRITE protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	ad00889e7c	IB/hfi1: Enable TID RDMA WRITE protocol This patch enables TID RDMA WRITE protocol by converting a qualified RDMA WRITE request into a TID RDMA WRITE request internally: (1) The TID RDMA cability must be enabled; (2) The request must start on a 4K page boundary; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	c6c231175c	IB/hfi1: Add interlock between TID RDMA WRITE and other requests This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA WRITE requests are interleaved with TID RDMA READ requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; 4. WRITE-AFTER-WRITE. When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	3c6cb20a0d	IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs This patch integrates TID RDMA WRITE protocol into normal RDMA verbs framework. The TID RDMA WRITE protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA WRITE request into a TID RDMA WRITE request to avoid data copying on the responder side. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	572f0c3301	IB/hfi1: Add the dual leg code The "Second Leg" of the TID RDMA WRITE protocol deals with the transfer of data and ack packets, which are in the KDETH PSN space, as opposed to the IB PSN space. Therefore, the Second Leg could be considered as a separate state machine. As such, it is handled by a different work queue item which is scheduled along with the normal IB state machine work item. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	24c5bfeaf1	IB/hfi1: Add the TID second leg ACK packet builder This patch adds the TID packet builder for the responder side, which contains the state machine to build TID RDMA ACK packet for either TID RDMA WRITE DATA or TID RDMA RESYNC packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	70dcb2e3dc	IB/hfi1: Add the TID second leg send packet builder To improve performance, the TID RDMA WRITE protocol is designed to own a second leg to send data and ack packets in the KDETH PSN space. This patch adds the packet builder for the requester side, which contains the state machine to build TID RDMA WRITE DATA and TID RDMA RESYNC packet. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	6e38fca6b1	IB/hfi1: Resend the TID RDMA WRITE DATA packets This patch adds the logic to resend TID RDMA WRITE DATA packets. The tracking indices will be reset properly so that the correct TID entries will be used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	7cf0ad679d	IB/hfi1: Add a function to receive TID RDMA RESYNC packet This patch adds a function to receive TID RDMA RESYNC packet on the responder side. The QP's hardware flow will be updated and all allocated software flows will be updated accordingly in order to drop all stale packets. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	6e391c6a4a	IB/hfi1: Add a function to build TID RDMA RESYNC packet This patch adds a function to build TID RDMA RESYNC packet, which is sent by the requester to notify the responder that no TID RDMA ACK packet has been received for a given KDETH PSN. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:44 -05:00
Kaike Wan	829eaee5d0	IB/hfi1: Add TID RDMA retry timer This patch adds the TID RDMA retry timer to make sure that TID RDMA WRITE DATA packets for a segment are received successfully by the responder. This timer is generally armed when the last TID RDMA WRITE DATA packet for a segment is sent out and stopped when all TID RDMA DATA packets are acknowledged. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	9e93e967f7	IB/hfi1: Add a function to receive TID RDMA ACK packet This patch adds a function to receive TID RDMA ACK packet, which could be an acknowledge to either a TID RDMA WRITE DATA packet or an TID RDMA RESYNC packet. For an ACK to TID RDMA WRITE DATA packet, the request segments are completed appropriately. For an ACK to a TID RDMA RESYNC packet, any pending segment flow information is updated accordingly. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	0f75e325aa	IB/hfi1: Add a function to build TID RDMA ACK packet This patch adds a function to build TID RDMA ACJ packet, which is also in the KDETH PSN space for packet ordering. This packet is used to acknowledge the receiving of all the TID RDMA WRITE DATA packets before the given KDETH PSN. Similar to RC ACK packets, TID RDMA ACK packets could also be coalesced. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	d72fe7d500	IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet This patch adds a function to receive TID RDMA WRITE DATA packet, which is in the KDETH PSN space in packet ordering. Due to the use of header suppression, software is generally only notified when the last data packet for a segment is received. This patch also adds code to handle KDETH EFLAGS errors for ingress TID RDMA WRITE DATA packets. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	539e1908e4	IB/hfi1: Add a function to build TID RDMA WRITE DATA packet This patch adds a function to build TID RDMA WRITE DATA packet. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	72a0ea99ec	IB/hfi1: Add a function to receive TID RDMA WRITE response This patch adds a function to receive TID RDMA WRITE response. The TID entries will be stored for encoding TID RDMA WRITE DATA packet later. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	3c759e003a	IB/hfi1: Add TID resource timer This patch adds the TID resource timer, which is used by the responder to free any TID resources that are allocated for TID RDMA WRITE request and not returned by the requester after a reasonable time. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	38d46d3676	IB/hfi1: Add a function to build TID RDMA WRITE response This patch adds the function to build TID RDMA WRITE response. The main role of the TID RDMA WRITE RESP packet is to send TID entries to the requester so that they can be used to encode TID RDMA WRITE DATA packet. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	07b923701e	IB/hfi1: Add functions to receive TID RDMA WRITE request This patch adds the functions to receive TID RDMA WRITE request. The request will be stored in the QP's s_ack_queue. This patch also adds code to handle duplicate TID RDMA WRITE request and a function to allocate TID resources for data receiving on the responder side. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	4f9264d156	IB/hfi1: Add an s_acked_ack_queue pointer The s_ack_queue is managed by two pointers into the ring: r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of where the next received request is going to be placed and s_tail_ack_queue is the entry of the request currently being processed. This works perfectly fine for normal Verbs as the requests are processed one at a time and the s_tail_ack_queue is not moved until the request that it points to is fully completed. In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and the two pointers can easily be used to determine "queue full" and "queue empty" conditions. The detection of these two conditions are imported in determining when an old entry can safely be overwritten with a new received request and the resources associated with the old request be safely released. When pipelined TID RDMA WRITE is introduced into this mix, things look very different. r_head_ack_queue is still the point at which a newly received request will be inserted, s_tail_ack_queue is still the currently processed request. However, with pipelined TID RDMA WRITE requests, s_tail_ack_queue moves to the next request once all TID RDMA WRITE responses for that request have been sent. The rest of the protocol for a particular request is managed by other pointers specific to TID RDMA - r_tid_tail and r_tid_ack - which point to the entries for which the next TID RDMA DATA packets are going to arrive and the request for which the next TID RDMA ACK packets are to be generated, respectively. What this means is that entries in the ring, which are "behind" s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no longer considered complete. This is where the problem is - a newly received request could potentially overwrite a still active TID RDMA WRITE request. The reason why the TID RDMA pointers trail s_tail_ack_queue is that the normal Verbs send engine uses s_tail_ack_queue as the pointer for the next response. Since TID RDMA WRITE responses are processed by the normal Verbs send engine, s_tail_ack_queue had to be moved to the next entry once all TID RDMA WRITE response packets were sent to get the desired pipelining between requests. Doing otherwise would mean that the normal Verbs send engine would not be able to send the TID RDMA WRITE responses for the next TID RDMA request until the current one is fully completed. This patch introduces the s_acked_ack_queue index to point to the next request to complete on the responder side. For requests other than TID RDMA WRITE, s_acked_ack_queue should always be kept in sync with s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind s_tail_ack_queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	f5a4a95f4d	IB/hfi1: Allow for extra entries in QP's s_ack_queue The TID RDMA WRITE protocol differs from normal IB RDMA WRITE in that TID RDMA WRITE requests do require responses, not just ACKs. Therefore, TID RDMA WRITE requests need to be treated as RDMA READ requests from the point of view of the QPs' s_ack_queue. In other words, the QPs' need to allow for TID RDMA WRITE requests to be stored in their s_ack_queue. However, because the user does not know anything about the TID RDMA capability and/or protocols, these extra entries in the queue cannot be advertized to the user. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	c098bbb00c	IB/hfi1: Build TID RDMA WRITE request This patch adds the functions to build TID RDMA WRITE request. The work request opcode, packet opcode, and packet formats for TID RDMA WRITE protocol are also defined in this patch. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 18:07:43 -05:00
Kaike Wan	3ce5daa2c1	IB/hfi1: Add static trace for TID RDMA READ protocol This patch makes the following changes to the static trace: 1. Adds the decoding of TID RDMA READ packets in IB header trace; 2. Tracks qpriv->s_flags and iow_flags in qpsleepwakeup trace; 3. Adds a new event to track RC ACK receiving; 4. Adds trace events for various stages of the TID RDMA READ protocol. These events provide a fine-grained control for monitoring and debugging the hfi1 driver in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:56 -05:00
Kaike Wan	f1ab4efa6d	IB/hfi1: Enable TID RDMA READ protocol This patch enables TID RDMA READ protocol by converting a qualified RDMA READ request into a TID RDMA READ request internally: (1) The TID RDMA capability must be enabled; (2) The request must start on a 4K page boundary and all receiving buffers must start on 4K page boundaries; (3) The request length must be a multiple of 4K and must be larger or equal to 256K. Each receiving buffer length must be a multiple of 4K. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	a0b34f75ec	IB/hfi1: Add interlock between a TID RDMA request and other requests This locking mechanism is designed to provent vavious memory corruption scenarios from occurring when requests are pipelined, especially when RDMA READ/WRITE requests are interleaved with TID RDMA READ/WRITE requests: 1. READ-AFTER-READ; 2. READ-AFTER-WRITE; 3. WRITE-AFTER-READ; When memory corruption is likely, a request will be held back until previous requests have been completed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	24b11923da	IB/hfi1: Integrate TID RDMA READ protocol into RC protocol This patch integrates the TID RDMA READ protocol into the IB RC protocol. This protocol is an end-to-end protocol between the hfi1 drivers on two OPA nodes that converts a qualified RDMA READ request into a TID RDMA READ request to avoid data copying on the requester side. The following codes are added in this patch: - Send the TID RDMA READ request; - Complete the TID RDMA READ send request; - Send the TID RDMA READ response; - Complete the TID RDMA READ request; Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	b126078e89	IB/hfi1: Add functions for restarting TID RDMA READ request This patch adds functions to retry TID RDMA READ request. Since TID RDMA READ request could be retried from any segment boundary, it requires a number of tracking fields in various structures and those fields should be reset properly. The qp->s_num_rd_atomic field is reset before retry and therefore should be incremented for each new or retried RDMA READ or atomic request. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	22d136d756	IB/hfi1: Add TID RDMA handlers This commit adds the TID RDMA READ pointers to the receiving opcode handlers. It also adds TID RDMA READ header sizes to header size table. A function to print the RHF EFLAGS errors is created so that it can be shared by both IB and TID RDMA receiving functions. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	9905bf06e8	IB/hfi1: Add functions to receive TID RDMA READ response This patch adds the functions to receive TID RDMA READ response. The TID resource information in the KDETH packet header will direct the hardware to deliver the packet payload to the user buffer automatically and the software will handle the packet header for the last packet of a segment as all other packet headers are suppressed by default. The TID entries will be freed when all packets for a segment have been received. This patch also adds the functions to handle KDETH eflag errors, including flow sequence and generation errors, when a TID RDMA READ response packet is received . The flow sequence error can be recovered by software checking of the flow sequence and will disappear when the hardware flow is programmed with a new generation number. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	1db21b5050	IB/hfi1: Add a function to build TID RDMA READ response This patch adds the function to build TID RDMA READ response packet. The previously received TID resource information will be used to build the KDETH packet, which will direct the delivery of packet payload by hardware. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	d0d564a1ca	IB/hfi1: Add functions to receive TID RDMA READ request This patch adds the functions to receive TID RDMA READ request. The TID resource information will be stored and tracked. Duplicate request will also be handled properly. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	6b6cf9357f	IB/hfi1: Set PbcInsertHcrc for TID RDMA packets All TID RDMA packets are in KDETH packet format and therefore the PbcInsertHcrc must be set properly before sending the packet to hardware. Otherwise, the packets will be dropped by the receiver. By default, HCRC is not inserted for 9B packets without KDETH, and this patch adds that back for TID RDMA packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	742a3826cf	IB/hfi1: Add functions to build TID RDMA READ request This patch adds the helper functions to build the TID RDMA READ request on the requester side. The key is to allocate TID resources (TID flow and TID entries) and send the resource information to the responder side along with the read request. Since the TID resources are limited, each TID RDMA READ request has to be split into segments with a default segment size of 256K. A software flow is allocated to track the data transaction for each segment. The work request opcode, packet opcode, and packet formats for TID RDMA READ protocol are also defined in this patch. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	84f4a40d46	IB/hfi1: Add static trace for flow and TID management functions This patch adds the static trace for the flow and TID management functions to help debugging in the filed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	2f16a696a0	IB/hfi1: Add the counter n_tidwait This patch adds the counter n_tidwait to count the number of times the TID resource allocator has to wait for TID resources. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	838b6fd2d9	IB/hfi1: TID RDMA RcvArray programming and TID allocation TID entries are used by hfi1 hardware to receive data payload from incoming packets directly into a user buffer and thus avoid data copying by software. This patch implements the functions for TID allocation, freeing, and programming TID RcvArray entries in hardware for kernel clients. TID entries are managed via lists of TID groups similar to PSM. Furthermore, to track TID resource allocation for each request, software flows are also allocated and freed as needed. Since software flows consume large amount of memory for tracking TID allocation and freeing, it is generally desirable to allocate them dynamically in the send queue and only for TID RDMA requests, but pre-allocate them for receive queue because the send queue could have thousands of entries while the receive queue has only a limited number of entries. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:55 -05:00
Kaike Wan	37356e7832	IB/hfi1: TID RDMA flow allocation The hfi1 hardware flow is a hardware flow-control mechanism for a KDETH data packet that is received on a hfi1 port. It validates the packet by checking both the generation and sequence. Each QP that uses the TID RDMA mechanism will allocate a hardware flow from its receiving context for any incoming KDETH data packets. This patch implements: (1) a function to allocate hardware flow (2) a function to free hardware flow (3) a function to initialize hardware flow generation for a receiving context (4) a wait mechanism if the hardware flow is not available (4) a function to remove the qp from the wait queue for hardware flow when the qp is reset or destroyed. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:53:54 -05:00
Kaike Wan	385156c5f2	IB/hfi: Move RC functions into a header file This patch moves some RC helper functions into a header file so that they can be called from both RC and TID RDMA functions. In addition, a common function for rewinding a request is created in rdmavt so that it can be shared between qib and hfi1 driver. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-02-05 17:51:09 -05:00
Bart Van Assche	bf3b4f066d	IB/mlx5: Do not use hw_access_flags for be and CPU data Avoid that sparse reports the following for the mlx5 driver: drivers/infiniband/hw/mlx5/qp.c:2671:34: warning: invalid assignment: \|= drivers/infiniband/hw/mlx5/qp.c:2671:34: left side has type restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2671:34: right side has type int drivers/infiniband/hw/mlx5/qp.c:2679:34: warning: invalid assignment: \|= drivers/infiniband/hw/mlx5/qp.c:2679:34: left side has type restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2679:34: right side has type int drivers/infiniband/hw/mlx5/qp.c:2680:34: warning: invalid assignment: \|= drivers/infiniband/hw/mlx5/qp.c:2680:34: left side has type restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2680:34: right side has type int drivers/infiniband/hw/mlx5/qp.c:2684:34: warning: invalid assignment: \|= drivers/infiniband/hw/mlx5/qp.c:2684:34: left side has type restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2684:34: right side has type int drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: incorrect type in argument 1 (different base types) drivers/infiniband/hw/mlx5/qp.c:2686:28: expected unsigned int [usertype] val drivers/infiniband/hw/mlx5/qp.c:2686:28: got restricted __be32 [usertype] drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32 drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32 This patch does not change any functionality. Fixes: `a60109dc9a` ("IB/mlx5: Add support for extended atomic operations") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-05 15:22:45 -07:00
Steve Wise	95b8e384d8	iw_cxgb*: kzalloc the iwcm verbs struct So future additions to that struct get initialized by default. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 16:26:02 -07:00
Wei Hu (Xavier)	d3743fa94c	RDMA/hns: Fix the chip hanging caused by sending doorbell during reset On hi08 chip, There is a possibility of chip hanging when sending doorbell during reset. We can fix it by prohibiting doorbell during reset. Fixes: `2d40788825` ("RDMA/hns: Add support for processing send wr and receive wr") Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 16:13:50 -07:00
Wei Hu (Xavier)	6a04aed6af	RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset On hi08 chip, There is a possibility of chip hanging and some errors when sending mailbox & doorbell during reset. We can fix it by prohibiting mailbox and doorbell during reset and reset occurred to ensure that hardware can work normally. Fixes: `a04ff739f2` ("RDMA/hns: Add command queue support for hip08 RoCE driver") Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 16:13:50 -07:00
Wei Hu (Xavier)	d061effc36	RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs In the reset process, the hns3 NIC driver notifies the RoCE driver to perform reset related processing by calling the .reset_notify() interface registered by the RoCE driver in hip08 SoC. In the current version, if a reset occurs simultaneously during the execution of rmmod or insmod ko, there may be Oops error as below: Internal error: Oops: 86000007 [#1] PREEMPT SMP Modules linked in: hns_roce(O) hns3(O) hclge(O) hnae3(O) [last unloaded: hns_roce_hw_v2] CPU: 0 PID: 14 Comm: kworker/0:1 Tainted: G O 4.19.0-ge00d540 #1 Hardware name: Huawei Technologies Co., Ltd. Workqueue: events hclge_reset_service_task [hclge] pstate: 60c00009 (nZCv daif +PAN +UAO) pc : 0xffff00000100b0b8 lr : 0xffff00000100aea0 sp : ffff000009afbab0 x29: ffff000009afbab0 x28: 0000000000000800 x27: 0000000000007ff0 x26: ffff80002f90c004 x25: 00000000000007ff x24: ffff000008f97000 x23: ffff80003efee0a8 x22: 0000000000001000 x21: ffff80002f917ff0 x20: ffff8000286ea070 x19: 0000000000000800 x18: 0000000000000400 x17: 00000000c4d3225d x16: 00000000000021b8 x15: 0000000000000400 x14: 0000000000000400 x13: 0000000000000000 x12: ffff80003fac6e30 x11: 0000800036303000 x10: 0000000000000001 x9 : 0000000000000000 x8 : ffff80003016d000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004 x2 : 00000000000007ff x1 : 0000000000000000 x0 : 0000000000000000 Process kworker/0:1 (pid: 14, stack limit = 0x00000000af8f0ad9) Call trace: 0xffff00000100b0b8 0xffff00000100b3a0 hns_roce_init+0x624/0xc88 [hns_roce] 0xffff000001002df8 0xffff000001006960 hclge_notify_roce_client+0x74/0xe0 [hclge] hclge_reset_service_task+0xa58/0xbc0 [hclge] process_one_work+0x1e4/0x458 worker_thread+0x40/0x450 kthread+0x12c/0x130 ret_from_fork+0x10/0x18 Code: bad PC value In the reset process, we will release the resources firstly, and after the hardware reset is completed, we will reapply resources and reconfigure the hardware. We can solve this problem by modifying both the NIC and the RoCE driver. We can modify the concurrent processing in the NIC driver to avoid calling the .reset_notify and .uninit_instance ops at the same time. And we need to modify the RoCE driver to record the reset stage and the driver's init/uninit state, and check the state in the .reset_notify, .init_instance. and uninit_instance functions to avoid NULL pointer operation. Fixes: `cb7a94c9c8` ("RDMA/hns: Add reset process for RoCE in hip08") Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 16:13:50 -07:00
YueHaibing	c3c668e742	RDMA/hns: Make some function static Fixes the following sparse warnings: drivers/infiniband/hw/hns/hns_roce_hw_v2.c:5822:5: warning: symbol 'hns_roce_v2_query_srq' was not declared. Should it be static? drivers/infiniband/hw/hns/hns_roce_srq.c:158:6: warning: symbol 'hns_roce_srq_free' was not declared. Should it be static? drivers/infiniband/hw/hns/hns_roce_srq.c:81:5: warning: symbol 'hns_roce_srq_alloc' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 15:33:58 -07:00
Jason Gunthorpe	6a8a2aa62d	Linux 5.0-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxXYaEeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkSQH/2yrfnviNPFYpZOR QQdc71Bfhkd8m85SmWIsSebkxmi3hKFVj15sGbWXd6+0/VxjEEGvQCZpvVwJceke LwDxtkKGg/74wAqJvlSAWxFNZ+Had4jDeoSoeQChddsBVXBBCxQx2v6ECg3o2x7W k8Z8t4+3RijDf8fYXY9ETyO2zW8R/wgT+dnl+DPgUH7u4dxh7FzAUfc4bgZIDg+i FzBQfbTJuz4BU7uRZ9IJiwhWKv0Iyi2DR3BY8Z1pqEpRaUMJMrCs2WGytHbTgt9e 0EtO1airbVneU4eumU/ZaF9cyEbah9HousEPnP7J09WG4s/Odxc4zE+uK1QqS2im 5Xv88is= =dVd1 -----END PGP SIGNATURE----- Merge tag 'v5.0-rc5' into rdma.git for-next Linux 5.0-rc5 Needed to merge the include/uapi changes so we have an up to date single-tree for these files. Patches already posted are also expected to need this for dependencies.	2019-02-04 14:53:42 -07:00
Moni Shoua	6141f8fa5b	IB/mlx5: Advertise XRC ODP support Query all per transport caps for XRC and set the appropriate bits in the per transport field of the advertised struct. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:07 -07:00
Moni Shoua	2e68daceac	IB/mlx5: Advertise SRQ ODP support for supported transports ODP support in SRQ is per transport capability. Based on device capabilities set this flag in device structure for future queries. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:07 -07:00
Moni Shoua	08100fad5c	IB/mlx5: Add ODP SRQ support Add changes to the WQE page-fault handler to 1. Identify that the event is for a SRQ WQE 2. Pass SRQ object instead of a QP to the function that reads the WQE 3. Parse the SRQ WQE with respect to its structure The rest is handled as for regular RQ WQE. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:07 -07:00
Moni Shoua	fbeb4075c6	IB/mlx5: Let read user wqe also from SRQ buffer Reading a WQE from SRQ is almost identical to reading from regular RQ. The differences are the size of the queue, the size of a WQE and buffer location. Make necessary changes to mlx5_ib_read_user_wqe() to let it read a WQE from a SRQ or RQ by caller choice. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:07 -07:00
Moni Shoua	29917f4750	IB/mlx5: Add XRC initiator ODP support Skip XRC segment in the beginning of a send WQE and fetch ODP XRC capabilities when QP type is IB_QPT_XRC_INI. The rest of the handling is the same as in RC QP. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:07 -07:00
Moni Shoua	6ff7414a17	IB/mlx5: Clean mlx5_ib_mr_responder_pfault_handler() signature In the function mlx5_ib_mr_responder_pfault_handler() 1. The parameter wqe is used as read-only so there is no need to pass it by reference. 2. Remove the unused argument pfault from list of arguments. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:06 -07:00
Moni Shoua	586f4e95c7	IB/mlx5: Remove useless check in ODP handler When handling an ODP event for a receive WQE in SRQ the target QP is unknown. Therefore, it is wrong to ask if QP has a SRQ in the page-fault handler. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:06 -07:00
Moni Shoua	10f56242e3	IB/mlx5: Fix the locking of SRQ objects in ODP events QP and SRQ objects are stored in different containers so the action to get and lock a common resource during ODP event needs to address that. While here get rid of 'refcount' and 'free' fields in mlx5_core_srq struct and use the fields with same semantics in common structure. Fixes: `032080ab43` ("IB/mlx5: Lock QP during page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-02-04 14:34:06 -07:00
YueHaibing	da91ddfdc7	RDMA/hns: Remove set but not used variable 'rst' Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'hns_roce_v2_qp_flow_control_init': drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4384:33: warning: variable 'rst' set but not used [-Wunused-but-set-variable] It never used since introduction. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-31 15:41:07 -07:00
Kaike Wan	a131d16460	IB/hfi1: Add static trace for OPFN This patch adds the static trace to the OPFN code and moves tid related static trace code into a new header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:37:40 -05:00
Kaike Wan	48a615dc00	IB/hfi1: Integrate OPFN into RC transactions OPFN parameter negotiation allows a pair of connected RC QPs to exchange a set of parameters in succession. This negotiation does not commence till the first ULP request. Because OPFN operations are operations private to the driver, they do not generate user completions or put the QP into error when they run out of retries. This patch integrates the OPFN protocol into the transactions of an RC QP. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:37:34 -05:00
Kaike Wan	ddf922c31f	IB/hfi1, IB/rdmavt: Allow for extending of QP's s_ack_queue The OPFN protocol uses the COMPARE_SWAP request to exchange data between the requester and the responder and therefore needs to be stored in the QP's s_ack_queue when the request is received on the responder side. However, because the user does not know anything about the OPFN protocol, this extra entry in the queue cannot be advertised to the user. This patch adds an extra entry in a QP's s_ack_queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:36:05 -05:00
Kaike Wan	f01b4d5a43	IB/hfi1: OPFN interface OPFN allows a pair of connected RC QPs to exchange a set of parameters in succession. The parameter exchange itself is done using the IB compare and swap request with a special virtual address. The request is triggered using a reserved IB work request opcode. This patch implements the OPFN interface to initialize, start, process, and reset the OPFN request. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:36:05 -05:00
Kaike Wan	d22a207d74	IB/hfi1: Add OPFN helper functions for TID RDMA feature This patch adds the OPFN helper functions to initialize, encode, decode, and reset OPFN parameters for the TID RDMA feature. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:36:05 -05:00
Mitko Haralanov	44e43d91ad	IB/hfi1: OPFN support discovery OPFN (Omni Path Feature Negotiation) support discovery allows a RC QP to announce that it supports OPFN and also discover if OPFN is supported by the peer QP. OPFN parameter negotiation is skipped unless OPFN support is first discovered. OPFN support is announced by claiming what was the reserved bit in dword 1 of OmniPath modified base transport header in requests and responses. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-31 11:36:04 -05:00
Leon Romanovsky	02da375097	RDMA/core: Use the ops infrastructure to keep all callbacks in one place As preparation to hide rdma_restrack_root, refactor the code to use the ops structure instead of a special callback which is hidden in rdma_restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 21:34:21 -07:00
Parav Pandit	cf34e1fe52	IB/mlx5: Consider vlan of lower netdev for macvlan GID entries When a given netdev of the GID entry is macvlan netdevice, and if the lower netdevice is vlan device, GID entry for macvlan based IP address needs to inherit the vlan of the lower netdevice. Therefore, attempt to find out if the lower device exist and if so, if it is vlan device and setup the vlan tag correctly. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 20:43:46 -07:00
Gal Pressman	cfc30ad3d0	IB/usnic: Remove stub functions Lack of mandatory verbs no longer fail device registration, the device will be marked as a non-kverbs provider. Signed-off-by: Gal Pressman <galpress@amazon.com> Tested-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 20:32:25 -07:00
Leon Romanovsky	459cc69fa4	RDMA: Provide safe ib_alloc_device() function All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 15:52:30 -07:00
Kamal Heib	e5c1bb47cc	IB/mlx5: Remove set but not used variable Remove 'del_mkey' variable that is set but not used. Fixes: `534fd7aac5` ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 15:17:17 -07:00
Kamal Heib	f3ffed0ce4	IB/mlx5: Make mlx5_ib_stage_odp_cleanup() static The function mlx5_ib_stage_odp_cleanup() is only used in main.c Fixes: `d5d284b829` ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-30 15:17:17 -07:00
Michael J. Ruhl	db421a5499	IB/{hfi1, qib, rvt} Cleanup open coded sge usage Several locations for manipulating sges use an open coded sequence that is covered by helper functions. Use the appropriate helper functions. Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-30 14:22:32 -05:00
Michael J. Ruhl	87fc34b575	IB/{hfi1,qib}: Cleanup open coded sge sizing Sge sizing is done in several places using an open coded method. This can cause maintenance issues. The open coded method is encapsulated in a helper routine. The helper was introduced with commit: `1198fcea8a` ("IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt") Update all call sites that have the open coded path with the helper routine. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-01-30 14:22:32 -05:00
Adit Ranadive	8aa04ad3b3	RDMA/vmw_pvrdma: Support upto 64-bit PFNs Update the driver to use the new device capability to report 64-bit UAR PFNs. Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 21:40:28 -07:00
Jason Gunthorpe	55c293c38e	Merge branch 'devx-async' into k.o/for-next Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 13:49:31 -07:00
Yishai Hadas	eaebaf77e7	IB/mlx5: Implement DEVX hot unplug for async command FD Implement DEVX hot unplug for the async command FD. This is done by managing a list of the inflight commands and wait until all launched work is completed as part of devx_hot_unplug_async_cmd_event_file. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 13:33:00 -07:00
Yishai Hadas	4accbb3fd2	IB/mlx5: Implement the file ops of DEVX async command FD Implement the file ops of the DEVX async command FD, this enables using the FD for reading the events and manage other options on the FD. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 13:33:00 -07:00
Yishai Hadas	a124edba26	IB/mlx5: Introduce async DEVX obj query API Introduce async DEVX obj query API to get the command response back to user space once it's ready without blocking when calling the firmware. The event's data includes a header with some meta data then the firmware output command data. The header includes: - The input 'wr_id' to let application recognizing the response. The input FD attribute is used to have the event data ready on. Downstream patches from this series will implement the file ops to let application read it. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 13:33:00 -07:00
Yishai Hadas	6bf8f22aea	IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async commands completion. The core layer should allow the driver to set object from type FD in a safe mode, this option was added with a matching comment in place. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-29 13:32:43 -07:00
Masahiro Yamada	b360ce3b2b	infiniband: prefix header search paths with $(srctree)/ Currently, the Kbuild core manipulates header search paths in a crazy way [1]. To fix this mess, I want all Makefiles to add explicit $(srctree)/ to the search paths in the srctree. Some Makefiles are already written in that way, but not all. The goal of this work is to make the notation consistent, and finally get rid of the gross hacks. Having whitespaces after -I does not matter since commit `48f6e3cf5b` ("kbuild: do not drop -I without parameter"). [1]: https://patchwork.kernel.org/patch/9632347/ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-25 15:28:50 -07:00
Mark Bloch	c1b03c25f5	RDMA/mlx5: Fix flow creation on representors The intention of the flow_is_supported was to disable the entire tree and methods that allow raw flow creation, but the grammar syntax has this disable the entire UVERBS_FLOW object. Since the method requires a MLX5_IB_OBJECT_FLOW_MATCHER there is no need to do anything, as it is automatically disabled when matchers are disabled. This restores the ability to create flow steering rules on representors via regular verbs. Fixes: `a1462351b5` ("RDMA/mlx5: Fail early if user tries to create flows on IB representors") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-25 11:58:06 -07:00
Lijun Ou	9d9d4ff788	RDMA/hns: Update the kernel header file of hns The hns_roce_ib_create_srq_resp is used to interact with the user for data, this was open coded to use a u32 directly, instead use a properly sized structure. Fixes: `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-25 09:55:48 -07:00
Ira Weiny	ff0244bb59	RDMA/qib: Use GUP longterm for PSM page pining Similar to the core change commit `5f1d43de54` ("IB/core: disable memory registration of filesystem-dax vmas") PSM should be prevented from using filesystem DAX pages. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Yangyang Li	0e40dc2f70	RDMA/hns: Add timer allocation support for hip08 This patch adds qpc timer and cqc timer allocation support for hardware timeout retransmission in kernel space driver. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Yangyang Li	aa84fa1874	RDMA/hns: Add SCC context clr support for hip08 This patch adds SCC context clear support for DCQCN in kernel space driver. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Yangyang Li	6a157f7d1b	RDMA/hns: Add SCC context allocation support for hip08 This patch adds SCC context allocation and initialization support for DCQCN in kernel space driver. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Moni Shoua	da6a496a34	IB/mlx5: Ranges in implicit ODP MR inherit its write access A sub-range in ODP implicit MR should take its write permission from the MR and not be set always to allow. Fixes: `d07d1d70ce` ("IB/umem: Update on demand page (ODP) support") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Jason Gunthorpe	8ba0ddd094	RDMA/iw_cxgb4: Drop __GFP_NOFAIL There is no reason for this __GFP_NOFAIL, none of the other routines in this file use it, and there is an error unwind here. NOFAIL should be reserved for special cases, not used by network drivers. Fixes: `6a0b6174d3` ("rdma/cxgb4: Add support for kernel mode SRQ's") Reported-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Bart Van Assche	0a353c2e94	IB/mlx5: Declare local functions 'static' This patch avoids that sparse complains about missing function declarations. Fixes: `c9990ab39b` ("RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman	316bcda81d	infiniband: usnic: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman	2537672966	infiniband: ocrdma: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman	73eb8f03f0	infiniband: mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman	0d0336cf54	infiniband: qib: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman	e775118025	infiniband: hfi1: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman	5c43276499	infiniband: hfi1: drop crazy DEBUGFS_SEQ_FILE_CREATE() macro The macro was just making things harder to follow, and audit, so remove it and call debugfs_create_file() directly. Also, the macro did not need to warn about the call failing as no one should ever care about any debugfs functions failing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:29 -07:00
Greg Kroah-Hartman	8283d78725	infiniband: cxgb4: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-24 09:22:29 -07:00
Jason Gunthorpe	e355477ed9	net/mlx5: Make mlx5_cmd_exec_cb() a safe API APIs that have deferred callbacks should have some kind of cleanup function that callers can use to fence the callbacks. Otherwise things like module unloading can lead to dangling function pointers, or worse. The IB MR code is the only place that calls this function and had a really poor attempt at creating this fence. Provide a good version in the core code as future patches will add more places that need this fence. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-01-24 14:25:26 +02:00
Maor Gottlieb	6113cc4401	IB/mlx5: Don't override existing ip_protocol Two flow specifications can set the ip protocol field in the flow table entry: 1) IB_FLOW_SPEC_TCP/UDP/GRE - set the ip protocol accordingly. 2) IB_FLOW_SPEC_IPV4/6 - has ip_protocol field for users who want to receive specific L4 packets. We need to avoid overriding of the ip_protocol with zeros, in case that the user first put the L4 specification and only then the L3. Fixes: `ca0d475385` ('IB/mlx5: Add support in TOS and protocol to flow steering') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 20:10:06 -07:00
Yishai Hadas	414556af5f	IB/mlx5: Add support for ODP for DEVX indirection mkey Add support for ODP for DEVX indirection mkey, it includes: - Recognizing its type as part of the radix tree lookup. - Use similar flow as done for the MW MKEY type. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 20:06:49 -07:00
Yishai Hadas	534fd7aac5	IB/mlx5: Manage indirection mkey upon DEVX flow for ODP Manage indirection mkey upon DEVX flow to support ODP. To support a page fault event on the indirection mkey it needs to be part of the device mkey radix tree. Both the creation and the deletion flows for a DEVX object which is indirection mkey were adapted to handle that. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 20:06:49 -07:00
Yishai Hadas	fa31f14380	IB/mlx5: DEVX handling for indirection MKEY Once an indirection MKEY is created umem valid bit shouldn't be set as this MKEY doesn't really hold a umem. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 20:06:49 -07:00
Xiaofei Tan	2b9acb9a97	RDMA/hns: Add the process of AEQ overflow for hip08 AEQ overflow will be reported by hardware when too many asynchronous events occurred but not be handled in time. Normally, AEQ overflow error is not easy to occur. Once happened, we have to do physical function reset to recover. PF reset is implemented in two steps. Firstly, set reset level with ae_dev->ops->set_default_reset_request. Secondly, run reset with ae_dev->ops->reset_event. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 16:47:54 -07:00
Jason Gunthorpe	951d01b96f	IB/mlx5: Fix how advise_mr() launches async work Work must hold a kref on the ib_device otherwise the dev pointer can become free before the work runs. This can happen because the work is being pushed onto the system work queue which is not flushed during driver unregister. Remove the bogus use of 'reg_state': - While in uverbs the reg_state is guaranteed to always be REGISTERED - Testing reg_state with no locking is bogus. Use ib_device_try_get() to get back into a region that prevents unregistration. For now continue with a flow that is similar to the existing code. Fixes: `813e90b1ae` ("IB/mlx5: Add advise_mr() support") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com>	2019-01-21 14:39:29 -07:00
Michael J. Ruhl	7709b0dc26	IB/hfi1: Remove overly conservative VM_EXEC flag check Applications that use the stack for execution purposes cause userspace PSM jobs to fail during mmap(). Both Fortran (non-standard format parsing) and C (callback functions located in the stack) applications can be written such that stack execution is required. The linker notes this via the gnu_stack ELF flag. This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps to have PROT_EXEC for the process. Checking for VM_EXEC bit and failing the request with EPERM is overly conservative and will break any PSM application using executable stacks. Cc: <stable@vger.kernel.org> #v4.14+ Fixes: `1222026764` ("IB/hfi: Protect against writable mmap") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 14:20:08 -07:00
Brian Welty	904bba211a	IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM The work completion length for a receiving a UD send with immediate is short by 4 bytes causing application using this opcode to fail. The UD receive logic incorrectly subtracts 4 bytes for immediate value. These bytes are already included in header length and are used to calculate header/payload split, so the result is these 4 bytes are subtracted twice, once when the header length subtracted from the overall length and once again in the UD opcode specific path. Remove the extra subtraction when handling the opcode. Fixes: `7724105686` ("IB/hfi1: add driver files") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 14:20:08 -07:00
Jack Morgenstein	f45f8edbe1	IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV The commit cited below replaced rdma_create_ah with mlx4_ib_create_slave_ah when creating AHs for the paravirtualized special QPs. However, this change also required replacing rdma_destroy_ah with mlx4_ib_destroy_ah in the affected flows. The commit missed 3 places where rdma_destroy_ah should have been replaced with mlx4_ib_destroy_ah. As a result, the pd usecount was decremented when the ah was destroyed -- although the usecount was NOT incremented when the ah was created. This caused the pd usecount to become negative, and resulted in the WARN_ON stack trace below when the mlx4_ib.ko module was unloaded: WARNING: CPU: 3 PID: 25303 at drivers/infiniband/core/verbs.c:329 ib_dealloc_pd+0x6d/0x80 [ib_core] Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm ib_umad mlx4_ib(-) ib_uverbs ib_core mlx4_en mlx4_core nfsv3 nfs fscache configfs xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc dm_mirror dm_region_hash dm_log dm_mod dax rndis_wlan rndis_host coretemp kvm_intel cdc_ether kvm usbnet iTCO_wdt iTCO_vendor_support cfg80211 irqbypass lpc_ich ipmi_si i2c_i801 mii pcspkr i2c_core mfd_core ipmi_devintf i7core_edac ipmi_msghandler ioatdma pcc_cpufreq dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas mptscsih crc32c_intel ata_piix bnx2 mptbase ipv6 crc_ccitt autofs4 [last unloaded: mlx4_core] CPU: 3 PID: 25303 Comm: modprobe Tainted: G W I 5.0.0-rc1-net-mlx4+ #1 Hardware name: IBM -[7148ZV6]-/Node 1, System Card, BIOS -[MLE170CUS-1.70]- 09/23/2011 RIP: 0010:ib_dealloc_pd+0x6d/0x80 [ib_core] Code: 00 00 85 c0 75 02 5b c3 80 3d aa 87 03 00 00 75 f5 48 c7 c7 88 d7 8f a0 31 c0 c6 05 98 87 03 00 01 e8 07 4c 79 e0 0f 0b 5b c3 <0f> 0b eb be 0f 0b eb ab 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 RSP: 0018:ffffc90005347e30 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff8888589e9540 RCX: 0000000000000006 RDX: 0000000000000006 RSI: ffff88885d57ad40 RDI: 0000000000000000 RBP: ffff88885b029c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000004 R12: ffff8887f06c0000 R13: ffff8887f06c13e8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fd6743c6740(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ed1038 CR3: 00000007e3156000 CR4: 00000000000006e0 Call Trace: mlx4_ib_close_sriov+0x125/0x180 [mlx4_ib] mlx4_ib_remove+0x57/0x1f0 [mlx4_ib] mlx4_remove_device+0x92/0xa0 [mlx4_core] mlx4_unregister_interface+0x39/0x90 [mlx4_core] mlx4_ib_cleanup+0xc/0xd7 [mlx4_ib] __x64_sys_delete_module+0x17d/0x290 ? trace_hardirqs_off_thunk+0x1a/0x1c ? do_syscall_64+0x12/0x180 do_syscall_64+0x4a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: `5e62d5ff1b` ("IB/mlx4: Create slave AH's directly") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 14:20:08 -07:00
Mark Bloch	8af526e035	RDMA/mlx5: Fix check for supported user flags when creating a QP When the flags verification was added two flags were missed from the check: * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC This causes user applications that were using these flags to break. Fixes: `2e43bb31b8` ("IB/mlx5: Verify that driver supports user flags") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-21 14:20:08 -07:00
YueHaibing	790b57f686	IB/hw: Remove unneeded semicolons Remove unneeded semicolons. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-18 14:04:46 -07:00
YueHaibing	8ea175f005	RDMA/qedr: remove set but not used variable 'ib_ctx' Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_create_srq': drivers/infiniband/hw/qedr/verbs.c:1436:22: warning: variable 'ib_ctx' set but not used [-Wunused-but-set-variable] drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_create_user_qp': drivers/infiniband/hw/qedr/verbs.c:1701:22: warning: variable 'ib_ctx' set but not used [-Wunused-but-set-variable] Fixes: `b0ea0fa543` ("IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-18 13:41:23 -07:00
Lijun Ou	de77503a59	RDMA/hns: RDMA/hns: Assign rq head pointer when enable rq record db When flush cqe, it needs to get the pointer of rq and sq from db address space of user and update it into qp context by modified qp. if rq does not exist, it will not get the value from db address space of user. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-18 13:04:27 -07:00
Linus Torvalds	d7393226d1	First 5.0 rc pull request Not much so far, but I'm feeling like the 2nd PR -rc will be larger than this. We have the usual batch of bugs and two fixes to code merged this cycle. - Restore valgrind support for the ioctl verbs interface merged this window, and fix a missed error code on an error path from that conversion - A user reported crash on obsolete mthca hardware - pvrdma was using the wrong command opcode toward the hypervisor - NULL pointer crash regression when dumping rdma-cm over netlink - Be conservative about exposing the global rkey -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxBTeMACgkQOG33FX4g mxrOIQ//YdZdU9J825DM4ppH/MWRoPgayI+cca5sW2EG/nkgsvFJoiVDDK5/ka1g ge5Q21ZLMSPCBR0Iu/e/JOq6fJI4fsbcJGZURbyKgRZqyCBCf6qJbhiZKifpQMVb w7RP8kRFRdaiQzkAYfZSv9TP93JLvTDLg6zZ74r4vc8YphIzkI410v568hs6FiVu MIcb53pBWUswpCAnBVB+54sw+phJyjd02kmY4xTlWmiEzwHBb0JQ+Kps72/G0IWy 0vOlDI1UjwqoDfThzyT7mcXqnSbXxg/e8EecMpyFzlorQyxgZ5TsJgQ8ubSYxuiQ 7+dZ4rsdoZD++3MGtpmqDMQzKSPb989WzJT8WLp5oSw4ryAXeJJ+tys/APLtvPkf EgKgVyEqfxMDXn02/ENwDPpZyKLZkhcHFLgvfYmxtlDvtai/rvTLmzV1mptEaxlF +2pwSQM4/E/8qrLglN9kdFSfjBMb7Bvd2NYQqZ9vah2omb7gPsaTEEpVw6l/E0NX oOxFKPEzb0nP9KmJmwO8KLCvcrruuRL8kpmhc6sQMQJ6z0h4hmZrHF5EZZH92g0p maHyrx66vqw/Yl+TLvAb/T6FV1ax5c1TauiNErAjnag2wgVWW42Q7lQzSFLFI8su GU8oRlbIclDQ/1bszsf0IShq0r9G17+2n6yyTX39rj62YioiDlI= =ymZq -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes frfom Jason Gunthorpe: "Not much so far. We have the usual batch of bugs and two fixes to code merged this cycle: - Restore valgrind support for the ioctl verbs interface merged this window, and fix a missed error code on an error path from that conversion - A user reported crash on obsolete mthca hardware - pvrdma was using the wrong command opcode toward the hypervisor - NULL pointer crash regression when dumping rdma-cm over netlink - Be conservative about exposing the global rkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT RDMA/mthca: Clear QP objects during their allocation RDMA/vmw_pvrdma: Return the correct opcode when creating WR RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type RDMA/nldev: Don't expose unsafe global rkey to regular user RDMA/uverbs: Fix post send success return value in case of error	2019-01-18 17:17:20 +12:00
Gustavo A. R. Silva	8e8aa14542	RDMA/mlx5: Replace kzalloc with kcalloc Replace kzalloc() function with its 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a, b, gfp) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-15 15:53:08 -07:00
Raju Rangoju	3352976c89	RDMA/iw_cxgb4: Fix the unchecked ep dereference The patch `944661dd97`: "RDMA/iw_cxgb4: atomically lookup ep and get a reference" from May 6, 2016, leads to the following Smatch complaint: drivers/infiniband/hw/cxgb4/cm.c:2953 terminate() error: we previously assumed 'ep' could be null (see line 2945) Fixes: `944661dd97` ("RDMA/iw_cxgb4: atomically lookup ep and get a reference") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-15 15:51:46 -07:00
Leon Romanovsky	73f5a82bb3	RDMA/mad: Reduce MAD scope to mlx5_ib only Management Datagram Interface (MAD) is applicable only when physical port is Infiniband. It makes MAD command logic to be completely unrelated to eth/core parts of mlx5. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-15 10:02:29 +02:00
Dan Carpenter	97099cc652	RDMA/bnxt_re: fix a size calculation This is from static analysis not from testing. Depending on the value of rcfw->cmdq_depth, then this might not cause an issue at runtime. The BITS_TO_LONGS() macro tells us how many longs it take to hold a bitmap. In other words, it divides by the number if bits per long and rounds up. Then we want to take that number and multiple by sizeof(long) to get the number of bytes to allocate. The code here does the multiplication first so the rounding up is done in the wrong place. So imagine we want to allocate 1 bit, then "(1 * 8) / 64 = 1" when we round up. But it should be "(1 / 64) * 8 = 8". In other words, because of the rounding difference we might allocate up to "sizeof(long) - 1" bytes fewer than intended. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-14 14:05:54 -07:00
Parav Pandit	5474723115	RDMA: Introduce and use rdma_device_to_ibdev() Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-14 13:12:03 -07:00
Parav Pandit	ea4baf7f11	RDMA: Rename port_callback to init_port Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-14 13:05:14 -07:00
Leon Romanovsky	081de9495c	RDMA: Clear CTX objects during their allocation As part of an audit process to update drivers to use rdma_restrack_add() ensure that CTX objects is cleared before access. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:08:52 -07:00
Leon Romanovsky	0975890ebe	RDMA: Clear CQ objects during their allocation As part of an audit process to update drivers to use rdma_restrack_add() ensure that CQ objects is cleared before access. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:08:52 -07:00
Leon Romanovsky	8cbfaac3d0	RDMA: Clear PD objects during their allocation As part of an audit process to update drivers to use rdma_restrack_add() ensure that PD objects is cleared before access. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:08:52 -07:00
Gal Pressman	dbe30dae48	RDMA/qedr: Fix out of bounds index check in query pkey The pkey table size is QEDR_ROCE_PKEY_TABLE_LEN, index should be tested for >= QEDR_ROCE_PKEY_TABLE_LEN instead of > QEDR_ROCE_PKEY_TABLE_LEN. Fixes: `a7efd7773e` ("qedr: Add support for PD,PKEY and CQ verbs") Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:07:45 -07:00
Gal Pressman	b188940796	RDMA/ocrdma: Fix out of bounds index check in query pkey The pkey table size is one element, index should be tested for > 0 instead of > 1. Fixes: `fe2caefcdf` ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:07:45 -07:00
Gal Pressman	4959d5da57	IB/usnic: Fix out of bounds index check in query pkey The pkey table size is one element, index should be tested for > 0 instead of > 1. Fixes: `e3cf00d0a8` ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:07:45 -07:00
Jason Gunthorpe	b0ea0fa543	IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>	2019-01-10 17:07:45 -07:00
Shamir Rabinovitch	6fa8f1afd3	IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbs The next patch will add dependency from ib_umem_get in to ib_uverbs so move the required ib_umem_xxx functionality to it's correct module - ib_uverbs - and avoid circular dependecy from the form of ib_core -> ib_uverbs -> ib_core in depmod. Since this now requires all drivers to be build modular if uverbs is modular, hoist the test a couple drivers had into the main kconfig and apply it to all drivers uniformly. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:06:44 -07:00
Leon Romanovsky	9d9f59b420	RDMA/mthca: Clear QP objects during their allocation As part of audit process to update drivers to use rdma_restrack_add() ensure that QP objects is cleared before access. Such change fixes the crash observed with uninitialized non zero sgid attr accessed by ib_destroy_qp(). CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64 Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib] RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core] RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699 RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20 RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400 R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000 FS: 0000000000000000(0000) GS:ffff8d1bf7ac0000(0000) knlGS:0000000000000000 Trace: ib_destroy_qp+0xc9/0x240 [ib_core] ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib] process_one_work+0x1a1/0x3a0 worker_thread+0x30/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x22/0x40 Reported-by: Alexander Murashkin <AlexanderMurashkin@msn.com> Tested-by: Alexander Murashkin <AlexanderMurashkin@msn.com> Fixes: `1a1f460ff1` ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:01:22 -07:00
Adit Ranadive	6325e01b6c	RDMA/vmw_pvrdma: Return the correct opcode when creating WR Since the IB_WR_REG_MR opcode value changed, let's set the PVRDMA device opcodes explicitly. Reported-by: Ruishuang Wang <ruishuangw@vmware.com> Fixes: `9a59739bd0` ("IB/rxe: Revise the ib_wr_opcode enum") Cc: stable@vger.kernel.org Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Ruishuang Wang <ruishuangw@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-10 17:00:28 -07:00
Leon Romanovsky	13859d5df4	RDMA/mlx5: Embed into the code flow the ODP config option Convert various places to more readable code, which embeds CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-08 16:41:38 -07:00
Leon Romanovsky	8b4d5bc5cf	RDMA/mlx5: Introduce and reuse helper to identify ODP MR Consolidate various checks if MR is ODP backed to one simple helper and update call sites to use it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-08 16:41:38 -07:00
Leon Romanovsky	e502b8b011	RDMA/core: Don't depend device ODP capabilities on kconfig option Device capability bits are exposing what specific device supports from HW perspective. Those bits are not dependent on kernel configurations and RDMA/core should ensure that proper interfaces to users will be disabled if CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set. Fixes: `f4056bfd8c` ("IB/core: Add on demand paging caps to ib_uverbs_ex_query_device") Fixes: `8cdd312cfe` ("IB/mlx5: Implement the ODP capability query verb") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-08 16:41:38 -07:00
Leon Romanovsky	96f87ee181	RDMA: Clean structures from CONFIG_INFINIBAND_ON_DEMAND_PAGING CONFIG_INFINIBAND_ON_DEMAND_PAGING is used in general structures to micro-optimize the memory footprint. Remove it, so it will allow us to simplify various ODP device flows. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-08 16:41:38 -07:00
Luis Chamberlain	750afb08ca	cross-tree: phase out dma_zalloc_coherent() We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-01-08 07:58:37 -05:00
Lijun Ou	91fb4d83b8	RDMA/hns: Modify the pbl ba page size for hip08 Modify the pbl ba page size to 16K for in order to support 4G MR size. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 12:14:42 -07:00
Lijun Ou	44754b95dd	RDMA/hns: Add constraint on the setting of local ACK timeout According to IB protocol, local ACK timeout shall be a 5 bit value. Currently, hip08 could not support the possible max value 31. Fail the request in this case. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 12:13:34 -07:00
Lijun Ou	4d103905eb	RDMA/hns: Bugfix for the scene without receiver queue In some application scenario, the user could not have receive queue when run rdma write or read operation. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 12:12:50 -07:00
Lijun Ou	9c6ccc035c	RDMA/hns: Fix the bug with updating rq head pointer when flush cqe When flush cqe with srq, the driver disable to update the rq head pointer into the hardware. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 12:12:36 -07:00
Potnuri Bharat Teja	e6b7b7d8a9	iw_cxgb4: Check for send WR also while posting write with completion WR Inorder to optimize the NVMEoF read IOPs, iw_cxgb4 posts a FW Write with Completion WQE that combines an RDMA Write WR and the subsequent RDMA Send with Invalidate WR. This patch is an extension to it, where it posts a Write with completion for RDMA WRITE WR + RDMA SEND WR combination as well. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 12:02:45 -07:00
Gustavo A. R. Silva	02fc184841	IB/usnic: Use struct_size() in kmalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void ) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-07 11:43:03 -07:00
Linus Torvalds	3954e1d031	4.21 merge window 2nd pull request Over the break a few defects were found, so this is a -rc style pull request of various small things that have been posted. - An attempt to shorten RCU grace period driven delays showed crashes during heavier testing, and has been entirely reverted - A missed merge/rebase error between the advise_mr and ib_device_ops series - Some small static analysis driven fixes from Julia and Aditya - Missed ability to create a XRC_INI in the devx verbs interop series -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwu4TIACgkQOG33FX4g mxqPgw//XU7X2/AbXALQOvZgI6y/qs6BSzucGEkTEEMyJ5KvjS537yJqN7ltfe9d BiJLIpCUJ2NKqyUnbah7nHT06Mm7wZM+FkIxtf2N3te/MfYd5HwIvUdIwwmX+VEc k1DcRD1EfowZCSgBAVQAqqJu6oBW//Wi48BQ7HNGvyXVJJ/F+uKIM/Am6oGUTV/5 69yo0ZfqP/+bRfbNvg7cHqWafCL8ed70pIqpoL67hRfHcxUW/TQVV6njw8FNB/MH DNL6pN3oncUweyOPDV/Z6Cx+De5BFF498Rbvosugk8OO62wQ780DTvTeA5AlEtxV TEjTtd7QqDhWRELzv4WtU9ojrOnp3bzEu36Ok7ANEGAW40WdAL//eWQiaJF423Az zcD3w/t9ZE2mIX9h7YcVnMpmDvGpyQorG4mFYPfZgXLVxgrY2phLwiZsOk3B6PY8 cszL4mJFnk6DKB9/31nWgPpWl+V1/E48JODwU9Fz1d3ov+XvNC4SBp0hM6cfG25c insZevsAfMQ+k43Rw+iE62Sz9JTfJZpVekyMmIG5fqCZlzG4UXhB6On5r6TGvWc0 cnbZ+ELmsZY54DyAloOAKvBUuVY/t8QYaFo3y69v0B5ZiVnY1I00r74FyGEo21Cv /uxKbUmQxW4T9rdgZtWtfsSKcuiGrRDLTcLJ5j19c6bqJyF3fao= =REsM -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Over the break a few defects were found, so this is a -rc style pull request of various small things that have been posted. - An attempt to shorten RCU grace period driven delays showed crashes during heavier testing, and has been entirely reverted - A missed merge/rebase error between the advise_mr and ib_device_ops series - Some small static analysis driven fixes from Julia and Aditya - Missed ability to create a XRC_INI in the devx verbs interop series" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: infiniband/qedr: Potential null ptr dereference of qp infiniband: bnxt_re: qplib: Check the return value of send_message IB/ipoib: drop useless LIST_HEAD IB/core: Add advise_mr to the list of known ops Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads" IB/mlx5: Allow XRC INI usage via verbs in DEVX context	2019-01-05 18:20:51 -08:00
Linus Torvalds	96d4f267e4	Remove 'type' argument from access_ok() function Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-01-03 18:57:57 -08:00
Aditya Pakki	9c6260de50	infiniband/qedr: Potential null ptr dereference of qp idr_find() may fail and return a NULL pointer. The fix checks the return value of the function and returns an error in case of NULL. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-02 16:08:38 -07:00
Aditya Pakki	94edd87a1c	infiniband: bnxt_re: qplib: Check the return value of send_message In bnxt_qplib_map_tc2cos(), bnxt_qplib_rcfw_send_message() can return an error value but it is lost. Propagate this error to the callers. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Acked-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-02 16:04:24 -07:00
Leon Romanovsky	ccffa54548	Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads" Longer term testing shows this patch didn't play well with MR cache and caused to call traces during remove_mkeys(). This reverts commit `bb7e22a8ab`. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-02 09:40:34 -07:00
Yishai Hadas	7422edce73	IB/mlx5: Allow XRC INI usage via verbs in DEVX context From device point of view both XRC target and initiator are XRC transport type. Fix to use the expected UID as was handled for the XRC target case to allow its usage via verbs in DEVX context. Fixes: `5aa3771ded` ("IB/mlx5: Allow XRC usage via verbs in DEVX context") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-01-02 09:40:34 -07:00
Linus Torvalds	f346b0becb	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...	2018-12-28 16:55:46 -08:00
Linus Torvalds	5d24ae67a9	4.21 merge window pull request This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2 yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2 NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/ 8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E= =T0p1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...	2018-12-28 14:57:10 -08:00
Jérôme Glisse	5d6527a784	mm/mmu_notifier: use structure for invalidate_range_start/end callback Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-28 12:11:50 -08:00
Dan Carpenter	58f7c0bfb4	RDMA/mlx5: Signedness bug in UVERBS_HANDLER() The "num_actions" variable needs to be signed for the error handling to work. The maximum number of actions is less than 256 so int type is large enough for that. Fixes: `cbfdd442c4` ("IB/uverbs: Add helper to get array size from ptr attribute") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-22 16:07:13 -07:00
Yishai Hadas	aa74be6eea	IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported The per-port Q counter is some kernel resource and as such may be used by few UID(s) upon DEVX usage. To enable using it for QP/RQ when DEVX context is used need to allocate it with a sharing mode indication to let firmware allows its usage. The UID = 0xffff was chosen to mark it. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-21 12:15:07 -07:00
Jason Gunthorpe	623d154305	IB/mlx5: Fix wrong error unwind The destroy_workqueue on error unwind is missing, and the code jumps to the wrong exit label. Fixes: `813e90b1ae` ("IB/mlx5: Add advise_mr() support") Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 20:49:31 -07:00
YueHaibing	e7c4d8e604	IB/mlx4: Remove set but not used variable 'pd' Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/mlx4/qp.c: In function '_mlx4_ib_destroy_qp': drivers/infiniband/hw/mlx4/qp.c:1612:22: warning: variable 'pd' set but not used [-Wunused-but-set-variable] Fixes: `e00b64f7c5` ("RDMA: Cleanup undesired pd->uobject usage") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 20:47:42 -07:00
Huy Nguyen	bb7e22a8ab	IB/mlx5: Fix long EEH recover time with NVMe offloads On NVMe offloads connection with many IO queues, EEH takes long time to recover. The culprit is the synchronize_srcu in the destroy_mkey. The solution is to use synchronize_srcu only for ODP mkey. Fixes: `b4cfe447d4` ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 16:28:47 -07:00
Or Gerlitz	842a9c837e	IB/mlx5: Simplify netdev unbinding When dealing with netdev unregister events, we just need to know that this is our currently bounded netdev. There's no need to do any further checks/queries. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 15:18:24 -07:00
Yishai Hadas	6e3722baac	IB/mlx5: Use the correct commands for UMEM and UCTX allocation During testing the command format was changed to close a security hole. Revise the driver to use the command format that will actually be supported in GA firmware. Both the UMEM and UCTX are intended only for use by the kernel and cannot be executed using a general command. Since the UMEM and CTX are not part of the general object the caps bits were moved to be some log_xxx location in the general HCA caps. The firmware code was adapted as well to match the above. Fixes: `a8b92ca1b0` ("IB/mlx5: Introduce DEVX") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 13:49:48 -07:00
Yishai Hadas	425518cc5e	IB/mlx5: Use uid as part of alloc/dealloc transport domain Use uid as part of alloc/dealloc transport domain to let firmware manages the resources correctly. Fixes: `d2d19121ae` ("IB/mlx5: Set uid as part of TD commands") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-20 13:39:41 -07:00
Jason Gunthorpe	ed50edfb72	Merge branch 'mlx5-next' into rdma.git From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 updates taken for dependencies on following patches. * branche 'mlx5-next': (23 commits) IB/mlx5: Introduce uid as part of alloc/dealloc transport domain net/mlx5: Add shared Q counter bits net/mlx5: Continue driver initialization despite debugfs failure net/mlx5: Fold the modify lag code into function net/mlx5: Add lag affinity info to log net/mlx5: Split the activate lag function into two routines net/mlx5: E-Switch, Introduce flow counter affinity IB/mlx5: Unify e-switch representors load approach between uplink and VFs net/mlx5: Use lowercase 'X' for hex values net/mlx5: Remove duplicated include from eswitch.c net/mlx5: Remove the get protocol device interface entry net/mlx5: Support extended destination format in flow steering command net/mlx5: E-Switch, Change vhca id valid bool field to bit flag net/mlx5: Introduce extended destination fields net/mlx5: Revise gre and nvgre key formats net/mlx5: Add monitor commands layout and event data net/mlx5: Add support for plugged-disabled cable status in PME net/mlx5: Add support for PCIe power slot exceeded error in PME net/mlx5: Rework handling of port module events net/mlx5: Move flow counters data structures from flow steering header ...	2018-12-20 13:24:50 -07:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
Devesh Sharma	bd1c24ccf9	RDMA/bnxt_re: Increase depth of control path command queue Increasing the depth of control path command queue to 8K entries to handle burst of commands. This feature needs support from FW and the driver/fw compatibility is checked from the interface version number. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:37:33 -07:00
Selvin Xavier	2b827ea192	RDMA/bnxt_re: Query HWRM Interface version from FW Get HWRM interface major, minor, build and patch version from FW for checking the FW/Driver compatibility. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:37:32 -07:00
Parvi Kaustubhi	8036e90f92	IB/usnic: Fix potential deadlock Acquiring the rtnl lock while holding usdev_lock could result in a deadlock. For example: usnic_ib_query_port() \| mutex_lock(&us_ibdev->usdev_lock) \| ib_get_eth_speed() \| rtnl_lock() rtnl_lock() \| usnic_ib_netdevice_event() \| mutex_lock(&us_ibdev->usdev_lock) This commit moves the usdev_lock acquisition after the rtnl lock has been released. This is safe to do because usdev_lock is not protecting anything being accessed in ib_get_eth_speed(). Hence, the correct order of holding locks (rtnl -> usdev_lock) is not violated. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:30:16 -07:00
Gal Pressman	50c582de1d	RDMA/bnxt_re: Make use of destroy AH sleepable flag When in a sleepable (non-atomic) context, wait for firmware completion instead of polling for it. Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:28:04 -07:00
Gal Pressman	90e3edd8cc	RDMA/bnxt_re: Make use of create AH sleepable flag When in a sleepable (non-atomic) context, wait for firmware completion instead of polling for it. Signed-off-by: Gal Pressman <galpress@amazon.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:28:03 -07:00
Gal Pressman	2553ba217e	RDMA: Mark if destroy address handle is in a sleepable context Introduce a 'flags' field to destroy address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:28:03 -07:00
Gal Pressman	b090c4e3a0	RDMA: Mark if create address handle is in a sleepable context Introduce a 'flags' field to create address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-19 16:17:19 -07:00
Doug Ledford	c9e585ebdc	IB/mlx5: Fix compile issue when ODP disabled When CONFIG_INFINIBAND_ON_DEMAND_PAGING is not enabled, we were getting build failures for defined but not used code. Fix that. Fixes: `813e90b1ae` ("IB/mlx5: Add advise_mr() support") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-12-19 13:43:17 -05:00
Shamir Rabinovitch	e00b64f7c5	RDMA: Cleanup undesired pd->uobject usage Drivers should be using udata to determine if a method is invoked from user space or kernel space. A pd does not necessarily say a different objects is kernel or user. Transforming the tests to use udata eliminates a large number of uobject references from the drivers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 19:15:48 -07:00
Moni Shoua	813e90b1ae	IB/mlx5: Add advise_mr() support The verb advise_mr() is used to give advice to the kernel about an address range that belongs to a MR. Implement the verb and register it on the device. The current implementation supports the only known advice to date, prefetch. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 15:26:19 -07:00
Moni Shoua	cbfdd442c4	IB/uverbs: Add helper to get array size from ptr attribute When the parser of an ioctl command has the knowledge that a ptr attribute in a bundle represents an array of structures, it is useful for it to know the number of elements in the array. This is done by dividing the attribute length with the element size. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 15:19:45 -07:00
Yuval Shaia	350b4c8ac1	IB/mlx4: Utilize macro to calculate SQ spare size The macro MLX4_IB_SQ_HEADROOM calculates the spare room needed to be left. Use it instead of hard-coding the HW prefetch size. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:24:34 -07:00
Dan Carpenter	e9dfa53a39	RDMA/hns: Fix an error code in hns_roce_create_srq() The function accidentally returns success on this error path. Fixes: `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:21:45 -07:00
Dan Carpenter	5050ae5fa3	IB/qib: Fix an error code in qib_sdma_verbs_send() We accidentally return success on this error path. Fixes: `f931551baf` ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:20:41 -07:00
Gal Pressman	ac2f7e623d	RDMA/mlx5: Fix function name typo 'fileds' -> 'fields' Fix typo in 'set_mr_fileds' -> 'set_mr_fields'. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:15:24 -07:00
Kamal Heib	b81a327dbc	RDMA/i40iw: Make sure to initialize ib_device_ops The initialization of the ib_device_ops was dropped by mistake when rebasing the ib_device_ops series, this patch fixes that. Fixes: `15644f57cb` ("RDMA/i40iw: Initialize ib_device_ops struct") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:12:26 -07:00
Leon Romanovsky	8e3b688301	RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion Handle atomic was left as unimplemented from 2013, remove the code till this part will be developed. Remove the dead code by simplifying SW completion logic which is supposed to be the same for send and receive paths. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # compile tested Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 13:59:34 -07:00
Aviv Heller	7c34ec19e1	net/mlx5: Make RoCE and SR-IOV LAG modes explicit With the introduction of SR-IOV LAG, checking whether LAG is active is no longer good enough, since RoCE and SR-IOV LAG each entails different behavior by both the core and infiniband drivers. This patch introduces facilities to discern LAG type, in addition to mlx5_lag_is_active(). These are implemented in such a way as to allow more complex mode combinations in the future. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-14 13:28:55 -08:00
Saeed Mahameed	64e4cf0dab	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev conflicts. Highlights: 1) Lag refactroing and flow counter affinity bits. 2) mlx5 core cleanups By Roi Dayan (2) and others * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Fold the modify lag code into function net/mlx5: Add lag affinity info to log net/mlx5: Split the activate lag function into two routines net/mlx5: E-Switch, Introduce flow counter affinity IB/mlx5: Unify e-switch representors load approach between uplink and VFs net/mlx5: Use lowercase 'X' for hex values net/mlx5: Remove duplicated include from eswitch.c Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-14 11:15:25 -08:00
Mark Bloch	06cc74af05	IB/mlx5: Unify e-switch representors load approach between uplink and VFs When in switchdev mode and the add function is called by the core level driver, make sure we only register the callbacks, but don't create the mlx5 IB device or initialize anything. With this change all the IB devices in switchdev mode are created only once the load callback is invoked by the e-switch core sub-module. This follows the design paradigm under which the all the Eth representors must be loaded before any of IB reprs is loaded. Signed-off-by: Mark Bloch <markb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-14 09:58:57 -08:00
Kamal Heib	3023a1e936	RDMA: Start use ib_device_ops Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:16 -07:00
Kamal Heib	20a6b58861	RDMA/vmw_pvrdma: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:15 -07:00
Kamal Heib	e761058190	RDMA/usnic: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:15 -07:00
Kamal Heib	16b0ba9571	RDMA/qib: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:14 -07:00
Kamal Heib	bd59461e57	RDMA/qedr: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:14 -07:00
Kamal Heib	a263c1241a	RDMA/ocrdma: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:14 -07:00
Kamal Heib	5a6c6e71ac	RDMA/nes: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:14 -07:00
Kamal Heib	56e2a43136	RDMA/mthca: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:13 -07:00
Kamal Heib	96458233ee	RDMA/mlx5: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:13 -07:00
Kamal Heib	4725c4ba8d	RDMA/mlx4: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-11 15:15:08 -07:00
Kamal Heib	15644f57cb	RDMA/i40iw: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-11 15:15:08 -07:00
Kamal Heib	7f645a58d0	RDMA/hns: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-11 15:15:08 -07:00
Kamal Heib	e3c320caa1	RDMA/hfi1: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-11 15:15:07 -07:00

... 3 4 5 6 7 ...

6113 Commits