linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-17 12:56:55 +07:00

Author	SHA1	Message	Date
Christoph Hellwig	25556ae6b9	IB: remove xrc_remote_srq_num from struct ib_send_wr The field is only initialized in mlx, but never used. If we want to add proper XRC support it should be done with a new struct ib_xrc_wr. This shrinks the various WR structures by another 4 bytes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Haggai Eran <haggaie@mellanox.com>	2015-10-08 11:09:11 +01:00
Christoph Hellwig	e622f2f4ad	IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>	2015-10-08 11:09:10 +01:00
Bodong Wang	470a553581	IB/core: Add support of checksum capability reporting for RC and RAW Two enum members IB_DEVICE_RC_IP_CSUM and IB_DEVICE_RAW_IP_CSUM are added to ib_device_cap_flags. Device should set these two flags if they support insertion of UDP and TCP checksum on outgoing IPv4 messages and can verify the validity of checksum for incoming IPv4 messages, for RC IPoIB and RAW over Ethernet respectively. They are similar to IB_DEVICE_UD_IP_CSUM. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-09-28 22:10:15 -04:00
Linus Torvalds	26d2177e97	Changes for 4.3 - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc. mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt 30ZYFuW8evTL+YQcaV65 =EHLg -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...	2015-09-09 08:33:31 -07:00
Yishai Hadas	036b106357	IB/uverbs: Enable device removal when there are active user space applications Enables the uverbs_remove_one to succeed despite the fact that there are running IB applications working with the given ib device. This functionality enables a HW device to be unbind/reset despite the fact that there are running user space applications using it. It exposes a new IB kernel API named 'disassociate_ucontext' which lets a driver detaching its HW resources from a given user context without crashing/terminating the application. In case a driver implemented the above API and registered with ib_uverb there will be no dependency between its device to its uverbs_device. Upon calling remove_one of ib_uverbs the call should return after disassociating the open HW resources without waiting to clients disconnecting. In case driver didn't implement this API there will be no change to current behaviour and uverbs_remove_one will return only when last client has disconnected and reference count on uverbs device became 0. In case the lower driver device was removed any application will continue working over some zombie HCA, further calls will ended with an immediate error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:40 -04:00
Jason Gunthorpe	7dd78647a2	IB/core: Make ib_dealloc_pd return void The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:39 -04:00
Jason Gunthorpe	96249d70dd	IB/core: Guarantee that a local_dma_lkey is available Every single ULP requires a local_dma_lkey to do anything with a QP, so let us ensure one exists for every PD created. If the driver can supply a global local_dma_lkey then use that, otherwise ask the driver to create a local use all physical memory MR associated with the new PD. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Acked-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:33 -04:00
Moni Shoua	e26be1bfef	IB/mlx4: Implement ib_device callbacks get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:12:20 -04:00
Matan Barak	03db3a2d81	IB/core: Add RoCE GID table management RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:50 -04:00
Sagi Grimberg	d9f272c523	IB/core: Drop ib_alloc_fast_reg_mr Fully replaced by a more generic and suitable ib_alloc_mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:49 -04:00
Sagi Grimberg	9bee178b4f	IB: Modify ib_create_mr API Use ib_alloc_mr with specific parameters. Change the existing callers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:44 -04:00
Sagi Grimberg	8b91ffc1cf	IB/core: Get rid of redundant verb ib_destroy_mr This was added in a thought of uniting all mr allocation and deallocation routines but the fact is we have a single deallocation routine already, ib_dereg_mr. And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr (includes only signature stuff for now). And, fixup the only callers (iser/isert) accordingly. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:44 -04:00
Yotam Kenneth	9268f72dcb	IB/core: Find the network device matching connection parameters In the case of IPoIB, and maybe in other cases, the network device is managed by an upper-layer protocol (ULP). In order to expose this network device to other users of the IB device, let ULPs implement a callback that returns network device according to connection parameters. The IB device and port, together with the P_Key and the GID should be enough to uniquely identify the ULP net device. However, in current kernels there can be multiple IPoIB interfaces created with the same GID. Furthermore, such configuration may be desireable to support ipvlan-like configurations for RDMA CM with IPoIB. To resolve the device in these cases the code will also take the IP address as an additional input. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 15:48:21 -04:00
Haggai Eran	7c1eb45a22	IB/core: lock client data with lists_rwsem An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 15:48:21 -04:00
Steve Wise	3403051ebb	RDMA/Core: remove rdma_cap_read_multi_sge() helper This functionality already exists via the max_sge_rd device capability. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-28 23:02:11 -04:00
Chuck Lever	1241d7bf2a	core: Remove the ib_reg_phys_mr() and ib_rereg_phys_mr() verbs The verbs are obsolete. The ib_rereg_phys_mr() verb is not used by kernel ULPs, and the last ib_reg_phys_mr() call site in the kernel tree has now been removed. Two staging tree call sites remain in the Lustre client. The Lustre team has been notified of the deprecation of reg_phys_mr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:28 -04:00
Hal Rosenstock	4139032b48	IB: Add rdma_cap_ib_switch helper and use where appropriate Persuant to Liran's comments on node_type on linux-rdma mailing list: In an effort to reform the RDMA core and ULPs to minimize use of node_type in struct ib_device, an additional bit is added to struct ib_device for is_switch (IB switch). This is needed to be initialized by any IB switch device driver. This is a NEW requirement on such device drivers which are all "out of tree". In addition, an ib_switch helper was added to ib_verbs.h based on the is_switch device bit rather than node_type (although those should be consistent). The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs) as well as (IPoIB and SRP) ULPs are updated where appropriate to use this new helper. In some cases, the helper is now used under the covers of using rdma_[start end]_port rather than the open coding previously used. Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-07-14 13:20:08 -04:00
Ira Weiny	65995fee84	IB/core: Add OPA MAD core capability flag Add OPA MAD support flags to the core capability immutable flags. In addition add the rdma_cap_opa_mad helper function for core functions to use to detect OPA MAD support. OPA MADs share a common header with IBTA MADs but with some differences for increased performance. Sharing a common header with IBTA MADs allows us to share most of the MAD processing code when dealing with OPA MADs in addition to supporting some IBTA MADs on OPA devices. OPA MADs differ in the following ways: 1) MADs are variable size up to 2K IBTA defined MADs remain fixed at 256 bytes 2) OPA SMPs must carry valid PKeys 3) OPA SMP packets are a different format The MAD stack will use this new functionality to determine if OPA MAD processing should occur on individual device ports. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	4cd7c9479a	IB/mad: Add support for additional MAD info to/from drivers In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	337877a466	IB/core: Add ability for drivers to report an alternate MAD size. Add max MAD size to the device immutable data set and have all drivers that support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core. Verify MAD size data in both the MAD core and when reading the immutable data. OPA drivers will report alternate MAD sizes in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Matan Barak	2528e33e68	IB/core: Pass hardware specific data in query_device Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	24306dc661	IB/core: Add timestamp_mask and hca_core_clock to query_device In order to expose timestamp we need to expose two new attributes in query_device to be used for CQ completion time-stamping: timestamp_mask - how many bits are valid in the timestamp, where timestamp values could be 64bits the most. hca_core_clock - timestamp is given in HW cycles, the frequency in KHZ units of the HCA, necessary in order to convert cycles to seconds. This is added both to ib_query_device and its respective uverbs counterpart. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	b9926b92a4	IB/core: Add CQ creation time-stamping flag Add CQ creation flag which dictates that the created CQ will report completion time-stamp value in the WC. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	8e37210b38	IB/core: Change ib_create_cq to use struct ib_cq_init_attr Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	bcf4c1ea58	IB/core: Change provider's API of create_cq to be extendible Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Moni Shoua	db75d0547c	IB/core: Don't advertise SA in RoCE port capabilities The Subnet Administrator (SA) is not a component of the RoCE spec. Therefore, it should not be a capability of a RoCE port. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-10 23:54:33 -04:00
Ira Weiny	73cdaaeed1	IB/core cleanup: Add const to args - agent_send_response In order to support constant callers of agent_send_response we add const specifiers to the its pointer arguments. Adjust the call tree accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:33:13 -04:00
Ira Weiny	a97e2d86a9	IB/core cleanup: Add const on args - device->process_mad The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:33:13 -04:00
Ira Weiny	5ede928985	IB/core cleanup: Add const to RDMA helpers The ib_device passed to the new RDMA helpers is constant. Declare the ib_device as const in the following functions. rdma_protocol_ib rdma_protocol_roce rdma_protocol_iwarp rdma_ib_or_roce rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:33:13 -04:00
Doug Ledford	175e8efe69	Merge branches 'bart-srp', 'generic-errors', 'ira-cleanups' and 'mwang-v8' into k.o/for-4.2	2015-05-20 16:12:40 -04:00
Ira Weiny	5d9fb04406	IB/core: Change rdma_protocol_iboe to roce After discussion upstream, it was agreed to transition the usage of iboe in the kernel to roce. This keeps our terminology consistent with what was finalized in the IBTA Annex 16 and IBTA Annex 17 publications. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 15:58:19 -04:00
Ira Weiny	f9b22e355d	IB/core: Convert core to use bitfield for caps Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:38:43 -04:00
Ira Weiny	7738613e7c	IB/core: Add per port immutable struct to ib_device As of commit `5eb620c81c` "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:38:13 -04:00
Sagi Grimberg	2b1b5b6012	IB/core, cma: Nice log-friendly string helpers Some of us keep revisiting the code to decode enumerations that appear in out logs. Let's borrow the nice logging helpers that exists in xprtrdma and rds for CMA events, IB events and WC statuses. Reviewd-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:43:52 -04:00
Ira Weiny	0cf18d7723	IB/core: Create common start/end port functions Previously start_port and end_port were defined in 2 places, cache.c and device.c and this prevented their use in other modules. Make these common functions, change the name to reflect the rdma name space, and update existing users. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:06 -04:00
Michael Wang	296ec00995	IB/Verbs: Improve docs for rdma-helpers Increase the level of documentation for the rdma_cap_* helpers introduced by Michael Wang <yun.wang@profitbricks.com>. This patch is loosely based on a patch Michael wrote to enhance the documentation of these functions, but has been significantly modified in terms of verbiage. In addition, the comments were moved from a kernel Documentation/infiniband/ file to being inline in the header file itself for the functions in question. Finally, the documentation was formated in proper kdoc format. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:06 -04:00
Michael Wang	227128fc68	IB/Verbs: Use management helper rdma_cap_eth_ah() Introduce helper rdma_cap_eth_ah() to help us check if the port of an IB device support Ethernet Address Handler. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:06 -04:00
Michael Wang	30a74ef41d	IB/Verbs: Use management helper rdma_cap_af_ib() Introduce helper rdma_cap_af_ib() to help us check if the port of an IB device support Native Infiniband Address. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	bc0f1d7153	IB/Verbs: Use management helper rdma_cap_read_multi_sge() Introduce helper rdma_cap_read_multi_sge() to help us check if the port of an IB device support RDMA Read Multiple Scatter-Gather Entries. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	a31ad3b0e3	IB/Verbs: Use management helper rdma_cap_ib_mcast() Introduce helper rdma_cap_ib_mcast() to help us check if the port of an IB device support Infiniband Multicast. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	fe53ba2f0c	IB/Verbs: Use management helper rdma_cap_ib_sa() Introduce helper rdma_cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	042153306d	IB/Verbs: Use management helper rdma_cap_iw_cm() Introduce helper rdma_cap_iw_cm() to help us check if the port of an IB device support IWARP Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	72219cea8e	IB/Verbs: Use management helper rdma_cap_ib_cm() Introduce helper rdma_cap_ib_cm() to help us check if the port of an IB device support Infiniband Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	29541e3add	IB/Verbs: Use management helper rdma_cap_ib_smi() Introduce helper rdma_cap_ib_smi() to help us check if the port of an IB device support Infiniband Subnet Management Interface. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	c757dea816	IB/Verbs: Use management helper rdma_cap_ib_mad() Introduce helper rdma_cap_ib_mad() to help us check if the port of an IB device support Infiniband Management Datagrams. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	de66be9474	IB/Verbs: Implement raw management helpers Add raw helpers: rdma_protocol_ib rdma_protocol_iboe rdma_protocol_iwarp rdma_ib_or_iboe (transition, clean up later) To help us detect which technology the port supported. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:03 -04:00
Michael Wang	6b90a6d66b	IB/Verbs: Implement new callback query_protocol() Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:03 -04:00
Yann Droneaud	43c6116573	Revert "IB/core: Add support for extended query device caps" While commit `7e36ef8205` ("IB/core: Temporarily disable ex_query_device uverb") is correct as it makes the extended QUERY_DEVICE uverb (which came as part of commit `5a77abf9a9` ("IB/core: Add support for extended query device caps") and commit `860f10a799` ("IB/core: Add flags for on demand paging support")) not available to userspace, it doesn't address the initial issue regarding ib_copy_to_udata() [1][2]. Additionally, further discussions around this new uverb seems to conclude it would require a different data structure than the one currently described in <rdma/ib_user_verbs.h> [3]. Both of these issues require a revert of the changes, so this patch partially reverts commit `8cdd312cfe` ("IB/mlx5: Implement the ODP capability query verb") and commit `860f10a799` ("IB/core: Add flags for on demand paging support") and fully reverts commit `5a77abf9a9` ("IB/core: Add support for extended query device caps"). [1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps" http://mid.gmane.org/1418733236.2779.26.camel@opteya.com [2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb" http://mid.gmane.org/1423067503.3030.83.camel@opteya.com [3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask" http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-06 00:54:33 -08:00
Haggai Eran	882214e2b1	IB/core: Implement support for MMU notifiers regarding on demand paging regions * Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yuval Dagan <yuvalda@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Shachar Raindel	8ada2c1c0c	IB/core: Add support for on demand paging regions * Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Sagi Grimberg	860f10a799	IB/core: Add flags for on demand paging support * Add a configuration option for enable on-demand paging support in the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later patch, this configuration option will select the MMU_NOTIFIER configuration option to enable mmu notifiers. * Add a flag for on demand paging (ODP) support in the IB device capabilities. * Add a flag to request ODP MR in the access flags to reg_mr. * Fail registrations done with the ODP flag when the low-level driver doesn't support this. * Change the conditions in which an MR will be writable to explicitly specify the access flags. This is to avoid making an MR writable just because it is an ODP MR. * Add a ODP capabilities to the extended query device verb. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Eli Cohen	5a77abf9a9	IB/core: Add support for extended query device caps Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Sagi Grimberg	78eda2bb65	IB/mlx5, iser, isert: Add Signature API additions Expose more signature setting parameters. We modify the signature API to allow usage of some new execution parameters relevant to data integrity feature. This patch modifies ib_sig_domain structure by: - Deprecate DIF type in signature API (operation will be determined by the parameters alone, no DIF type awareness) - Add APPTAG check bitmask (for input domain) - Add REFTAG remap (increment) flag for each domain - Add APPTAG/REFTAG escape options for each domain The mlx5 driver is modified to follow the new parameters in HW signature setup. At the moment the callers (iser/isert) hard-code new parameters (by DIF type). In the future, callers will retrieve them from the scsi command structure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-10-09 00:10:53 -07:00
Matan Barak	7e6edb9b2e	IB/core: Add user MR re-registration support Memory re-registration is a feature that enables changing the attributes of a memory region registered by user-space, including PD, translation (address and length) and access flags. Add the required support in uverbs and the kernel verbs API. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-08-01 15:11:13 -07:00
Roland Dreier	eeaddf3670	Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next	2014-06-10 10:12:14 -07:00
Roland Dreier	8385fd8414	IB/core: Fix sparse warnings about redeclared functions Fix a few functions that are declared with __attribute_const__ in the ib_verbs.h header file but defined without it in verbs.c. This gets rid of the following sparse warnings: drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-04 10:01:42 -07:00
Or Gerlitz	09b93088d7	IB: Add a QP creation flag to use GFP_NOIO allocations This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-06-02 14:58:11 -07:00
Roland Dreier	f7eaa7ed8f	Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next	2014-04-03 08:30:17 -07:00
Mike Marciniszyn	ea58a59565	IB/core: Remove overload in ib_sg_dma* The code is replaced by driver specific changes and avoids the pointer NULL test for drivers that don't overload these operations. Suggested-by: <Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Vinod Kumar <vinod.kumar@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 11:16:32 -07:00
Sagi Grimberg	1b01d33560	IB/core: Introduce signature verbs API Introduce a verbs interface for signature-related operations. A signature handover operation configures the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT: Generate and insert protection information when handing over data from input space to output space. - validate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting signature handover operations fills relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating a QP that will carry signature handover operations may require some special preparations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now). 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: wr->sg_list (ib_sge). * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer) - data protection guards: sig_handover.prot (ib_sge). * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_handover.sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region that will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_mr_status check_mr_status verb checks the status of the memory region post transaction. The first check that may be used is IB_MR_CHECK_SIG_STATUS, which will indicate if any signature errors are pending for a specific signature-enabled ib_mr. This verb is a lightwight check and is allowed to be taken from interrupt context. An application must call this verb after it is known that the actual data transfer has finished. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Sagi Grimberg	17cd3a2db8	IB/core: Introduce protected memory regions This commit introduces verbs for creating/destoying memory regions which will allow new types of memory key operations such as protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes which will describe protection schemes that will be handled by the HCA in an offloaded manner. These memory regions will be applicable for a new REG_SIG_MR work request introduced later in this patchset. In the future these routines may replace or implement current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Moni Shoua	b4a26a2728	IB: Report using RoCE IP based gids in port caps For userspace RoCE UD QPs we need to know the GID format that the kernel uses, e.g when working over older kernels. For that end, add a new port capability IB_PORT_IP_BASED_GIDS and report it when query port is issued. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:46:03 -08:00
Roland Dreier	fb1b5034e4	Merge branch 'ip-roce' into for-next Conflicts: drivers/infiniband/hw/mlx4/main.c	2014-01-22 23:24:21 -08:00
Roland Dreier	8f399921ea	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next	2014-01-22 23:24:13 -08:00
Upinder Malhi	5db5765e25	IB/core: Add support for RDMA_NODE_USNIC_UDP Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:48:54 -08:00
Matan Barak	dd5f03beb4	IB/core: Ethernet L2 attributes in verbs/cm structures This patch add the support for Ethernet L2 attributes in the verbs/cm/cma structures. When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority in a similar manner that the IB L2 (and the L4 PKEY) attributes are used. Thus, those attributes were added to the following structures: * ib_ah_attr - added dmac * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority) * ib_wc - added smac, vlan_id * ib_sa_path_rec - added smac, dmac, vlan_id * cm_av - added smac and vlan_id For the path record structure, extra care was taken to avoid the new fields when packing it into wire format, so we don't break the IB CM and SA wire protocol. On the active side, the CM fills. its internal structures from the path provided by the ULP. We add there taking the ETH L2 attributes and placing them into the CM Address Handle (struct cm_av). On the passive side, the CM fills its internal structures from the WC associated with the REQ message. We add there taking the ETH L2 attributes from the WC. When the HW driver provides the required ETH L2 attributes in the WC, they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks for the presence of these flags, and in their absence does address resolution from the ib_init_ah_from_wc() helper function. ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters are mandatory for Ethernet link layer, while they are irrelevant for IB. Vendor drivers are modified to support the new function signature. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:20:54 -08:00
Matan Barak	240ae00e4d	IB/core: Add support for IB L2 device-managed steering This patch adds preliminary support for IB L2 device-managed steering, currently exposed only in the kernel. This flow spec can be used by low-level drivers that need to indicate the link layer type when creating device-managed flow rules. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:06:50 -08:00
Matan Barak	90f1d1b41b	IB/core: Add flow steering support for IPoIB UD traffic When creating an IPoIB UD QP, provide a hint to the low level driver that the QP should support flow-steering. This means that privileged user space applications can steer TCP/IP IPoIB traffic from the network stack, in a similar manner done with Ethernet RAW_PACKET QPs. The hint is provided through new QP creation flag called NETIF_QP. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:06:50 -08:00
Upinder Malhi	248567f793	IB/core: Add RDMA_TRANSPORT_USNIC_UDP Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:45 -08:00
Yann Droneaud	309243ec14	IB/core: const'ify inbuf in struct ib_udata Userspace input buffer is not modified by kernel, so it can be 'const'. This is also a prerequisite to remove the implicit cast from INIT_UDATA(). Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-12-16 10:38:28 -08:00
Roland Dreier	b4fdf52b3f	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next	2013-11-17 08:22:19 -08:00
Yann Droneaud	f21519b23c	IB/core: extended command: an improved infrastructure for uverbs commands Commit `400dbc9658` ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit `400dbc9658`, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ \| flags \| 00 00 \| command \| \| in_words \| out_words \| +----------------------------------------+ \| response \| \| response \| \| provider_in_words \| provider_out_words \| \| padding \| +----------------------------------------+ \| \| . <uverbs input> . . (in_words * 8) . \| \| +----------------------------------------+ \| \| . <provider input> . . (provider_in_words * 8) . \| \| +----------------------------------------+ - response, if present: +----------------------------------------+ \| \| . <uverbs output space> . . (out_words * 8) . \| \| +----------------------------------------+ \| \| . <provider output space> . . (provider_out_words * 8) . \| \| +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Eli Cohen	1c636f8016	IB/core: Encorce MR access rights rules on kernel consumers Enforce the rule that when requesting remote write or atomic permissions, local write must be indicated as well. See IB spec 11.2.8.2. Spotted by: Hagay Abramovsky <hagaya@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:25:32 -08:00
$Upinder Malhi $umalhi$$ Upinder Malhi $umalhi$	180771a370	IB/core: Add Cisco usNIC rdma node and transport types This patch adds new rdma node and new rdma transport, and supporting code used by Cisco's low latency driver called usNIC. usNIC uses its own transport, distinct from IB and iWARP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:36:25 -08:00
Roland Dreier	82af24ac6f	Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	2013-09-03 09:01:08 -07:00
Matan Barak	22878dbc91	IB/core: Better checking of userspace values for receive flow steering - Don't allow unsupported comp_mask values, user should check ibv_query_device to know which features are supported. - Add a check in ib_uverbs_create_flow() to verify the size passed from the user space. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 11:12:48 -07:00
Hadar Hen Zion	436f2ad05a	IB/core: Export ib_create/destroy_flow through uverbs Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:53:14 -07:00
Hadar Hen Zion	319a441d13	IB/core: Add receive flow steering support The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, which receive plain Ethernet packets, specifically packets that don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow(), which allow providing a flow specification for a QP. When there's a match between the specification and a received packet, the packet is forwarded to that QP, in a the same way one uses ib_attach_multicast() for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy, which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow() is a struct ib_flow_attr, which contains a few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, just with a little API enhancement which defines the newly added spec. The flow spec structures are defined with TLV (Type-Length-Value) entries, which allows calling ib_create_flow() with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower numerical value for the priority field means higher priority. The returned value from ib_create_flow() is a struct ib_flow, which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow(). Applications that offload TCP/IP traffic can also be written over IB UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING to denote support for flow steering. The ib_flow_attr enum type supports usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:51:52 -07:00
Yishai Hadas	73c40c616a	IB/core: Add locking around event dispatching on XRC target QPs Fix a potential race when event occurrs on a target XRC QP and in the middle of reporting that on its shared qps, one of them is destroyed by user space application. Also add note for kernel consumers in ib_verbs.h that they must not destroy the QP from within the handler. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:21:32 -07:00
Jack Morgenstein	0134f16bc9	IB/core: Add reserved values to enums for low-level driver use Continue the approach taken by commit `d2b57063e4` ("IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use") and add reserved entries to the ib_qp_type and ib_wr_opcode enums. Low-level drivers can then define macros to use these reserved values, giving proper names to the macros for readability. Also add a range of reserved flags to enum ib_send_flags. The mlx5 IB driver uses the new additions. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-07 19:21:21 -07:00
Shani Michaeli	7083e42ee2	IB/core: Add "type 2" memory windows support This patch enhances the IB core support for Memory Windows (MWs). MWs allow an application to have better/flexible control over remote access to memory. Two types of MWs are supported, with the second type having two flavors: Type 1 - associated with PD only Type 2A - associated with QPN only Type 2B - associated with PD and QPN Applications can allocate a MW once, and then repeatedly bind the MW to different ranges in MRs that are associated to the same PD. Type 1 windows are bound through a verb, while type 2 windows are bound by posting a work request. The 32-bit memory key is composed of a 24-bit index and an 8-bit key. The key is changed with each bind, thus allowing more control over the peer's use of the memory key. The changes introduced are the following: * add memory window type enum and a corresponding parameter to ib_alloc_mw. * type 2 memory window bind work request support. * create a struct that contains the common part of the bind verb struct ibv_mw_bind and the bind work request into a single struct. * add the ib_inc_rkey helper function to advance the tag part of an rkey. Consumer interface details: * new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support for these features. Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B memory windows. It can set neither to indicate it doesn't support type 2 windows at all. * modify existing provides and consumers code to the new param of ib_alloc_mw and the ib_mw_bind_info structure Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:51:45 -08:00
Jack Morgenstein	d2b57063e4	IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use Reserve bits 26-31 for internal use by low-level drivers. Two such bits are used in the mlx4_b driver SR-IOV implementation. These enum additions guarantee that the core layer will never use these bits, so that low level drivers may safely make use of them. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:28 -07:00
Roland Dreier	cc169165c8	Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus	2012-05-21 09:00:47 -07:00
Or Gerlitz	c938a616aa	IB/core: Add raw packet QP type IB_QPT_RAW_PACKET allows applications to build a complete packet, including L2 headers, when sending; on the receive side, the HW will not strip any headers. This QP type is designed for userspace direct access to Ethernet; for example by applications that do TCP/IP themselves. Only processes with the NET_RAW capability are allowed to create raw packet QPs (the name "raw packet QP" is supposed to suggest an analogy to AF_PACKET / SOL_RAW sockets). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:18:09 -07:00
Or Gerlitz	c3bccbfbb7	IB/core: Use qp->usecnt to track multicast attach/detach Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs that are attached to them, don't let a QP be destroyed if there are multicast group(s) attached to it. Use the existing usecnt field of struct ib_qp which was added by commit `0e0ec7e` ("RDMA/core: Export ib_open_qp() to share XRC TGT QPs") to track this. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:16:54 -07:00
Or Gerlitz	d927d505c5	IB: Change CQE "csum_ok" field to a bit flag Use a bit in wc_flags rather then a whole integer to hold the "checksum OK" flag. By itself, this change doesn't reduce the size of struct ib_wc on 64bit machines -- it stays on 56 bytes because of padding. However, it will allow to add more fields in the future without enlarging the struct. Also, it will let us have a unified approach with future libibverbs checksum offload reporting, because a bit flag doesn't break the library ABI. This patch was suggested during conversation with Liran Liss <liranl@mellanox.com>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-03-08 12:34:27 -08:00
Or Gerlitz	2e96691c31	IB: Use central enum for speed instead of hard-coded values The kernel IB stack uses one enumeration for IB speed, which wasn't explicitly specified in the verbs header file. Add that enum, and use it all over the code. The IB speed/width notation is also used by iWARP and IBoE HW drivers, which use the convention of rate = speed * width to advertise their port link rate. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-03-05 09:25:16 -08:00
Roland Dreier	504255f8d0	Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next	2011-11-01 09:37:08 -07:00
Sean Hefty	0e0ec7e063	RDMA/core: Export ib_open_qp() to share XRC TGT QPs XRC TGT QPs are shared resources among multiple processes. Since the creating process may exit, allow other processes which share the same XRC domain to open an existing QP. This allows us to transfer ownership of an XRC TGT QP to another process. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:49:51 -07:00
Sean Hefty	53d0bd1e7f	RDMA/uverbs: Export XRC domains to user space Allow user space to create XRC domains. Because XRCDs are expected to be shared among multiple processes, we use inodes to identify an XRCD. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:21:24 -07:00
Sean Hefty	d3d72d909e	RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD XRC TGT QPs are intended to be shared among multiple users and processes. Allow the destruction of an XRC TGT QP to be done explicitly through ib_destroy_qp() or when the XRCD is destroyed. To support destroying an XRC TGT QP, we need to track TGT QPs with the XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also cleaned up. To avoid stale reference issues, if a user is holding a reference on a TGT QP, we increment a reference count on the QP. The user releases the reference by calling ib_release_qp. This releases any access to the QP from a user above verbs, but allows the QP to continue to exist until destroyed by the XRCD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:20:27 -07:00
Sean Hefty	b42b63cf0d	RDMA/core: Add XRC QPs XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC communication is between an initiator (INI) QP and a target (TGT) QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC QPs, except that work requests posted to an XRC INI QP must specify the remote SRQ that is the target of the work request. We define two new QP types for XRC, to distinguish between INI and TGT QPs, and update the core layer to support XRC QPs. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:16:19 -07:00
Sean Hefty	418d51307d	RDMA/core: Add XRC SRQ type XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC defines SRQs that are specifically used by XRC connections. Expand the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to only XRC use according to the IB XRC Annex. Portions of this patch were derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:14:31 -07:00
Sean Hefty	96104eda01	RDMA/core: Add SRQ type field Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-13 09:13:26 -07:00
Sean Hefty	59991f94eb	RDMA/core: Add XRC domain support XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). A few new concepts are introduced to support this. This patch adds: - A new device capability flag, IB_DEVICE_XRC, which low-level drivers set to indicate that a device supports XRC. - A new object type, XRC domains (struct ib_xrcd), and new verbs ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which XRC SRQs an incoming message can target. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-12 10:32:26 -07:00
Marcel Apfelbaum	71eeba161d	IB: Add new InfiniBand link speeds Introduce support for the following extended speeds: FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with 64b/66b encoding rather than 8b/10b encoding. FDR: IBA extended speed 14.0625 Gbps. EDR: IBA extended speed 25.78125 Gbps. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-10-11 11:53:47 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Or Gerlitz	761d90ed4c	IB/core: Add GID change event Add IB GID change event type. This is needed for IBoE when the HW driver updates the GID (e.g when new VLANs are added/deleted) table and the change should be reflected to the IB core cache. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 21:04:30 -07:00
Tejun Heo	f06267104d	RDMA: Update workqueue usage * ib_wq is added, which is used as the common workqueue for infiniband instead of the system workqueue. All system workqueue usages including flush_scheduled_work() callers are converted to use and flush ib_wq. * cancel_delayed_work() + flush_scheduled_work() converted to cancel_delayed_work_sync(). * qib_wq is removed and ib_wq is used instead. This is to prepare for deprecation of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2011-01-16 21:16:31 -08:00
Eli Cohen	a3f5adaf49	IB/core: Add link layer property to ports This patch allows ports to have different link layers: IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support. For devices that do not provide an implementation for querying the link layer property of a port, we return a default value based on the transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-09-27 17:51:10 -07:00
Aleksey Senin	a2ebf07ae5	IB: Rename RAW_ETY to RAW_ETHERTYPE Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make the special QP type easier to understand. cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html Signed-off-by: Aleksey Senin <alekseys@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-08-04 10:44:19 -07:00
Ralph Campbell	9a6edb60ec	IB/core: Allow device-specific per-port sysfs files Add a new parameter to ib_register_device() so that low-level device drivers can pass in a pointer to a callback function that will be called for each port that is registered in sysfs. This allows low-level device drivers to create files in /sys/class/infiniband/<hca>/ports/<N>/ without having to poke through the internals of the RDMA sysfs handling. There is no need for an unregister function since the kobject reference will go to zero when ib_unregister_device() is called. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-21 10:34:44 -07:00
Vladimir Sokolovsky	5e80ba8ff0	IB/core: Add support for masked atomic operations - Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD send opcodes that can be used to post "masked atomic compare and swap" and "masked atomic fetch and add" work request respectively. - Add masked_atomic_cap capability. - Add mask fields to atomic struct of ib_send_wr - Add new opcodes to ib_wc_opcode The new operations are described more precisely below: * Masked Compare and Swap (MskCmpSwap) The MskCmpSwap atomic operation is an extension to the CmpSwap operation defined in the IB spec. MskCmpSwap allows the user to select a portion of the 64 bit target data for the “compare” check as well as to restrict the swap to a (possibly different) portion. The pseudo code below describes the operation: \| atomic_response = va \| if (!((compare_add ^ va) & compare_add_mask)) then \| va = (va & ~(swap_mask)) \| (swap & swap_mask) \| \| return atomic_response The additional operands are carried in the Extended Transport Header. Atomic response generation and packet format for MskCmpSwap is as for standard IB Atomic operations. * Masked Fetch and Add (MFetchAdd) The MFetchAdd Atomic operation extends the functionality of the standard IB FetchAdd by allowing the user to split the target into multiple fields of selectable length. The atomic add is done independently on each one of this fields. A bit set in the field_boundary parameter specifies the field boundaries. The pseudo code below describes the operation: \| bit_adder(ci, b1, b2, co) \| { \| value = ci + b1 + b2 \| co = !!(value & 2) \| \| return value & 1 \| } \| \| #define MASK_IS_SET(mask, attr) (!!((mask)&(attr))) \| bit_position = 1 \| carry = 0 \| atomic_response = 0 \| \| for i = 0 to 63 \| { \| if ( i != 0 ) \| bit_position = bit_position << 1 \| \| bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position), \| MASK_IS_SET(compare_add, bit_position), &new_carry) \| if (bit_add_res) \| atomic_response \|= bit_position \| \| carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position))) \| } \| \| return atomic_response Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-21 16:37:48 -07:00
Alexander Chiang	17a55f79fd	IB/core: Pack struct ib_device a little tighter A small change to reduce the size of ib_device to 1112 bytes (from 1128). Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-24 10:23:49 -08:00
Bart Van Assche	55464d461b	IB: Clarify the documentation of ib_post_send() Clarify the behavior of ib_post_send() when a list of work requests is passed in and an immediate error is returned. Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-12-09 14:20:04 -08:00
Harvey Harrison	f3a7c66b5c	net: replace __constant_{endian} uses in net headers Base versions handle constant folding now. For headers exposed to userspace, we must only expose the __ prefixed versions. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-14 22:58:35 -08:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Steve Wise	96f15c0353	RDMA/core: Add local DMA L_Key support - Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0 STag support and IB HCAs to indicate reserved L_Key support. - Add a u32 local_dma_lkey member to struct ib_device. Drivers fill this in with the appropriate local DMA L_Key (if they support it). - Fix up the drivers using this flag. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Ron Livne	47ee1b9f2e	IB/core: Add support for multicast loopback blocking This patch also adds a creation flag for QPs, IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that multicast sends from the QP to a group that the QP is attached to will not be looped back to the QP's receive queue. This can be used to save receive resources when a consumer does not want a local copy of multicast traffic; for example IPoIB must waste CPU time throwing away such local copies of multicast traffic. This patch also adds a device capability flag that shows whether a device supports this feature or not. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Steve Wise	7f624d023b	RDMA/core: Add iWARP protocol statistics attributes in sysfs This patch adds a sysfs attribute group called "proto_stats" under /sys/class/infiniband/$device/ and populates this group with protocol statistics if they exist for a given device. Currently, only iWARP stats are defined, but the code is designed to allow InfiniBand protocol stats if they become available. These stats are per-device and more importantly -not- per port. Details: - Add union rdma_protocol_stats in ib_verbs.h. This union allows defining transport-specific stats. Currently only iwarp stats are defined. - Add struct iw_protocol_stats to define the current set of iwarp protocol stats. - Add new ib_device method called get_proto_stats() to return protocol statistics. - Add logic in core/sysfs.c to create iwarp protocol stats attributes if the device is an RNIC and has a get_proto_stats() method. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Steve Wise	00f7ec36c9	RDMA/core: Add memory management extensions support This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Dotan Barak	4deccd6d95	RDMA: Improve include file coding style Remove subversion $Id lines and improve readability by fixing other coding style problems pointed out by checkpatch.pl. Signed-off-by: Dotan Barak <dotanba@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:44 -07:00
Roland Dreier	4c0283fc56	IB/core: Remove IB_DEVICE_SEND_W_INV capability flag In 2.6.26, we added some support for send with invalidate work requests, including a device capability flag to indicate whether a device supports such requests. However, the support was incomplete: the completion structure was not extended with a field for the key contained in incoming send with invalidate requests. Full support for memory management extensions (send with invalidate, local invalidate, fast register through a send queue, etc) is planned for 2.6.27. Since send with invalidate is not very useful by itself, just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27, which makes things simpler for applications, since they will not have quite as confusing an array of fine-grained bits to check. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-06-09 09:58:42 -07:00
Arthur Kepner	cb9fbc5c37	IB: expand ib_umem_get() prototype Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:12 -07:00
Tony Jones	f4e91eb4a8	IB: convert struct class_device to struct device This converts the main ib_device to use struct device instead of struct class_device as class_device is going away. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-19 19:10:30 -07:00
Eli Cohen	2dd5716227	IB/core: Add support for modify CQ Add support for modifying CQ parameters for controlling event generation moderation. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:33 -07:00
Roland Dreier	0f39cf3d54	IB/core: Add support for "send with invalidate" work requests Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:32 -07:00
Eli Cohen	c93570f23a	IB/core: Add IPoIB UD LSO support LSO (large send offload) allows the networking stack to pass SKBs with data size larger than the MTU to the IPoIB driver and have the HCA HW fragment the data to multiple MSS-sized packets. Add a device capability flag IB_DEVICE_UD_TSO for devices that can perform TCP segmentation offload, a new send work request opcode IB_WR_LSO, header, hlen and mss fields for the work request structure, and a new IB_WC_LSO completion type. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:27 -07:00
Eli Cohen	b846f25aa2	IB/core: Add creation flags to struct ib_qp_init_attr Add a create_flags member to struct ib_qp_init_attr that will allow a kernel verbs consumer to create a pass special flags when creating a QP. Add a flag value for telling low-level drivers that a QP will be used for IPoIB UD LSO. The create_flags member will also be useful for XRC and ehca low-latency QP support. Since no create_flags handling is implemented yet, add code to all low-level drivers to return -EINVAL if create_flags is non-zero. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:27 -07:00
Roland Dreier	b3d636b0d1	IB: Make struct ib_uobject.id a signed int IDR IDs are signed, so struct ib_uobject.id should be signed. This avoids some sparse pointer signedness warnings. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:06 -07:00
Roland Dreier	5128bdc97a	IB/core: Remove unused struct ib_device.flags member Avoid confusion about what it might mean, since it's never initialized. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-08 14:47:26 -08:00
Eli Cohen	e0605d9199	IB/core: Add IP checksum offload support Add a device capability to show when it can handle checksum offload. Also add a send flag for inserting checksums and a csum_ok field to the completion record. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-08 14:37:56 -08:00
Greg Kroah-Hartman	35be068198	Kobject: change drivers/infiniband to use kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <mshefty@ichips.intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:26 -08:00
Adrian Bunk	87ae9afdca	cleanup asm/scatterlist.h includes Not architecture specific code should not #include <asm/scatterlist.h>. This patch therefore either replaces them with #include <linux/scatterlist.h> or simply removes them if they were unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
Dotan Barak	92ddc447ce	IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c After moving the definition of struct ib_umem_chunk from ib_verbs.h to ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK to stay in ib_verbs.h. Move the macro to umem.c, the only place where it is used. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:18 -07:00
Dotan Barak	bfb3ea1251	IB: Include <linux/list.h> and <linux/rwsem.h> from <rdma/ib_verbs.h> ib_verbs.h uses struct list_head and rw_semaphore, so while the files <linux/list.h> and <linux/rwsem.h> seem to be pulled in indirectly by the other header files it includes, the right thing is to include those files directly. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-08-03 10:45:18 -07:00
Yosef Etigin	5eb620c81c	IB/core: Add helpers for uncached GID and P_Key searches Add ib_find_gid() and ib_find_pkey() functions that use uncached device queries. The calls might block but the returns are always up-to-date. Cache P_Key and GID table lengths in core to avoid extra port info queries. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:53 -07:00
Roland Dreier	f7c6a7b5d5	IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-08 18:00:37 -07:00
Roland Dreier	ed23a72778	IB: Return "maybe missed event" hint from ib_req_notify_cq() The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. The return value of ib_req_notify_cq() is extended so: < 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. We add a flag to enable this behavior rather than turning it on unconditionally, because checking for missed events may incur significant overhead for some low-level drivers, and consumers that don't care about the results of this test shouldn't be forced to pay for the test. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Michael S. Tsirkin	f4fd0b224d	IB: Add CQ comp_vector support Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Michael S. Tsirkin	062dbb69f3	IB: Return qp pointer as part of ib_wc struct ib_wc currently only includes the local QP number: this matches the IB spec, but seems mostly useless. The following patch replaces this with the pointer to qp itself, and updates all low level drivers and all users. This has the following advantages: - Ability to get a per-qp context through wc->qp->qp_context - Existing drivers already have the qp pointer ready in poll cq, so this change actually saves a tiny bit (extra memory read) on data path (for ehca it would actually be expensive to find the QP pointer when polling a CQ, but ehca does not support SRQ so we can leave wc->qp as NULL for ehca) - Users that need the QP number can still get it through wc->qp->qp_num Use case: In IPoIB connected mode code, I have a common CQ shared by multiple QPs. To track connection usage, I need a way to get at some per-QP context upon the completion, and I would like to avoid allocating context object per work request just to stick a QP pointer into it. With this code, I can just use wc->qp->qp_context. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-04 14:11:55 -08:00
Michael S. Tsirkin	459d6e2a54	IB: Include <linux/kref.h> explicitly in <rdma/ib_verbs.h> <rdma/ib_verbs.h> uses struct kref, so it should include <linux/kref.h> explicitly to avoid hidden include dependencies. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-04 14:11:55 -08:00
Roland Dreier	c59a3da134	IB: Fix ib_dma_alloc_coherent() wrapper The ib_dma_alloc_coherent() wrapper uses a u64* for the dma_handle parameter, unlike dma_alloc_coherent, which uses dma_addr_t*. This means that we need a temporary variable to handle the case when ib_dma_alloc_coherent() just falls through directly to dma_alloc_coherent() on architectures where sizeof u64 != sizeof dma_addr_t. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-15 13:57:26 -08:00
Ben Collins	d1998ef38a	[PATCH] ib_verbs: Use explicit if-else statements to avoid errors with do-while macros At least on PPC, the "op ? op : dma" construct causes a compile failure because the dma_* is a do{}while(0) macro. This turns all of them into proper if/else to avoid this problem. Signed-off-by: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 19:47:13 -08:00
Ralph Campbell	9b513090a3	IB: Add DMA mapping functions to allow device drivers to interpose The QLogic InfiniPath HCAs use programmed I/O instead of HW DMA. This patch allows a verbs device driver to interpose on DMA mapping function calls in order to avoid relying on bus_to_virt() and phys_to_virt() to undo the mappings created by dma_map_single(), dma_map_sg(), etc. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-12 14:27:41 -08:00
Tom Tucker	07ebafbaaa	RDMA: iWARP Core Changes. Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:47 -07:00
Ralph Campbell	9bc57e2d19	IB/uverbs: Pass userspace data to modify_srq and modify_qp methods Pass a struct ib_udata to the low-level driver's ->modify_srq() and ->modify_qp() methods, so that it can get to the device-specific data passed in by the userspace driver. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:25 -07:00
Roland Dreier	9ead190bfd	IB/uverbs: Don't serialize with ib_uverbs_idr_mutex Currently, all userspace verbs operations that call into the kernel are serialized by ib_uverbs_idr_mutex. This can be a scalability issue for some workloads, especially for devices driven by the ipath driver, which needs to call into the kernel even for datapath operations. Fix this by adding reference counts to the userspace objects, and then converting ib_uverbs_idr_mutex into a spinlock that only protects the idrs long enough to take a reference on the object being looked up. Because remove operations may fail, we have to do a slightly funky two-step deletion, which is described in the comments at the top of uverbs_cmd.c. This also still leaves ib_uverbs_idr_lock as a single lock that is possibly subject to contention. However, the lock hold time will only be a single idr operation, so multiple threads should still be able to make progress, even if ib_uverbs_idr_lock is being ping-ponged. Surprisingly, these changes even shrink the object code: add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60) Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:44:49 -07:00
Sean Hefty	4e00d69454	IB: Add ib_init_ah_from_wc() Add a function to initialize address handle attributes from a work completion. This functionality is duplicated by both verbs and the CM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:39 -07:00
Leonid Arsh	63942c9a98	IB: Add client reregister event type Add IB_EVENT_CLIENT_REREGISTER to enum so low-level drivers can generate "client reregister" events. Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:35 -07:00
Jack Morgenstein	6fb9cdbf2c	IB: Add caching of ports' LMC Add an LMC cache to struct ib_device, and add a function ib_get_cached_lmc() to query the cache. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:34 -07:00
Jack Morgenstein	bf6a9e31cf	IB: simplify static rate encoding Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-10 09:43:47 -07:00
Dotan Barak	abb6e9ba17	IB/mthca: Return actual capacity from create_srq Have mthca's create_srq method return the actual capacity of the SRQ that gets created. Also update comments in <rdma/ib_verbs.h> to clarify that this is what is expected from ib_create_srq(). Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:16 -08:00
Roland Dreier	8a51866f08	IB: Add ib_modify_qp_is_ok() library function The in-kernel mthca driver contains a table of which attributes are valid for each queue pair state transition. It turns out that both other IB drivers -- ipath and ehca -- which are being prepared for merging have copied this table, errors and all. To forestall this code duplication, move this table and the code to check parameters against it into a midlayer library function, ib_modify_qp_is_ok(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:12 -08:00
Or Gerlitz	d36f34aadf	IB: Enable FMR pool user to set page size This patch allows the consumer to set the page size of "pages" mapped by the pool FMRs, which is a feature already existing in the base verbs API. On the cosmetic side it changes ib_fmr_attr.page_size field to be named page_shift. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:10 -08:00
Roland Dreier	c5bcbbb9fe	IB: Allow userspace to set node description Expose a writable "node_desc" sysfs attribute for InfiniBand devices. This allows userspace to update the node description with information such as the node's hostname, so that IB network management software can tie its view to the real world. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:09 -08:00
Roland Dreier	33b9b3ee97	IB: Add userspace support for resizing CQs Add support to uverbs to handle resizing userspace CQs (completion queues), including adding an ABI for marshalling requests and responses. The kernel midlayer already has ib_resize_cq(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:07 -08:00
Sean Hefty	cf311cd49a	IB: Add node_guid to struct ib_device Add a node_guid field to struct ib_device. It is the responsibility of the low-level driver to initialize this field before registering a device with the midlayer. Convert everyone to looking at this field instead of calling ib_query_device() when all they want is the node GUID, and remove the node_guid field from struct ib_device_attr. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-01-10 07:39:34 -08:00
Roland Dreier	40de2e548c	[IB] Have cq_resize() method take an int, not int* Change the struct ib_device.resize_cq() method to take a plain integer that holds the new CQ size, rather than a pointer to an integer that it uses to return the new size. This makes the interface match the exported ib_resize_cq() signature, and allows the low-level driver to update the CQ size with proper locking if necessary. No in-tree drivers are exporting this method yet. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-11-10 10:22:50 -08:00
Sean Hefty	34816ad98e	[IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped The MAD layer was violating the DMA API by touching data buffers used for sends after the DMA mapping was done. This causes problems on non-cache-coherent architectures, because the device doing DMA won't see updates to the payload buffers that exist only in the CPU cache. Fix this by having all MAD consumers use ib_create_send_mad() to allocate their send buffers, and moving the DMA mapping into the MAD layer so it can be done just before calling send (and after any modifications of the send buffer by the MAD layer). Tested on a non-cache-coherent PowerPC 440SPe system. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-25 10:51:39 -07:00
Roland Dreier	883a99c702	[IB] uverbs: Add a mask of device methods allowed for userspace Give each device a uverbs_cmd_mask, so that a low-level driver can control which methods may be called on behalf of userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-17 15:20:30 -07:00
Roland Dreier	274c089163	[IB] uverbs: Add device-specific ABI version attribute Add abi_version attribute to uverbs class devices to allow for ABI versioning of device-specific interfaces. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-10-17 15:20:26 -07:00
Roland Dreier	63c47c286d	[IB] uverbs: Close some exploitable races Al Viro pointed out that the current IB userspace verbs interface allows userspace to cause mischief by closing file descriptors before we're ready, or issuing the same command twice at the same time. This patch closes those races, and fixes other obvious problems such as a module reference leak. Some other interface bogosities will require an ABI change to fix properly, so I'm deferring those fixes until 2.6.15. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-09-26 13:01:03 -07:00
Roland Dreier	a4d61e8480	[PATCH] IB: move include files to include/rdma Move the InfiniBand headers from drivers/infiniband/include to include/rdma. This allows InfiniBand-using code to live elsewhere, and lets us remove the ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2005-08-26 20:37:38 -07:00

... 2 3 4 5 6 ...

304 Commits