linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 08:26:55 +07:00

Author	SHA1	Message	Date
Luis R. Rodriguez	7ea402d01c	x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled We are burrying direct access to MTRR code support on x86 in order to take advantage of PAT. In the future, we also want to make the default behaviour of ioremap_nocache() to use strong UC, use of mtrr_add() on those systems would make write-combining void. In order to help both enable us to later make strong UC default and in order to phase out direct MTRR access code port the driver over to arch_phys_wc_add() and annotate that the device driver requires systems to boot with PAT disabled, with the 'nopat' kernel parameter. This is a workable compromise given that the ipath device driver powers the old HTX bus cards that only work in AMD systems, while the newer IB/qib device driver powers all PCI-e cards. The ipath device driver is obsolete, hardware is hard to find and because of this its a reasonable compromise to require users of ipath to boot with 'nopat'. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Doug Ledford <dledford@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: infinipath@intel.com Cc: jbeulich@suse.com Cc: konrad.wilk@oracle.com Cc: linux-rdma@vger.kernel.org Cc: mchehab@osg.samsung.com Cc: toshi.kani@hp.com Link: http://lkml.kernel.org/r/1434053994-2196-4-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1434356898-25135-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-18 11:23:42 +02:00
Nicholas Bellinger	bc0c94b140	target: Drop unnecessary core_tpg_register TFO parameter This patch drops unnecessary target_core_fabric_ops parameter usage for core_tpg_register() during fabric driver TFO->fabric_make_tpg() se_portal_group creation callback execution. Instead, use the existing se_wwn->wwn_tf->tf_ops pointer to ensure fabric driver is really using the same TFO provided at module_init time. Also go ahead and drop the forward TFO declarations tree-wide, and handling the special case for iscsi-target discovery TPG. Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-06-15 23:23:22 -07:00
Eran Ben Elisha	9616982f3f	net/mlx4_core: Add helper to query counters This is an infrastructure step for querying VF and PF counters. This code was in the IB driver, move it to the mlx4 core driver so it will be accessible for more use cases. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	7193a141eb	IB/mlx4: Set VF to read from QP counters As IB VFs are not capable to read the port counters through MADs, move there to read their own QP counters to gather statistics. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	c3abb51bdb	IB/mlx4: Add RoCE/IB dedicated counters This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:02 -07:00
Eran Ben Elisha	47d8417f59	net/mlx4_core: Add sink counter Reserve the last valid counter index for "sink" counter, when a new counter cannot be allocated, the driver will use this counter. In order to avoid allocating this counter on any other flow, fix the indices bitmap allocation range, and reserve the sink counter index. Add macro for the sink counter index and replace all appearences of the index with the macro. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-15 17:23:01 -07:00
Ira Weiny	8e4349d13f	IB/mad: Add final OPA MAD processing For devices which support OPA MADs 1) Use previously defined SMP support functions. 2) Pass correct base version to ib_create_send_mad when processing OPA MADs. 3) Process out_mad_key_index returned by agents for a response. This is necessary because OPA SMP packets must carry a valid pkey. 4) Carry the correct segment size (OPA vs IBTA) of RMPP messages within ib_mad_recv_wc. 5) Handle variable length OPA MADs by: * Adjusting the 'fake' WC for locally routed SMP's to represent the proper incoming byte_len * out_mad_size is used from the local HCA agents 1) when sending agent responses on the wire 2) when passing responses through the local_completions function NOTE: wc.byte_len includes the GRH length and therefore is different from the in_mad_size specified to the local HCA agents. out_mad_size should _not_ include the GRH length as it is added Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:18 -04:00
Ira Weiny	f28990bc89	IB/mad: Add partial Intel OPA MAD support Add OPA SMP processing functionality. Define the new OPA SMP format, create support functions for this format using the previously defined helper functions as appropriate. These functions are defined in this patch and used in the final OPA MAD support patch. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	548ead1744	IB/mad: Add partial Intel OPA MAD support This patch is the first of 3 which adds processing of OPA MADs 1) Add Intel Omni-Path Architecture defines 2) Increase max management version to accommodate OPA 3) update ib_create_send_mad If the device supports OPA MADs and the MAD being sent is the OPA base version alter the MAD size and sg lengths as appropriate Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	4cd7c9479a	IB/mad: Add support for additional MAD info to/from drivers In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	c9082e51b6	IB/mad: Convert allocations from kmem_cache to kzalloc This patch implements allocating alternate receive MAD buffers within the MAD stack. Support for OPA to send/recv variable sized MADs is implemented later. 1) Convert MAD allocations from kmem_cache to kzalloc kzalloc is more flexible to support devices with different sized MADs and research and testing showed that the current use of kmem_cache does not provide performance benefits over kzalloc. 2) Change struct ib_mad_private to use a flex array for the mad data 3) Allocate ib_mad_private based on the size specified by devices in rdma_max_mad_size. 4) Carry the allocated size in ib_mad_private to be used when processing ib_mad_private objects. 5) Alter DMA mappings based on the mad_size of ib_mad_private. 6) Replace the use of sizeof and static defines as appropriate 7) Add appropriate casts for the MAD data when calling processing functions. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	337877a466	IB/core: Add ability for drivers to report an alternate MAD size. Add max MAD size to the device immutable data set and have all drivers that support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core. Verify MAD size data in both the MAD core and when reading the immutable data. OPA drivers will report alternate MAD sizes in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	da2dfaa3a3	IB/mad: Support alternate Base Versions when creating MADs In preparation to support the new OPA MAD Base version, add a base version parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current users. Definition of the new base version and it's processing will occur in later patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:17 -04:00
Ira Weiny	29869eafa6	IB/mad: Create a generic helper for DR forwarding checks IB and OPA SMPs share the same processing algorithm but have different header formats and permissive LID detection. Add a helper function which is generic to processing the DR forwarding checks which can be used by both IB and OPA SMP code. Use this function in the current IB function smi_check_forward_dr_smp. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Ira Weiny	86f0e67a21	IB/mad: Create a generic helper for DR SMP Recv processing IB and OPA SMPs share the same processing algorithm but have different header formats and permissive LID detection. Add a helper function which is generic to processing DR SMP Recv messages which can be used by both IB and OPA SMP code. Use this function in the current IB function smi_handle_dr_smp_recv. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Ira Weiny	92f1505604	IB/mad: Create a generic helper for DR SMP Send processing IB and OPA SMPs share the same processing algorithm but have different header formats and permissive LID detection. Add a helper function which is generic to processing DR SMP Send messages which can be used by both IB and OPA SMP code. Use this function in the current IB function smi_handle_dr_smp_send. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Ira Weiny	e11ae8aa0c	IB/mad: Split IB SMI handling from MAD Recv handler Make a helper function to process Directed Route SMPs to be called by the IB MAD Recv Handler, ib_mad_recv_done_handler. This cleans up the MAD receive handler code a bit and allows for us to better share the SMP processing code between IB and OPA SMPs. IB and OPA SMPs share the same processing algorithm but have different header formats and permissive LID detection. Therefore this and subsequent patches split the common processing code from the IB specific code in anticipation of sharing those algorithms with the OPA code. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Ira Weiny	83a1d22889	IB/mad cleanup: Generalize processing of MAD data ib_find_send_mad only needs access to the MAD header not the full IB MAD. Change the local variable to ib_mad_hdr and change the corresponding cast. This allows for clean usage of this function with both IB and OPA MADs because OPA MADs carry the same header as IB MADs. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Ira Weiny	d94bd2667a	IB/mad cleanup: Clean up function params -- find_mad_agent find_mad_agent only needs read only access to the MAD header. Update the ib_mad pointer to be const ib_mad_hdr. Adjust call tree. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:16 -04:00
Matan Barak	4b664c4355	IB/mlx4: Add support for CQ time-stamping This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	52033cfb5a	IB/mlx4: Add mmap call to map the hardware clock In order to read the HCA's cycle counter efficiently in user space, we need to map the HCA's register. This is done through mmap call. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	2528e33e68	IB/core: Pass hardware specific data in query_device Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	24306dc661	IB/core: Add timestamp_mask and hca_core_clock to query_device In order to expose timestamp we need to expose two new attributes in query_device to be used for CQ completion time-stamping: timestamp_mask - how many bits are valid in the timestamp, where timestamp values could be 64bits the most. hca_core_clock - timestamp is given in HW cycles, the frequency in KHZ units of the HCA, necessary in order to convert cycles to seconds. This is added both to ib_query_device and its respective uverbs counterpart. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	565197dd8f	IB/core: Extend ib_uverbs_create_cq ib_uverbs_ex_create_cq follows the extension verbs mechanism. New features (for example, CQ creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	8e37210b38	IB/core: Change ib_create_cq to use struct ib_cq_init_attr Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Matan Barak	bcf4c1ea58	IB/core: Change provider's API of create_cq to be extendible Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-12 14:49:10 -04:00
Saeed Mahameed	facc9699f0	net/mlx5e: Fix HW MTU settings Previously we configured HW MTU to be netdev->mtu, actually we need to configure netdev->mtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN). Also, query MTU can not fail, hence make the relevant helper a void functionm, add mlx5e_set_dev_port_mtu, helper function to handle MTU setting. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 15:55:25 -07:00
Hariprasad S	74217d4c6a	iw_cxgb4: support for bar2 qid densities exceeding the page size Handle this configuration: Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size Use cxgb4_bar2_sge_qregs() to obtain the proper location within the bar2 region for a given qid. Rework the DB and GTS write functions to make use of this bar2 info. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-11 12:22:32 -04:00
Doug Ledford	0699ee7ad7	Merge branch 'for-4.2-misc' into k.o/for-4.2	2015-06-11 01:13:30 -04:00
Colin Ian King	4dc5444279	RDMA/ocrdma: fix double free on pd A reorganisation of the PD allocation and deallocation in commit `9ba1377daa` ("RDMA/ocrdma: Move PD resource management to driver.") introduced a double free on pd, as detected by static analysis by smatch: drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:682 ocrdma_alloc_pd() error: double free of 'pd'^ The original call to ocrdma_mbx_dealloc_pd() (which does not kfree pd) was replaced with a call to _ocrdma_dealloc_pd() (which does kfree pd). The kfree following this call causes the double free, so just remove it to fix the problem. Fixes: `9ba1377daa` ("RDMA/ocrdma: Move PD resource management to driver.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-11 01:12:28 -04:00
Dan Carpenter	fc3aa45b63	IB/usnic: clean up some error handling code This code causes a static checker warning: drivers/infiniband/hw/usnic/usnic_uiom.c:476 usnic_uiom_alloc_pd() warn: passing zero to 'PTR_ERR' This code isn't buggy, but iommu_domain_alloc() doesn't return an error pointer so we can simplify the error handling and silence the static checker warning. The static checker warning is to catch place which do: if (!ptr) return ERR_PTR(ptr); Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Goodell <dgoodell@cisco.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-11 01:11:27 -04:00
Fabian Frederick	ed0de4a8c9	IB/mthca: use swap() in mthca_make_profile() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-11 01:10:59 -04:00
Moni Shoua	9247a8eba6	IB/core: Don't warn on no SA support in event handler Registering an event handler is done for a device. This device may have one RoCE port (no SA cap) and one InfiniBand port (has SA cap). Therefore, warning from the event handler about a specific port that doesn't have SA cap is correct but pollutes the kernel log without a need. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-10 23:54:34 -04:00
Sagi Grimberg	524630d582	iser-target: Fix possible use-after-free iser connection termination process happens in 2 stages: - isert_wait_conn: - resumes rdma disconnect - wait for session commands - wait for flush completions (post a marked wr to signal we are done) - wait for logout completion - queue work for connection cleanup (depends on disconnected/timewait events) - isert_free_conn - last reference put on the connection In case we are terminating during IOs, we might be posting send/recv requests after we posted the last work request which might lead to a use-after-free condition in isert_handle_wc. After we posted the last wr in isert_wait_conn we are guaranteed that no successful completions will follow (meaning no new work request posts may happen) but other flush errors might still come. So before we put the last reference on the connection, we repeat the process of posting a marked work request (isert_wait4flush) in order to make sure all pending completions were flushed. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Falkovich <jennyf@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-06-08 22:17:09 -07:00
Sagi Grimberg	2f1b6b7d9a	iser-target: release stale iser connections When receiving a new iser connect request we serialize the pending requests by adding the newly created iser connection to the np accept list and let the login thread process the connect request one by one (np_accept_wait). In case we received a disconnect request before the iser_conn has begun processing (still linked in np_accept_list) we should detach it from the list and clean it up and not have the login thread process a stale connection. We do it only when the connection state is not already terminating (initiator driven disconnect) as this might lead us to access np_accept_mutex after the np was released in live shutdown scenarios. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Falkovich <jennyf@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-06-08 22:16:40 -07:00
Sagi Grimberg	9253e667ab	iser-target: Fix variable-length response error completion Since commit "2426bd456a6 target: Report correct response ..." we might get a command with data_size that does not fit to the number of allocated data sg elements. Given that we rely on cmd t_data_nents which might be different than the data_size, we sometimes receive local length error completion. The correct approach would be to take the command data_size into account when constructing the ib sg_list. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Falkovich <jennyf@mellanox.com> Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-06-08 22:16:17 -07:00
Haggai Abramonvsky	4aa17b2879	mlx5: Enable mutual support for IB and Ethernet Ethernet functionality is only available when working in ISSI > 0 mode. Previously, the IB driver wasn't ready to work on that mode, and hence building both the IB driver and the Ethernet functionality in the core driver were disallowed by Kconfigs. Now, once we have all the pre-steps in place, we can remove this limitation. The last steps in the IB driver for getting that setup to work are: create dummy SRQ for the driver's use (until now we could use XRC_SRQ as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from basic SRQs) and adapt the create QP function to be compatible with ISSI > 0. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-04 16:41:02 -07:00
Majd Dibbiny	647241ea10	IB/mlx5: Don't create IB instance over Ethernet ports Since we still don't have RoCE support in mlx5, avoid creating IB driver instance over Ethernet ports. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-04 16:41:02 -07:00
Majd Dibbiny	1b5daf11b0	IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode In ISSI > 0 mode, most of the MAD_IFC command features are deprecated, and can't be used. Therefore, when in that mode, we replace all of them with other commands that provide the required functionality. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-04 16:41:02 -07:00
Haggai Abramonvsky	01949d0109	net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0 When working in ISSI > 0 mode, the model exposed by the device for XRCs and SRQs is different. XRCs use XRC SRQs and plain SRQs are based on RPM (Receive Memory Pool). Add helper functions to create, modify, query, and arm XRC SRQs and RMPs. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-04 16:41:01 -07:00
Hariprasad Shenai	a4cfd929c9	cxgb4: Add ethtool support to get adapter stats Add ethtool support to get adapter specific hardware statistics Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-03 23:40:19 -07:00
Bart Van Assche	ba92999252	target: Minimize SCSI header #include directives Only include SCSI initiator header files in target code that needs these header files, namely the SCSI pass-through code and the tcm_loop driver. Change SCSI_SENSE_BUFFERSIZE into TRANSPORT_SENSE_BUFFER in target code because the former is intended for initiator code and the latter for target code. With this patch the only initiator include directives in target code that remain are as follows: $ git grep -nHE 'include .scsi/(scsi.h\|scsi_host.h\|scsi_device.h\|scsi_cmnd.h)' drivers/target drivers/infiniband/ulp/{isert,srpt} drivers/usb/gadget/legacy/tcm_.[ch] drivers/{vhost,xen} include/{target,trace/events/target.h} drivers/target/loopback/tcm_loop.c:29:#include <scsi/scsi.h> drivers/target/loopback/tcm_loop.c:31:#include <scsi/scsi_host.h> drivers/target/loopback/tcm_loop.c:32:#include <scsi/scsi_device.h> drivers/target/loopback/tcm_loop.c:33:#include <scsi/scsi_cmnd.h> drivers/target/target_core_pscsi.c:39:#include <scsi/scsi_device.h> drivers/target/target_core_pscsi.c:40:#include <scsi/scsi_host.h> drivers/xen/xen-scsiback.c:52:#include <scsi/scsi_host.h> / SG_ALL */ Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-06-02 08:03:25 -07:00
Doug Ledford	b806ef3bbe	Merge branch 'for-4.2-misc' into k.o/for-4.2	2015-06-02 09:33:22 -04:00
Ira Weiny	73cdaaeed1	IB/core cleanup: Add const to args - agent_send_response In order to support constant callers of agent_send_response we add const specifiers to the its pointer arguments. Adjust the call tree accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:33:13 -04:00
Ira Weiny	a97e2d86a9	IB/core cleanup: Add const on args - device->process_mad The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:33:13 -04:00
Roland Dreier	1156256811	IB/mlx4: Fix error paths in mlx4_ib_create_flow() The unwinding clean up code are err_create_flow starts at the current index i. That means we shouldn't increment i until we're really sure we won't have to destroy the current flow; otherwise we might increment the index, fail inside an is_bonded block, and end up accessing off the end of the reg_id[] array. This was detected by Coverity (CID 1271229). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:22:31 -04:00
Roland Dreier	18eaf1f195	RDMA/ocrdma: Fix memory leak in _ocrdma_alloc_pd() If ocrdma_get_pd_num() fails, then we need to free the pd struct we allocated. This was detected by Coverity (CID 1271245). Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:22:31 -04:00
Bart Van Assche	5237496781	IB/ipoib: Fix RCU annotations in ipoib_neigh_hash_init() Avoid that sparse complains about ipoib_neigh_hash_init(). This patch does not change any functionality. See also patch "IPoIB: Fix memory leak in the neigh table deletion flow" (commit ID `66172c0993`). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:22:31 -04:00
Faisal Latif	854ace98e7	RDMA/nes: Enable the use of the tos field in the nes driver RDMA/nes: Enable the use of the tos field in the nes driver Signed-off-by: Faisal Latif <Faisal.Latif@intel.com> Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:22:31 -04:00
Steve Wise	68cdba068d	RDMA/iw_cm: Export tos field to iwarp providers rdma-cma/iw_cm: Export tos field to iwarp providers Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-06-02 09:22:30 -04:00
David S. Miller	dda922c831	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-01 22:51:30 -07:00
Linus Torvalds	dae8f283bf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "These are mostly minor fixes, with the exception of the following that address fall-out from recent v4.1-rc1 changes: - regression fix related to the big fabric API registration changes and configfs_depend_item() usage, that required cherry-picking one of HCH's patches from for-next to address the issue for v4.1 code. - remaining TCM-USER -v2 related changes to enforce full CDB passthrough from Andy + Ilias. Also included is a target_core_pscsi driver fix from Andy that addresses a long standing issue with a Scsi_Host reference being leaked on PSCSI device shutdown" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iser-target: Fix error path in isert_create_pi_ctx() target: Use a PASSTHROUGH flag instead of transport_types target: Move passthrough CDB parsing into a common function target/user: Only support full command pass-through target/user: Update example code for new ABI requirements target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem target: Drop signal_pending checks after interruptible lock acquire target: Add missing parentheses target: Fix bidi command handling target/user: Disallow full passthrough (pass_level=0) ISCSI: fix minor memory leak	2015-05-31 11:31:42 -07:00
Matan Barak	c66fa19c40	net/mlx4: Add EQ pool Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-30 23:35:34 -07:00
Matan Barak	48564135cb	net/mlx4_core: Demote simple multicast and broadcast flow steering rules In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are created, always create them at MLX4_DOMAIN_NIC priority (instead of the real priority the function created them at). This is done in order to let multiple functions add broadcast/multicast rules without affecting other functions, which is necessary for DPDK in SRIOV. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-30 23:35:34 -07:00
Christoph Hellwig	7ad34a9367	target: target_core_configfs.h is not needed in fabric drivers Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:39 -07:00
Bart Van Assche	2fe6e721b5	ib_srpt: Remove set-but-not-used variables Detected these variables by building with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:32 -07:00
Bart Van Assche	649ee05499	target: Move task tag into struct se_cmd + support 64-bit tags Simplify target core and target drivers by storing the task tag a.k.a. command identifier inside struct se_cmd. For several transports (e.g. SRP) tags are 64 bits wide. Hence add support for 64-bit tags. (Fix core_tmr_abort_task conversion spec warnings - nab) (Fix up usb-gadget to use 16-bit tags - HCH + bart) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Juergen Gross <jgross@suse.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:31 -07:00
Christoph Hellwig	2650d71e24	target: move transport ID handling to the core Now that struct se_portal_group contains a protocol identifier field we can take all the code to format an parse protocol identifiers in CDBs into common code instead of leaving this to low-level drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:30 -07:00
Christoph Hellwig	2aeeafae6b	target: remove the get_fabric_proto_ident method Now that we store the protocol identifier in the tpg structure we don't need this method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:30 -07:00
Christoph Hellwig	e4aae5af81	target: change core_tpg_register prototype Remove the unneeded fabric_ptr argument, and change the type argument to pass in a SPC protocol identifier. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:27 -07:00
Christoph Hellwig	144bc4c2a4	target: move node ACL allocation to core code Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:42:23 -07:00
Christoph Hellwig	c7d6a80392	target: refactor init/drop_nodeacl methods By always allocating and adding, respectively removing and freeing the se_node_acl structure in core code we can remove tons of repeated code in the init_nodeacl and drop_nodeacl routines. Additionally this now respects the get_default_queue_depth method in this code path as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:41:51 -07:00
Christoph Hellwig	e1750d20e6	target: make the tpg_get_default_depth method optional All fabric drivers except for iSCSI always return 1, so implement that as default behavior. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:41:50 -07:00
Bart Van Assche	afc16604c0	target: Remove first argument of target_{get,put}_sess_cmd() The first argument of these two functions is always identical to se_cmd->se_sess. Hence remove the first argument. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Andy Grover <agrover@redhat.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 22:41:47 -07:00
Roland Dreier	b2feda4feb	iser-target: Fix error path in isert_create_pi_ctx() We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed in the function. That means the cleanup path should use the local pi_ctx variable, not desc->pi_ctx. This was detected by Coverity (CID 1260062). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 20:01:04 -07:00
Amir Vadai	f62b8bb8f2	net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4 Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver extends the existing mlx5 driver with Ethernet functionality. This patch contains the driver entry points but does not include transmit and receive (see the previous patch in the series) routines. It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the Ethernet functionality. Currently, Kconfig is programmed to make Ethernet and Infiniband functionality mutally exclusive. Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND being selected. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-30 18:24:51 -07:00
Saeed Mahameed	938fe83c8d	net/mlx5_core: New device capabilities handling - Query all supported types of dev caps on driver load. - Store the Cap data outbox per cap type into driver private data. - Introduce new Macros to access/dump stored caps (using the auto generated data types). - Obsolete SW representation of dev caps (no need for SW copy for each cap). - Modify IB driver to use new macros for checking caps. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-30 18:23:22 -07:00
Amir Vadai	64ffaa2159	net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memory As David Daney pointed in mlx4_core driver [1], mlx5_core is also misusing the DMA-API. This patch is removing the code that vmap() memory allocated by dma_alloc_coherent(). After this patch, users of this drivers might fail allocating resources on memory fragmeneted systems. This will be fixed later on. [1] - https://patchwork.ozlabs.org/patch/458531/ CC: David Daney <david.daney@cavium.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-30 18:22:37 -07:00
Luis R. Rodriguez	9c27847dda	kernel/params: constify struct kernel_param_ops uses Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-05-28 11:32:10 +09:30
Or Gerlitz	74d4943fbb	net/mlx4_core: Modify port values when generting EQEs for VFs As part of enabling single ported VFs over IB ports we need to handle some of the flows for generting EQ events for VFs which don't come into play under Eth ports. This mainly includes port management events derived from changes of the phyiscal port (lid change, client re-register, down/up, etc), VF pkey table changes and VF guid changes initiated by the IB driver. (1) make sure that events are generated only for VFs sitting on the relevant physical port (under the ALL_SLAVES flow). (2) before generating the event, convert from physical (one or two) to VF port (always equals one). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-24 23:05:09 -04:00
Or Gerlitz	430910b1b9	IB/mlx4: Convert slave port before building address-handle When multiplexling a MAD sent from VF, we should convert the port used by the guest to send the packet to the actual physical port which will be used to transmit the packet, before building the relevant address-handle (AH). This is needed under VPI for single ported VFs, since the code that builds the AH (mlx4_ib_query_ah()) makes decisions based on the input port. If we use the port number provided by the guest, it might have different protocol vs. the one this packat has to go from, and hence the result could be wrong. So far, the conversion was done after the AH was built and it worked for single ported Eth VFs which were not enabled under VPI. When adding support for single ported IB VFs and VPI, we hit that. Fixes: `449fc48866` ('net/mlx4: Adapt code for N-Port VF') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-24 23:05:09 -04:00
Matthew Finlay	c07678bb01	IB/cma: Fix broken AF_IB UD support Support for using UD and AF_IB is currently broken. The IB_CM_SIDR_REQ_RECEIVED message is not handled properly in cma_save_net_info() and we end up falling into code that will try and process the request as ipv4/ipv6, which will end up failing. The resolution is to add a check for the SIDR_REQ and call cma_save_ib_info() with a NULL path record. Change cma_save_ib_info() to copy the src sib info from the listen_id when the path record is NULL. Reported-by: Hari Shankar <Hari.Shankar@netapp.com> Signed-off-by: Matt Finlay <matt@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 16:15:56 -04:00
Doug Ledford	175e8efe69	Merge branches 'bart-srp', 'generic-errors', 'ira-cleanups' and 'mwang-v8' into k.o/for-4.2	2015-05-20 16:12:40 -04:00
Ira Weiny	5d9fb04406	IB/core: Change rdma_protocol_iboe to roce After discussion upstream, it was agreed to transition the usage of iboe in the kernel to roce. This keeps our terminology consistent with what was finalized in the IBTA Annex 16 and IBTA Annex 17 publications. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 15:58:19 -04:00
Ted Kim	c29ed5a456	ib/cm: Change reject message type when destroying cm_id Problem reported by: Ted Kim <ted.h.kim@oracle.com>: We have a case where a Linux system and a non-Linux system are trying to interoperate. The Linux host is the active side and starts the connection establishment, but later decides to not go through with the connection setup and does rdma_destroy_id(). The rdma_destroy_id() eventually works its way down to cm_destroy_id() in core/cm.c, where a REJ is sent. The non-Linux system has some trouble recognizing the REJ because of: A. CM states which can't receive the REJ B. Some issues about REJ formatting (missing comm ID) ISSUE A: That part of the spec says, a Consumer Reject REJ can be sent for a connection abort, but it goes further and says: can send a REJ message with a "Consumer Reject" Reason code if they are in a CM state (i.e. REP Rcvd, MRA(REP) Sent, REQ Rcvd, MRA Sent) that allows a REJ to be sent (lines 35-38). Of the states listed there in that sentence, it would seem to limit the active side to using the Consumer Reject (for the abort case) in just the REP-Rcvd and MRA-REP-Sent states. That is basically only after the active side sees a REP (or alternatively goes down the state transitions to timeout in which case a Timeout REJ is sent). As a fix, in cm-destroy-id() move the IB-CM-MRA-REQ-RCVD case to the same as REQ-SENT. Essentially, make a REJ sent after getting an MRA on active side a timeout rather than Consumer- Reject, which is arguably more correct with the CM state diagrams previous to getting a REP. Signed-off-by: Ted Kim <ted.h.kim@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>	2015-05-20 12:41:38 -04:00
Ira Weiny	f9b22e355d	IB/core: Convert core to use bitfield for caps Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:38:43 -04:00
Ira Weiny	7738613e7c	IB/core: Add per port immutable struct to ib_device As of commit `5eb620c81c` "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:38:13 -04:00
Ira Weiny	26c454288a	IB/user_mad: Fix buggy usage of port index The addition of the rdma_cap_ib_mad is technically broken in ib_umad_remove_one because the loop "i" value is not a port value. This bug resulted in the ib_umad failing to properly remove its resources when the core capability functions were converted to bit fields. NOTE: e17371d73908 did not result in broken behavior on its own. It was only an issue when the implementation of rdma_cap_ib_mad was changed. Pass the port value to rdma_cap_ib_mad. Fixes: e17371d73908 ("IB/Verbs: Use management helper rdma_cap_ib_mad()") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:37:34 -04:00
Ira Weiny	ab8be619b8	IB/user_mad: Use new start/end port functions Use the new common rdma_[start\|end]_port functions instead of using local variables and figuring it out on the fly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:36:17 -04:00
Ira Weiny	f766c58fa3	IB/mad: Add const qualifiers to query only functions The following functions only need read access to the data passed to them. ib_mad_kernel_rmpp_agent is_rmpp_data_mad rcv_has_same_gid ib_find_send_mad Clarify with const specifiers Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:34:45 -04:00
Ira Weiny	8bf4b30c24	IB/mad: Clean up rcv_has_same_class rcv_has_same_class only needs access to the MAD header specify WR and Receive WC as const Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:34:45 -04:00
Ira Weiny	9690930854	IB/mad: Change ib_response_mad signature arguments ib_response_mad only needs read access to the MAD header, not write access to the entire mad struct, so replace struct ib_mad with const struct ib_mad_hdr Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:34:10 -04:00
Ira Weiny	77f60833b8	IB/mad: Change validate_mad signature arguments validate_mad only needs read access to the MAD header, not write access to the entire mad struct, so replace struct ib_mad with const struct ib_mad_hdr Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-20 12:32:58 -04:00
Sagi Grimberg	ea8a1616a7	iser-target: Align to generic logging helpers Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:44:22 -04:00
Sagi Grimberg	871e00afa4	IB/iser: Align to generic logging helpers Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:44:22 -04:00
Sagi Grimberg	57363d98cf	IB/srp: Align to generic logging helpers Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:44:22 -04:00
Sagi Grimberg	2b1b5b6012	IB/core, cma: Nice log-friendly string helpers Some of us keep revisiting the code to decode enumerations that appear in out logs. Let's borrow the nice logging helpers that exists in xprtrdma and rds for CMA events, IB events and WC statuses. Reviewd-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:43:52 -04:00
Bart Van Assche	985aa49556	IB/srp: Add 64-bit LUN support The SCSI standard defines 64-bit values for LUNs. Large arrays employing large or hierarchical LUN numbers become more and more common. So update the SRP initiator to use 64-bit LUN numbers. See also Hannes Reinecke, commit `9cb78c16f5` ("scsi: use 64-bit LUNs"), June 2014. The largest LUN number that has been tested is 0xd2003fff00000000. Checked the following structure sizes with gdb: * sizeof(struct srp_cmd) = 48 * sizeof(struct srp_tsk_mgmt) = 48 * sizeof(struct srp_aer_req) = 36 The ibmvscsi changes have been compile tested only (on a PPC system). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:56 -04:00
Bart Van Assche	bbac5ccff4	IB/srp: Remove !ch->target tests from the reconnect code Remove the !ch->target tests from the reconnect code. These tests are not needed: upon entry of srp_rport_reconnect() it is guaranteed that all ch->target pointers are non-NULL. None of the functions srp_new_cm_id(), srp_finish_req(), srp_create_ch_ib() nor srp_connect_ch() modifies this pointer. srp_free_ch_ib() is never called concurrently with srp_rport_reconnect(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:56 -04:00
Bart Van Assche	47513cf4f4	IB/srp: Remove a superfluous check from srp_free_req_data() The function srp_free_req_data() does not use ch->target. Hence remove the ch->target != NULL check. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:56 -04:00
Bart Van Assche	33ab3e5ba2	IB/srp: Rearrange module description Move the module version and release date into separate fields. This makes the modinfo output easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:56 -04:00
Bart Van Assche	45c37cad40	IB/srp: Remove superfluous casts A long time ago the data type int64_t was declared as long long on x86 systems and as long on PPC systems. Today that data type is declared as long long on all Linux architectures. This means that the casts from uint64_t into unsigned long long are superfluous. Remove these superfluous casts. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:55 -04:00
Bart Van Assche	a44074f14b	IB/srp: Fix reconnection failure handling Although it is possible to let SRP I/O continue if a reconnect results in a reduction of the number of channels, the current code does not handle this scenario correctly. Instead of making the reconnect code more complex, consider this as a reconnection failure. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> #v3.19 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:55 -04:00
Bart Van Assche	c014c8cd31	IB/srp: Fix connection state tracking Reception of a DREQ message only causes the state of a single channel to change. Hence move the 'connected' member variable from the target to the channel data structure. This patch avoids that following false positive warning can be reported by srp_destroy_qp(): WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xa6/0x120 [ib_srp]() Call Trace: [<ffffffff8106e10f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8106e16a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0440226>] srp_destroy_qp+0xa6/0x120 [ib_srp] [<ffffffffa0440322>] srp_free_ch_ib+0x82/0x1e0 [ib_srp] [<ffffffffa044408b>] srp_create_target+0x7ab/0x998 [ib_srp] [<ffffffff81346f60>] dev_attr_store+0x20/0x30 [<ffffffff811dd90f>] sysfs_write_file+0xef/0x170 [<ffffffff8116d248>] vfs_write+0xc8/0x190 [<ffffffff8116d411>] sys_write+0x51/0x90 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> #v3.19 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:55 -04:00
Bart Van Assche	8de9fe3a1d	IB/srp: Fix a connection setup race Avoid that receiving a DREQ while RDMA channels are being established causes target->qp_in_error to be reset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> #v3.19 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:55 -04:00
Bart Van Assche	fb49c8bbaa	IB/srp: Remove an extraneous scsi_host_put() from an error path Fix a scsi_get_host() / scsi_host_put() imbalance in the error path of srp_create_target(). See also patch "IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning" (commit ID `34aa654ecb`). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> #v3.19 Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:55 -04:00
Ira Weiny	b78d28a2af	IB/mad: Clean up comments in smi.c Return values of 0 do not make sense for functions which return enum smi_action Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:24 -04:00
Ira Weiny	c597eee506	IB/mad: Rename is_data_mad to is_rmpp_data_mad is_rmpp_data_mad is more descriptive for this function. Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:24 -04:00
Ira Weiny	0cf18d7723	IB/core: Create common start/end port functions Previously start_port and end_port were defined in 2 places, cache.c and device.c and this prevented their use in other modules. Make these common functions, change the name to reflect the rdma name space, and update existing users. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:06 -04:00
Michael Wang	227128fc68	IB/Verbs: Use management helper rdma_cap_eth_ah() Introduce helper rdma_cap_eth_ah() to help us check if the port of an IB device support Ethernet Address Handler. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:06 -04:00
Michael Wang	30a74ef41d	IB/Verbs: Use management helper rdma_cap_af_ib() Introduce helper rdma_cap_af_ib() to help us check if the port of an IB device support Native Infiniband Address. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	a31ad3b0e3	IB/Verbs: Use management helper rdma_cap_ib_mcast() Introduce helper rdma_cap_ib_mcast() to help us check if the port of an IB device support Infiniband Multicast. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	fe53ba2f0c	IB/Verbs: Use management helper rdma_cap_ib_sa() Introduce helper rdma_cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	042153306d	IB/Verbs: Use management helper rdma_cap_iw_cm() Introduce helper rdma_cap_iw_cm() to help us check if the port of an IB device support IWARP Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	72219cea8e	IB/Verbs: Use management helper rdma_cap_ib_cm() Introduce helper rdma_cap_ib_cm() to help us check if the port of an IB device support Infiniband Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	29541e3add	IB/Verbs: Use management helper rdma_cap_ib_smi() Introduce helper rdma_cap_ib_smi() to help us check if the port of an IB device support Infiniband Subnet Management Interface. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	c757dea816	IB/Verbs: Use management helper rdma_cap_ib_mad() Introduce helper rdma_cap_ib_mad() to help us check if the port of an IB device support Infiniband Management Datagrams. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	fef60902ef	IB/Verbs: Reform rest part in IB-core cma Use raw management helpers to reform rest part in IB-core cma. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	7c11147da2	IB/Verbs: Reform cma_acquire_dev() Reform cma_acquire_dev() with management helpers, introduce cma_validate_port() to make the code more clean. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:05 -04:00
Michael Wang	5c9a52828a	IB/Verbs: Reform mcast related part in IB-core cma Use raw management helpers to reform mcast related part in IB-core cma. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	c72f21893e	IB/Verbs: Reform route related part in IB-core cma Use raw management helpers to reform route related part in IB-core cma. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	21655afc62	IB/Verbs: Reform cm related part in IB-core cma/ucm Use raw management helpers to reform cm related part in IB-core cma/ucm. Few checks focus on the device cm type rather than the port capability, directly pass port 1 works currently, but can't support mixing cm type device in future. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	55045b2577	IB/Verbs: Reform IB-core verbs Use raw management helpers to reform IB-core verbs Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	8e37ab68fe	IB/Verbs: Reform IB-ulp ipoib Use raw management helpers to reform IB-ulp ipoib. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	613466cb7f	IB/Verbs: Reform IB-core multicast Use raw management helpers to reform IB-core multicast. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	08e3681ab8	IB/Verbs: Reform IB-core sa_query Use raw management helpers to reform IB-core sa_query. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	091e6a4c42	IB/Verbs: Reform IB-core cm Use raw management helpers to reform IB-core cm. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	827f2a8b0a	IB/Verbs: Reform IB-core mad/agent/user_mad Use raw management helpers to reform IB-core mad/agent/user_mad. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:04 -04:00
Michael Wang	6b90a6d66b	IB/Verbs: Implement new callback query_protocol() Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 13:35:03 -04:00
Selvin Xavier	235dfcd47e	RDMA/ocrdma: Update ocrdma version number Updating the driver version to 10.6.0.0 Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Naga Irrinki	72d8a013d7	RDMA/ocrdma: Fail connection for MTU lesser than 512 HW currently restricts the IB MTU range between 512 and 4096. Fail connection for MTUs lesser than 512. Signed-off-by: Naga Irrinki <naga.irrinki@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Mitesh Ahuja	d27b2f15eb	RDMA/ocrdma: Fix dmac resolution for link local address rdma_addr_find_dmac_by_grh fails to resolve dmac for link local address. Use rdma_get_ll_mac to resolve the link local address. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Mitesh Ahuja	59582d86e2	RDMA/ocrdma: Prevent allocation of DPP PDs if FW doesnt support it If DPP PDs are not supported by the FW, allocate only normal PDs. Signed-off-by: Mitesh Ahuja <mitesh.ahuja@avagotech.com> Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Mitesh Ahuja	038ab8b743	RDMA/ocrdma: Fix the request length for RDMA_QUERY_QP mailbox command to FW. Fix ocrdma_query_qp to pass correct mailbox request length to FW. Signed-off-by: Mitesh Ahuja <mitesh.ahuja@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Devesh Sharma	6f5deab0be	RDMA/ocrdma: Use VID 0 if PFC is enabled and vlan is not configured If the adapter ports are in PFC mode and VLAN is not configured, use vlan tag 0 for RoCE traffic. Also, log an advisory message in system logs. Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Devesh Sharma	fe48822bc6	RDMA/ocrdma: Fix QP state transition in destroy_qp Don't move QP to error state, if QP is in reset state during QP destroy operation. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Devesh Sharma	5e6f9237f8	RDMA/ocrdma: Report EQ full fatal error Detect when Event Queue (EQ) becomes full and print a warning message. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Selvin Xavier	314fdf4473	RDMA/ocrdma: Fix EQ destroy failure during driver unload Changing the destroy sequence of mailbox queue and event queues. FW expects mailbox queue to be destroyed before desroying the EQs. Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-18 10:25:24 -04:00
Joe Perches	f4f01b542c	infiniband: Remove duplicated KERN_<LEVEL> from pr_<level> uses These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause bad logging output so remove them. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-12 15:52:37 -04:00
Mike Marciniszyn	ec40f925e0	IB/qib: fix test of unsigned variable Commit `d4988623cc` ("IB/qib: use arch_phys_wc_add()") adjusted mtrr inititialization to use the new interface. Unfortunately, the new interface returns a signed value and the patch tested the unsigned wc_cookie. Fix the issue by changing the type of wc_cookie to int. For the success case the ret left at zero to avoid a warning from the caller. For failure wc_cookie is used as the ret. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-12 13:55:41 -04:00
Tatyana Nikolova	ec04847c0c	RDMA/core: Fix for parsing netlink string attribute The string iwpm_ulib_name is recorded in a nlmsg as a netlink attribute. Without this fix parsing of the nlmsg by the userspace port mapper service fails because of unknown attribute length, causing the port mapper service not to register the client, which has sent the nlmsg. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Cc: <stable@vger.kernel.org> #v3.16 Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-12 13:03:04 -04:00
Steve Wise	940fd304d2	iw_cxgb4: use wildcard mapping for getting remote addr info For listening endpoints bound to the wildcard address, we need to pass the wildcard address mapping to iwpm_get_remote_info() instead of the mapped address of the new child connection. Without this fix, and with iwarp port mapping enabled, each iw_cxgb4 connection that is spawned from a listening endpoint bound to the wildcard address, will generate an annoying dmesg entry about failing to find the remote address mapping info, and the connection state displayed in debugfs under /sys/kernel/debug/iw_cxgb4/<pci-slot-no>/eps will not have the peer's address/port mapping info. The connection still works though. Fixes: `5b6b8fe` ("RDMA/cxgb4: Report the actual address of the remote connecting peer") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-11 17:16:49 -04:00
Nicholas Mc Guire	94634e9861	IB/ehca: use correct destination for memcpy Using an element of a struct as the address for the memcpy of the whole struct may introduce a buffer overflow and does not help readability either simply pass the real thing as first argument to memcpy. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-11 17:14:37 -04:00
Bart Van Assche	8f71c1a27b	IPoIB/CM: Fix indentation level See also patch "IPoIB/cm: Add connected mode support for devices without SRQs" (commit ID `68e995a295`). Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Hariprasad S	179d03bbfd	iw_cxgb4: Remove negative advice dmesg warnings Remove these log messages in favor of per-endpoint counters as well as device-global counters that can be inspected via debugfs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
David Ahern	0d0f738f6a	IB/core: Fix unaligned accesses Addresses the following kernel logs seen during boot of sparc systems: Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Honggang LI	471e705832	IB/core: change rdma_gid2ip into void function as it always return zero Signed-off-by: Honggang Li <honli@redhat.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Luis R. Rodriguez	d4988623cc	IB/qib: use arch_phys_wc_add() This driver already makes use of ioremap_wc() on PIO buffers, so convert it to use arch_phys_wc_add(). The qib driver uses a mmap() special case for when PAT is not used, this behaviour used to be determined with a module parameter but since we have been asked to just remove that module parameter this checks for the WC cookie, if not set we can assume PAT was used. If its set we do what we used to do for the mmap for when MTRR was enabled. The removal of the module parameter is OK given that Andy notes that even if users of module parameter are still around it will not prevent loading of the module on recent kernels. Cc: Doug Ledford <dledford@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@purestorage.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: infinipath@intel.com Cc: linux-rdma@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xensource.com Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Luis R. Rodriguez	87a26e976c	IB/qib: add acounting for MTRR There is no good reason not to, we eventually delete it as well. Cc: Toshi Kani <toshi.kani@hp.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Airlie <airlied@redhat.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Guy Shapiro	325ad0617a	IB/core: dma unmap optimizations While unmapping an ODP writable page, the dirty bit of the page is set. In order to do so, the head of the compound page is found. Currently, the compound head is found even on non-writable pages, where it is never used, leading to unnecessary cpu barrier that impacts performance. This patch moves the search for the compound head to be done only when needed. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Guy Shapiro	c1d383b578	IB/core: dma map/unmap locking optimizations Currently, while mapping or unmapping pages for ODP, the umem mutex is locked and unlocked once for each page. Such lock/unlock operation take few tens to hundreds of nsecs. This makes a significant impact when mapping or unmapping few MBs of memory. To avoid this, the mutex should be locked only once per operation, and not per page. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Acked-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:02 -04:00
Steve Wise	5b6b8fe640	RDMA/cxgb4: Report the actual address of the remote connecting peer Get the actual (non-mapped) ip/tcp address of the connecting peer from the port mapper Also setup the passive side endpoint to correctly display the actual and mapped addresses for the new connection. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Tatyana Nikolova	230da36ae9	RDMA/nes: Report the actual address of the remote connecting peer Get the actual (non-mapped) ip/tcp address of the connecting peer from the port mapper and report the address info to the user space application at the time of connection establishment Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Tatyana Nikolova	6eec177461	RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients Add functionality to enable the port mapper on the passive side to provide to its clients the actual (non-mapped) ip/tcp address information of the connecting peer 1) Adding remote_info_cb() to process the address info of the connecting peer The address info is provided by the user space port mapper service when the connection is initiated by the peer 2) Adding a hash list to store the remote address info 3) Adding functionality to add/remove the remote address info After the info has been provided to the port mapper client, it is removed from the hash list Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Hariprasad S	4a75a86c8d	iw_cxgb4: enforce qp/cq id requirements Currently the iw_cxgb4 implementation requires the qp and cq qid densities to match as well as the qp and cq id ranges. So fail a device open if the device configuration doesn't meet the requirements. The reason for these restictions has to do with the fact that IQ qid X has a UGTS register in the same bar2 page as EQ qid X. Thus both qids need to be allocated to the same user process for security reasons. The logic that does this (the qpid allocator in iw_cxgb4/resource.c) handles this but requires the above restrictions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Hariprasad S	09ece8b9e9	iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs For T5, we must not use the kdb/kgts registers, in order avoid db drops under extreme loads. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Hariprasad S	6198dd8d7a	iw_cxgb4: 32b platform fixes - get_dma_mr() was using ~0UL which is should be ~0ULL. This causes the DMA MR to get setup incorrectly in hardware. - wr_log_show() needed a 64b divide function div64_u64() instead of doing division directly. - fixed warnings about recasting a pointer to a u64 Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Hariprasad S	0b7410471d	iw_cxgb4: Cleanup register defines/MACROS Cleanup macros and register defines for consistency Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Jason Gunthorpe	285214409a	RDMA/CMA: Canonize IPv4 on IPV6 sockets properly When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to canonize the address family to IPv4, but does not properly process the listening sockaddr to get the listening port, and does not properly set the address family of the canonized sockaddr. Fixes: `e51060f08a` ("IB: IP address based RDMA connection manager") Cc: <stable@vger.kernel.org> Reported-By: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 09:18:01 -04:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
Linus Torvalds	c6668726d2	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Lots of activity in target land the last months. The highlights include: - Convert fabric drivers tree-wide to target_register_template() (hch + bart) - iser-target hardening fixes + v1.0 improvements (sagi) - Convert iscsi_thread_set usage to kthread.h + kill iscsi_target_tq.c (sagi + nab) - Add support for T10-PI WRITE_STRIP + READ_INSERT operation (mkp + sagi + nab) - DIF fixes for CONFIG_DEBUG_SG=y + UNMAP file emulation (akinobu + sagi + mkp) - Extended TCMU ABI v2 for future BIDI + DIF support (andy + ilias) - Fix COMPARE_AND_WRITE handling for NO_ALLLOC drivers (hch + nab) Thanks to everyone who contributed this round with new features, bug-reports, fixes, cleanups and improvements. Looking forward, it's currently shaping up to be a busy v4.2 as well" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (69 commits) target: Put TCMU under a new config option target: Version 2 of TCMU ABI target: fix tcm_mod_builder.py target/file: Fix UNMAP with DIF protection support target/file: Fix SG table for prot_buf initialization target/file: Fix BUG() when CONFIG_DEBUG_SG=y and DIF protection enabled target: Make core_tmr_abort_task() skip TMFs target/sbc: Update sbc_dif_generate pr_debug output target/sbc: Make internal DIF emulation honor ->prot_checks target/sbc: Return INVALID_CDB_FIELD if DIF + sess_prot_type disabled target: Ensure sess_prot_type is saved across session restart target/rd: Don't pass incomplete scatterlist entries to sbc_dif_verify_* target: Remove the unused flag SCF_ACK_KREF target: Fix two sparse warnings target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handling target: simplify the target template registration API target: simplify target_xcopy_init_pt_lun target: remove the unused SCF_CMD_XCOPY_PASSTHROUGH flag target/rd: reduce code duplication in rd_execute_rw() tcm_loop: fixup tpgt string to integer conversion ...	2015-04-24 10:22:09 -07:00
Linus Torvalds	7c034dfd58	InfiniBand/RDMA updates for 4.1: - IPoIB fixes from Doug Ledford and Erez Shitrit - iSER updates from Sagi Grimberg - mlx4 GUID handling changes from Yishai Hadas - other misc fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJVN9SzAAoJEENa44ZhAt0hWq4QAJRFrwoe9ubTextSHeTU0FkY CydiQtGWrhyAHTX/KtdB1Uv9FzGHc6gqkAOXImouacYTM9ffypMF6Oj4xIYIMQtz MvNlNm07KOtQYlubiaZWcP5BjdLfMZjQxb03/9smygLTBjm80dAEt5X1znx7YrqI ZfE+ibPdvRqVEvFZKfT2U0kGU6oEVKrbJEiUCoJPwwcghDZQl18YmGOxt5qdI2uO V+71ozwozT8utSIl7S2YTJZBdkJ7tLrqrX2D/D2jUAmh1rqHIDrsXXiZ44UJj82i oXuwqmHXfq1LfuC9kxCX5JJpGeLE7E3OoxM1zIev31710zPA0v57rNKKweCi2Tj6 Z36B0SIRV4ipWr/sBhVDr1Ffc/uap3DOIEU9Z+t8rwhELCEVuxmNaNb0K1e5nPiy YOQYp/ctC0NslM4mqQJLhGMVl6H8PjodbM1whnYZLsF1+8clNvdtLYzy/cA5fGbO tngUGXu0YZGdwvfuQhi5FB45XLaErJaPcMH0QRI5G0JgtjvbzXiMlqWtekTUBi7W DJNQlVRI4S1RYRBYkq709ymXiWwTeh3rhH+ZJpM+aY8b0NR/lx+dNyesNG+7GBJH y5UOOUck0w+JbQzZo264I6a5e8pXq3kMi3BH8pF4Jbo5WvxSF6uriXb6Q1JzfH20 Jn0J6W9ghCSfrhMI1zgQ =v1jB -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA updates from Roland Dreier: - IPoIB fixes from Doug Ledford and Erez Shitrit - iSER updates from Sagi Grimberg - mlx4 GUID handling changes from Yishai Hadas - other misc fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (51 commits) mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures IB/iser: Rewrite bounce buffer code path IB/iser: Bump version to 1.6 IB/iser: Remove code duplication for a single DMA entry IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr IB/iser: Modify struct iser_mem_reg members IB/iser: Make fastreg pool cache friendly IB/iser: Move PI context alloc/free to routines IB/iser: Move fastreg descriptor pool get/put to helper functions IB/iser: Merge build page-vec into register page-vec IB/iser: Get rid of struct iser_rdma_regd IB/iser: Remove redundant assignments in iser_reg_page_vec IB/iser: Move memory reg/dereg routines to iser_memory.c IB/iser: Don't pass ib_device to fall_to_bounce_buff routine IB/iser: Remove a redundant struct iser_data_buf IB/iser: Remove redundant cmd_data_len calculation IB/iser: Fix wrong calculation of protection buffer length IB/iser: Handle fastreg/local_inv completion errors IB/iser: Fix unload during ep_poll wrong dereference ib_srpt: convert printk's to pr_* functions ...	2015-04-22 11:50:05 -07:00
Linus Torvalds	388f997620	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix verifier memory corruption and other bugs in BPF layer, from Alexei Starovoitov. 2) Add a conservative fix for doing BPF properly in the BPF classifier of the packet scheduler on ingress. Also from Alexei. 3) The SKB scrubber should not clear out the packet MARK and security label, from Herbert Xu. 4) Fix oops on rmmod in stmmac driver, from Bryan O'Donoghue. 5) Pause handling is not correct in the stmmac driver because it doesn't take into consideration the RX and TX fifo sizes. From Vince Bridgers. 6) Failure path missing unlock in FOU driver, from Wang Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: dsa: use DEVICE_ATTR_RW to declare temp1_max netns: remove BUG_ONs from net_generic() IB/ipoib: Fix ndo_get_iflink sfc: Fix memcpy() with const destination compiler warning. altera tse: Fix network-delays and -retransmissions after high throughput. net: remove unused 'dev' argument from netif_needs_gso() act_mirred: Fix bogus header when redirecting from VLAN inet_diag: fix access to tcp cc information tcp: tcp_get_info() should fetch socket fields once net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state() skbuff: Do not scrub skb mark within the same name space Revert "net: Reset secmark when scrubbing packet" bpf: fix two bugs in verification logic when accessing 'ctx' pointer bpf: fix bpf helpers to use skb->mac_header relative offsets stmmac: Configure Flow Control to work correctly based on rxfifo size stmmac: Enable unicast pause frame detect in GMAC Register 6 stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree stmmac: Add defines and documentation for enabling flow control stmmac: Add properties for transmit and receive fifo sizes stmmac: fix oops on rmmod after assigning ip addr ...	2015-04-17 16:31:08 -04:00
Erez Shitrit	2c15395974	IB/ipoib: Fix ndo_get_iflink Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: `5aa7add8f1` ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li <honli@redhat.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+ Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-17 15:21:04 -04:00
Michal Hocko	f72f116a2a	cxgb4: drop __GFP_NOFAIL allocation set_filter_wr is requesting __GFP_NOFAIL allocation although it can return ENOMEM without any problems obviously (t4_l2t_set_switching does that already). So the non-failing requirement is too strong without any obvious reason. Drop __GFP_NOFAIL and reorganize the code to have the failure paths easier. The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL and then checks the return value and returns -ENOMEM on failure. This doesn't make any sense what so ever. Either the allocation cannot fail or it can. del_filter_wr seems to be safe as well because the filter entry is not marked as pending and the return value is propagated up the stack up to c4iw_destroy_listen. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mel Gorman <mgorman@suse.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Jan Kara <jack@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-16 12:03:01 -04:00
Doug Ledford	c1c2fef6cf	Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1	2015-04-15 16:24:49 -04:00
Sagi Grimberg	ba943fb237	IB/iser: Rewrite bounce buffer code path In some rare cases, IO operations may be not aligned to page boundaries. This prevents iser from performing fast memory registration. In order to overcome that iser uses a bounce buffer to carry the transaction. We basically allocate a buffer in the size of the transaction and perform a copy. The buffer allocation using kmalloc is too restrictive since it requires higher order (atomic) allocations for large transactions (which may result in memory exhaustion fairly fast for some workloads). We rewrite the bounce buffer code path to allocate scattered pages and perform a copy between the transaction sg and the bounce sg. Reported-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:14 -04:00
Sagi Grimberg	4fcd1470a0	IB/iser: Bump version to 1.6 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	ad1e567242	IB/iser: Remove code duplication for a single DMA entry In singleton scatterlists, DMA memory registration code is taken both for Fastreg and FMR code paths. Move it to a function. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	6ef8bb837d	IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr Instead of passing ib_sge as output variable, we pass the mem_reg pointer to have the routines fill the rkey as well. This reduces code duplication and extra assignments. This is a preparation step to unify some registration logics together. Also, pass iser_fast_reg_mr the fastreg descriptor directly. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	90a6684c30	IB/iser: Modify struct iser_mem_reg members No need to keep lkey, va, len variables, we can keep them as struct ib_sge. This will help when we change the memory registration logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	8b95aa2c1b	IB/iser: Make fastreg pool cache friendly Memory regions are resources that are saved in the device caches. Increase the probability for a cache hit by adding the MRU descriptor to pool head. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	4dec2a27e3	IB/iser: Move PI context alloc/free to routines Make iser_[create\|destroy]_fastreg_desc shorter, more readable and easily extendable. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	bd8b944eee	IB/iser: Move fastreg descriptor pool get/put to helper functions Instead of open-coding connection fastreg pool get/put, we introduce iser_reg_desc[get\|put] helpers. We aren't setting these static as this will be a per-device routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg a bit. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	f0e35c27a5	IB/iser: Merge build page-vec into register page-vec No need for these two separate. Keep it in a single routine like in the fastreg case. This will also make iser_reg_page_vec closer to iser_fast_reg_mr arguments. This is a preparation step for registration flow refactor. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	b130ededff	IB/iser: Get rid of struct iser_rdma_regd This struct members other than struct iser_mem_reg are unused, so remove it altogether. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	6847fdeb0b	IB/iser: Remove redundant assignments in iser_reg_page_vec Buffer length was assigned twice, and no reason to set va to io_addr and then add the offset, just set va to io_addr + offset. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	d03e61d036	IB/iser: Move memory reg/dereg routines to iser_memory.c As memory registration/de-registration methods, lets move them to their natural location. While we're at it, make iser_reg_page_vec routine static. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	5640832590	IB/iser: Don't pass ib_device to fall_to_bounce_buff routine No need to pass that, we can take it from the task. In a later stage, this function will be invoked according to a device capability. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	e3784bd1d9	IB/iser: Remove a redundant struct iser_data_buf No need to keep two iser_data_buf structures just in case we use mem copy. We can avoid that just by adding a pointer to the original sg. So keep only two iser_data_buf per command (data and protection) and pass the relevant data_buf to bounce buffer routine. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	ecc3993a2a	IB/iser: Remove redundant cmd_data_len calculation This code was added before we had protection data length calculation (in iser_send_command), so we needed to calc the sg data length from the sg itself. This is not needed anymore. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	a065fe6aa2	IB/iser: Fix wrong calculation of protection buffer length This length miss-calculation may cause a silent data corruption in the DIX case and cause the device to reference unmapped area. Fixes: `d77e65350f` ('libiscsi, iser: Adjust data_length to include protection information') Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	30bf1d58ae	IB/iser: Handle fastreg/local_inv completion errors Fast registration and local invalidate work requests can also fail. We should call error completion handler for them. Reported-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	c4de4663e0	IB/iser: Fix unload during ep_poll wrong dereference In case the user unloaded ib_iser while ep_connect is in progress, we need to destroy the endpoint although ep_disconnect wasn't invoked (we detect this by the iser conn state != DOWN). However, if we got an REJECTED/UNREACHABLE CM event we move the connection state to DOWN which will prevent us from destroying the endpoint in the module unload stage. Fix this by setting the connection state to TERMINATING in iser_conn_error so we can still destroy the endpoint at unload stage. Reported-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Doug Ledford	9f5d32af09	ib_srpt: convert printk's to pr_* functions The driver already defined the pr_format, it just hadn't been converted to use pr_info, pr_warn, and pr_err instead of the equivalent printks. Convert so that messages from the driver are now properly tagged with their driver name and can be more easily debugged. In addition, a number of these printk's were not newline terminated, so fix that at the same time. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:54 -04:00
Bart Van Assche	56b5390caf	IB/srp: Use P_Key cache for P_Key lookups This change slightly reduces the time needed to log in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: David Dillow <dave@thedillows.org> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:54 -04:00
Sebastian Ott	cc47d369b5	infiniband/mlx4: check for mapping error Since ib_dma_map_single can fail use ib_dma_mapping_error to check for errors. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:39 -04:00
Sébastien Dugué	a233c4b54c	ib_uverbs: Fix pages leak when using XRC SRQs Hello, When an application using XRCs abruptly terminates, the mmaped pages of the CQ buffers are leaked. This comes from the fact that when resources are released in ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their refcount is not 0. When creating an XRC SRQ, we increment the associated CQ refcount. This refcount is only decremented when the SRQ is released. Therefore we need to release the SRQs prior to the CQs to make sure that all references to the CQs are gone before trying to release these. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:39 -04:00
Erez Shitrit	ca9b590caa	IB/mlx4: Fix WQE LSO segment calculation The current code decreases from the mss size (which is the gso_size from the kernel skb) the size of the packet headers. It shouldn't do that because the mss that comes from the stack (e.g IPoIB) includes only the tcp payload without the headers. The result is indication to the HW that each packet that the HW sends is smaller than what it could be, and too many packets will be sent for big messages. An easy way to demonstrate one more aspect of the problem is by configuring the ipoib mtu to be less than 2hlen (256) and then run app sending big TCP messages. This will tell the HW to send packets with giant (negative value which under unsigned arithmetics becomes a huge positive one) length and the QP moves to SQE state. Fixes: `b832be1e40` ('IB/mlx4: Add IPoIB LSO support') Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	0e5544d9bf	IB/ipoib: Remove IPOIB_MCAST_RUN bit After Doug Ledford's changes there is no need in that bit, it's semantic becomes subset of the IPOIB_FLAG_OPER_UP bit. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	1e85b806f9	IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's Whenever there is no path->ah to the destination, keep only defined number of skb's. Otherwise there are cases that the driver can keep infinite list of skb's. For example, when one device want to send unicast arp to the destination, and from some reason the SM doesn't respond, the driver currently keeps all the skb's. If that unicast arp traffic stopped, all these skb's are kept by the path object till the interface is down. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	2c01073095	IB/ipoib: Handle QP in SQE state As the result of a completion error the QP can moved to SQE state by the hardware. Since it's not the Error state, there are no flushes and hence the driver doesn't know about that. The fix creates a task that after completion with error which is not a flush tracks the QP state and if it is in SQE state moves it back to RTS. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	3fd0605caa	IB/ipoib: Update broadcast record values after each successful join request Update the cached broadcast record in the priv object after every new join of this broadcast domain group. These values are needed for the port configuration (MTU size) and to all the new multicast (non-broadcast) join requests initial parameters. For example, SM starts with 2K MTU for all the fabric, and after that it restarts (or handover to new SM) with new port configuration of 4K MTU. Without using the new values, the driver will keep its old configuration of 2K and will not apply the new configuration of 4K. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Erez Shitrit	a44878d100	IB/ipoib: Use one linear skb in RX flow The current code in the RX flow uses two sg entries for each incoming packet, the first one was for the IB headers and the second for the rest of the data, that causes two dma map/unmap and two allocations, and few more actions that were done at the data path. Use only one linear skb on each incoming packet, for the data (IB headers and payload), that reduces the packet processing in the data-path (only one skb, no frags, the first frag was not used anyway, less memory allocations) and the dma handling (only one dma map/unmap over each incoming packet instead of two map/unmap per each incoming packet). After commit `73d3fe6d1c` ("gro: fix aggregation for skb using frag_list") from Eric Dumazet, we will get full aggregation for large packets. When running bandwidth tests before and after the (over the card's numa node), using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before and after ~16Gbs on my setup (Mellanox's ConnectX3). Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	1c0453d64a	IB/ipoib: drop mcast_mutex usage We needed the mcast_mutex when we had to prevent the join completion callback from having the value it stored in mcast->mc overwritten by a delayed return from ib_sa_join_multicast. By storing the return of ib_sa_join_multicast in an intermediate variable, we prevent a delayed return from ib_sa_join_multicast overwriting the valid contents of mcast->mc, and we no longer need a mutex to force the join callback to run after the return of ib_sa_join_multicast. This allows us to do away with the mutex entirely and protect our critical sections with a just a spinlock instead. This is highly desirable as there were some places where we couldn't use a mutex because the code was not allowed to sleep, and so we were currently using a mix of mutex and spinlock to protect what we needed to protect. Now we only have a spin lock and the locking complexity is greatly reduced. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	d2fe937ce6	IB/ipoib: deserialize multicast joins Allow the ipoib layer to attempt to join all outstanding multicast groups at once. The ib_sa layer will serialize multiple attempts to join the same group, but will process attempts to join different groups in parallel. Take advantage of that. In order to make this happen, change the mcast_join_thread to loop through all needed joins, sending a join request for each one that we still need to join. There are a few special cases we handle though: 1) Don't attempt to join anything but the broadcast group until the join of the broadcast group has succeeded. 2) No longer restart the join task at the end of completion handling. If we completed successfully, we are done. The join task now needs kicked either by mcast_send or mcast_restart_task or mcast_start_thread, but should not need started anytime else except when scheduling a backoff attempt to rejoin. 3) No longer use separate join/completion routines for regular and sendonly joins, pass them all through the same routine and just do the right thing based on the SENDONLY join flag. 4) Only try to join a SENDONLY join twice, then drop the packets and quit trying. We leave the mcast group in the list so that if we get a new packet, all that we have to do is queue up the packet and restart the join task and it will automatically try to join twice and then either send or flush the queue again. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	69911416d8	IB/ipoib: fix MCAST_FLAG_BUSY usage Commit `a9c8ba5884` ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. And when we did complete it, sometimes we continued to touch the mcast entry after the completion, opening us up to possible use after free issues. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that we have a running task. We know when we need to reschedule our join task thread and don't need a flag to tell us. 9) Add a helper for rescheduling the join task thread A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	efc82eeeae	IB/ipoib: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	0b39578bcd	IB/ipoib: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes everything from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	894021a752	IB/ipoib: Make the carrier_on_task race aware We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_OPER_UP has been cleared, in which case we don't need to do anything anyway and we just return. While investigating which flag should be used, FLAG_ADMIN_UP or FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more appropriate flag to use. However, there was a mix of these two flags in use in the existing code. So while we check for that flag here as part of this race fix, also cleanup the two places that had used the less appropriate flag for their tests. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	c84ca6d2b1	IB/ipoib: Consolidate rtnl_lock tasks in workqueue The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and needs to keep it held. It also needs to call flush_workqueue() to flush out any outstanding work. In the past, we've had to try and make sure that we didn't flush out any outstanding join completions because they also wanted to grab rtnl_lock() and that would deadlock. It turns out that the only thing in the join completion handler that needs this lock can be safely moved to our carrier_on_task, thereby reducing the potential for the join completion code and the flush code to deadlock against each other. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	be7aa663fc	IB/ipoib: change init sequence ordering In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	e135106fac	IB/ipoib: factor out ah flushing Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at appropriate times to flush out all remaining ah entries before we shut the device down. Because neighbors and mcast entries can each have a reference on any given ah, we must make sure to free all of those first before our ah will actually have a 0 refcount and be able to be reaped. This factoring is needed in preparation for having per-device work queues. The original per-device workqueue code resulted in the following error message: <ibdev>: ib_dealloc_pd failed That error was tracked down to this issue. With the changes to which workqueues were flushed when, there were no flushes of the per device workqueue after the last ah's were freed, resulting in an attempt to dealloc the pd with outstanding resources still allocated. This code puts the explicit flushes in the needed places to avoid that problem. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Yann Droneaud	66578b0b2f	IB/core: don't disallow registering region starting at 0x0 In a call to ib_umem_get(), if address is 0x0 and size is already page aligned, check added in commit `8494057ab5` ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic") will refuse to register a memory region that could otherwise be valid (provided vm.mmap_min_addr sysctl and mmap_low_allowed SELinux knobs allow userspace to map something at address 0x0). This patch allows back such registration: ib_umem_get() should probably don't care of the base address provided it can be pinned with get_user_pages(). There's two possible overflows, in (addr + size) and in PAGE_ALIGN(addr + size), this patch keep ensuring none of them happen while allowing to pin memory at address 0x0. Anyway, the case of size equal 0 is no more (partially) handled as 0-length memory region are disallowed by an earlier check. Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> # `8494057ab5` ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic") Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:05:02 -04:00
Yann Droneaud	8abaae62f3	IB/core: disallow registering 0-sized memory region If ib_umem_get() is called with a size equal to 0 and an non-page aligned address, one page will be pinned and a 0-sized umem will be returned to the caller. This should not be allowed: it's not expected for a memory region to have a size equal to 0. This patch adds a check to explicitly refuse to register a 0-sized region. Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Jack Morgenstein <jackm@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:05:02 -04:00
Yishai Hadas	56c1d2335b	IB/mlx4: Change alias guids default to be host assigned Change the default mode to be HOST assigned instead of SM assigned. This is the expected operational mode, because it doesn't depend on SM availability. As PF generates random GUIDs as the initial admin values, this gives out of the box experience. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 15:51:50 -04:00
Yishai Hadas	ee59fa0d7e	IB/mlx4: Request alias GUID on demand Request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shutdown (but the GID was not administratively released). Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 15:51:50 -04:00
Yishai Hadas	f547960128	IB/mlx4: Change init flow to request alias GUIDs for active VFs Change the init flow to ask GUIDs only for active VFs. This is done for both SM & HOST modes so that there is no need any more to maintain the ownership record type. In case SM mode is used, the initial value will be 0, ask the SM to assign, for the HOST mode the initial value will be the HOST generated GUID. This will enable out of the box experience for both probed and attached VFs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 15:51:50 -04:00
Yishai Hadas	2350f24774	IB/mlx4: Manage admin alias GUID upon admin request Set the admin alias GUID per the administrator's request via the sysfs mechanism into the core layer. The "get" request returns the current value. However, if the administrator requests the SM to assign a new value by requesting 0, the SM assigned GUID is returned. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 15:51:50 -04:00
Yishai Hadas	99ee4df6aa	IB/mlx4: Alias GUID adding persistency support If the SM rejects an alias GUID request the PF driver keeps trying to acquire the specified GUID indefinitely, utilizing an exponential backoff scheme. Retrying is managed per GUID entry. Each entry that wasn't applied holds its next retry information. Retry requests to the SM consist of records of 8 consecutive GUIDS. Each record that contains GUIDs requiring retries holds its next time-to-run based on the retry information of all its GUID entries. The record having the lowest retry time will run first when that retry time arrives. Since the method (SET or DELETE) as sent to the SM applies to all the GUIDs in the record, we must handle SET requests and DELETE requests in separate SM messages (one for SETs and the other for DELETEs). To avoid race conditions where a GUID entry request (set or delete) was modified after the SM request was sent, we save the method and the requested indices as part of the callback's context -- thus, only the requested indexes are evaluated when the response is received. When an GUID entry is approved we turn off its retry-required bit, this prevents redundant SM retries from occurring on that record. The port down event should be sent only when previously it was up. Likewise, the port up event should be sent only if previously the port was down. Synchronization was added around the flows that change entries and record state to prevent race conditions. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 15:51:49 -04:00

... 2 3 4 5 6 ...

4731 Commits