Commit Graph

190 Commits

Author SHA1 Message Date
James Smart
4c2805aab5 lpfc: nvmet: Add support for NVME LS request hosthandle
As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced.  The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association.  The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.

This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req().  The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:34 -06:00
James Smart
3a8070c567 lpfc: Refactor NVME LS receive handling
In preparation for supporting both intiator mode and target mode
receiving NVME LS's, commonize the existing NVME LS request receive
handling found in the base driver and in the nvmet side.

Using the original lpfc_nvmet_unsol_ls_event() and
lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the
reception of an NVME LS request. The common routine will validate the LS
request, that it was received from a logged-in node, and allocate a
lpfc_async_xchg_ctx that is used to manage the LS request. The role of
the port is then inspected to determine which handler is to receive the
LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler
is created in nvme and is stubbed out.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:34 -06:00
James Smart
7cacae2ad0 lpfc: Refactor nvmet_rcv_ctx to create lpfc_async_xchg_ctx
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1),
both the nvme (host) and nvmet (controller/target) sides will need to be
able to receive LS requests.  Currently, this support is in the nvmet side
only. To prepare for both sides supporting LS receive, rename
lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:34 -06:00
James Smart
2fcbc569b9 scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSI
Currently driver ktime stats, measuring code paths, is NVME-specific.

Convert the stats routines such that the code paths are generic, providing
status for NVME and SCSI. Added ktime stat calls in SCSI queuecommand and
cmpl routines.

Link: https://lore.kernel.org/r/20200322181304.37655-11-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-29 18:10:58 -04:00
James Smart
c90b448023 scsi: lpfc: Fix scsi host template for SLI3 vports
SCSI layer sends driver IOs with more s/g segments than driver can handle.
This results in "Too many sg segments from dma_map_sg. Config 64, seg_cnt
219" error messages from the lpfc_scsi_prep_dma_buf_s3() routine.

The was due to use the driver using individual templates for pport and
vport, host reset enabled or not, nvme vs scsi, etc. In the end, there was
a combination for a vport that didn't match the pport.

Rather than enumerating more templates and more discretionary assignments,
revert to a base template that is copied to a template specific to the
pport/vport. Then, based on role, attributes and sli type, modify the
fields that are different for that port.  Added a log message to
lpfc_create_port to validate values.

Link: https://lore.kernel.org/r/20200322181304.37655-5-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-26 23:15:08 -04:00
James Smart
df3fe76658 scsi: lpfc: add RDF registration and Link Integrity FPIN logging
This patch modifies lpfc to register for Link Integrity events via the use
of an RDF ELS and to perform Link Integrity FPIN logging.

Specifically, the driver was modified to:

 - Format and issue the RDF ELS immediately following SCR registration.
   This registers the ability of the driver to receive FPIN ELS.

 - Adds decoding of the FPIN els into the received descriptors, with
   logging of the Link Integrity event information. After decoding, the ELS
   is delivered to the scsi fc transport to be delivered to any user-space
   applications.

 - To aid in logging, simple helpers were added to create enum to name
   string lookup functions that utilize the initialization helpers from the
   fc_els.h header.

 - Note: base header definitions for the ELS's don't populate the
   descriptor payloads. As such, lpfc creates it's own version of the
   structures, using the base definitions (mostly headers) and additionally
   declaring the descriptors that will complete the population of the ELS.

Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-18 00:08:38 -05:00
James Smart
e3ba04c9ba scsi: lpfc: Fix Fabric hostname registration if system hostname changes
There are reports of multiple ports on the same system displaying different
hostnames in fabric FDMI displays.

Currently, the driver registers the hostname at initialization and obtains
the hostname via init_utsname()->nodename queried at the time the FC link
comes up. Unfortunately, if the machine hostname is updated after
initialization, such as via DHCP or admin command, the value registered
initially will be incorrect.

Fix by having the driver save the hostname that was registered with FDMI.
The driver then runs a heartbeat action that will check the hostname.  If
the name changes, reregister the FMDI data.

The hostname is used in RSNN_NN, FDMI RPA and FDMI RHBA.

Link: https://lore.kernel.org/r/20191218235808.31922-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-21 13:42:42 -05:00
James Smart
d480e57809 scsi: lpfc: fix inlining of lpfc_sli4_cleanup_poll_list()
Compilation can fail due to having an inline function reference where the
function body is not present.

Fix by removing the inline tag.

Fixes: 93a4d6f401 ("scsi: lpfc: Add registration for CPU Offline/Online events")

Link: https://lore.kernel.org/r/20191111230401.12958-4-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-12 22:21:33 -05:00
James Smart
93a4d6f401 scsi: lpfc: Add registration for CPU Offline/Online events
The recent affinitization didn't address cpu offlining/onlining.  If an
interrupt vector is shared and the low order cpu owning the vector is
offlined, as interrupts are managed, the vector is taken offline. This
causes the other CPUs sharing the vector will hang as they can't get io
completions.

Correct by registering callbacks with the system for Offline/Online
events. When a cpu is taken offline, its eq, which is tied to an interrupt
vector is found. If the cpu is the "owner" of the vector and if the
eq/vector is shared by other CPUs, the eq is placed into a polled mode.
Additionally, code paths that perform io submission on the "sharing CPUs"
will check the eq state and poll for completion after submission of new io
to a wq that uses the eq.

Similarly, when a cpu comes back online and owns an offlined vector, the eq
is taken out of polled mode and rearmed to start driving interrupts for eq.

Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06 00:04:04 -05:00
James Smart
51f8e43ed3 scsi: lpfc: Fix NVMe ABTS in response to receiving an ABTS
When the port, running as a nvme target, receives an ABTS, it submits
commands to the adapter to Abort i/o outstanding in the adapter. The Abort
command formatting routine left a command field set to zero, which
instructs the adapter to generate an ABTS on the wire as part of cleaning
up the I/O. This is common operation for an initiator, but not for a
target.

Fix the driver to check whether an ABTS had been received for the I/O, and
if so, change the Abort command formatting so that the ABTS generation is
disabled (IA=1). No need to ABTS it when the other side already has.

Also refactored the code such that there is a single routine being used for
nvme or nvmet ABORT requests, and IA is an argument.

Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30 22:07:10 -04:00
James Smart
9db6c14c36 scsi: lpfc: Remove bg debugfs buffers
Capturing and downloading dif command data and dif data was done a dozen
years ago and no longer being used. Also creates a potential security hole.

Remove the debugfs buffer for dif debugging.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-29 18:08:58 -04:00
James Smart
c00f62e6c5 scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair
Currently, each hardware queue, typically allocated per-cpu, consists of a
WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
WQ/CQ pairs will exist for the hardware queue. Separate queues are
unnecessary. The current implementation wastes memory backing the 2nd set
of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
queues can be supported which means there may not always be enough to have
a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
own WQ/CQ.

Rework the implementation to use a single WQ/CQ pair by both protocols.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19 22:41:12 -04:00
James Smart
84f2ddf8cf scsi: lpfc: Fix hang when downloading fw on port enabled for nvme
As part of firmware download, the adapter is reset. On the adapter the
reset causes the function to stop and all outstanding io is terminated
(without responses). The reset path then starts teardown of the adapter,
starting with deregistration of the remote ports with the nvme-fc
transport. The local port is then deregistered and the driver waits for
local port deregistration. This never finishes.

The remote port deregistrations terminated the nvme controllers, causing
them to send aborts for all the outstanding io. The aborts were serviced in
the driver, but stalled due to its state. The nvme layer then stops to
reclaim it's outstanding io before continuing.  The io must be returned
before the reset on the controller is deemed complete and the controller
delete performed.  The remote port deregistration won't complete until all
the controllers are terminated. And the local port deregistration won't
complete until all controllers and remote ports are terminated. Thus things
hang.

The issue is the reset which stopped the adapter also stopped all the
responses that would drive i/o completions, and the aborts were also
stopped that stopped i/o completions. The driver, when resetting the
adapter like this, needs to be generating the completions as part of the
adapter reset so that I/O complete (in error), and any aborts are not
queued.

Fix by adding flush routines whenever the adapter port has been reset or
discovered in error. The flush routines will generate the completions for
the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
and flushed as well.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19 22:41:10 -04:00
Linus Torvalds
ba6d10ab80 SCSI misc on 20190709
This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs,
 mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the
 removal of the osst driver (I heard from Willem privately that he
 would like the driver removed because all his test hardware has
 failed).  Plus number of minor changes, spelling fixes and other
 trivia.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXSTl4yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdcxAQDCJVbd
 fPUX76/V1ldupunF97+3DTharxxbst+VnkOnCwD8D4c0KFFFOI9+F36cnMGCPegE
 fjy17dQLvsJ4GsidHy8=
 =aS5B
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs,
  mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the
  removal of the osst driver (I heard from Willem privately that he
  would like the driver removed because all his test hardware has
  failed). Plus number of minor changes, spelling fixes and other
  trivia.

  The big merge conflict this time around is the SPDX licence tags.
  Following discussion on linux-next, we believe our version to be more
  accurate than the one in the tree, so the resolution is to take our
  version for all the SPDX conflicts"

Note on the SPDX license tag conversion conflicts: the SCSI tree had
done its own SPDX conversion, which in some cases conflicted with the
treewide ones done by Thomas & co.

In almost all cases, the conflicts were purely syntactic: the SCSI tree
used the old-style SPDX tags ("GPL-2.0" and "GPL-2.0+") while the
treewide conversion had used the new-style ones ("GPL-2.0-only" and
"GPL-2.0-or-later").

In these cases I picked the new-style one.

In a few cases, the SPDX conversion was actually different, though.  As
explained by James above, and in more detail in a pre-pull-request
thread:

 "The other problem is actually substantive: In the libsas code Luben
  Tuikov originally specified gpl 2.0 only by dint of stating:

  * This file is licensed under GPLv2.

  In all the libsas files, but then muddied the water by quoting GPLv2
  verbatim (which includes the or later than language). So for these
  files Christoph did the conversion to v2 only SPDX tags and Thomas
  converted to v2 or later tags"

So in those cases, where the spdx tag substantially mattered, I took the
SCSI tree conversion of it, but then also took the opportunity to turn
the old-style "GPL-2.0" into a new-style "GPL-2.0-only" tag.

Similarly, when there were whitespace differences or other differences
to the comments around the copyright notices, I took the version from
the SCSI tree as being the more specific conversion.

Finally, in the spdx conversions that had no conflicts (because the
treewide ones hadn't been done for those files), I just took the SCSI
tree version as-is, even if it was old-style.  The old-style conversions
are perfectly valid, even if the "-only" and "-or-later" versions are
perhaps more descriptive.

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (185 commits)
  scsi: qla2xxx: move IO flush to the front of NVME rport unregistration
  scsi: qla2xxx: Fix NVME cmd and LS cmd timeout race condition
  scsi: qla2xxx: on session delete, return nvme cmd
  scsi: qla2xxx: Fix kernel crash after disconnecting NVMe devices
  scsi: megaraid_sas: Update driver version to 07.710.06.00-rc1
  scsi: megaraid_sas: Introduce various Aero performance modes
  scsi: megaraid_sas: Use high IOPS queues based on IO workload
  scsi: megaraid_sas: Set affinity for high IOPS reply queues
  scsi: megaraid_sas: Enable coalescing for high IOPS queues
  scsi: megaraid_sas: Add support for High IOPS queues
  scsi: megaraid_sas: Add support for MPI toolbox commands
  scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver
  scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura
  scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD
  scsi: megaraid_sas: Handle sequence JBOD map failure at driver level
  scsi: megaraid_sas: Don't send FPIO to RL Bypass queue
  scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault
  scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout
  scsi: megaraid_sas: Call disable_irq from process IRQ poll
  scsi: megaraid_sas: Remove few debug counters from IO path
  ...
2019-07-11 15:14:01 -07:00
James Smart
6f2589f478 lpfc: add support for translating an RSCN rcv into a discovery rescan
This patch updates RSCN receive processing to check for the remote
port being an NVME port, and if so, invoke the nvme_fc callback to
rescan the remote port.  The rescan will generate a discovery udev
event.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-21 11:08:38 +02:00
James Smart
f60cb93bbf lpfc: add support to generate RSCN events for nport
This patch adds general RSCN support:

 - The ability to transmit an RSCN to the port on the other end of
   the link (regular port if pt2pt, or fabric controller if fabric).
 - And general recognition of an RSCN ELS when an ELS is received.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-21 11:08:37 +02:00
James Smart
d74a89aab9 scsi: lpfc: Separate CQ processing for nvmet_fc upcalls
Currently the driver is notified of new command frame receipt by CQEs. As
part of the CQE processing, the driver upcalls the nvmet_fc transport to
deliver the command. nvmet_fc, as part of receiving the command builds out
a context for it, where one of the first steps is to allocate memory for
the io.

When running with tests that do large ios (1MB), it was found on some
systems, the total number of outstanding I/O's, at 1MB per, completely
consumed the system's memory. Thus additional ios were getting blocked in
the memory allocator.  Given that this blocked the lpfc thread processing
CQEs, there were lots of other commands that were received and which are
then held up, and given CQEs are serially processed, the aggregate delays
for an IO waiting behind the others became cummulative - enough so that the
initiator hit timeouts for the ios.

The basic fix is to avoid the direct upcall and instead schedule a work
item for each io as it is received. This allows the cq processing to
complete very quickly, and each io can then run or block on it's own.
However, this general solution hurts latency when there are few ios.  As
such, implemented the fix such that the driver watches how many CQEs it has
processed sequentially in one run. As long as the count is below a
threshold, the direct nvmet_fc upcall will be made. Only when the count is
exceeded will it revert to work scheduling.

Given that debug of this showed a surprisingly long delay in cq processing,
the io timer stats were updated to better reflect the processing of the
different points.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:21 -04:00
James Smart
0d041215f0 scsi: lpfc: Update 12.2.0.0 file copyrights to 2019
For files modified as part of 12.2.0.0 patches, update copyright to 2019

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:29:50 -05:00
James Smart
6a828b0f61 scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues
So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.

This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:29:49 -05:00
James Smart
c490850a09 scsi: lpfc: Adapt partitioned XRI lists to efficient sharing
The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.

Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool.  The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.

On io completion, a cpu will push the XRI back on to its private pool.  A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.

On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.

Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:29:09 -05:00
James Smart
1fbf974250 scsi: lpfc: Convert ring number to hardware queue for nvme wqe posting.
SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.

Replace ring number with the hardware queue that should be used.

Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:29:09 -05:00
James Smart
5e5b511d8b scsi: lpfc: Partition XRI buffer list across Hardware Queues
Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues.  A single list isn't great for
performance when shared across the per-cpu hardware queues.

Create a separate XRI IO buffer get/put list for each Hardware Queue.  As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.

Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.

Add a debugfs interface to display hardware queue statistics

Added new empty_io_bufs counter to track if a cpu runs out of XRIs.

Replace common_ variables/names with io_ to make meanings clearer.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:24:22 -05:00
James Smart
7370d10ac9 scsi: lpfc: Remove extra vector and SLI4 queue for Expresslane
There is a extra queue and msix vector for expresslane. Now that the driver
will be doing queues per cpu, this oddball queue is no longer needed.
Expresslane will utilize the normal per-cpu queues.

Updated debugfs sli4 queue output to go along with the change

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:22:42 -05:00
James Smart
0794d601d1 scsi: lpfc: Implement common IO buffers between NVME and SCSI
Currently, both NVME and SCSI get their IO buffers from separate
pools. XRI's are associated 1:1 with IO buffers, so XRI's are also split
between protocols.

Eliminate the independent pools and use a single pool. Each buffer
structure now has a common section and a protocol section. Per protocol
routines for SGL initialization are removed and replaced by common
routines. Initialization of the buffers is only done on the common area.
All other fields, which are protocol specific, are initialized when the
buffer is allocated for use in the per-protocol allocation routine.

In the past, the SCSI side allocated IO buffers as part of slave_alloc
calls until the maximum XRIs for SCSI was reached. As all XRIs are now
common and may be used for either protocol, allocation for everything is
done as part of adapter initialization and the scsi side has no action in
slave alloc.

As XRI's are no longer split, the lpfc_xri_split module parameter is
removed.

Adapters based on SLI3 will continue to use the older scsi_buf_list_get/put
routines.  All SLI4 adapters utilize the new IO buffer scheme

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-05 22:22:42 -05:00
James Smart
5021267af1 scsi: lpfc: Adding ability to reset chip via pci bus reset
This patch adds a "pci_bus_reset" option to the board_mode sysfs attribute.
This option uses the pci_reset_bus() api to reset the PCIe link the adapter
is on, which will reset the chip/adapter.  Prior to issuing this option,
all functions on the same chip must be placed in the offline state by the
admin. After the reset, all of the instances may be brought online again.

The primary purpose of this functionality is to support cases where
firmware update required a chip reset but the admin did not want to reboot
the machine in order to instantiate the firmware update.

Sanity checks take place prior to the reset to ensure the adapter is the
sole entity on the PCIe bus and that all functions are in the offline
state.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-19 22:13:08 -05:00
James Smart
1165a5c220 scsi: lpfc: Fix driver release of fw-logging buffers
On driver termination, after the driver stops fw logging by writing a
register on the chip, the driver immediately unmaps and frees the logging
buffer, without confirming in any way that the chip has received the write
and terminated the logging. As termination on the chip is not immediate,
the chip may issue a dma request to the now unmapped dma buffer, resulting
in a iommu fault.

Change the driver to receive a confirmation that logging ahs been
terminated. As the driver always issues an SLI reset with the device as
part of shutdown, and as part of that is receiving confirmation that the
reset is complete - the driver was modified to perform the write to disable
fw logging prior to the SLI reset and only free the fw log buffer after the
SLI reset is complete. That guarantees use of the fw log buffer is fully
terminated when it is unmapped.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:35:33 -05:00
James Smart
dea16bdae2 scsi: lpfc: Fix discovery failures during port failovers with lots of vports
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands.  The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port.  During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.

Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.

Filter out multiple UNREG_RPIs being queued up for the same remote port.

Beef up log messages in this area.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:35:32 -05:00
James Smart
7ea92eb458 scsi: lpfc: Implement GID_PT on Nameserver query to support faster failover
The switches seem to respond faster to GID_PT vs GID_FT NameServer
queries.  Add support for GID_PT to be used over GID_FT to enable
faster storage failover detection. Includes addition of new module
parameter to select between GID_PT and GID_FT (GID_FT is default).

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-06 20:42:51 -05:00
James Smart
5cca2ab1b3 scsi: lpfc: Reset link or adapter instead of doing infinite nameserver PLOGI retry
Currently, PLOGI failures are infinitely delayed/retried.  There have
been some fabric situations where the PLOGI's were to the nameserver
and it stopped responding. The retries would never clear up.  A better
resolution in this situation is to retry a couple of times, then drop
the link and reinit. This brings back connectivity to the nameserver.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-06 20:42:50 -05:00
James Smart
d2cc9bcd7f scsi: lpfc: add support to retrieve firmware logs
This patch adds the ability to read firmware logs from the adapter. The driver
registers a buffer with the adapter that is then written to by the adapter.
The adapter posts CQEs to indicate content updates in the buffer. While the
adapter is writing to the buffer in a circular fashion, an application will
poll the driver to read the next amount of log data from the buffer.

Driver log buffer size is configurable via the ras_fwlog_buffsize sysfs
attribute. Verbosity to be used by firmware when logging to host memory is
controlled through the ras_fwlog_level attribute.  The ras_fwlog_func
attribute enables or disables loggy by firmware.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-09-11 20:37:33 -04:00
Johannes Thumshirn
c6668cae16 scsi: lpfc: remove ScsiResult macro
Remove the ScsiResult macro and open code it on all call sites.

This will make subsequent refactoring in this area easier.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-10 22:42:47 -04:00
James Smart
4ae2ebde31 scsi: lpfc: Revise copyright for new company language
Change references from "Broadcom Limited" to "Broadcom Inc." in the
copyright message. Update copyright duration if not yet updated for 2018.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-10 22:15:09 -04:00
James Smart
bd3061bab3 scsi: lpfc: Streamline NVME Targe6t WQE setup
To reduce latency when initializing WQE content, created templates for the
most common wqes. This reduces the number of operations taken to set the
content. It's not a lot of speed up, but every bit helps.

This patch updates the NVME target path.

[mkp: fixed typo]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-03-12 21:55:23 -04:00
James Smart
5fd1108517 scsi: lpfc: Streamline NVME Initiator WQE setup
To reduce latency when initializing WQE content, create templates for the
most common wqes. This reduces the number of operations taken to set the
content. It's not a lot of speed up, but every bit helps.

This patch updates the NVME initiator path.

[mkp: fixed typo]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-03-12 21:55:23 -04:00
James Smart
128bddacc4 scsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright
Updated Copyright in files updated 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-12 11:43:24 -05:00
James Smart
6e8e1c14c6 scsi: lpfc: Add WQ Full Logic for NVME Target
I/O conditions on the nvme target may have the driver submitting to a
full hardware wq. The hardware wq is a shared resource among all nvme
controllers. When the driver hit a full wq, it failed the io posting
back to the nvme-fc transport, which then escalated it into errors.

Correct by maintaining a sideband queue within the driver that is added
to when the WQ full condition is hit, and drained from as soon as new WQ
space opens up.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-12 11:43:23 -05:00
James Smart
c3725bdcdf scsi: lpfc: Fix driver handling of nvme resources during unload
During driver unload, the driver may crash due to NULL pointers.  The
NULL pointers were due to the driver not protecting itself sufficiently
during some of the teardown paths.  Additionally, the driver was not
waiting for and cleanup up nvme io resources. As such, the driver wasn't
making the callbacks to the transport, stalling the transports
association teardown.

This patch waits for io clean up before tearding down and adds checks
for possible NULL pointers.

Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-04 20:32:55 -05:00
Kees Cook
f22eb4d31c scsi: lpfc: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-01 11:27:07 -07:00
Dick Kennedy
66d7ce93a0 scsi: lpfc: Fix MRQ > 1 context list handling
Various oops including cpu LOCKUPs were seen.

For asynchronously received ius where the driver must assign exchange
resources, the resources were on a single get (free) list and put list
(finished, waiting to be put on get list). As all cpus are sharing the
lists, an interrupt for a receive frame may have to wait for all the
other cpus to place their done work onto the put list before it can
acquire the lock to pull from the list.

Fix by breaking the resource lists into per-cpu lists or at least more
than 1 list with cpu's sharing the lists). A cpu would allocate from the
free list for its own cpu, and put its done work on the its own put list
- avoiding the contention. As cpu load may vary, when empty, a cpu may
grab from another cpu, thereby changing resource distribution.  But
searching for a resource only occurs on 1 or a few cpus until a single
resource can be allocated. if the condition reoccurs, it starts looking
at a different cpu.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:29:41 -04:00
Guilherme G. Piccoli
7c9fdfb700 scsi: lpfc: Avoid NULL pointer dereference in lpfc_els_abort()
We might have a NULL pring in lpfc_els_abort(), for example on error
recovery path, since queues are destroyed during error recovery
mechanism.

In this case, we should just drop the abort since the queues will be
recreated anyway. This patch just verifies for NULL pointer and stop the
abortion of the queue in case of a NULL pring.

Also, this patch converts return type of lpfc_els_abort() from int to
void, since it's not checked anywhere.

Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Tested-by: Raphael Silva <raphasil@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: James Smart  <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-31 22:44:13 -04:00
James Smart
a8cf5dfeb4 scsi: lpfc: Added recovery logic for running out of NVMET IO context resources
Previous logic would just drop the IO.

Added logic to queue the IO to wait for an IO context resource from an
IO thats already in progress.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16 21:22:22 -04:00
James Smart
6c621a2229 scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context
Currently IO resources are mapped 1 to 1 with RQ buffers posted

Added logic to separate RQE buffers from IO op resources
(sgl/iocbq/context). During initialization, the driver will determine
how many SGLs it will allocate for NVMET (based on what the firmware
reports) and associate a NVMET IOCBq and NVMET context structure with
each one.

Now that hdr/data buffers are immediately reposted back to the RQ, 512
RQEs for each MRQ is sufficient. Also, since NVMET data buffers are now
128 bytes, lpfc_nvmet_mrq_post is not necessary anymore as we will
always post the max (512) buffers per NVMET MRQ.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16 21:21:47 -04:00
James Smart
3c603be979 scsi: lpfc: Separate NVMET data buffer pool fir ELS/CT.
Using 2048 byte buffer and onle 128 bytes is needed.

Create nee LFPC_NVMET_DATA_BUF_SIZE define to use for NVMET RQ/MRQs.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16 21:21:17 -04:00
James Smart
4492b739c9 scsi: lpfc: Fix panic on BFS configuration
To select the appropriate shost template, the driver is issuing a
mailbox command to retrieve the wwn. Turns out the sending of the
command precedes the reset of the function.  On SLI-4 adapters, this is
inconsequential as the mailbox command location is specified by dma via
the BMBX register. However, on SLI-3 adapters, the location of the
mailbox command submission area changes. When the function is first
powered on or reset, the cmd is submitted via PCI bar memory. Later the
driver changes the function config to use host memory and DMA. The
request to start a mailbox command is the same, a simple doorbell write,
regardless of submission area.  So.. if there has not been a boot driver
run against the adapter, the mailbox command works as defaults are
ok. But, if the boot driver has configured the card and, and if no
platform pci function/slot reset occurs as the os starts, the mailbox
command will fail. The SLI-3 device will use the stale boot driver dma
location. This can cause PCI eeh errors.

Fix is to reset the sli-3 function before sending the mailbox command,
thus synchronizing the function/driver on mailbox location.

Note: The fix uses routines that are typically invoked later in the call
flow to reset the sli-3 device. The issue in using those routines is
that the normal (non-fix) flow does additional initialization, namely
the allocation of the pport structure. So, rather than significantly
reworking the initialization flow so that the pport is alloc'd first,
pointer checks are added to work around it. Checks are limited to the
routines invoked by a sli-3 adapter (s3 routines) as this fix/early call
is only invoked on a sli3 adapter. Nothing changes post the
fix. Subsequent initialization, and another adapter reset, still occur -
both on sli-3 and sli-4 adapters.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Fixes: 96418b5e2c ("scsi: lpfc: Fix eh_deadline setting for sli3 adapters.")
Cc: stable@vger.kernel.org # v4.11+
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-08 21:24:34 -04:00
James Smart
86c6737963 Update ABORT processing for NVMET.
The driver with nvme had this routine stubbed.

Right now XRI_ABORTED_CQE is not handled and the FC NVMET
Transport has a new API for the driver.

Missing code path, new NVME abort API
Update ABORT processing for NVMET

There are 3 new FC NVMET Transport API/ template routines for NVMET:

lpfc_nvmet_xmt_fcp_release
This NVMET template callback routine called to release context
associated with an IO This routine is ALWAYS called last, even
if the IO was aborted or completed in error.

lpfc_nvmet_xmt_fcp_abort
This NVMET template callback routine called to abort an exchange that
has an IO in progress

nvmet_fc_rcv_fcp_req
When the lpfc driver receives an ABTS, this NVME FC transport layer
callback routine is called. For this case there are 2 paths thru the
driver: the driver either has an outstanding exchange / context for the
XRI to be aborted or not.  If not, a BA_RJT is issued otherwise a BA_ACC

NVMET Driver abort paths:

There are 2 paths for aborting an IO. The first one is we receive an IO and
decide not to process it because of lack of resources. An unsolicated ABTS
is immediately sent back to the initiator as a response.
lpfc_nvmet_unsol_fcp_buffer
            lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)

The second one is we sent the IO up to the NVMET transport layer to
process, and for some reason the NVME Transport layer decided to abort the
IO before it completes all its phases. For this case there are 2 paths
thru the driver:
the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no
outstanding WQEs are present for the exchange / context.
lpfc_nvmet_xmt_fcp_abort
    if (LPFC_NVMET_IO_INP)
        lpfc_nvmet_sol_fcp_issue_abort  (ABORT_WQE)
                lpfc_nvmet_sol_fcp_abort_cmp
    else
        lpfc_nvmet_unsol_fcp_issue_abort
                lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)
                        lpfc_nvmet_unsol_fcp_abort_cmp

Context flags:
LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange.
LPFC_NVMET_XBUSY  - this flag indicates the IO completed but the firmware
is still busy with the corresponding exchange. The exchange should not be
reused until after a XRI_ABORTED_CQE is received for that exchange.
LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the
exchange.
LPFC_NVMET_CTX_RLS  - this flag signifies a context free was requested,
but we are deferring it due to an XBUSY or ABORT in progress.

A ctxlock is added to the context structure that is used whenever these
flags are set/read  within the context of an IO.
The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when
the transport has resolved all IO associated with the buffer. The flag is
cleared when the CTX is associated with a new IO.

An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP
condition active simultaneously. Both conditions must complete before the
exchange is freed.
When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked:
If there is an outstanding IO, the driver will issue an ABORT_WQE. This
should result in 3 completions for the exchange:
1) IO cmpl with XB bit set
2) Abort WQE cmpl
3) XRI_ABORTED_CQE cmpl
For this scenerio, after completion #1, the NVMET Transport IO rsp
callback is called.  After completion #2, no action is taken with respect
to the exchange / context.  After completion #3, the exchange context is
free for re-use on another IO.

If there is no outstanding activity on the exchange, the driver will send a
ABTS to the Initiator. Upon completion of this WQE, the exchange / context
is freed for re-use on another IO.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24 09:25:49 +02:00
James Smart
9d3d340d19 Fix crash after issuing lip reset
When RPI is not available, driver sends WQE with invalid RPI value and
rejected by HBA.
lpfc 0000:82:00.3: 1:3154 BLS ABORT RSP failed, data:  x3/xa0320008
and
lpfc :2753 PLOGI failure DID:FFFFFA Status:x3/xa0240008

In this case, driver accesses rpi_ids array out of bounds.

Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
lpfc_nodelist entry if RPI is not available.

When RPI is not available, we will get discovery timeouts and
command drops for some of the vports as seen below.

lpfc :0273 Unexpected discovery timeout, vport State x0
lpfc :0230 Unexpected timeout, hba link state x5
lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-04-24 09:25:49 +02:00
James Smart
96418b5e2c scsi: lpfc: Fix eh_deadline setting for sli3 adapters.
A previous change unilaterally removed the hba reset entry point
from the sli3 host template. This was done to allow tape devices
being used for back up from being removed. Why was this done ?
When there was non-responding device on the fabric, the error
escalation policy would escalate to the reset handler. When the
reset handler was called, it would reset the adapter, dropping
link, thus logging out and terminating all i/o's - on any target.
If there was a tape device on the same adapter that wasn't in
error, it would kill the tape i/o's, effectively killing the
tape device state.  With the reset point removed, the adapter
reset avoided the fabric logout, allowing the other devices to
continue to operate unaffected. A hack - yes. Hint: we really
need a transport I_T nexus reset callback added to the eh process
(in between the SCSI target reset and hba reset points), so a
fc logout could occur to the one bad target only and stop the error
escalation process.

This patch commonizes the approach so it can be used for sli3 and sli4
adapters, but mandates the admin, via module parameter, specifically
identify which adapters the resets are to be removed for. Additionally,
bus_reset, which sends Target Reset TMFs to all targets, is also removed
from the template as it too has the same effect as the adapter reset.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by:   Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-06 23:04:22 -05:00
James Smart
d080abe0a8 scsi: lpfc: Update copyrights
Update copyrights to 2017 for all files touched in this patch set

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 18:41:44 -05:00
James Smart
d613b6a7aa scsi: lpfc: NVME Target: bind to nvmet_fc api
NVME Target: Tie in to NVME Fabrics nvmet_fc LLDD target api

Adds the routines to:
- register and deregister the FC port as a nvmet-fc targetport
- binding of nvme queues to adapter WQs
- receipt and passing of NVME LS's to transport, sending transport response
- receipt of NVME FCP CMD IUs, processing FCP target io data transmission
  commands; transmission of FCP io response
- Abort operations for tgt io exchanges

[mkp: fixed space at end of file warning]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 18:41:43 -05:00
James Smart
2d7dbc4c27 scsi: lpfc: NVME Target: Receive buffer updates
NVME Target: Receive buffer updates

Allocates buffer pools and configures adapter interfaces to handle
receive buffer (asynchronous FCP CMD ius, first burst data)
from the adapter. Splits by protocol, etc.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 18:41:43 -05:00