During attachment, the driver writes the EQ doorbell to disable potential
interrupts from an EQ. The current EQ doorbell format used for clearing the
interrupt is incorrect and uses an if_type=2 format, making the operation act
on the wrong EQ.
Correct the code to use the proper if_type=6 EQ doorbell format.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When taking the board offline while performing i/o, unsafe locking errors
occurred and irq level isn't properly managed.
In lpfc_sli_hba_down, spin_lock_irqsave(&phba->hbalock, flags) does not
disable softirqs raised from timer expiry. It is possible that a softirq is
raised from the lpfc_els_retry_delay routine and recursively requests the same
phba->hbalock spinlock causing deadlock.
Address the deadlocks by creating a new port_list lock. The softirq behavior
can then be managed a level deeper into the calling sequences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On io completion, the driver is taking an adapter wide lock and nulling the
scsi command back pointer. The nulling of the back pointer is to signify the
io was completed and the scsi_done() routine was called. However, the routine
makes no check to see if the abort routine had done the same thing and
possibly nulled the pointer. Thus it may doubly-complete the io.
Make the following mods:
- Check to make sure forward progress (call scsi_done()) only happens if the
command pointer was non-null.
- As the taking of the lock, which is adapter wide, is very costly on a system
under load, null the pointer using an xchg operation rather than under lock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When nvme is enabled, change the default for two parameters:
sg_seg_cnt - raise the per-io sg list size so that 1MB ios are
supported (based on a 4k buffer per element).
iocb_cnt - raise the number of buffers used for things like
NVME LS request/responses to allow more concurrent requests
to for larger nvme configs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver allocates a sg list per io struture based on a fixed maximum
size. When it registers with the protocol transports and indicates the max sg
list size it supports, the driver manipulates the fixed value to report a
lesser amount so that it has reserved space for sg elements that are used for
DIF.
The driver initialization path sets the cfg_sg_seg_cnt field to the
manipulated value for scsi. NVME initialization ran afterward and capped it's
maximum by the manipulated value for SCSI. This erroneously made NVME report
the SCSI-reduce-for-DIF value that reduced the max io size for nvme and wasted
sg elements.
Rework the driver so that cfg_sg_seg_cnt becomes the overall maximum size and
allow the max size to be tunable. A separate (new) scsi sg count is then
setup with the scsi-modified reduced value. NVME then initializes based off
the overall maximum.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver only sends NVME PRLI to a device that also supports FCP. This resuls
in remote ports that don't have fc_remote_ports created for them. The driver
is clearing the nlp_fc4_type for a ndlp at the wrong time.
Fix by moving the nlp_fc4_type clearing to the discovery engine in the
DEVICE_RECOVERY state. Also ensure that rport registration is done for all
nlp_fc4_types.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes issue when switch command fails, current code increments
retry count twice. This results in a smaller number of retries.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current code relies on switch to provide a unique combination of WWPN +
NPORTID to tract an FC port. This patch tries to detect a case where switch
data base can get corrupted where multiple WWPNs can have the same Nport ID.
The 1st Nport ID on the list will be kept while the duplicate Nport ID will be
discarded.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When qla2xxx and Target Core gets out of sync during command cleanup, qla2xxx
will not free command until it is out of firmware's hand and Target Core has
called the release on the command.
This patch adds synchronization using cmd_lock and release flag. If the
release flag is set, then qla2xxx will free up the command using
qlt_free_cmd() otherwise transport_generic_free_cmd() will be responsible for
relase of the command.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If chip unable to fully initialize, use full shutdown sequence to clear out
any stale FW state.
Fixes: e315cd28b9 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On PLOGI complete + RSCN received, driver tries to handle RSCN but failed to
reset the session back to the beginning to restart the login process. Instead
the session was left in the Plogi complete without moving forward. This patch
will push the session state back to the delete state and restart the
connection.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Task abort can take 2 paths: 1) serial/synchronous abort where the calling
thread will put to sleep, wait for completion and free cmd resource. 2) async
abort where the cmd free will be free by the completion thread. For path 2,
driver is freeing the SRB too early.
Fixes: f6145e86d2 ("scsi: qla2xxx: Fix race between switch cmd completion and timeout")
Cc: stable@vger.kernel.org # 4.19
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add ability to allow each physical port to control operating mode. Current
code forces all ports to behave in one mode (i.e. initiator, target or
dual). This patch allows user to select the operating mode for each port.
- Driver must be loaded in dual mode to allow resource allocation
modprobe qla2xxx qlini_mode=dual
- In addition user can make adjustment to exchange resources using following
command
echo 1024 > /sys/class/scsi_host/host<x>/ql2xiniexchg
echo 1024 > /sys/class/scsi_host/host<x>/ql2xexchoffld
- trigger mode change and new setting of ql2xexchoffld|ql2xiniexchg
echo [<value>] > /sys/class/scsi_host/host<x>/qlini_mode
where, value can be one of following
- enabled
- disabled
- dual
- exclusive
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For Loop topology + Initiator, FW is in control of PLOGI/PRLI. When link is
reset, driver will try to cleanup the session by doing an Implicit Logout.
Instead, the code is doing an Explicit Logout. The explicit logout interferes
with FW state machine in trying to reconnect. The implicit logout was meant
for FW to flush commands. In loop, it is not needed because FW will auto
flush.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When FW rejects a command due to "entry_status" error (malform IOCB), the srb
resource needs to be returned back for cleanup. The filter to catch this is
in the wrong location.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clear port speed value on chip reset.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During adapter shutdown process check for register disconnect before
proceeding to call PCI functions.
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Abort IOCB request can take up to 40s or 2 ABTS timeout. We will wait for
ABTS response for 20s. On a timeout, second ABTS can go out with another 20s
timeout. On 2nd ABTS timeout FW will automatically do Logout.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch allows FC-NVMe under-run to be handled by transport
Signed-off-by: Darren Trapp <darren.trapp@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current abort code defaults to legacy single queue where hardware_lock is used
to protect command search. This patch moves this code behind the QPair where
the qp_lock_ptr will reference the appropriate lock for either legacy/single
queue or MQ.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Using GPNFT/GNNFT command will be able to cover switch database with less
number of scans. This patch removes Get NportID with provided WWPN/GIDPN
switch command. By making this change, in large fabric with lots of remote
port or NPIV ports with noisy SAN, the number of GIDPN commands issued by a
port when it detects large number of remote ports going away or coming back,
can overwhelmn the switch and it can becomde unresponsive. In a case where the
fabric has not change, GIDPN is not required.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Reduce sess_lock holding to prevent CPU Lock up. sess_lock was held across
fc_port registration and deletion. These calls can be blocked by upper
layer. Sess_lock is also being accessed by interrupt thread.
- Reduce number of loops in processing work_list to prevent kernel complaint
of CPU lockup or holding sess_lock.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, qla2x00_[get_sp|rel_sp] routines does {get|release} of srb
resource/srb_mempool directly from qla_hw_data. qla2x00_start_sp() is used to
issue management commands through the default Request Q 0 & Response Q 0 or
base_qpair. This patch moves access of these resources through
base_qpair. Instead of having knowledge of specific Q number and lock to
rsp/req queue, this change will key off the qpair that is assigned to the srb
resource. This lays the ground work for other routines to see this resource
through the qpair.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add sysfs support to control zio6 interrupt threshold. Using this sysfs hook
user can set when to generate interrupts. This value will be used to tell
firmware to generate interrupt at a certain interval. If the number of
exchanges/commands fall below defined setting, then the interrupt will be
generated immediately by the firmware.
By default ZIO6 will coalesce interrupts to a specified interval
regardless of low traffic or high traffic.
[mkp: fixed several typos]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Following changes are added by this patch
- Prevent ABTS Response from getting in front of Termination of exchange.
Firmware requires driver to cleanup exchanges before ABTS response can be
sent. This reduces ABTS response error which triggers extra command
re-termination and re-sending of ABTS response.
- Add bits in driver and tracks CTIO/ATIO attribute bits for proper command
Termination. A copy of the ATTR bits will be kept in the ABTS task
management command as a back up copy, if an ABTS response encounters an
error.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ABTS error completion can trigger an exchange cleanup from the driver and
another ABTS response will be generated. This retry of ABTS response can
cause loop between driver trying to send ABTS and firmware returning error.
This patch fixes this issue by adding logic to check for unresolved exchanges
and clean up before ABTS is retried. This patch also addes the fix to use the
same qpair as the ABTS completion for the terminatation of exchange as well as
retry of ABTS response.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When driver detect CTIO_INVALID_RX_ID status for CTIO, print message with
correct information to help with debugging.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move ATIO queue processing out of hardware_lock to prevent deadlock.
Fixes: 3bb67df5b5 ("qla2xxx: Check for online flag instead of active reset when transmitting responses")
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For driver MBX submission, use mbox_busy to serialize request. For Userspace
MBX submission, use optrom mutex to serialize request.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When driver receive PLOGI/PRLI from FW, the WWPN value will be provided. If
it is not, then driver will terminate it. The WWPN allows driver to locate
the session or create a new session.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For target mode, any chip reset triggered before target mode is enabled will
be held off until user is ready to enable. This prevents the chip from
starting or running before it is intended.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When switch responds with error for Get Port Speed Command (GPSC), driver
should not proceed with telling FW about the speed of the remote port.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When all fabric scan retries fail, remove all RPorts, DMA resources for the
command. Otherwise we have stale Rports.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Turn ON logout_on_delete flag to make sure firmware resource for fcport is
cleaned up on ADISC error.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Turn off IOCB timeout timer on IOCB completion instead of turning it off in a
deferred task. This prevent false alarm if the deferred task is stalled out.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Decrement login retry count only for plogi instead of number of attempts made
for login.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, the rport registration is being called from a single work element
that is used to process QLA internal "work_list". This work_list is meant for
quick and simple task (ie no sleep). The Rport registration process sometime
can be delayed by upper layer. This causes back pressure with the internal
queue where other jobs are unable to move forward.
This patch will schedule the registration process with a new work element
(fc_port.reg_work). While the RPort is being registered, the current state of
the fcport will not move forward until the registration is done. If the state
of the fabric has changed, a new field/next_disc_state will record the next
action on whether to 'DELETE' or 'Reverify the session/ADISC'.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove redundant check for fcport is deleted or being delete. The same check
is already in the deletion routine.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename rscn_rcvd field to scan_needed to be more meaningful.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On Abort of initiator scsi command, the abort needs to follow the same qpair
as the the scsi command to prevent out of order processing.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch improves performance for 16G and above adapter by removing
additional call to process_response_queue().
[mkp: typo]
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>