The initial lpfc_desc_set_adisc implementation in commit
dea3101e0a ("lpfc: add Emulex FC driver version 8.0.28") enabled ADISC if
cfg_use_adisc && RSCN_MODE && FCP_2_DEVICE
In commit 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of
SLI-3") this changed to
(cfg_use_adisc && RSC_MODE) || FCP_2_DEVICE
and later in commit ffc954936b ("[SCSI] lpfc 8.3.13: FC Discovery Fixes
and enhancements.") to
(cfg_use_adisc && RSC_MODE) || (FCP_2_DEVICE && FCP_TARGET)
A customer reports that after a devloss, an ADISC failure is logged. It
turns out the ADISC flag is set even the user explicitly set lpfc_use_adisc
= 0.
[Sat Dec 22 22:55:58 2018] lpfc 0000:82:00.0: 2:(0):0203 Devloss timeout on WWPN 50:01:43:80:12:8e:40:20 NPort x05df00 Data: x82000000 x8 xa
[Sat Dec 22 23:08:20 2018] lpfc 0000:82:00.0: 2:(0):2755 ADISC failure DID:05DF00 Status:x9/x70000
[mkp: fixed Hannes' email]
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Link: https://lore.kernel.org/r/20191022072112.132268-1-dwagner@suse.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After IOP reset completion, AIF request command is not issued to the
controller. Driver schedules a worker thread to issue a AIF request command
after IOP reset completion.
[mkp: fix zeroday warning]
Link: https://lore.kernel.org/r/1571120524-6037-7-git-send-email-balsundar.p@microsemi.com
Acked-by: Balsundar P < Balsundar.P@microchip.com>
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently driver waits for the command IOCTL from the firmware and if the
firmware enters nonresponsive state, the driver doesn't respond till the
firmware is responsive again.
Check that firmware is alive, otherwise return -EBUSY.
[mkp: clarified commit desc]
Link: https://lore.kernel.org/r/1571120524-6037-6-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before issuing IOP reset, INTX mode is selected. This is triggering MSGU
lockup and ended in basecode assert. Use DROP_IO command when IOP reset is
sent in preparation for interrupt mode switch.
Link: https://lore.kernel.org/r/1571120524-6037-4-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The problem is the driver detects FastResponse bit set and saves it to
Fib's flags to not check IO response status, but it never clears it for
next IO. Hence the next IO will pick up FastResponse bit to not check
the IO response status and fail to report any type IO error to kernel
Link: https://lore.kernel.org/r/1571120524-6037-3-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi async probe process is calling blk_pm_runtime_init for each lun,
and then those request queues are monitored by the block layer pm
engine (blk-pm.c). This is however, not the case for scsi-passthrough
queues, created by bsg_setup_queue().
So the ufs-bsg driver might send various commands, disregarding the pm
status of the device. This is wrong, regardless if its request queue is
pm-aware or not.
Fixes: df032bf27a (scsi: ufs: Add a bsg endpoint that supports UPIUs)
Link: https://lore.kernel.org/r/1570696267-8487-1-git-send-email-avri.altman@wdc.com
Reported-by: Yuliy Izrailov <yuliy.izrailov@wdc.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The queue pointer might not be valid. The rest of the code checks the
pointer before accessing it. lpfc_sli4_process_missed_mbox_completions is
the only place where the check is missing.
Fixes: 657add4e5e ("scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors")
Cc: James Smart <jsmart2021@gmail.com>
Link: https://lore.kernel.org/r/20191018162111.8798-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
DRIVER_ERROR is a a driver byte setting, not a host byte. The qla2xxx
driver should rather return DID_ERROR here to be in line with the other
drivers.
Link: https://lore.kernel.org/r/20191018140458.108278-1-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZOpW2gUwxXeCmhkh7ulgGnXF3j0FAl2PeYQACgkQ7ulgGnXF
3j3Zow/7BD/9Vai5zqDOFXSFnR5cFfcLoL2aYu13B5GjIYKZlhUW8ePC0jo0p0sV
TSsIrOxv3RaeDLC5ISi+njsSJMspW5qGv8jrZb7xBn1zE2gcJ9YeVhb+tboW2rrr
R03i7HInSQhdyKFMQS05IonRi8LphmTYKy3p8LifiiPoy4TsGcpw2tjQKicp0GxZ
gWMOcMnx4sfUiivo0tys6UwUIACVqKOysXn4HGs8COFF4cdBHXJVkddZ5ZUO6hP+
JInRdiKqDwycZyE6X/6Mj1B7tbmLVGH5mOX2Mx6dwUQkBIpsgJGGIxHRd9sUKv3r
ltfZADn7CJGcTgwEFF1Fnn61pYXgx/m5M9ECop+w89CLNjWBVsRW1rLjIqiDIY6a
pCmZQE/P95iHxLtAn6s1IkZoiXzU44tGQZaS3/8uGFxizx/ktWosokUdzC3PJee3
1eHIXOGwJlu9dTSTR7YDid4s3pXHovlMlu1OTNp1ap8jHX7L5D2AM6xxlLJaXPhN
zOJz6vcP5ZdVWqNq55jsB0dXDa76hrN2SUkpcwgJYKeU7qnRuGRq/jzjetSuRLvI
jyaLY0VyKxHWk0/YgmU2gfW/sBYccJg6ONCPJ80R3KuO8VoMxUPZZiWWTb6bwVXn
Mj2M6dWRIySVI1D8MR7sahGfeCqk7LkCE88DdTDoyaBHasbUSAk=
=LmPS
-----END PGP SIGNATURE-----
Merge tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull scsi fixes from Martin Petersen:
"These two commits were in a separate postmerge branch due to a
dependency on changes merged for 5.4 in the block tree.
They fix two issues in the intersection of the request cleanup changes
from block (b7e9e1fb7a) and the request batching changes
(8930a6c207) that were made to SCSI during the 5.4 cycle"
* tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: core: fix dh and multipathing for SCSI hosts without request batching
scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching
As said in commit f2c2cbcc35 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.
Link: http://lkml.kernel.org/r/20191018031850.48498-21-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The BUILD_NVME define never got defined anywhere, causing NVMe commands to
be treated as SCSI commands when freeing the buffers. This was causing a
stuck discovery and a horrible crash in lpfc_set_rrq_active() later on.
Link: https://lore.kernel.org/r/20191017150019.75769-1-hare@suse.de
Fixes: c00f62e6c5 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.
And getting a invalid paging request BUG_ON as following:
[ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[ 34.629189] Oops: 0000 [#1] SMP PTI
[ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473
[ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[ 34.644545] Call Trace:
[ 34.644907] scsi_host_dev_release+0x6b/0x1f0
[ 34.645511] device_release+0x74/0x110
[ 34.646046] kobject_put+0x116/0x390
[ 34.646559] put_device+0x17/0x30
[ 34.647041] scsi_target_dev_release+0x2b/0x40
[ 34.647652] device_release+0x74/0x110
[ 34.648186] kobject_put+0x116/0x390
[ 34.648691] put_device+0x17/0x30
[ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360
[ 34.649953] execute_in_process_context+0x29/0x80
[ 34.650603] scsi_device_dev_release+0x20/0x30
[ 34.651221] device_release+0x74/0x110
[ 34.651732] kobject_put+0x116/0x390
[ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50
[ 34.652935] sdev_store_delete.cold.4+0x71/0x8f
[ 34.653579] dev_attr_store+0x1b/0x40
[ 34.654103] sysfs_kf_write+0x3d/0x60
[ 34.654603] kernfs_fop_write+0x174/0x250
[ 34.655165] __vfs_write+0x1f/0x60
[ 34.655639] vfs_write+0xc7/0x280
[ 34.656117] ksys_write+0x6d/0x140
[ 34.656591] __x64_sys_write+0x1e/0x30
[ 34.657114] do_syscall_64+0xb1/0x400
[ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 34.658335] RIP: 0033:0x7f156f337130
During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.
To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct returning from reset before outstanding commands are completed
for the device.
Link: https://lore.kernel.org/r/157107623870.17997.11208813089704833029.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core. The last two are mostly about removing
incorrect messages from the kernel log: the resid message is
definitely wrong and the sync cache on protected drive problem is
arguably wrong.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXaYZYCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVuDAP9HBhGv
dQ3FPA7gZ33rmsb8M1Q1NJ0GJuvFj2muh9CFYwD6AoJtVLivVZR75gojLLMqKpuf
6EwRTaUZwYAoWeILNuA=
=+iTy
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core.
The last two are mostly about removing incorrect messages from the
kernel log: the resid message is definitely wrong and the sync cache
on protected drive problem is arguably wrong"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: MAINTAINERS: Update qla2xxx driver
scsi: zfcp: fix reaction on bit error threshold notification
scsi: core: save/restore command resid for error handling
scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
scsi: sd: Ignore a failure to sync cache due to lack of authorization
Code that iterates over all standard PCI BARs typically uses
PCI_STD_RESOURCE_END. However, that requires the unusual test
"i <= PCI_STD_RESOURCE_END" rather than something the typical
"i < PCI_STD_NUM_BARS".
Add a definition for PCI_STD_NUM_BARS and change loops to use the more
idiomatic C style to help avoid fencepost errors.
Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/
Clearing ch->device in ch_release() is wrong because that pointer must
remain valid until ch_remove() is called. This patch fixes the following
crash the second time a ch device is opened:
BUG: kernel NULL pointer dereference, address: 0000000000000790
RIP: 0010:scsi_device_get+0x5/0x60
Call Trace:
ch_open+0x4c/0xa0 [ch]
chrdev_open+0xa2/0x1c0
do_dentry_open+0x13a/0x380
path_openat+0x591/0x1470
do_filp_open+0x91/0x100
do_sys_open+0x184/0x220
do_syscall_64+0x5f/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 085e56766f ("scsi: ch: add refcounting")
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191009173536.247889-1-bvanassche@acm.org
Reported-by: Rob Turk <robtu@rtist.nl>
Suggested-by: Rob Turk <robtu@rtist.nl>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building a kernel with SCSI_SNI_53C710 enabled, Kconfig warns:
WARNING: unmet direct dependencies detected for 53C700_LE_ON_BE
Depends on [n]: SCSI_LOWLEVEL [=y] && SCSI [=y] && SCSI_LASI700 [=n]
Selected by [y]:
- SCSI_SNI_53C710 [=y] && SCSI_LOWLEVEL [=y] && SNI_RM [=y] && SCSI [=y]
Add the missing depends SCSI_SNI_53C710 to 53C700_LE_ON_BE to fix it.
Link: https://lore.kernel.org/r/20191009151128.32411-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetSpanBlock:
drivers/scsi/megaraid/megaraid_sas_fp.c:400:16: warning: variable debugBlk set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function mr_spanset_get_phy_params:
drivers/scsi/megaraid/megaraid_sas_fp.c:713:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetPhyParams:
drivers/scsi/megaraid/megaraid_sas_fp.c:815:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
'debugBlk' is introduced by commit 9c915a8c99 ("[SCSI] megaraid_sas:
Add 9565/9285 specific code"), but never used, so remove it
'fusion' is not used since commit c365178f31 ("scsi: megaraid_sas:
use adapter_type for all gen controllers")
Link: https://lore.kernel.org/r/1570605824-89133-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some arrays are not capable of returning RTPG data during state
transitioning, but rather return an 'LUN not accessible, asymmetric access
state transition' sense code. In these cases we can set the state to
'transitioning' directly and don't need to evaluate the RTPG data (which we
won't have anyway).
Link: https://lore.kernel.org/r/20191007135701.32389-1-hare@suse.de
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
alloc_workqueue is not checked for errors and as a result a potential
NULL dereference could occur.
Link: https://lore.kernel.org/r/1568824618-4366-1-git-send-email-allen.pais@oracle.com
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, MSI-X vectors name appears in /proc/interrupts is "megasas"
which is same for all the vectors. This patch provides a unique name for
all megaraid_sas controllers and their associated MSI-X interrupts.
Link: https://lore.kernel.org/r/20191007051828.12294-1-chandrakanth.patil@broadcom.com
Suggested-by: Konstantin Shalygin <k0ste@k0ste.ru>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Formatting changes, no functional changes.
Link: https://lore.kernel.org/r/157048753005.11757.2228541207280057256.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Removed some unused manifest constants.
Link: https://lore.kernel.org/r/157048752420.11757.3464951542864727227.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Obtain the unique IDs from the RLL and RPL instead of VPD page 83h.
Link: https://lore.kernel.org/r/157048751833.11757.11996314786914610803.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048751247.11757.1727592925624138646.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048750649.11757.7811056360633694725.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support for a timeout on LUN resets.
Link: https://lore.kernel.org/r/157048750055.11757.9689400788261610618.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add timeout field in RAID IU.
Link: https://lore.kernel.org/r/157048749461.11757.10013040278241807855.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: koshyaji <ajish.koshy@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use sas_phy_delete rather than sas_phy_free which, according to
comments, should not be called for PHYs that have been set up
successfully.
Link: https://lore.kernel.org/r/157048748876.11757.17773443136670011786.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048748297.11757.3872221216800537383.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This line is indented too far so it's a bit confusing.
Link: https://lore.kernel.org/r/20191004100615.GA823@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static?
Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For MPI heartbeat stop Async Event, this patch would capture MPI FW dump
and chip reset. FW will tell which function to capture FW dump for.
Link: https://lore.kernel.org/r/20190912180918.6436-13-hmadhani@marvell.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add mailbox timeout checkout for ISP 27xx/28xx during FW dump procedure.
Without the timeout check, hardware lock can be held for long period. This
patch would shorten the dump procedure if a timeout condition is
encountered.
Link: https://lore.kernel.org/r/20190912180918.6436-12-hmadhani@marvell.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload, the remove flag will be set for all
scsi_qla_host/NPIV. This allows each NPIV to see the flag instead of
reaching for base_vha to search for it.
Link: https://lore.kernel.org/r/20190912180918.6436-11-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add error handling logic to ELS Passthrough relating to NVME devices.
Current code does not parse error code to take proper recovery action,
instead it re-logins with the same login parameters that encountered the
error. Ex: nport handle collision.
Link: https://lore.kernel.org/r/20190912180918.6436-10-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some storage arrays advertise FCP LUNs and NVMe namespaces behind the same
WWN. The driver now offers a user option by way of NVRAM parameter to
allow users to choose, on a per port basis, the kind of FC-4 type they
would like to prioritize for login.
Link: https://lore.kernel.org/r/20190912180918.6436-9-hmadhani@marvell.com
Signed-off-by: Michael Hernandez <mhernandez@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Twelve patches mostly small but obvious fixes or cosmetic but small
updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXZgfWiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaVOAQDnuANx
QGEuQ1dZPALeZPOlEOsJzzpHPd3O+mQauIE96wD9FMypt/UKF9+fvlp4mCP+ya66
0fz1kmTQIcAADdYaNYM=
=aQi7
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Twelve patches mostly small but obvious fixes or cosmetic but small
updates"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix Nport ID display value
scsi: qla2xxx: Fix N2N link up fail
scsi: qla2xxx: Fix N2N link reset
scsi: qla2xxx: Optimize NPIV tear down process
scsi: qla2xxx: Fix stale mem access on driver unload
scsi: qla2xxx: Fix unbound sleep in fcport delete path.
scsi: qla2xxx: Silence fwdump template message
scsi: hisi_sas: Make three functions static
scsi: megaraid: disable device when probe failed after enabled device
scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
scsi: qedf: Remove always false 'tmp_prio < 0' statement
scsi: ufs: skip shutdown if hba is not powered
scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
When a non-passthrough command is terminated with CHECK CONDITION, request
sense is executed by hijacking the command descriptor. Since
scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() do not save/restore the
original command resid, the value returned on failure of the original
command is lost and replaced with the value set by the execution of the
request sense command. This value may in many instances be unaligned to the
device sector size, causing sd_done() to print a warning message about the
incorrect unaligned resid before the command is retried.
Fix this problem by saving the original command residual in struct
scsi_eh_save using scsi_eh_prep_cmnd() and restoring it in
scsi_eh_restore_cmnd(). In addition, to make sure that the request sense
command is executed with a correctly initialized command structure, also
reset the residual to 0 in scsi_eh_prep_cmnd() after saving the original
command value in struct scsi_eh_save.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191001074839.1994-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warning:
drivers/scsi/bfa/bfad.c:1491:1: warning:
symbol 'restart_bfa' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20190930094327.46836-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called
from the completion path") introduced the WARN_ON_ONCE in
qla2x00_status_cont_entry(). The assumption was that there is only one
status continuations element. According to the firmware documentation it is
possible that multiple status continuations are emitted by the firmware.
Fixes: 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called from the completion path")
Link: https://lore.kernel.org/r/20190927073031.62296-1-dwagner@suse.de
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I've got a report about a UAS drive enclosure reporting back Sense: Logical
unit access not authorized if the drive it holds is password protected.
While the drive is obviously unusable in that state as a mass storage
device, it still exists as a sd device and when the system is asked to
perform a suspend of the drive, it will be sent a SYNCHRONIZE CACHE. If
that fails due to password protection, the error must be ignored.
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190903101840.16483-1-oneukum@suse.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since 'commit fc8d0590d9 ("libcxgbi: Add ipv6 api to driver")' was
introduced, there is no call to csk_print_port() and csk_print_ip() is
made.
Hence kernel build with clang complains below message:
drivers/scsi/cxgbi/libcxgbi.c:2287:19: warning: unused function 'csk_print_port' [-Wunused-function]
static inline int csk_print_port(struct cxgbi_sock *csk, char *buf)
^
drivers/scsi/cxgbi/libcxgbi.c:2298:19: warning: unused function 'csk_print_ip' [-Wunused-function]
static inline int csk_print_ip(struct cxgbi_sock *csk, char *buf)
^
Remove csk_print_port() and csk_print_ip() to stop warning.
Link: https://lore.kernel.org/r/20190924093716.GA78230@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add sysfs attributes for the ATA information page and Supported VPD Pages
page.
Link: https://lore.kernel.org/r/20190926162216.56591-1-ryanattard@ryanattard.info
Signed-off-by: Ryan Attard <ryanattard@ryanattard.info>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are some statements that are indented too deeply, remove the
extraneous tabs and rejoin split lines.
Link: https://lore.kernel.org/r/20190927095840.26377-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Couple of users had requested to print the SCSI command age along with
command failure errors. This is a small change, but allows users to get
more important information about the command that was failed, it would help
the users in debugging the command failures:
Link: https://lore.kernel.org/r/20190926052501.GA8352@machine1
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add qedf_get_host_port_id() to the transport template.
The fc_transport_template initializes the port_id member to the default
value of -1. The new getter ensures that the sysfs entry shows the current
value and not the default one, e.g by using 'lsscsi -H -t'
Link: https://lore.kernel.org/r/20190924072906.23737-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework from previous work by:
Sujit Reddy Thumma <sthumma@codeaurora.org>
Override auto suspend tunables for UFS device LUNs during initialization so
as to efficiently manage background operations and the power consumption.
Link: https://lore.kernel.org/r/1568649411-5127-3-git-send-email-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework from previous work by:
Sujit Reddy Thumma <sthumma@codeaurora.org>
Until now the scsi mid-layer forbids runtime suspend till userspace enables
it. This is mainly to quarantine some disks with broken runtime power
management or have high latencies executing suspend resume callbacks. If
the userspace doesn't enable the runtime suspend the underlying hardware
will be always on even when it is not doing any useful work and thus
wasting power.
Some low-level drivers for the controllers can efficiently use runtime
power management to reduce power consumption and improve battery life.
Allow runtime suspend parameters override within the LLD itself instead of
waiting for userspace to control the power management.
Link: https://lore.kernel.org/r/1568649411-5127-2-git-send-email-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simplify this function implementation by using a known function.
Generated by: scripts/coccinelle/api/ptr_ret.cocci
[mkp: applied by hand]
Link: https://lore.kernel.org/r/9e667f19-434e-ed30-78cb-9ddc6323c51e@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array setup_attrs on the stack but instead make it
static const. Makes the object code smaller by 180 bytes.
Before:
text data bss dec hex filename
2140 224 0 2364 93c drivers/scsi/ufs/ufshcd-dwc.o
After:
text data bss dec hex filename
1863 320 0 2183 887 drivers/scsi/ufs/ufshcd-dwc.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906170104.10450-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array 'options' on the stack but instead make it static
const. Makes the object code smaller by 143 bytes.
Before:
text data bss dec hex filename
94483 11272 1184 106939 1a1bb drivers/scsi/ips.o
After:
text data bss dec hex filename
94244 11368 1184 106796 1a12c drivers/scsi/ips.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906164522.5644-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array dev_cmd_err on the stack but instead make it
static const. Makes the object code smaller by 80 bytes.
Before:
text data bss dec hex filename
21461 1564 0 23025 59f1 drivers/scsi/fnic/vnic_dev.o
After:
text data bss dec hex filename
21318 1628 0 22946 59a2 drivers/scsi/fnic/vnic_dev.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906163945.3889-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable rc is being initialized with a value that is never read and is
being re-assigned a little later on. The assignment is redundant and hence
can be removed.
Link: https://lore.kernel.org/r/20190905135017.23772-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pointer host is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant and
hence can be removed.
Link: https://lore.kernel.org/r/20190905134229.21194-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/smartpqi/smartpqi_init.c: In function 'pqi_driver_version_show':
drivers/scsi/smartpqi/smartpqi_init.c:6164:24: warning:
variable 'ctrl_info' set but not used [-Wunused-but-set-variable]
commit 6d90615f13 ("scsi: smartpqi: add sysfs entries") added it but
it was never used. Also remove variable 'shost'.
[mkp: commit desc]
Link: https://lore.kernel.org/r/20190831130348.20552-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a statement that is indented one level too deeply, remove the tab,
re-join broken line and remove some empty lines.
Link: https://lore.kernel.org/r/20190831073903.7834-1-colin.king@canonical.com
Addresses-Coverity: ("Indentation does not match nesting")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Load driver with module parameter "max_msix_vectors". Value provided in
module parameter is not used by mpt3sas driver. Driver loads with max
controller supported MSI-X value.
In _base_alloc_irq_vectors use reply_queue_count which is determined using
user provided msix value insted of ioc->msix_vector_count which tells max
supported msix value of the controller.
Link: https://lore.kernel.org/r/1568379890-18347-13-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If any faulty application issues an NVMe Encapsulated commands to HBA which
doesn't support NVMe protocol then driver should return the command as
invalid with the following message.
"HBA doesn't support NVMe. Rejecting NVMe Encapsulated request."
Otherwise below page fault kernel panic will be observed while building the
PRPs as there is no PRP pools allocated for the HBA which doesn't support
NVMe drives.
RIP: 0010:_base_build_nvme_prp+0x3b/0xf0 [mpt3sas]
Call Trace:
_ctl_do_mpt_command+0x931/0x1120 [mpt3sas]
_ctl_ioctl_main.isra.11+0xa28/0x11e0 [mpt3sas]
? prepare_to_wait+0xb0/0xb0
? tty_ldisc_deref+0x16/0x20
_ctl_ioctl+0x1a/0x20 [mpt3sas]
do_vfs_ioctl+0xaa/0x620
? vfs_read+0x117/0x140
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x60/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mkp: tweaked error string]
Link: https://lore.kernel.org/r/1568379890-18347-12-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The firmware image layout has been changed for Aero controllers. All
compatible HBAs have to get Firmware Package version from Component Image
Header layout.
The Signature field in FW header is set to 0xEB000042 for products
compatible with Component Image Header.
For compatible controllers, driver fetches firmware package version from
ApplicationSpecific field of Component Image Header.
Link: https://lore.kernel.org/r/1568379890-18347-11-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set
whenever application registers the diag buffer & it will be cleared when
application unregisters the buffer.
When this flag is enabled, and if application issues diag buffer register
command without releasing the buffer, then register command will be failed
with -EINVAL status by saying that this buffer is already registered by the
application.
When user issues a trace buffer register command through sysfs parameter,
and if trace buffer is in released stated but not yet unregistered by the
application which was owning it, then driver will unregister the buffer by
itself and freshly register the 1MB sized trace buffer with the HBA
firmware.
Link: https://lore.kernel.org/r/1568379890-18347-9-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The diag buffer which is allocated during driver load time or through sysfs
parameter is marked as driver allocated diag buffer.
MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer.
This buffer won't be de-allocated even when application issues unregister
command, driver just clears the registered status bit. Same buffer will be
reused while re-registering the same diag buffer type by any application.
While re-registering the same diag buffer type application has to register
with the same size that the buffer was allocated during driver load
time. This buffer size can be read by the application by issuing diag
'query' command.
This always makes sure that the memory is available for applications for
collecting the firmware logs. Only thing is that this won't allow the
application to re-register the diag buffer with different size, but the
buffer size which is allocated during driver load time will be enough for
most of the cases for collecting the firmware logs.
Link: https://lore.kernel.org/r/1568379890-18347-8-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clear MPT3_DIAG_BUFFER_IS_RELEASED bit once diag buffer is re-registered
after reading the buffer, else driver won't release the buffer and return
the 'diag release' command with -EINVAL status saying that buffer is
already released.
Link: https://lore.kernel.org/r/1568379890-18347-7-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Application A has registered a diag buffer and looking for particular event
to happen to release & read the trace buffer. Meanwhile application B has
unregistered the diag buffer and now Application A can't get the required
diag buffer. So proper diag buffer ownership is missing.
Each application has to maintain its own Unique ID. Now driver has to save
the Application's UniqueID for each diag buffer type when diag buffer is
registered. And driver has to allow 'release', 'read' & 'unregister' diag
commands only if application's UniqueID matches with saved UniqueID for the
corresponding diag buffer type.
When diag buffer is registered by the driver, then the UniqueID saved by
the driver is "BRCM" (i.e. 0x4252434D) for SAS3 and above generations HBA
devices. For SAS2 HBAs, driver keeps the legacy UniqueID 0x07075900 for
maintaining compatibility with the legacy SAS2 application and this
improvement won't be applicable for SAS2 HBA devices.
Any application can own the buffer registered by the driver by sending
diag register request to driver with same buffer type and size
(Application can get the buffer size by sending 'query' command). Then
driver changes the ownership of the buffer by saving application's
UniqueID for that corresponding buffer type.
Also, application can re-register the diag buffer with same size without
un-registering it, but diag buffer should be released before re-registering
it. By allowing this, driver no need to deallocate and allocate a new
buffer for re-register command, same buffer can be re-used.
Link: https://lore.kernel.org/r/1568379890-18347-6-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Memory leak can happen when diag buffer is released but not unregistered
(where buffer is deallocated) by the user. During module unload time driver
is not deallocating the buffer if the buffer is in released state.
Deallocate the diag buffer during module unload time without any diag
buffer status checks.
Link: https://lore.kernel.org/r/1568379890-18347-5-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When user issues diag register command from application with required size,
and if driver unable to allocate the memory, then it will fail the register
command. While failing the register command, driver is not currently
clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set
before trying to allocate the memory. As this bit is set, subsequent
register command will be failed with BUSY status even when user wants to
register the trace buffer will less memory.
Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag
register command with no memory status.
Link: https://lore.kernel.org/r/1568379890-18347-4-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Display message before releasing the diag buffer so that user knows which
event caused the release of diag buffer.
Releasing of diag buffer means HBA firmware stops posting the firmware logs
on the registered diag buffer.
Link: https://lore.kernel.org/r/1568379890-18347-3-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if user wishes to enable the host trace buffer during driver load
time, then user has to load the driver with module parameter
'diag_buffer_enable' set to one.
Alternatively now the user can enable host trace buffer by enabling the
following fields in manufacturing page11 in NVDATA (nvdata xml is used
while building HBA firmware image):
* HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can
allocate,
* HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host
should allocate,
* HostTraceBufferDecrementSizeKB - size by which host can reduce from
buffer size and retry the buffer allocation
when buffer allocation failed with previous
calculated buffer size.
The driver will register the trace buffer automatically without any module
parameter during boot time when above fields are enabled in manufacturing
page11 in HBA firmware.
Driver follows the following algorithm for enabling the host trace buffer
during driver load time:
* If user has loaded the driver with module parameter 'diag_buffer_enable'
set to one, then driver allocates 2MB buffer and registers this buffer
with HBA firmware for capturing the firmware trace logs.
* Else driver reads manufacture page11 data and checks whether
HostTraceBufferMaxSizeKB filed is zero or not?
- If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate
HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is
successful, then it will register this buffer with HBA firmware, else
in a loop the driver will try again by reducing the current buffer size
with HostTraceBufferDecrementSizeKB size until memory allocation is
successful or buffer size falls below HostTraceBufferMinSizeKB. If the
memory allocation is successful, then the buffer will be registered
with the firmware. Else, if the buffer size falls below the
HostTraceBufferMinSizeKB, then driver won't register trace buffer with
HBA firmware.
- If HostTraceBufferMaxSizeKB is zero, then driver won't register trace
buffer with HBA firmware.
Link: https://lore.kernel.org/r/1568379890-18347-2-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in
lpfc driver message 0387.
Presumed residual (or unused) code from previous commit.
Removed fcp_txcmplq_cnt.
Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior
commit in the 12.4.0.0 stream added device recognition that would prevent
T10 PI enablement. However, it didn't contain a complete device list. Thus
some SLI-4 FCoE adapters still had T10 PI enabled.
Fix by expanding the device list that identifies FCoE devices.
Link: https://lore.kernel.org/r/20190922035906.10977-19-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates ACQE handling for:
- an EEPROM failure error reported by the adapter.
- ensures that all data for any ACQE, recognized or not, is logged.
- Given that all data is now logged unconditionally, the default case
(unrecognized) data can be reduced.
Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool
before any associated sgl or cmd and rsp buffers are returned via their
respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf
structure is reused quickly, there may be a race condition between release
of old and association of new resources.
Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp
buffer lists before releasing the lpfc_io_buf structure for re-use.
Fixes: d79c9e9d4b ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and
spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard
deadlocks were seen around lpfc_scsi_prep_cmnd().
Fix by converting the locks to irqsave/irqrestore.
Fixes: d79c9e9d4b ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While reviewing the CT behavior, issues with spinlock_irq were seen. The
driver should be using spinlock_irqsave/irqrestore in the els flush
routine.
Changed to spinlock_irqsave/irqrestore.
Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline. The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.
Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.
To fix, the following was performed:
- Prevent the offline routine from releasing iocbs that have had aborts
issued on them. Defer to the abort completions. Also means the driver
fully waits for the completions. Given this change, the recognition of
"driver-generated" status which then releases the iocb is no longer
valid. As such, the change made in the commit 296012285c is reverted.
As recognition of "driver-generated" status is no longer valid, this
patch reverts the changes made in
commit 296012285c ("scsi: lpfc: Fix leak of ELS completions on adapter reset")
- Modify lpfc_work_done to allow slow path completions so that the abort
completions aren't ignored.
- Updated the fdmi path to recognize a CT request that fails due to the
port being unusable. This stops FDMI retries. FDMI will be restarted on
next link up.
Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Scenarios were seen where a host hung when the system booted or the host
was very slow in booting. The link would not come up and no luns were
visible to the host.
After investigation, this was found to be due to the introduction of a new
ACQE that adapter may generate to report a adapter hw warning. The ACQE was
delivered to the driver very early in adapter initialization, when the
driver did not expect command completion. As part of handling this
unexpected interrupt the an EQEs are consumed and discarded and the EQ
rearmed. The issue is the CQ that cause the EQE and thus the interrupt was
not processed and the CQ was left unarmed. Meaning it would no longer
generate a new interrupt condition. Subsequent mailbox commands used to
initialize the adapter use the same CQ, and as there was no completion
interrupt generated, the driver never saw the mailbox commands complete and
it would wait long command timeouts.
Fix by having the early flush routine also process the related CQ and rearm
the CQ.
Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Coverity flagged several scenarios where checking of null pointer values
wasn't consistent.
Fix the code to that be consistent on checking.
Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the port, running as a nvme target, receives an ABTS, it submits
commands to the adapter to Abort i/o outstanding in the adapter. The Abort
command formatting routine left a command field set to zero, which
instructs the adapter to generate an ABTS on the wire as part of cleaning
up the I/O. This is common operation for an initiator, but not for a
target.
Fix the driver to check whether an ABTS had been received for the I/O, and
if so, change the Abort command formatting so that the ABTS generation is
disabled (IA=1). No need to ABTS it when the other side already has.
Also refactored the code such that there is a single routine being used for
nvme or nvmet ABORT requests, and IA is an argument.
Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An issue was seen discovering all SCSI Luns when a target device undergoes
link bounce.
The driver currently does not qualify the FC4 support on the target.
Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is
that the target will reject the PRLI if it is not supported. If a PRLI
times out, the driver will retry. The driver will not proceed with the
device until both SCSI and NVMe PRLIs are resolved. In the failure case,
the device is FCP only and does not respond to the NVMe PRLI, thus
initiating the wait/retry loop in the driver. During that time, a RSCN is
received (device bounced) causing the driver to issue a GID_FT. The GID_FT
response comes back before the PRLI mess is resolved and it prematurely
cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE
state. Discovery with the target never completes or resets.
Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes,
thereby restarting the discovery process for the node.
Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Faults are seen with RIP of lpfc_scsi_cmd_iocb_cmpl(). The failure is when
lpfc_update_status is being called as part of the completion. After
debugging, it was seen the issue was the shost pointer that the driver
derived from the scsi cmd. The crash showed the cmd->device pointer being
bogus, which is likely as the scsi devices were offlined prior. The bogus
device pointer caused subsequent pointers derived from the location,
specifically the vport, to be bogus.
Fix by adjusting the calling sequence to pass in the vport rather than
having to derive it from the cmd structure.
Link: https://lore.kernel.org/r/20190922035906.10977-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Symptoms were seen of the driver not having valid data for mailbox
commands. After debugging, the following sequence was found:
The driver maintains a port-wide pointer of the mailbox command that is
currently in execution. Once finished, the port-wide pointer is cleared
(done in lpfc_sli4_mq_release()). The next mailbox command issued will set
the next pointer and so on.
The mailbox response data is only copied if there is a valid port-wide
pointer.
In the failing case, it was seen that a new mailbox command was being
attempted in parallel with the completion. The parallel path was seeing
the mailbox no long in use (flag check under lock) and thus set the port
pointer. The completion path had cleared the active flag under lock, but
had not touched the port pointer. The port pointer is cleared after the
lock is released. In this case, the completion path cleared the just-set
value by the parallel path.
Fix by making the calls that clear mbox state/port pointer while under
lock. Also slightly cleaned up the error path.
Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When target-side fault injections are made, the driver isn't reconnecting
to the remote port. The driver is logging "2753" error messages which
state:
"PLOGI failure DID:1B2400 Status:x3/xf0240008"
The failures status is indicating a Illegal field error, which points to
the Temporary RPI field being used for the ELS. This error typically means
the driver used an RPI that was already registered (shouldn't be registered
if using it in this context).
Study has found that if the driver were in discovery attempts and
encountered an error, it wouldn't flag the temporary rpi in error. Yet the
rpi was released for reallocation in these error paths and another ELS
could allocate the rpi. In the failure situation a retry was done on an ELS
that had encountered an error, and as the rpi wasn't marked in error, the
ELS reused the rpi it originally allocated. But that rpi had been allocated
by a different ELS issued after the original error and before the retry
attempt. The different ELS had succeeded and the RPI was registered.
Fix by marking the rpi state for the node to be in error, aka as needing
reallocation, upon an error in the els processing. Error state marking is
always done prior to release back to the internal rpi free list, which the
driver wasn't doing in cases prior.
Also enhanced some of the logging to help in the next case of problem
troubleshooting.
Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp
pointer. However, further testing has shown that this change causes a
later state change to occasionally be skipped, which results in a reference
count never being decremented thus the rpi is never released, which causes
a vport delete to never succeed.
Revise the fix in the prior patch to no longer null the ndlp. Instead the
RELEASE_RPI flag is set which will drive the release of the rpi.
Given the new code was added at a deep indentation level, refactor the code
block using a new routine that avoids the indentation issues.
Fixes: 9b16406864 ("scsi: lpfc: Fix use-after-free mailbox cmd completion")
Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The nvme-fc transport may call to abort an io on controller reset. If the
driver is out of resources to issue an abort command, it just gives up and
does nothing. The transport expects the lldd to always be able to terminate
an io it has issued. At that point, the controller hangs waiting for
aborted ios to be returned. Note: flaged by "6136" and "6176" error
messages.
Root issue was the adapter mis-allocated the number resources it allocated
for command entries for the adapter.
Convert the driver to allocate command resources based on the number of
xris supported by the FC port - 1 resource for the original command and 1
resource for the abort request.
Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Coverity flagged missing status check on register read that flags a
poisoned data return value.
Add checking of register read status.
Link: https://lore.kernel.org/r/20190922035906.10977-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use of spin_lock_irq may re-enable interrupts prematurely.
Convert to spin_lock. Note: code is under the phba->hba_lock which has been
locked with irqsave.
Link: https://lore.kernel.org/r/20190922035906.10977-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link
trace showed the port was assigned a non-zero n_port_id, but didn't use the
address on the PRLI. The assigned address is set on the port by the
CONFIG_LINK mailbox command. The driver responded to the PRLI before the
mailbox command completed. Thus the PRLI response used the old n_port_id.
Defer the PRLI response until CONFIG_LINK completes.
Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2J8xQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpujgD/94s9GGKN8JShxCpT0YNuWyyFF5gNlaimQU
RSGAwnv2YUgEGNSUOPpcaj5FAYhTfYzbqoHlE+jytA2U5KXTOhc5Z85QV+TY4HPs
I03xczYuYD/uX0QuF00zU2+6eV3lETELPiBARbfEQdHfm72iwurweHzlh4dfhbxW
P7UA/cKixXWF2CH9wg5347Ll93nD24f2pi8BUyLJi/xpdlaRrN11Ii8AzNlRmq52
VRxURuogl98W89F6EV2VhPGFgUEYHY2Ot7II2OqqV+jmjHDQW9y5hximzINOqkxs
bQwo5J+WrDSPoqwl8+db2k7QQjAl1XKDAHmCwz+7J/BoOgZj8/M1FMBwzita+5x+
UqxEYe7k+2G3w2zuhBrq03BypU8pwqFep/QI0cCCPaHs4J5QnkVOScEqd6iV/C3T
FPvMvqDf7MrElghj4Qa2IZlh/CgqmLG5NUEz8E40cXkdiP+E+eK9ZY2Uwx2XhBrm
7Gl+SpG5DxWqqJeRNVWjFwM4p5L+01NtwDbTjZ1rsf+mCW5cNsy/L9B4UpPz4HxW
coAs0y/Ce+ZhCopIXZ4jLDBoTG9yoVg8EcyfaHKD2Zz0mUFxa2xm+LvXKeT49qqx
xuodpKD3fiuM7h9Xgv+cDsmn8Rr8gSeXEGV7qzpudmkxbp6IVg/yG5hC/dM921GR
EVrRtUIwdw==
=aAPP
-----END PGP SIGNATURE-----
Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"Some later additions that weren't quite done for the first pull
request, and also a few fixes that have arrived since.
This contains:
- Kill silly pktcdvd warning on attempting to register a non-scsi
passthrough device (me)
- Use symbolic constants for the block t10 protection types, and
switch to handling it in core rather than in the drivers (Max)
- libahci platform missing node put fix (Nishka)
- Small series of fixes for BFQ (Paolo)
- Fix possible nbd crash (Xiubo)"
* tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block:
block: drop device references in bsg_queue_rq()
block: t10-pi: fix -Wswitch warning
pktcdvd: remove warning on attempting to register non-passthrough dev
ata: libahci_platform: Add of_node_put() before loop exit
nbd: fix possible page fault for nbd disk
nbd: rename the runtime flags as NBD_RT_ prefixed
block, bfq: push up injection only after setting service time
block, bfq: increase update frequency of inject limit
block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1
block, bfq: update inject limit only after injection occurred
block: centralize PI remapping logic to the block layer
block: use symbolic constants for t10_pi type
For N2N, the NPort ID is assigned by driver in the PLOGI ELS. According to
FW Spec the byte order for SID is not the same as DID.
Link: https://lore.kernel.org/r/20190912180918.6436-8-hmadhani@marvell.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During link up/bounce, qla driver would do command flush as part of
cleanup. In this case, the flush can intefere with FW state. This patch
allows FW to be in control of link up.
Link: https://lore.kernel.org/r/20190912180918.6436-7-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix stalled link recovery for N2N with FC-NVMe connection.
Link: https://lore.kernel.org/r/20190912180918.6436-6-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the case of NPIV port is being torn down, this patch will set a flag to
indicate VPORT_DELETE. This would prevent relogin to be triggered.
Link: https://lore.kernel.org/r/20190912180918.6436-5-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On driver unload, 'remove_one' thread was allowed to advance, while session
cleanup still lag behind. This patch ensures session deletion will finish
before remove_one can advance.
Link: https://lore.kernel.org/r/20190912180918.6436-4-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are instances, though rare, where a LOGO request cannot be sent out
and the thread in free session done can wait indefinitely. Fix this by
putting an upper bound to sleep.
Link: https://lore.kernel.org/r/20190912180918.6436-3-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Print if fwdt template is present or not, only when
ql2xextended_error_logging is enabled.
Link: https://lore.kernel.org/r/20190912180918.6436-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6:
warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5:
warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6:
warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20190923054035.19036-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
storvsc doesn't use a dedicated hardware queue for a given CPU queue. When
issuing I/O, it selects returning CPU (hardware queue) dynamically based on
vmbus channel usage across all channels.
This patch advertises num_present_cpus() as number of hardware queues. This
will have upper layer setup 1:1 mapping between hardware queue and CPU
queue and avoid unnecessary locking when issuing I/O.
Link: https://lore.kernel.org/r/1567790660-48142-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since tmp_prio is declared as u8, the following statement is always false.
tmp_prio < 0
So remove 'always false' statement.
Link: https://lore.kernel.org/r/20190919075548.GA112801@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In some cases, hba may go through shutdown flow without successful
initialization and then make system hang.
For example, if ufshcd_change_power_mode() gets error and leads to
ufshcd_hba_exit() to release resources of the host, future shutdown flow
may hang the system since the host register will be accessed in unpowered
state.
To solve this issue, simply add checking to skip shutdown for above kind of
situation.
Link: https://lore.kernel.org/r/1568780438-28753-1-git-send-email-stanley.chu@mediatek.com
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qla2xxx driver had this issue as well when the newer array firmware
returned the retry_delay_timer in the fcp_rsp. The bnx2fc is not handling
the masking of the scope bits either so the retry_delay_timestamp value
lands up being a large value added to the timer timestamp delaying I/O for
up to 27 Minutes. This patch adds similar code to handle this to the
bnx2fc driver to avoid the huge delay.
Link: https://lore.kernel.org/r/1568210202-12794-1-git-send-email-loberman@redhat.com
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Reported-by: David Jeffery <djeffery@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was missing from scsi_device_from_queue() due to the introduction of
another new scsi_mq_ops_no_commit of linux-next commit 8930a6c207 ("scsi:
core: add support for request batching") from Martin's scsi/5.4/scsi-queue
or James' scsi/misc.
Only devicehandler code seems to call scsi_device_from_queue():
*** drivers/scsi/scsi_dh.c:
scsi_dh_activate[255] sdev = scsi_device_from_queue(q);
scsi_dh_set_params[302] sdev = scsi_device_from_queue(q);
scsi_dh_attach[325] sdev = scsi_device_from_queue(q);
scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q);
Fixes multipath tools follow-on errors:
$ multipath -v6
...
libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha failed: No such device
...
mpatha: failed to load map, error 19
...
showing also as kernel messages:
device-mapper: table: 252:0: multipath: error attaching hardware handler
device-mapper: ioctl: error adding target to table
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was missing from scsi_mq_ops_no_commit of linux-next commit
8930a6c207 ("scsi: core: add support for request batching") from Martin's
scsi/5.4/scsi-queue or James' scsi/misc.
See also linux-next commit b7e9e1fb7a ("scsi: implement .cleanup_rq
callback") from block/for-next.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi,
lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates.
The only core change this time around is the addition of request
batching for virtio. Since batching requires an additional flag to
use, it should be invisible to the rest of the drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXYQE/yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXs9AP4usPY5
OpMlF6OiKFNeJrCdhCScVghf9uHbc7UA6cP+EgD/bCtRgcDe1ZjOTYWdeTwvwWqA
ltWYonnv6Lg3b1f9yqI=
=jRC/
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi,
lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The
only core change this time around is the addition of request batching
for virtio. Since batching requires an additional flag to use, it
should be invisible to the rest of the drivers"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits)
scsi: hisi_sas: Fix the conflict between device gone and host reset
scsi: hisi_sas: Add BIST support for phy loopback
scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation
scsi: hisi_sas: Remove some unused function arguments
scsi: hisi_sas: Remove redundant work declaration
scsi: hisi_sas: Remove hisi_sas_hw.slot_complete
scsi: hisi_sas: Assign NCQ tag for all NCQ commands
scsi: hisi_sas: Update all the registers after suspend and resume
scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device
scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails
scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled
scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset()
scsi: hisi_sas: add debugfs auto-trigger for internal abort time out
scsi: virtio_scsi: unplug LUNs when events missed
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
scsi: fcoe: fix null-ptr-deref Read in fc_release_transport
scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code
scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code
scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code
scsi: ufs: Use kmemdup in ufshcd_read_string_desc()
...
Pull networking updates from David Miller:
1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.
2) Use bio_vec in the networking instead of custom skb_frag_t, from
Matthew Wilcox.
3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.
4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.
5) Support all variants of 5750X bnxt_en chips, from Michael Chan.
6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
Buslov.
7) Add TCP syn cookies bpf helper, from Petar Penkov.
8) Add 'nettest' to selftests and use it, from David Ahern.
9) Add extack support to drop_monitor, add packet alert mode and
support for HW drops, from Ido Schimmel.
10) Add VLAN offload to stmmac, from Jose Abreu.
11) Lots of devm_platform_ioremap_resource() conversions, from
YueHaibing.
12) Add IONIC driver, from Shannon Nelson.
13) Several kTLS cleanups, from Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
mlxsw: spectrum: Register CPU port with devlink
mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
net: ena: fix incorrect update of intr_delay_resolution
net: ena: fix retrieval of nonadaptive interrupt moderation intervals
net: ena: fix update of interrupt moderation register
net: ena: remove all old adaptive rx interrupt moderation code from ena_com
net: ena: remove ena_restore_ethtool_params() and relevant fields
net: ena: remove old adaptive interrupt moderation code from ena_netdev
net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
net: ena: enable the interrupt_moderation in driver_supported_features
net: ena: reimplement set/get_coalesce()
net: ena: switch to dim algorithm for rx adaptive interrupt moderation
net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
ethtool: implement Energy Detect Powerdown support via phy-tunable
xen-netfront: do not assume sk_buff_head list is empty in error handling
s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
net: ena: don't wake up tx queue when down
drop_monitor: Better sanitize notified packets
...
Currently t10_pi_prepare/t10_pi_complete functions are called during the
NVMe and SCSi layers command preparetion/completion, but their actual
place should be the block layer since T10-PI is a general data integrity
feature that is used by block storage protocols. Introduce .prepare_fn
and .complete_fn callbacks within the integrity profile that each type
can implement according to its needs.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Fixed to not call queue integrity functions if BLK_DEV_INTEGRITY
isn't defined in the config.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXYAIeQAKCRBiEGxRG/Sl
2/SzAQDEnoNxzV/R5kWFd+2kmFeY3cll0d99KMrWJ8om+kje6QD/cXxZHzFm+T1L
UPF66k76oOODV7cyndjXnTnRXbeCRAM=
=Szby
-----END PGP SIGNATURE-----
Merge tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"In this cycle we've finally managed to contribute the patch set
sorting out LED naming issues. Besides that there are many changes
scattered among various LED class drivers and triggers.
LED naming related improvements:
- add new 'function' and 'color' fwnode properties and deprecate
'label' property which has been frequently abused for conveying
vendor specific names that have been available in sysfs anyway
- introduce a set of standard LED_FUNCTION* definitions
- introduce a set of standard LED_COLOR_ID* definitions
- add a new {devm_}led_classdev_register_ext() API with the
capability of automatic LED name composition basing on the
properties available in the passed fwnode; the function is
backwards compatible in a sense that it uses 'label' data, if
present in the fwnode, for creating LED name
- add tools/leds/get_led_device_info.sh script for retrieving LED
vendor, product and bus names, if applicable; it also performs
basic validation of an LED name
- update following drivers and their DT bindings to use the new LED
registration API:
- leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114,
leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt
Other LED class improvements:
- replace {devm_}led_classdev_register() macros with inlines
- allow to call led_classdev_unregister() unconditionally
- switch to use fwnode instead of be stuck with OF one
LED triggers improvements:
- led-triggers:
- fix dereferencing of null pointer
- fix a memory leak bug
- ledtrig-gpio:
- GPIO 0 is valid
Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer,
more complete driver exists, based on a generic driver for the AMD
SOCs gpio-controller, supporting LEDs as well other devices:
- drop profile field from priv data
- drop iosize field from priv data
- drop enum_apu_led_platform_types
- drop superseeded apu2/3 led support
- add pr_fmt prefix for better log output
- fix error message on probing failure
Other misc fixes and improvements to existing LED class drivers:
- leds-ns2, leds-max77650:
- add of_node_put() before return
- leds-pwm, leds-is31fl32xx:
- use struct_size() helper
- leds-lm3697, leds-lm36274, leds-lm3532:
- switch to use fwnode_property_count_uXX()
- leds-lm3532:
- fix brightness control for i2c mode
- change the define for the fs current register
- fixes for the driver for stability
- add full scale current configuration
- dt: Add property for full scale current.
- avoid potentially unpaired regulator calls
- move static keyword to the front of declarations
- fix optional led-max-microamp prop error handling
- leds-max77650:
- add of_node_put() before return
- add MODULE_ALIAS()
- Switch to fwnode property API
- leds-as3645a:
- fix misuse of strlcpy
- leds-netxbig:
- add of_node_put() in netxbig_leds_get_of_pdata()
- remove legacy board-file support
- leds-is31fl319x:
- simplify getting the adapter of a client
- leds-ti-lmu-common:
- fix coccinelle issue
- move static keyword to the front of declaration
- leds-syscon:
- use resource managed variant of device register
- leds-ktd2692:
- fix a typo in the name of a constant
- leds-lp5562:
- allow firmware files up to the maximum length
- leds-an30259a:
- fix typo
- leds-pca953x:
- include the right header"
* tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits)
leds: lm3532: Fix optional led-max-microamp prop error handling
led: triggers: Fix dereferencing of null pointer
leds: ti-lmu-common: Move static keyword to the front of declaration
leds: lm3532: Move static keyword to the front of declarations
leds: trigger: gpio: GPIO 0 is valid
leds: pwm: Use struct_size() helper
leds: is31fl32xx: Use struct_size() helper
leds: ti-lmu-common: Fix coccinelle issue in TI LMU
leds: lm3532: Avoid potentially unpaired regulator calls
leds: syscon: Use resource managed variant of device register
leds: Replace {devm_}led_classdev_register() macros with inlines
leds: Allow to call led_classdev_unregister() unconditionally
leds: lm3532: Add full scale current configuration
dt: lm3532: Add property for full scale current.
leds: lm3532: Fixes for the driver for stability
leds: lm3532: Change the define for the fs current register
leds: lm3532: Fix brightness control for i2c mode
leds: Switch to use fwnode instead of be stuck with OF one
leds: max77650: Switch to fwnode property API
led: triggers: Fix a memory leak bug
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1/no0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmo9EACFXMbdNmEEUMyRSdOkVLlr7ZlTyQi1tLpB
YESDPxdBfybzpi0qa8JSaysGIfvSkSjmSAqBqrWPmASOSOL6CK4bbA4fTYbgPplk
XeHUdgGiG34oCQUn8Xil5reYaTm7I6LQWnWTpVa5fIhAyUYaGJL+987ykoGmpQmB
Dvf3YSc+8H0RTp9PCMVd6UCGPkZbVlLImGad3PF5ULvTEaE4RCXC2aiAgh0p1l5A
J2CkRZ+/mio3zN2O4YN7VdPGfr1Wo1iZ834xbIGLegv1miHXagFk7jwTcC7zIt5t
oSnJnqIg3iCe7SpWt4Bkzw/zy/2UqaspifbCMgw8vychlViVRUHFO5h85Yboo7kQ
OMLEQPcwjm6dTHv5h1iXF9LW1O7NoiYmmgvApU9uOo1HUrl1X7PZ3JEfUsVHxkOO
T4D5igf0Krsl1eAbiwEUQzy7vFZ8PlRHqrHgK+fkyotzHu1BJR7OQkYygEfGFOB/
EfMxplGDpmibYGuWCwDX2bPAmLV3SPUQENReHrfPJRDt5TD1UkFpVGv/PLLhbr0p
cLYI78DKpDSigBpVMmwq5nTYpnex33eyDTTA8C0sakcsdzdmU5qv30y3wm4nTiep
f6gZo6IMXwRg/rCgVVrd9SKQAr/8wEzVlsDW3qyi2pVT8sHIgm0tFv7paihXGdDV
xsKgmTrQQQ==
=Qt+h
-----END PGP SIGNATURE-----
Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people
- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)
- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.
- REQ_OP_ZONE_RESET_ALL support (Chaitanya)
- Bio merge handling unification (Christoph)
- Pick default elevator correctly for devices with special needs
(Damien)
- Block stats fixes (Hou)
- Timeout and support devices nbd fixes (Mike)
- Series fixing races around elevator switching and device add/remove
(Ming)
- sed-opal cleanups (Revanth)
- Per device weight support for BFQ (Fam)
- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)
- blk-cgroup writeback fixes (Tejun)
- paride queue init fixes (zhengbin)
- blk_set_runtime_active() cleanup (Stanley)
- Block segment mapping optimizations (Bart)
- lightnvm fixes (Hans/Minwoo/YueHaibing)
- Various little fixes and cleanups
* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdf64MAAoJEKurIx+X31iBB20P/07o93sBT92SiA2/ety9sLqV
BGJmEdw7gyb9WVbUip6s71FIEKZw4foCGkqDiX+lr5Fw2A9tiK7LmFgTLi4LLwg+
syhYZ1y5/mwBI4FLlJudKjQdFZjr/n7DNlz4H67woE2kK+FyRsOKEaFUhuR8+0rC
mKJBKtIGnoIOPG06PT1k5qfdpzlreCFoWdIhjO55LfDgZnnDiMaX5h0vcBQ9xgCp
xGV0n/f7+qn4pzB4hGvNV209Sdgv2V4t77bHNvyXlJrM5Hqzafo5MzFgEJv+fRqJ
2RnkWVhwctfbid/2ggf2aAsYnMK3GigEaOCsYW2oWJESVUQhxIi3ndF/Jt9fraZv
ZouD7G/s64P5lUQuCT9JnKGzJrSgxvkd37049AZ4pFVc2MzLC6o6dyyP8pu5ARe8
T0shFik3+gsml2US/vSUzxvrg1saRQjl9E/AJ0RTZ8oyP4FNnFmkJf38qj3a0L0k
ILFYscM5q7WPggoDA/m6F96tLGhdK/sKjDzrADjEh2dIvn4woqoEJSDn+rXuP+Gm
UOj1v8mILZCqvOAmc9IkGCkPUlbrmNV/1FYh5+GWudtillEaD82vjSqm+jnVbfXD
REvHlR/kxCSj1gg/+nk+NFdZCkW3xETOcTZohhDkR7du2mHjTwBMZ2YRPrqoX4c8
VZA57Mrqm5Uk5601qYRl
=L5e+
-----END PGP SIGNATURE-----
Merge tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"The big change here is removal of support for SGI Altix"
* tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
genirq: remove the is_affinity_mask_valid hook
ia64: remove CONFIG_SWIOTLB ifdefs
ia64: remove support for machvecs
ia64: move the screen_info setup to common code
ia64: move the ROOT_DEV setup to common code
ia64: rework iommu probing
ia64: remove the unused sn_coherency_id symbol
ia64: remove the SGI UV simulator support
ia64: remove the zx1 swiotlb machvec
ia64: remove CONFIG_ACPI ifdefs
ia64: remove CONFIG_PCI ifdefs
ia64: remove the hpsim platform
ia64: remove now unused machvec indirections
ia64: remove support for the SGI SN2 platform
drivers: remove the SGI SN2 IOC4 base support
drivers: remove the SGI SN2 IOC3 base support
qla2xxx: remove SGI SN2 support
qla1280: remove SGI SN2 support
misc/sgi-xp: remove SGI SN2 support
char/mspec: remove SGI SN2 support
...
Currently blk_set_runtime_active() is checking if q->dev is null by
itself, thus remove the same checking in its user: scsi_dev_type_resume().
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When device gone, it will check whether it is during reset, if not, it will
send internal task abort. Before internal task abort returned, reset
begins, and it will check whether SAS_PHY_UNUSED is set, if not, it will
call hisi_sas_init_device(), but at that time domain_device may already be
freed or part of it is freed, so it may referenece null pointer in
hisi_sas_init_device(). It may occur as follows:
thread0 thread1
hisi_sas_dev_gone()
check whether in RESET(no)
internal task abort
reset prep
soft_reset
... (part of reset_done)
internal task abort failed
release resource anyway
clear_itct
device->lldd_dev=NULL
hisi_sas_reset_init_all_device
check sas_dev->dev_type is SAS_PHY_UNUSED and
!device
set dev_type SAS_PHY_UNUSED
sas_free_device
hisi_sas_init_device
...
Semaphore hisi_hba.sema is used to sync the processes of device gone and
host reset.
To solve the issue, expand the scope that semaphore protects and let them
never occur together.
And also some places will check whether domain_device is NULL to judge
whether the device is gone. So when device gone, need to clear
sas_dev->sas_device.
Link: https://lore.kernel.org/r/1567774537-20003-14-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add BIST (built in self test) support for phy loopback.
Through the new debugfs interface, the user can configure loopback
mode/linkrate/phy id/code mode before enabling it. And also user can
enable/disable BIST function.
Link: https://lore.kernel.org/r/1567774537-20003-13-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We extract the code of memory allocate and construct an new function for
it. We think it's convenient for subsequent optimization.
Link: https://lore.kernel.org/r/1567774537-20003-12-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some function arguments are unused, so remove them.
Also move the timeout print in for wait_cmds_complete_timeout_vX_hw()
callsites into that same function.
Link: https://lore.kernel.org/r/1567774537-20003-11-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the NCQ tag is only assigned for FPDMA READ and FPDMA WRITE
commands, and for other NCQ commands (such as FPDMA SEND), their NCQ tags
are set in the delivery command to 0.
So for all the NCQ commands, we also need to assign normal NCQ tag for
them, so drop the command type check in hisi_sas_get_ncq_tag() [drop
hisi_sas_get_ncq_tag() altogether actually], and always use the ATA command
NCQ tag when appropriate.
Link: https://lore.kernel.org/r/1567774537-20003-8-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After suspend and resume, the HW registers will be set back to their
initial value. We use init_reg_v3_hw() to set some registers, but some
registers are set via firmware in ACPI "_RST" method, so add reset handler
before init_reg_v3_hw().
Link: https://lore.kernel.org/r/1567774537-20003-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When init device for SAS disks, it will send TMF IO to clear disks. At that
time TMF IO is broken by some operations such as injecting controller reset
from HW RAs event, the TMF IO will be timeout, and at last device will be
gone. Print is as followed:
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] found
...
hisi_sas_v3_hw 0000:74:02.0: controller resetting...
hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: controller reset complete
hisi_sas_v3_hw 0000:74:02.0: abort tmf: TMF task timeout and not done
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] is gone
sas: driver on host 0000:74:02.0 cannot handle device 5000c500a75a860d,
error:5
To improve the reliability, retry TMF IO max of 3 times for SAS disks which
is the same as softreset does.
Link: https://lore.kernel.org/r/1567774537-20003-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
At expander environment, we delay after issue phy reset to wait for
hardware to handle phy reset. But if sas_smp_phy_control() fails, the
delay is unnecessary so remove it.
Link: https://lore.kernel.org/r/1567774537-20003-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
At hisi_sas_debug_I_T_nexus_reset(), we call sas_phy_reset() to reset a
phy. But if the phy is disabled, sas_phy_reset() will directly return
-ENODEV without issue a phy reset request.
If so, We can directly return -ENODEV to libsas before issue a phy
reset.
Link: https://lore.kernel.org/r/1567774537-20003-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When calling sas_phy_reset(), we need to specify whether the reset type
is hard reset or link reset - use true/false for clarity.
Link: https://lore.kernel.org/r/1567774537-20003-3-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The event handler calls scsi_scan_host() when events are missed, which will
hotplug new LUNs. However, this function won't remove any unplugged LUNs.
The result is that hotunplug doesn't work properly when the number of
unplugged LUNs exceeds the event queue size (currently 8).
Scan existing LUNs when events are missed to check if they are still
present. If not, remove them.
Link: https://lore.kernel.org/r/20190905181903.29756-1-mlupfer@ddn.com
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cdb in send_mode_select() is not zeroed and is only partially filled in
rdac_failover_get(), which leads to some random data getting to the
device. Users have reported storage responding to such commands with
INVALID FIELD IN CDB. Code before commit 3278255741 was not affected, as
it called blk_rq_set_block_pc().
Fix this by zeroing out the cdb first.
Identified & fix proposed by HPE.
Fixes: 3278255741 ("scsi_dh_rdac: switch to scsi_execute_req_flags()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20190904155205.1666-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Ales Novak <alnovak@suse.cz>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In fcoe_if_init, if fc_attach_transport(&fcoe_vport_fc_functions)
fails, need to free the previously memory and return fail, otherwise
will trigger null-ptr-deref Read in fc_release_transport.
fcoe_exit
fcoe_if_exit
fc_release_transport(fcoe_vport_scsi_transport)
Link: https://lore.kernel.org/r/1566279789-58207-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130457.24744-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130348.24772-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130256.24704-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use kmemdup rather than duplicating its implementation
Link: https://lore.kernel.org/r/20190831124424.18642-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The UFS_RESET pin on Qualcomm SoCs are controlled by TLMM and exposed
through the GPIO framework. Acquire the device-reset GPIO and use this to
implement the device_reset vops, to allow resetting the attached memory.
Based on downstream support implemented by Subhash Jadavani
<subhashj@codeaurora.org>.
Link: https://lore.kernel.org/r/20190828191756.24312-3-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS memory devices needs their reset line toggled in order to get them
into a good state for initialization. Provide a new vops to allow the
platform driver to implement this operation.
Link: https://lore.kernel.org/r/20190828191756.24312-2-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent patch unconditionally marks the hba as in error as part of
resetting the adapter. The driver flow that called the adapter reset was a
recovery path, which expects the adapter to not be in an error state in
order to finish the recovery. Given the new error state being set, the
recovery fails and the adapter is left in limbo.
Revise the adapter reset routine so that it will only mark the adapter in
error if it was unable to reset the adapter.
Fixes: 8c24a4f643 ("scsi: lpfc: Fix crash due to port reset racing vs adapter error handling")
Link: https://lore.kernel.org/r/20190903215441.10490-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert the remaining %pf users to %ps to prepare for the removal of the
old %pf conversion specifier support.
Fixes: 3235066449 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls")
Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.com
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On fast cable pull, where driver is unable to detect device has disappeared
and came back based on switch info, qla2xxx would not re-login while remote
port has already invalidated the session. This causes IO timeout. This
patch would relogin to remote device for RSCN affected port.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-6-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Login session was stucked on cable pull. When FW is in the middle PRLI
PENDING + driver is in Initiator mode, driver fails to check back with FW to
see if the PRLI has completed. This patch would re-check with FW again to
make sure PRLI would complete before pushing forward with relogin.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-5-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
HINT_MBX_INT_PENDING is not guaranteed to be cleared by firmware. Remove
check that prevent driver load with ISP82XX.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-4-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use adapter specific callback to read flash instead of ISP adapter
specific.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-3-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates log message which indicates number of vectors used by
the driver instead of displaying failure to get maximum requested
vectors. Driver will always request maximum vectors during
initialization. In the event driver is not able to get maximum requested
vectors, it will adjust the allocated vectors. This is normal and does not
imply failure in driver.
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/20190830222402.23688-2-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For commands completing with a resid not aligned on the device logical
sector size, also print the command CDB in addition to the current message
to help debug hardware generating such incorrect command completion
information.
Link: https://lore.kernel.org/r/20190828053511.14818-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pci_alloc_irq_vectors() returns number of vectors allocated. Fix the check
for error condition.
Fixes: cca678dfba ("scsi: fnic: switch to pci_alloc_irq_vectors")
Link: https://lore.kernel.org/r/20190827211340.1095-1-gvaradar@cisco.com
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Just a single lpfc fix adjusting the number of available queues for
high CPU count systems.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXXK/9iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUTCAP9C9a9W
sUBdDpe1bedPFJBBqT3540rucXGlSINXpm20RAEA7C9BkrHk7wFpCmieZscdDG2v
T5o0P6RYDEShcm91HLk=
=lrs3
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Just a single lpfc fix adjusting the number of available queues for
high CPU count systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variable
Using the helper blk_queue_required_elevator_features(), set the
elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request
queue of SCSI ZBC disks.
This feature requirement can always be satisfied as the mq-deadline
elevator is always selected for in-kernel compilation when
CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Port speed printing was added by commit d948e6383e ("scsi: fnic: Add port
speed stat to fnic debug stats"). As currently configured, this will cause
the port speed to be printed to syslog every 2 seconds. To prevent log
spamming, only print the vnic port speed at driver initialization and if
the speed changes. Also clean up a small typo in fnic_trace.c.
Fixes: d948e6383e ("scsi: fnic: Add port speed stat to fnic debug stats")
Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_unsol_compl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:636:30: warning: variable task set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_ofld_cmpl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1125:21: warning: variable port set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_init_seq_cleanup_task:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1468:30: warning: variable orig_task set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_seq_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:932:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:1001:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_process_scsi_cmd_compl:
drivers/scsi/bnx2fc/bnx2fc_io.c:1882:20: warning: variable host set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c: In function bnx2fc_rcv:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:431:26: warning: variable fh set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 8.42.3.0.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a race b/w fipvlan request and response path:
=====
qedf_fcoe_process_vlan_resp:113]:2: VLAN response, vid=0xffd.
qedf_initiate_fipvlan_req:165]:2: vlan = 0x6ffd already set.
qedf_set_vlan_id:139]:2: Setting vlan_id=0ffd prio=3.
======
The request thread sees that vlan is already set and fails to call
ctrl_link_up.
Fix:
- While setting vlan_id use local variable and before setting vlan_id.
- Call fcoe_ctlr_link_up in next iteration of fipvlan request.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The list of rports might become stale so we should rather traverse the
discovery list when trying relogin.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prevent race where we're removing the module and we get link update
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Problem Statement:
- Driver has fc_id of 0xcc0200
- Driver gets link down (due to test) and calls fcoe_ctlr_link_down().
- At this point, the fc_id of the initiator port is zeroed out.
- Driver gets a link up 14 seconds later.
- Driver performs FIP VLAN request, gets a response from the switch.
- No change in VLAN is detected.
- Driver then notifies libfcoe via fcoe_ctlr_link_up().
- Libfcoe then issues a multicast discovery solicitation as expected.
- Cisco FCF responds to that correctly.
- Libfcoe at this point starts a 3 sec count-down to allow any other FCFs
to be discovered. However, at this point, it has been 20 seconds since
the last FKA from the driver (which would have been sent prior to
backlink toggle), which causes the CVL to be issued from Cisco CVL from
the switch is dropped by the driver as the vx_port identification
descriptor is present and has value of 0xcc0200, which does not match
the driver's value of 0. Libfcoe completes the 3 sec count down and
proceeds to issue FLOGI as per protocol. Switch rejects FLogi request.
All subsequent FLOGI requests from libfc are rejected by the switch
(possibly because it is now expecting a new solicitation). This
situation will continue until the next link toggle.
Solution:
The Vx_port descriptor in the CVL has three fields:
MAC address
Fabric ID
Port Name
Today, the code checks for both #1 and #2 above. In the case where we went
through a link down, both these will be zero until FLOGI succeeds.
We should change our code to check if any one of these 3 is valid and if
so, handle the CVL (basically switching from AND to OR). The port name
field is definitely expected to be valid always.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log s_id, d_id, type and command to the log message.
[mkp: fixed warning]
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current code doeesn't support 20Gbps speed for current and supported
speed. Add support for it.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver was wrongly interpreting the supported cap value returned by qed.
Solution: Use QED define macros instead of OS defined for interpreting
supporting speeds.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver was attempting to print cdb[0], which is not set for resets coming
from SCSI ioctls. Check for cmd_len before accessing cmnd.
Crash info:
[84790.864747] BUG: unable to handle kernel NULL pointer dereference at (null)
[84790.864783] IP: qedf_initiate_tmf+0x7a/0x6e0 [qedf]
[84790.865204] Call Trace:
[84790.865246] scsi_try_target_reset+0x2b/0x90 [scsi_mod]
[84790.865266] scsi_ioctl_reset+0x20f/0x2a0 [scsi_mod]
[84790.865284] scsi_ioctl+0x131/0x3a0 [scsi_mod]
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- On some setups fipvlan can be retried for long duration and the
connection to switch was not there so it was not getting any reply.
- During unload this thread was hanging.
Problem Resolution:
Check if unload is in progress, then quit from fipvlan thread.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Print messages during exiting condition to help debugging.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add:
PM8222 VID_9005, DID_028F, SVID_1BD4 and SDID_004F
3101E-4i (1G, no GB) VID_9005, DID_028F, SVID_9005 and SDID_0808
3102E-8i (2G, no GB) VID_9005, DID_028F, SVID_9005 and SDID_0809
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return -EINPROGRESS when a rescan worker is queued.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When each ld is deleted, a rescan event is triggered in the driver. These
can stack up waiting on mutex_lock.
Change to mutex_try_lock and schedule a rescan for later.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return identify physical device "Phys_Bay_in_Box" as bay_identifier.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- serial number
- model
- vendor
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Expose physical devices before logical devices.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The 12.4.0.0 patch that merged WQ/CQ pairs into single per-cpu pair
contained a bug: a local variable was set to the queue pair by index. This
should have allowed the local variable to be natively used. Instead, the
code reused the index relative to the local variable, obtaining a random
pointer value that when used eventually faulted the system
Convert offending code to use local variable.
Fixes: c00f62e6c5 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Raise the config max for lpfc_fcp_mq_threshold variable to 256.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Capturing and downloading dif command data and dif data was done a dozen
years ago and no longer being used. Also creates a potential security hole.
Remove the debugfs buffer for dif debugging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Per Dan Carpenter:
The patch d79c9e9d4b: "scsi: lpfc: Support dynamic unbounded SGL lists on
G7 hardware." from Aug 14, 2019, leads to the following static checker
warning:
drivers/scsi/lpfc/lpfc_init.c:4107 lpfc_new_io_buf()
error: not allocating enough data 784 vs 768
There was no need to compare sizes nor to allocate size based on a define.
Change allocation to use actual structure length
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/ufs/ufs-qcom.c: In function ufs_qcom_pwr_change_notify:
drivers/scsi/ufs/ufs-qcom.c:808:6: warning: variable val set but not used [-Wunused-but-set-variable]
Fixes: 1e1e465c6d ("scsi/ufs: qcom: Remove ufs_qcom_phy_*() calls from host")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a spelling mistake in a ql_log message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c: In function cq_interrupt_v1_hw:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1542:6: warning: variable irq_value set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch provides a module parameter and sysfs interface to select
whether the queue depth for each device should be based on the
protocol-specific value set by the driver (the default) or the maximum
supported by the controller (can_queue).
Although we have a sysfs interface per sdev to change the queue depth
of individual scsi devices, this implementation provides a single
sysfs entry per shost to switch between the controller max and the
driver default.
[mkp: tweaked commit desc]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
According to the firmware documentation a status type 0 IOCB can be
followed by one or more status continuation type 0 IOCBs. Hence do not
complain if the completion function is not called from inside the status
type 0 IOCB handler.
WARNING: CPU: 10 PID: 425 at drivers/scsi/qla2xxx/qla_isr.c:2784
qla2x00_status_entry.isra.7+0x484/0x17b0 [qla2xxx]
CPU: 10 PID: 425 Comm: kworker/10:1 Tainted: G E 5.3.0-rc4-next-20190813-autotest-autotest #1
Workqueue: qla2xxx_wq qla25xx_free_rsp_que [qla2xxx]
Call Trace:
qla2x00_status_entry.isra.7+0x1484/0x17b0 [qla2xxx] (unreliable)
qla24xx_process_response_queue+0x7d8/0xbd0 [qla2xxx]
qla25xx_free_rsp_que+0x1a0/0x220 [qla2xxx]
process_one_work+0x25c/0x520
worker_thread+0x8c/0x5e0
kthread+0x154/0x1a0
ret_from_kernel_thread+0x5c/0x7c
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently bits in hba->outstanding_tasks are cleared only after their
corresponding task management commands are successfully done by
__ufshcd_issue_tm_cmd().
If timeout happens in a task management command, its corresponding bit in
hba->outstanding_tasks will not be cleared until next task management
command with the same tag used successfully finishes.
This is wrong and can lead to some issues, like power issue. For example,
ufshcd_release() and ufshcd_gate_work() will do nothing if
hba->outstanding_tasks is not zero even if both UFS host and devices are
actually idle.
Solution is referred from error handling of device commands: bits in
hba->outstanding_tasks shall be cleared regardless of their execution
results.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pointer fh is being assigned a return value from the call to
skb_transport_header however this value is never read and fh is being
re-assigned immediately afterwards with a new value. Since there are
side-effects from calling skb_transport_header the call is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS devices have issues if LCC is enabled. So we are setting
PA_LOCAL_TX_LCC_Enable to 0 before link startup which will make sure that
both host and device TX LCC are disabled once link startup is completed.
Signed-off-by: Anil Varughese <aniljoy@cadence.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable error is being initialized with a value that is never read and
error is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move ASPM definitions and function prototypes from include/linux/pci-aspm.h
to include/linux/pci.h so users only need to include <linux/pci.h>:
PCIE_LINK_STATE_L0S
PCIE_LINK_STATE_L1
PCIE_LINK_STATE_CLKPM
pci_disable_link_state()
pci_disable_link_state_locked()
pcie_no_aspm()
No functional changes intended.
Link: https://lore.kernel.org/r/20190827095620.11213-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXWEBrCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishds+AQCyQlgV
TzSFQ1zvbAb3SNFdNsCzzb8Aq2vJC+RojF2VFgD/cJfE2fix9E7Nk8PCGwH1sgnf
m5Glsvv8BEmmtoikrb8=
=Ti6g
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
scsi: target: tcmu: avoid use-after-free after command timeout
scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
Hi Linus,
Please, pull the following patches that mark switch cases where we are
expecting to fall through.
- Fix fall-through warnings on arm and mips for multiple
configurations.
Thanks
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl1clmEACgkQRwW0y0cG
2zGqbg/9HPC3Cf3oYq4o0/kV+cfS0ir6iJCz1mspFfbBloaS/EU7A2CF35bDz7k3
XUzl/ci82EQCnuJv/X6ddayUF1S/vFWLnQXRznz07kJspUnNpu7JKgsZr2qsHaRe
CfCj62J/Kuhnke8EUjuWEuga6YXYsYlcevgg/tpVXsTmxrpq2A15tWyut7WEe4JQ
kWPELwYbPsDvTj2siZrgMRBx4gVzQKQVo5TpZiuADeJu9RuFT/64PI9TDQGE7c+X
fFq4ijd1YPj/E+WI7k5VdUbXYiPIIXmkJ4VAPcu5VWmUS7y7bTeye0Jc3uYAxI1r
7rykYhNzniGn3SZL+wq8rHchL3dTLBYhd34HhTlb5xdGFwmbzKgHBqdlGpH8HOo+
CLu8kPYdmnzYCth4md0ENwgBVkj0tweyZuMzCys1qR6RFhOipxWLNGEvIXWZ0Sp8
uNyXnPdCrZTmlwubwY4FOOLsGKW06GnD64cfmEYoCMcmT2j7clbjasWYM4PXQvbt
0dVtt8k4M5LJBLh8qTX7RMZHDQYMiiYiMnLLAXf4wB0VUTqgNuLc4k0PpX3kBYtO
4b0lU/LQH+8811BMNVBHK55StQ8DjM0C2yfQWx610eoohjV70JTyxOWoqeHFL5hq
DIFdLDOgvJCqtyYgJDjmCmH9x6lgfvmxAKq66h9Z7vt25KLUizQ=
=fQZm
-----END PGP SIGNATURE-----
Merge tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more fallthrough fixes from Gustavo A. R. Silva:
"Fix fall-through warnings on arm and mips for multiple configurations"
* tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
video: fbdev: acornfb: Mark expected switch fall-through
scsi: libsas: sas_discover: Mark expected switch fall-through
MIPS: Octeon: Mark expected switch fall-through
power: supply: ab8500_charger: Mark expected switch fall-through
watchdog: wdt285: Mark expected switch fall-through
mtd: sa1100: Mark expected switch fall-through
drm/sun4i: tcon: Mark expected switch fall-through
drm/sun4i: sun6i_mipi_dsi: Mark expected switch fall-through
ARM: riscpc: Mark expected switch fall-through
dmaengine: fsldma: Mark expected switch fall-through
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: mtx1_defconfig mips):
drivers/scsi/libsas/sas_discover.c: In function ‘sas_discover_domain’:
./include/linux/printk.h:309:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/libsas/sas_discover.c:459:3: note: in expansion of macro ‘pr_notice’
pr_notice("ATA device seen but CONFIG_SCSI_SAS_ATA=N so cannot attach\n");
^~~~~~~~~
drivers/scsi/libsas/sas_discover.c:462:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Update lpfc version to 12.4.0.0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, each hardware queue, typically allocated per-cpu, consists of a
WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
WQ/CQ pairs will exist for the hardware queue. Separate queues are
unnecessary. The current implementation wastes memory backing the 2nd set
of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
queues can be supported which means there may not always be enough to have
a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
own WQ/CQ.
Rework the implementation to use a single WQ/CQ pair by both protocols.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
protocol. This allows for the detection of errors and lost frames and
immediate retransmission of data to avoid exchange termination, which
escalates into NVMeoFC connection and association failures. A significant
RAS improvement.
The driver is modified to indicate support for SLER in the NVMe PRLI is
issues and to check for support in the PRLI response. When both sides
support it, the driver will set a bit in the WQE to enable the recovery
behavior on the exchange. The adapter will take care of all detection and
retransmission.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
to contain the exchanges Scatter/Gather List. This caps the number of SGL
elements that can be in the SGL. There are not extensions to extend the
list out of the 2 pages.
The G7 hardware adds a SGE type that allows the SGL to be vectored to a
different scatter/gather list segment. And that segment can contain a SGE
to go to another segment and so on. The initial segment must still be
pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
as it can now be dynamically grown. This much smaller allocation can
handle the SG list for most normal I/O, and the dynamic aspect allows it to
support many MB's if needed.
The implementation creates a pool which contains "segments" and which is
initially sized to hold the initial small segment per xri. If an I/O
requires additional segments, they are allocated from the pool. If the
pool has no more segments, the pool is grown based on what is now
needed. After the I/O completes, the additional segments are returned to
the pool for use by other I/Os. Once allocated, the additional segments are
not released under the assumption of "if needed once, it will be needed
again". Pools are kept on a per-hardware queue basis, which is typically
1:1 per cpu, but may be shared by multiple cpus.
The switch to the smaller initial allocation significantly reduces the
memory footprint of the driver (which only grows if large ios are
issued). Based on the several K of XRIs for the adapter, the 8KB->256B
reduction can conserve 32MBs or more.
It has been observed with per-cpu resource pools that allocating a resource
on CPU A, may be put back on CPU B. While the get routines are distributed
evenly, only a limited subset of CPUs may be handling the put routines.
This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
all the resources are being put on a limited subset of CPUs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added code to support driver loopback with MDS Diagnostics. This style of
diagnostics passes frames from the fabric to the driver who then echo them
back out the link. SEND_FRAME WQEs are used to transmit the frames. Added
the SOF and EOF field location definitions for use by SEND_FRAME.
Also ensure that enable_mds_diags is a RW parameter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To aid better hardware detection when there are issues, report the first
and second level hardware revisions from the READ_REV command. Add the
elements to the existing hardware id string.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to see real addresses, convert %p with %px for kernel addresses
and replace %p with %pf for functions.
While converting, standardize on "x%px" throughout (not %px or 0x%px).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While performing code review, several relatively simple optimizations can
be done in the fast path.
Add these optimizations (unlikely designators).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Running on Coverity produced the following errors:
- coding style (indentation)
- memset size mismatch errors
note: comment cases where it is purposely a mismatch
Fix the errors.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
modinfo for lpfc_nvme_enable_fb is incorrect. FirstBurst on lpfc target is
not fully supported.
Update the attribute description
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is allowing the user to change lpfc_enable_bg while loading the
driver against a FCoE adapter. This is not supported.
No check is made for the adapter type when applying the blockguard
enablement value.
Fix by verifying the adapter type before setting the enablement flag.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
GetTrunkInfo is displaying an incorrect link speed when the link is a trunk
and the link has gone down. The driver is not clearing the logical speed
as part of the link down transition.
Fix by setting the logical speed to UNKNOWN SPEED when the link goes down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Max Frame Size value is shown as 34816 in fdmishow from Switch.
The driver uses bbRcvSize in common service param which is obtained from
the READ_SPARM mailbox command. The bbRcvSize field which is displayed is a
three nibble field but the driver is printing a full four nibbles.
Fix by masking off the upper nibble.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi transport fc bsg interface does not expect the bsg_job_done()
callback to be done if the bsg request call returns failure. Several of the
HST_VENDOR cases in the driver unconditionally call bsg_job_done()
regardless of the returning value.
Fix the code to only call bsg_job_done() if the call to lpfc_bsg_request()
will return success.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When forcing the use of MSI (vs MSI-X) the driver is crashing in
pci_irq_get_affinity.
The driver was not using the new pci_alloc_irq_vectors interface in the MSI
path.
Fix by using pci_alloc_irq_vectors() with PCI_RQ_MSI in the MSI path.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is currently reporting a non-zero nvme sg_seg_cnt value of 256
when nvme is disabled. It should be zero.
Fix by ensuring the value is cleared.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an unsolicited ABTS was received, the driver looks up the exchange it
references. It it does various searches looking for the exchange
context. When one is eventually matched and it is associated with an XRI
context, the driver sends an ABORT WQE to terminate the exchange. Current
code looks at whether the transport had taken action on the XRI yet or not
(no action if set to LPFC_NVMET_STE_RCV; action if non-LPFC_NVMET_STE_RCV).
Based on action or not one of two (sol vs unsol) issue abort routines are
called. The unsol version cheats and transmits a sequence containing an
ABTS with no interaction with the adapter. The sol version issues an Abort
WQE and lets the adapter manage whether the ABTS is sent to not.
The issue is the unsol version is sending ABTS unconditionally for the
exchange that received the ABTS. It's unnecessary.
Remove the conditional and just call the adapter command-based routine to
let the adapter manage the ABTS.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of firmware download, the adapter is reset. On the adapter the
reset causes the function to stop and all outstanding io is terminated
(without responses). The reset path then starts teardown of the adapter,
starting with deregistration of the remote ports with the nvme-fc
transport. The local port is then deregistered and the driver waits for
local port deregistration. This never finishes.
The remote port deregistrations terminated the nvme controllers, causing
them to send aborts for all the outstanding io. The aborts were serviced in
the driver, but stalled due to its state. The nvme layer then stops to
reclaim it's outstanding io before continuing. The io must be returned
before the reset on the controller is deemed complete and the controller
delete performed. The remote port deregistration won't complete until all
the controllers are terminated. And the local port deregistration won't
complete until all controllers and remote ports are terminated. Thus things
hang.
The issue is the reset which stopped the adapter also stopped all the
responses that would drive i/o completions, and the aborts were also
stopped that stopped i/o completions. The driver, when resetting the
adapter like this, needs to be generating the completions as part of the
adapter reset so that I/O complete (in error), and any aborts are not
queued.
Fix by adding flush routines whenever the adapter port has been reset or
discovered in error. The flush routines will generate the completions for
the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
and flushed as well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This issue is specific to SLI-3 adapters, specifically when DIF is used.
Once seen, this message floods the logs:
9064 BLKGRD: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from
dma_map_sg
The driver, upon detecting an error such as too many elements in an sglist,
misrepresents the error by treating it as a temporary resource issue by
returning MLQUEUE_HOST_BUSY. In these cases, no retry will fix it and it
should have been a hard error. The repeated retry was causing the spamming
of the log.
As for the initial reason of why an I/O encountered this issue at all is
not clear as parameters set by the driver should have avoided this. The
dm multipath maintainer has been notified of the issue.
Fix by changing the return code for the dma mapping routines to indicate
cases that are not retryable and return DID_ERROR on those cases.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter encounters a condition which causes the adapter to fail
(driver must detect the failure) simultaneously to a request to the driver
to reset the adapter (such as a host_reset), the reset path will be racing
with the asynchronously-detect adapter failure path. In the failing
situation, one path has started to tear down the adapter data structures
(io_wq's) while the other path has initiated a repeat of the teardown and
is in the lpfc_sli_flush_xxx_rings path and attempting to access the
just-freed data structures.
Fix by the following:
- In cases where an adapter failure is detected, rather than explicitly
calling offline_eratt() to start the teardown, change the adapter state
and let the later calls of posted work to the slowpath thread invoke the
adapter recovery. In essence, this means all requests to reset are
serialized on the slowpath thread.
- Clean up the routine that restarts the adapter. If there is a failure
from brdreset, don't immediately error and leave things in a partial
state. Instead, ensure the adapter state is set and finish the teardown
of structures before returning.
- If in the scsi host reset handler and the board fails to reset and
restart (which can be due to parallel reset/recovery paths), instead of
hard failing and explicitly calling offline_eratt() (which gets into the
redundant path), just fail out and let the asynchronous path resolve the
adapter state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During cable pull testing a deadlock was seen between lpfc_nlp_counters()
vs lpfc_mbox_process_link_up() vs lpfc_work_list_done(). They are all
waiting on the shost->host_lock.
Issue is all of these cases raise irq when taking out the lock but use
spin_unlock_irq() when unlocking. The unlock path is will unconditionally
re-enable interrupts in cases where irq state should be preserved. The
re-enablement allowed the other paths to execute which then causes the
deadlock.
Fix by converting the lock/unlock to irqsave/irqrestore.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a test with high nvme remote port counts connected via a multi-hop FC
switch config where switches were systematically reset (e.g. fabric
partitioning and re-establishment), the nvme remote ports would switch
addresses based on the switch reconfiguration events. The driver would get
into a situation where the nvme port changed address, PLOGI and PRLI would
succeed nvme transport registration occurred, but subsequent LS requests by
the nvme subsystem failed due to a bad ndlp state and connectivity to the
device failed.
The driver hit a race condition on multiple devices that address swapped
simultaneously. In cases where the driver notices the remote port structure
came back as the same value as previously (meaning a nvme_rport structure
was re-enabled and did not go through devloss_tmo/connect_tmo_failures on
all controllers) the driver would unconditionally exit assuming the ndlp
information was correct. But, if the ndlp's had been swapped, the ndlp had
stale port state information, which when used by the LS request commands,
would fail the commands.
Fix by checking whether a node swap had occurred, and only exit if no ndlp
swap had occurred.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In situations where zoning is not being used, thus NVMe initiators see
other NVMe initiators as well as NVMe targets, a link bounce on an
initiator will cause the NVMe initiators to spew "6169" State Error
messages.
The driver is not qualifying whether the remote port is a NVMe targer or
not before calling the lpfc_nvme_rescan_port(), which validates the role
and prints the message if its only an NVMe initiator.
Fix by the following:
- Before calling lpfc_nvme_rescan_port() ensure that the node is a NVMe
storage target or a NVMe discovery controller.
- Clean up implementation of lpfc_nvme_rescan_port. remoteport pointer
will always be NULL if a NVMe initiator only. But, grabbing of
remoteport pointer should be done under lock to coincide with the
registering of the remote port with the fc transport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On an SLI-3 adapter which does not support NVMe, but with the driver global
attribute to enable nvme on any adapter if it does support NVMe
(e.g. module parameter lpfc_enable_fc4_type=3), the SGL and total SGE
values are being munged by the protocol enablement when it shouldn't be.
Correct by changing the location of where the NVME sgl information is being
applied, which will avoid any SLI-3-based adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If admin changes the devloss_tmo on an rport via the fc_remote_port rport
dev_loss_tmo attribute, the value is on set on scsi stack. The change is
not propagated to NVMe.
The set routine in the lldd lacks the call to
nvme_fc_set_remoteport_devloss() to set the value.
Fix by adding the call to the lldd set routine.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In tests with remote ports contantly logging out/logging coupled with
occassional local link bounce, if a remote port is disocnnected for longer
than devloss_tmo and then subsequently reconnected, eventually the test
will fail to login with the remote port and remote port connectivity is
lost.
When devloss_tmo expires, the driver does not free the node struct until
the port or npiv instances is being deleted. The node is left allocated but
the state set to UNUSED. If the node was in the process of logging in when
the local link drop occurred, meaning the RPI was allocated for the node in
order to send the ELS, but not yet registered which comes after successful
login, the node is moved to the NPR state, and if devloss expires, to
UNUSED state. If the remote port comes back, the node associated with it
is restarted and this path happens to allocate a new RPI and overwrites the
prior RPI value. In the cases where the port was logged in and loggs out,
the path did release the RPI but did not set the node rpi value. In the
cases where the remote port never finished logging in, the path never did
the call to release the rpi. In this latter case, when the node is
subsequently restore, the new rpi allocation overwrites the rpi that was
not released, and the rpi is now leaked. Eventually the port will run out
of RPI resources to log into new remote ports.
Fix by following changes:
- When an rpi is released, do so under locks and ensure the node rpi value
is set to a non-allocated value (LPFC_RPI_ALLOC_ERROR). Note:
refactored to a small service routine to avoid indentation issues.
- When re-enabling a node, check the rpi value to determine if a new
allocation is necessary. If already set, use the prior rpi.
Enhanced logging to help in the future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a remote port is removed and remains removed for devloss_tmo, if an RSCN
is subsequently received indicating the presence of the remte port, the
driver does not login to and rediscovery the remote port.
Currently, in order to for a port to be rediscovered post an RSCN, the node
state must be NPR to reflect not logged in. When devloss expires, the node
state is marked UNUSED. When an RSCN occurs, the nodes referenced by the
RSCN will have a NPR_2B_DISC flag set, but the re-login will only be
attempted if the node is in NPR_NODE state. Thus the node is skipped over.
Fix by recognizing the NPR_2B_DISC and UNUSED and transition the node back
to NPR state to allow the re-login to take place.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an admin updates lpfc's devloss_tmo sysfs attribute, the kernel will
oops.
Coding of a loop allowed a new value (rport) to be set/checked for null
followed by an older value (remoteport) checked for null to allow progress
where the new value, even though null, will be referenced.
Rework the logic to validate and prevent any reference to the null ptr.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's possible for the driver to initiate an FLOGI and before it completes,
another link down/up transition occurs requiring a new FLOGI. Currently,
nothing is done to abort/noop the older FLOGI request to the adapter, so if
this transition occurs and the FLOGI completion is received after the link
down/up transition, the driver may erroneously act on the older FLOGI. In
most cases, the adapter properly terminates/fails the FLOGI, but there is a
timing condition where the FLOGI may complete on the wire prior to the
transition, but the response may not be seen/processed by the driver before
the driver sees the link transition.
Fix by having the link down handler in the driver run through any
outstanding ELS's and change the completion handler of the ELS so that it
will be no-op'd and released.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When tearing down the adapter for a reset, online/offline, or driver
unload, the queue free routine would hit a GPF oops. This only occurs on
conditions where the number of hardware queues created is fewer than the
number of cpus in the system. In this condition cpus share a hardware
queue. And of course, it's the 2nd cpu that shares a hardware that
attempted to free it a second time and hit the oops.
Fix by reworking the cpu to hardware queue mapping such that:
Assignment of hardware queues to cpus occur in two passes:
first pass: is first time assignment of a hardware queue to a cpu.
This will set the LPFC_CPU_FIRST_IRQ flag for the cpu.
second pass: for cpus that did not get a hardware queue they will
be assigned one from a primary cpu (one set in first pass).
Deletion of hardware queues is driven by cpu itteration, and queues
will only be deleted if the LPFC_CPU_FIRST_IRQ flag is set.
Also contains a few small cleanup fixes and a little better logging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The adapter reset path (lpfc_sli_hba_down) is taking/releasing a lock with
irq. But, the path is already under the hbalock which raised irq so it's
unnecessary.
Convert to simple lock/unlock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
lpfc_nvme_register_port hit a null prev_ndlp pointer in a test with lots of
target ports swapping addresses. The oldport value was stale, thus it's
ndlp (prev_ndlp set to it) was used.
Fix by moving oldrport pointer checks, and if used prev_ndlp pointer
assignment, to be done while the lock is held.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is inadvertently trying to issue an INIT_VPI mailbox command on
an SLI-3 driver. The command is specific to SLI-4. When the call is made to
send the command, if on an SLI-3 adapter, an array pointer is NULL and the
driver will oops.
Fix by restricting the command to SLI-4 adapters only.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a target issues an ADISC to the port and the target is a NVME target,
the driver is inadvertantly invalidating the login and marking the remote
port as logged out. Communication with the target is lost.
Revise the ADISC check so that FCP or NVME targets will be marked valid at
the end of ADISC processing. Enhance logging to recognize condition
better.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some remote ports may be slow in registering their GID_FT protocol
information with the fabric. If the remote port is an initiator, it may
send PLOGI to the port before the GID_FT logic is complete. Meaning, after
accepting the PLOGI, when the driver may see no response to the GID_FT that
is issued after the login to determine the protocols supported so that
proper PRLI's may be transmit. If the driver has no fc4 information, it
currently stops and the remote port is not discovered.
Fix by issuing a LOGO when there is no GID_FT information. The LOGO
completion handling will attempt to re-login if the nport_id is still
present.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In cases of remote-port-side cable pull/replug, there happens to be a
target that upon replug will send the port a PLOGI, a PRLI, and a LOGO.
When this sequence is received by the driver, the PLOGI accepted and a
GFT_ID is issued to find the protocol support for the remote port. While
the GFT_ID is outstanding, a LOGO is received. The driver logs the remote
port out and unregisters the RPI and schedules a new PLOGI transmission.
However, the GFT_ID was not terminated. When it completed, the driver
attempted to transition the remote port to PRLI transmission, which cancels
the PLOGI scheduling. The PRLI transmit attempt is rejected by the adapter
as the remote port is not logged in. No retry is attempted as it's expected
the logout is noted and the supposedly scheduled PLOGI should address the
state. As there is no PLOGI, the remote port does not get re-discovered.
Fix by aborting the outstanding GFT_ID if the related remote port is logged
out.
Ensure a PRLI transmit attempt only occurs if the remote port is logging
in. This avoids the incorrect attempt while logged out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter is reset while there are outstanding ELS's, subsequent
reinitialization of the adapter will fail as it has not recovered all of
the io contexts relative to the ELS's.
If an ELS timed out or otherwise failed and an the ELS was attempted to be
aborted (which changes the ELS completion context), in causes where the
driver generates completions for the outstanding IO as the adapter would
not due to being reset, the driver released only the ELS context and failed
to release the abort context. When the adapter went to reinit, as it had
not received all of the contexts, it failed to reinit.
Fix by having the ELS completion handler identify the driver-generated
completion status and release the abort context.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unusually high IO latency can be observed with little IO in progress. The
latency may remain high regardless of amount of IO and can only be cleared
by forcing lpfc_fcp_imax values to non-zero and then back to zero.
The driver's eq_delay mechanism that scales the interrupt coalescing based
on io completion load failed to reduce or turn off coalescing when load
decreased. Specifically, if no io completed on a cpu within an eq_delay
polling window, the eq delay processing was skipped and no change was made
to the coalescing values. This left the coalescing values set when they
were no longer applicable.
Fix by always clearing the percpu counters for each time period and always
run the eq_delay calculations if an eq has a non-zero coalescing value.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a timer routine uses workqueues, it could fire before the workqueue is
allocated.
Fix by allocating the workqueue before the timer routines are setup
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After seeing some interoperability issues with ADISC, it was determined the
ELS definitions in lpfc were using types that allowed the compiler to add
pad to the structure, causing the structure to no longer be per spec. The
offending structures are ADISC, FAN, and RNID.
This patch implements the simple fix of eliminating the pad by forcing the
compiler to pack the structure. Care was taken to ensure field accesses
won't be by operations that would hit a bad field alignment.
The better solution would be to convert to the uapi fc header definitions,
but the number of changes required to do is rather intrusive so this course
of action was deferred.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When connected to a high number of remote ports, the driver is encountering
PLOGI errors. The errors are due to adapter detected failures indicating
illegal field values.
Turns out the driver was prematurely clearing an RPI bitmask before waiting
for an UNREG_RPI mailbox completion. This allowed the RPI to be reused
before it was actually available.
Fix by clearing RPI bitmask only after UNREG_RPI mailbox completion.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>