Commit Graph

147 Commits

Author SHA1 Message Date
Helge Deller
2f4843b172 scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
The mptscsih_remove() function triggers a kernel oops if the Scsi_Host
pointer (ioc->sh) is NULL, as can be seen in this syslog:

 ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
 Begin: Waiting for root file system ...
 scsi host2: error handler thread failed to spawn, error = -4
 mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
 Backtrace:
  [<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
  [<0000000040946470>] pci_device_probe+0x1ac/0x2d8
  [<0000000040add668>] really_probe+0x1bc/0x988
  [<0000000040ade704>] driver_probe_device+0x160/0x218
  [<0000000040adee24>] device_driver_attach+0x160/0x188
  [<0000000040adef90>] __driver_attach+0x144/0x320
  [<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
  [<0000000040adc138>] driver_attach+0x4c/0x80
  [<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
  [<0000000040ae0130>] driver_register+0xf4/0x298
  [<00000000409450c4>] __pci_register_driver+0x78/0xa8
  [<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]

This patch adds the necessary NULL-pointer checks.  Successfully tested on
a HP C8000 parisc workstation with buggy SCSI drives.

Link: https://lore.kernel.org/r/20201022090005.GA9000@ls3530.fritz.box
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-26 16:57:18 -04:00
Linus Torvalds
55e0500eb5 SCSI misc on 20201013
This series consists of the usual driver updates (ufs, qla2xxx, tcmu,
 ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug
 fixes.  There are only three core changes: adding sense codes,
 cleaning up noretry and adding an option for limitless retries.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX4YulyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaZDAQCT7rwG
 UEZYHgYkU9EX9ERVBQM0SW4mLrxf3g3P5ioJsAEAtkclCM4QsIOP+MIPjIa0EyUY
 khu0kcrmeFR2YwA8zhw=
 =4w4S
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
  hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.

  There are only three core changes: adding sense codes, cleaning up
  noretry and adding an option for limitless retries"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
  scsi: hisi_sas: Recover PHY state according to the status before reset
  scsi: hisi_sas: Filter out new PHY up events during suspend
  scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
  scsi: hisi_sas: Add check for methods _PS0 and _PR0
  scsi: hisi_sas: Add controller runtime PM support for v3 hw
  scsi: hisi_sas: Switch to new framework to support suspend and resume
  scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
  scsi: qedf: Remove redundant assignment to variable 'rc'
  scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
  scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
  scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
  scsi: sun_esp: Use module_platform_driver to simplify the code
  scsi: sun3x_esp: Use module_platform_driver to simplify the code
  scsi: sni_53c710: Use module_platform_driver to simplify the code
  scsi: qlogicpti: Use module_platform_driver to simplify the code
  scsi: mac_esp: Use module_platform_driver to simplify the code
  scsi: jazz_esp: Use module_platform_driver to simplify the code
  scsi: mvumi: Fix error return in mvumi_io_attach()
  scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
  scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
  ...
2020-10-14 15:15:35 -07:00
Jason Yan
bef7afbf3b scsi: mptscsih: Remove set but not used 'timeleft'
This addresses the following gcc warning with "make W=1":

drivers/message/fusion/mptscsih.c: In function ‘mptscsih_IssueTaskMgmt’:
drivers/message/fusion/mptscsih.c:1519:17: warning: variable ‘timeleft’
set but not used [-Wunused-but-set-variable]
 1519 |  unsigned long  timeleft;
      |                 ^~~~~~~~

Link: https://lore.kernel.org/r/20200827125925.428357-1-yanaijie@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-01 22:16:46 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Tomas Henzl
afe89f115e scsi: mptscsih: Fix read sense data size
The sense data buffer in sense_buf_pool is allocated with size of
MPT_SENSE_BUFFER_ALLOC(64) (multiplied by req_depth) while SNS_LEN(sc)(96)
is used when reading the data.  That may lead to a read from unallocated
area, sometimes from another (unallocated) page.  To fix this, limit the
read size to MPT_SENSE_BUFFER_ALLOC.

Link: https://lore.kernel.org/r/20200616150446.4840-1-thenzl@redhat.com
Co-developed-by: Stanislav Saner <ssaner@redhat.com>
Signed-off-by: Stanislav Saner <ssaner@redhat.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-06-24 00:23:17 -04:00
Gustavo A. R. Silva
c2b9975080 scsi: mptscsih: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

drivers/message/fusion/mptscsih.c: In function  mptscsih_io_done :
drivers/message/fusion/mptscsih.c:741:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if ( ioc->bus_type == SAS ) {
       ^
drivers/message/fusion/mptscsih.c:790:3: note: here
   case MPI_IOCSTATUS_SCSI_TASK_TERMINATED: /* 0x0048 */
   ^~~~
drivers/message/fusion/mptscsih.c:884:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
    scsi_set_resid(sc, 0);
    ^~~~~~~~~~~~~~~~~~~~~
drivers/message/fusion/mptscsih.c:885:3: note: here
   case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR: /* 0x0040 */
   ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-03 23:45:59 -04:00
Colin Ian King
244830a0dc scsi: mptfusion: fix indentation issues
There are several statements and code blocks there are incorrectly
indented. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-19 17:17:08 -04:00
Hannes Reinecke
cfd2aff711 scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host()
Commit c5ce0abeb6 ("scsi: sas: move scsi_remove_host call...")  moved
the call to scsi_remove_host() into sas_remove_host(), but forgot to
modify the mpt drivers.

Fixes: c5ce0abeb6 ("scsi: sas: move scsi_remove_host call into sas_remove_host")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-24 18:21:17 -04:00
Christoph Hellwig
4e6f767dba scsi: mptscsih: Remove bogus interpretation of request->ioprio
Having an I/O priority does not mean we should send all requests as HEAD
OF QUEUE tags.

Reported-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-08 17:29:48 -05:00
Christoph Hellwig
db5ed4dfd5 scsi: drop reason argument from ->change_queue_depth
Drop the now unused reason argument from the ->change_queue_depth method.
Also add a return value to scsi_adjust_queue_depth, and rename it to
scsi_change_queue_depth now that it can be used as the default
->change_queue_depth implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 14:45:27 +01:00
Christoph Hellwig
c8b09f6fb6 scsi: don't set tagging state from scsi_adjust_queue_depth
Remove the tagged argument from scsi_adjust_queue_depth, and just let it
handle the queue depth.  For most drivers those two are fairly separate,
given that most modern drivers don't care about the SCSI "tagged" status
of a command at all, and many old drivers allow queuing of multiple
untagged commands in the driver.

Instead we start out with the ->simple_tags flag set before calling
->slave_configure, which is how all drivers actually looking at
->simple_tags except for one worke anyway.  The one other case looks
broken, but I've kept the behavior as-is for now.

Except for that we only change ->simple_tags from the ->change_queue_type,
and when rejecting a tag message in a single driver, so keeping this
churn out of scsi_adjust_queue_depth is a clear win.

Now that the usage of scsi_adjust_queue_depth is more obvious we can
also remove all the trivial instances in ->slave_alloc or ->slave_configure
that just set it to the cmd_per_lun default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2014-11-12 11:19:43 +01:00
Christoph Hellwig
8f88dc4192 mptfusion: don't change queue type in ->change_queue_depth
This function shouldn't change the queue type, just the depth.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:19:42 +01:00
Christoph Hellwig
609aa22f3b scsi: remove ordered_tags scsi_device field
Remove the ordered_tags field, we haven't been issuing ordered tags based
on it since the big barrier rework in 2010.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2014-11-12 11:19:40 +01:00
Joe Lawrence
9f21316fc2 mptfusion: tweak null pointer checks
Fixes the following smatch warnings:

  drivers/message/fusion/mptbase.c:652 mptbase_reply() warn: variable
    dereferenced before check 'reply' (see line 639)

      [JL: No-brainer, the enclosing switch statement dereferences
       reply, so we can't get here unless reply is valid.]

  drivers/message/fusion/mptsas.c:1255 mptsas_taskmgmt_complete() error:
    we previously assumed 'pScsiTmReply' could be null (see line 1227)

      [HCH: Reading the code in mptsas_taskmgmt_complete it's pretty
       obvious that it can't do anything useful if mr/pScsiTmReply are
       NULL, so I suspect it would be best to just return at the
       beginning of the function.

       I'd love to understand if it actually could ever be zero, which I
       doubt.  Maybe the LSI people can shed some light on that?]

  drivers/message/fusion/mptsas.c:3888 mptsas_not_responding_devices()
    error: we previously assumed 'port_info->phy_info' could be null
    (see line 3875)

      [HCH: It's pretty obvious from reading mptsas_sas_io_unit_pg0 that
       we never register a port_info with a NULL phy_info in the lists,
       so all NULL checks on it could be deleted.]

  drivers/message/fusion/mptscsih.c:1284 mptscsih_info() error:
    we previously assumed 'h' could be null (see line 1274)

      [HCH: shost_priv can't return NULL, so the if (h) should be
       removed.]

  drivers/message/fusion/mptscsih.c:1388 mptscsih_qcmd() error: we
    previously assumed 'vdevice' could be null (see line 1373)

      [HCH: vdevice can't ever be NULL here, it's allocated in
       ->slave_alloc and thus guaranteed to be around when
       ->queuecommand is called.]

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-25 17:16:58 -04:00
Hannes Reinecke
9cb78c16f5 scsi: use 64-bit LUNs
The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.

So update the linux SCSI stack to use 64-bit LUN numbers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:37 +02:00
Matthew Wilcox
a48ac9e5d4 fusion: Remove use of DEF_SCSI_QCMD
Removing the host_lock from the I/O submission path gives a huge
scalability improvement.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Praveen Krishnamoorthy <Praveen.krishnamoorthy@lsi.com>
Acked-by: Sreekanth Reddy <Sreekanth.reddy@lsi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-05-28 18:13:25 +02:00
Al Viro
cac197031c fusion: switch to ->show_info()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:18 -04:00
Alan Cox
3012d60b39 drivers/message/fusion/mptscsih.c: missing break
This happens to do the right thing in all cases on fibre channel but not on
other media types

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Cc: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 15:02:12 -08:00
nagalakshmi.nandigama@lsi.com
513382c9a2 [SCSI] mptfusion: Added check for SILI bit in READ_16 CDB for DATA UNDERRUN ERRATA
When READ_16 command is issued, the setting of SILI Bit in CDB is confirmed
and if SILI bit is off, the processing of relavent Errata is executed.

Added code for checking SILI bit for READ_16 as described in "SSC-4".

Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-09-22 15:15:08 +04:00
kashyap.desai@lsi.com
98cbe371fd [SCSI] mptfusion: Fix for device offline while doing aggressive HBA reset
[Resend patch as per Bernd Schubert comment ]

Issue:

Device goes offline while doing aggressive HBA reset
along with IO using some utility.

Root cause:

FW goes into bad state due to aggressive reset. Softreset does not
help to recover FW. And also aggressive reset open up the window for
Error handling thread to kicked off at the same time HBA will be in
constant RESET loop as part of aggressive reset test case can lead
Device to goes offline.

Changes:

1. Added extra check as below inside eh_timed_out call back as below.
   if(ioc->ioc_reset_in_progress) Rc = EH_TIMER_RESET

2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
   management context.  Since SAS controller uses high priority queue
   for task management. This check is not required for SAS controller.

3. Moved SoftReset call to HardReset from Task Mgmt context.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:53 -06:00
kashyap.desai@lsi.com
e62cca19a9 [SCSI] mptfusion: Better handling of DEAD IOC PCI-E Link down error condition
Find Non-Operation IOC and remove it from OS: Detecting
dead(non-functional) ioc will be done reading doorbell register value
from fault reset thread, which has been called from work thread
context after each specific interval. If doorbell value is 0xFFFFFFFF,
it will be considered as IOC is non-operational and marked as dead
ioc.

Once Dead IOC has been detected, it will be removed at pci layer using
"pci_remove_bus_device" API.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:16 -06:00
Kashyap, Desai
e466e1c6e1 [SCSI] mptfusion : Added check for SILI bit in READ_6 CDB for DATA UNDERRUN ERRATA
When READ_6 command is issued, the setting of SILI Bit in CDB is confirmed and
if SILI bit is off, the processing of relavent Errata is executed.

Earlier we did not have check for SILI bit in READ_6 CDB.
As described in "ssc-r22", Now Driver is checking SILI bit for READ_6.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-27 14:20:02 +04:00
Kashyap, Desai
56cee8d577 [SCSI] mptfusion: Remove debug print from mptscsih_qcmd()
Remove debug print from mptscsih_qcmd function call.
This debug print cause flood of prints and difficult to debug other issues.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-05-01 11:55:27 -05:00
Christoph Hellwig
aeaeb5cec5 [SCSI] fusion: do not check serial_number in the abort handler
The SCSI midlayer stops all command processing when in error handling, which
means there is no chance for command reuse when the abort handler is called.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-05-01 10:23:54 -05:00
Kashyap, Desai
bcfe42e980 [SCSI] mptfusion: Fix Incorrect return value in mptscsih_dev_reset
There's a branch at the end of this function that
is supposed to normalize the return value with what
the mid-layer expects. In this one case, we get it wrong.

Also increase the verbosity of the INFO level printk
at the end of mptscsih_abort to include the actual return value
and the scmd->serial_number. The reason being success
or failure is actually determined by the state of
the internal tag list when a TMF is issued, and not the
return value of the TMF cmd. The serial_number is also
used in this decision, thus it's useful to know for debugging
purposes.

Cc: stable@kernel.org
Reported-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 12:51:08 -06:00
Kashyap, Desai
c9de7dc483 [SCSI] mptfusion: Block Error handling for deleting devices or Device in DMD
Issue description:
In multipath topology, when device deletion is in transient state,
multipath driver can call blk_flush_queue() as part of path failure.
Before device get deleted from OS, Device may go OFFLINE as part of error
handling kicked off triggered from multipathing driver. Above condition hits
more frequently if device missing delay timer (which is LSI specific firmware
parameter) is non zero value.

root cause of this issue is Error handling thread is getting kicked off for
device which is not really present(in transient state of deleting).

This patch has solution for this issue. driver is now using eh_timed_out
callback. See below.

mptsas_transport_template->eh_timed_out = mptsas_eh_timed_out

Using mptsas_eh_timed_out function, driver can decide weather vdevice is
under Device missing delay or deleting state.

for either of those cases, there is BLK_EH_RESET_TIMER return to scsi mid
and error handling thread will not be kicked off for that particular scsi
command.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-28 09:07:46 -05:00
Kei Tokunaga
97009a29e8 [SCSI] mptfusion: print Doorbell register in a case of hard reset and timeout
Printing Doorbell register in a case of hard reset and timeout
should be useful for figuring out the state of the system.

Signed-off-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-27 12:03:52 -05:00
Kashyap, Desai
b68bf096d4 [SCSI] mptfusion: schedule_target_reset from all Reset context
Issue:
target reset will be queued to driver's internal queue to get schedule
later. When driver add target into internal target_reset queue we will block IOs
on those target using scsi midlayer API. Now due to some cause driver is not
executing those target_reset list and it is always in block state.

Changes:
now we are clearing target_reset queue from all other Callback context
instead of only DeviceReset context.Now wherever driver is clearing
taskmgmt_in_progress flag it is considering target_reset queue cleanup
also.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-27 12:02:34 -05:00
Kashyap, Desai
4d0695664e [SCSI] mptfusion: Use DID_TRANSPORT_DISRUPTED instead of DID_BUS_BUSY
Changed the return value for Nexus Loss IOs to be DID_TRANSPORT_DISRUPTED.
What this will allow is the multi-path driver to delay the fail over
process. They would like the path to keep up as long as the nexus loss
Loginfo is return from firmware. With DID_BUS_BUSY the path fails over
immediately.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-27 12:02:28 -05:00
Ryan Kuester
2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
pass-through commands, in particular by smartctl.

First, my version of the symptoms.  On an LSI SAS1068E B3 HBA running
01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
occasional task, bus, and host resets, some of which lead to hard faults of
the HBA requiring a reboot.  Abusively looping the smartctl command,

    # while true; do smartctl -a /dev/sdb > /dev/null; done

dramatically increases the frequency of these failures to nearly one per
minute.  A high IO load through the HBA while looping smartctl seems to
improve the chance of a full scsi host reset or a non-recoverable hang.

I reduced what smartctl was doing down to a simple test case which
causes the hang with a single IO when pointed at the sd interface.  See
the code at the bottom of this e-mail.  It uses an SG_IO ioctl to issue
a single pass-through ATA identify device command.  If the buffer
userspace gives for the read data has certain alignments, the task is
issued to the HBA but the HBA fails to respond.  If run against the sg
interface, neither the test code nor smartctl causes a hang.

sd and sg handle the SG_IO ioctl slightly differently.  Unless you
specifically set a flag to do direct IO, sg passes a buffer of its own,
which is page-aligned, to the block layer and later copies the result
into the userspace buffer regardless of its alignment.  sd, on the other
hand, always does direct IO unless the userspace buffer fails an
alignment test at block/blk-map.c line 57, in which case a page-aligned
buffer is created and used for the transfer.

The alignment test currently checks for word-alignment, the default
setup by scsi_lib.c; therefore, userspace buffers of almost any
alignment are given directly to the HBA as DMA targets.  The LSI 1068
hardware doesn't seem to like at least a couple of the alignments which
cross a page boundary (see the test code below).  Curiously, many
page-boundary-crossing alignments do work just fine.

So, either the hardware has an bug handling certain alignments or the
hardware has a stricter alignment requirement than the driver is
advertising.  If stricter alignment is required, then in no case should
misaligned buffers from userspace be allowed through without being
bounced or at least causing an error to be returned.

It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement.  If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57).  The following
patch to 2.6.34-rc5 makes my symptoms go away.  I'm sure this is the wrong
place for this code, but it gets my idea across.

Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-27 12:01:01 -05:00
Ben Hutchings
9b8f77a184 fusion: fix kernel-doc notation
The function name must be followed by a space, hypen, space, and a
short description.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-24 07:31:20 -07:00
Kashyap, Desai
69b2e9b443 [SCSI] mptfusion: Task abort is not supported for Volumes
1) corrected return value as SUCCESS instead of 0.
2) Added check in mptscsih_abort.
mptfusion do not support task abort for Volumes.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 09:24:11 -05:00
Kashyap, Desai
08f5c5c23d [SCSI] mptfusion: sanity check for vdevice pointer is added
Added sanity checks before accessing vdevice and added vdevice->deleted
setting for mptfc.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 09:24:10 -05:00
Kashyap, Desai
48959f1eae [SCSI] mptfusion: mpt_detach is called properly at the time of rmmod
Current design of mptsas is as follow.
MPTSAS will do probe() if pci id matches for available card in
system, irrespective of mode of controller. If controller is I/T mode
or I mode, things are fine. If controller is only in T mode, mptsas is
not doing complete process of mptsas_probe(). It will only make
sure IOC structure is created and IOC reference is available for
mptstm driver. Now While removing module we should take care
case of Target mode only mptsas. If we are removing IOC which is
only in Target mode, We should only detach IOC instead of
following rest of the cleanup process which is only required for T
mode controller. Now For T mode controller, only part clean up is
done instead of complete cleanup. mpt_detach will call early in case
of Target mode only controller.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 09:24:07 -05:00
Kashyap, Desai
d0f698c461 [SCSI] mptfusion: Added new less expensive RESET (Message Unit Reset)
Message Unit Reset - instructs the IOC to reset the Reply Post and
Free FIFO's. All the Message Frames on Reply Free FIFO are
discarded. All posted buffers are freed, and event notification is
turned off.  IOC doesnt reply to any outstanding request. This will
transfer IOC to READY state.  Message unit ready is less expensive
operations than Hard Reset.  soft reset will not force Firmware to
reload again, it only do clean up of Message units.

mpt_Soft_Hard_ResetHandler will first try for Soft Reset,if
it fails then go for big hammer reset which is Hard Reset.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 09:24:04 -05:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Linus Torvalds
654451748b Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (158 commits)
  [SCSI] Fix printing of failed 32-byte commands
  [SCSI] Fix printing of variable length commands
  [SCSI] libsrp: fix bug in ADDITIONAL CDB LENGTH interpretation
  [SCSI] scsi_dh_alua: Add IBM Power Virtual SCSI ALUA device to dev list
  [SCSI] scsi_dh_alua: add netapp to dev list
  [SCSI] qla2xxx: Update version number to 8.03.02-k1.
  [SCSI] qla2xxx: EEH: Restore PCI saved state during pci slot reset.
  [SCSI] qla2xxx: Add firmware ETS burst support.
  [SCSI] qla2xxx: Correct loop-resync issues during SNS scans.
  [SCSI] qla2xxx: Correct use-after-free issue in terminate_rport_io callback.
  [SCSI] qla2xxx: Correct EH bus-reset handling.
  [SCSI] qla2xxx: Proper clean-up of BSG requests when request times out.
  [SCSI] qla2xxx: Initialize payload receive length in failure path of vendor commands
  [SCSI] fix duplicate removal on error path in scsi_sysfs_add_sdev
  [SCSI] fix refcounting bug in scsi_get_host_dev
  [SCSI] fix memory leak in scsi_report_lun_scan
  [SCSI] lpfc: correct PPC build failure
  [SCSI] raid_class: add raid1e
  [SCSI] mpt2sas: Do not call sas_is_tlr_enabled for RAID volumes.
  [SCSI] zfcp: Introduce header file for qdio structs and inline functions
  ...
2010-02-26 16:55:27 -08:00
Kashyap, Desai
9858ae3801 [SCSI] mptfusion : mptscsih_abort return value should be SUCCESS instead of value 0.
retval should be SUCCESS/FAILED which is defined at scsi.h
retval = 0 is directing wrong return value. It must be retval = SUCCESS.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-02-08 13:40:17 -06:00
Kashyap, Desai
65f89c2396 [SCSI] mptfusion: Added MPI_SCSIIO_CONTROL_HEADOFQ priority
There is a 'ioprio' field in the BIO and the Request structure.
check this priority field and set MPI_SCSIIO_CONTROL_HEADOFQ
to pass down I/O priority.
An enhancement to the LSI Disk Array Controller firmware is being
developed to look at the Head Of Queue bit to allow I/Os with the HOQ bit
set to be processed before I/Os which do not have the HOQ bit set.
In order to set the HOQ bit, the mpt fusion driver  needs to look at the
'ioprio' field in the request structure associated with the scsi command.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-01-18 10:48:10 -06:00
Mike Christie
e881a172da [SCSI] modify change_queue_depth to take in reason why it is being called
This patch modifies scsi_host_template->change_queue_depth so that
it takes an argument indicating why it is being called. This will be
used so that if a LLD needs to do some extra processing when
handling queue fulls or later ramp ups, it can do so.

This is a simple port of the drivers setting a change_queue_depth
callback. In the patch I just have these LLDs adjust the queue depth
if the user was requesting it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

[Vasu.Dev: v2
	Also converted pmcraid_change_queue_depth and then verified
all modules compile  using "make allmodconfig" for any new build
warnings on X86_64.

	Updated original description after combing two original
patches from Mike to make this patch git bisectable.]
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
[jejb: fixed up 53c700]
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04 12:00:41 -06:00
Kashyap, Desai
9b53b39243 [SCSI] mptspi: Fix for incorrect data underrun errata
Errata:
Certain conditions on the scsi bus may casue the 53C1030 to incorrectly signal
a SCSI_DATA_UNDERRUN to the host.

Workaround 1:
For an Errata on LSI53C1030 When the length of request data
and transfer data are different with result of command (READ or VERIFY),
DID_SOFT_ERROR is set.

Workaround 2:
For potential trouble on LSI53C1030. It is checked whether the length of
request data is equal to the length of transfer and residual.
MEDIUM_ERROR is set by incorrect data.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-29 13:03:24 -04:00
Kashyap, Desai
fea984034b [SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
Driver is modified to return DID_NO_CONNECT for all pending I/O
requests for bus type SAS, if it founds the target is removed at
the firmware level.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:28 -05:00
Kashyap, Desai
79a3ec1ace [SCSI] mptsas : set max_id to infinite value.
Do not set max_id value received from FW. Once SAS transport layer is
introduced max_id value is missleading to SCSI mid layer. Use max_id to
infinite value.

logic of can queue of scsi host is changed.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:26 -05:00
Kashyap, Desai
d23321b488 [SCSI] mptsas : Handle INSUFFICIENT resources status as similar to IOC BUSY status
Handle insufficient resources status as similar to busy status.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:25 -05:00
Kashyap, Desai
a247fa4521 [SCSI] mptsas : Removed mptscsih_timer_expired.
Removed mptscsih_timer_expired. This timer is no more use.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:24 -05:00
Randy Dunlap
9cf46a35d2 fusion: fix recent kernel-doc problems
Fix recent fusion driver kernel-doc fatal error and warnings.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric.Moore@lsi.com
Cc: support@lsi.com
Cc: DL-MPTFusionLinux@lsi.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-14 13:52:01 -07:00
James Bottomley
fc847ab431 [SCSI] mpt fusion: fix up doc book comments
Several of the doc book in the previous patches had incorrect multi-line short
function descriptors.  Fixed it all to be the correct single line descriptor.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 18:05:10 -05:00
Kashyap, Desai
db7051b298 [SCSI] mpt fusion: Added support for Broadcast primitives Event handling
Firmware is able to handle Broadcast primitives, but upstream driver does not
have support for broadcast primitive handling. Now this patch is mainly to
support broadcast primitives.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:45:10 -05:00
Kashyap, Desai
a7938b0bb3 [SCSI] mpt fusion: RAID device handling and Dual port Raid support is added
1. Handle integrated Raid device(Add/Delete) and error condition and check
   related to Raid device. is_logical_volume will represent logical volume
   device.
2. Raid device dual port support is added. Main functions to support this
   feature are mpt_raid_phys_disk_get_num_paths and mpt_raid_phys_disk_pg1.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:44:11 -05:00
Kashyap, Desai
2f187862e5 [SCSI] mpt fusion: Code Cleanup patch
Resending patch considering Grants G's code review.

Main goal to submit this patch is code cleaup.
1. Better driver debug prints and code indentation.
2. fault_reset_work_lock is not used anywhere. driver is using taskmgmt_lock
instead of fault_reset_work_lock.
3. setting pci_set_drvdata properly.
4. Ingore config request when IOC is in reset state.( ioc_reset_in_progress
is set).
5. Init/clear managment frame proprely.(INITIALIZE_MGMT_STATUS and
CLEAR_MGMT_STATUS)

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:43:06 -05:00