Commit Graph

390 Commits

Author SHA1 Message Date
Brian King
3a9794d329 sd: Fix max transfer length for 4k disks
The following patch fixes an issue observed with 4k sector disks
where the max_hw_sectors attribute was getting set too large in
sd_revalidate_disk. Since sdkp->max_xfer_blocks is in units
of SCSI logical blocks and queue_max_hw_sectors is in units of
512 byte blocks, on a 4k sector disk, every time we went through
sd_revalidate_disk, we were taking the current value of
queue_max_hw_sectors and increasing it by a factor of 8. Fix
this by only shifting sdkp->max_xfer_blocks.

Cc: stable@vger.kernel.org
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 13:46:29 +01:00
Hannes Reinecke
2104551969 scsi: use per-cpu buffer for formatting sense
Convert sense buffer logging to use the per-cpu buffer to avoid line
breakup.

Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-01-09 15:44:30 +01:00
Martin K. Petersen
e461338b6c sd: tweak discard heuristics to work around QEMU SCSI issue
7985090aa0 changed the discard heuristics to give preference to the
WRITE SAME commands that (unlike UNMAP) guarantee deterministic results.

Ming Lei discovered that QEMU SCSI's WRITE SAME implementation
internally relied on limits that were only communicated for the UNMAP
case. And therefore discard commands backed by WRITE SAME would fail.

Tweak the heuristics so we still pick UNMAP in the LBPRZ=0 case and only
prefer the WRITE SAME variants if the device has the LBPRZ flag set.

Reported-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-12-30 13:30:38 +01:00
Hannes Reinecke
eb846d9f14 scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-24 20:01:40 +01:00
Christoph Hellwig
3af6b35261 scsi: remove scsi_driver owner field
The driver core driver structure has grown an owner field and now
requires it to be set for all modular drivers.  Set it up for
all scsi_driver instances and get rid of the now superflous
scsi_driver owner field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Shane M Seymour <shane.seymour@hp.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 20:01:28 +01:00
Christoph Hellwig
3c356bde19 scsi: stop passing a gfp_mask argument down the command setup path
There is no reason for ULDs to pass in a flag on how to allocate the S/G
lists.  While we don't need GFP_ATOMIC for the blk-mq case because we
don't hold locks, that decision can be made way down the chain without
having to pass a pointless gfp_mask argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 19:56:55 +01:00
Martin K. Petersen
7985090aa0 sd: disable discard_zeroes_data for UNMAP
The T10 SBC UNMAP command does not provide any hard guarantees that
blocks will return zeroes on a subsequent READ. This is due to the fact
that the device server is free to silently ignore all or parts of the
request.

The only way to ensure that a block consistently returns zeroes after
being unmapped is to use WRITE SAME with the UNMAP bit set. Should the
device be unable to unmap one or more blocks described by the command it
is required to manually write zeroes to them.

Until now we have preferred UNMAP over the WRITE SAME variants to
accommodate thinly provisioned devices that predated the final SBC-3
spec. This patch changes the heuristic so that we favor WRITE SAME(16)
or (10) over UNMAP if these commands are marked as supported in the
Logical Block Provisioning VPD page.

The patch also disables discard_zeroes_data for devices operating in
UNMAP mode.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:19:14 +01:00
Christoph Hellwig
21a9d4c9d6 sd: fix up ->compat_ioctl
No need to verify the passthrough ioctls, the real handler will
take care of that.  Also make sure not to block for resets on
O_NONBLOCK fds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:11 +01:00
Christoph Hellwig
906d15fbd2 scsi: split scsi_nonblockable_ioctl
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.

Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl.  All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.

Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:11 +01:00
Hannes Reinecke
ef61329db7 scsi: remove scsi_show_result()
Open-code scsi_print_result in sd.c, and cleanup logging to
not print duplicate informations.
Also remove the call to scsi_show_result() in ufshcd.c
to be consistent with other callers of scsi_execute().

With that we can remove scsi_show_result in constants.c

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:16:05 +01:00
Hannes Reinecke
d811b848eb scsi: use sdev as argument for sense code printing
We should be using the standard dev_printk() variants for
sense code printing.

[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:58 +01:00
Hannes Reinecke
ad3819c09b sd: remove scsi_print_sense() in sd_done()
sd_done() was calling scsi_print_sense() for a sense code
of 'NO_SENSE'.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:56 +01:00
Linus Torvalds
e75437fb93 Merge branch 'for-3.18/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver update from Jens Axboe:
 "This is the block driver pull request for 3.18.  Not a lot in there
  this round, and nothing earth shattering.

   - A round of drbd fixes from the linbit team, and an improvement in
     asender performance.

   - Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and
     hd from Michael Opdenacker.

   - Disable entropy collection from flash devices by default, from Mike
     Snitzer.

   - A small collection of xen blkfront/back fixes from Roger Pau Monné
     and Vitaly Kuznetsov"

* 'for-3.18/drivers' of git://git.kernel.dk/linux-block:
  block: disable entropy contributions for nonrot devices
  xen, blkfront: factor out flush-related checks from do_blkif_request()
  xen-blkback: fix leak on grant map error path
  xen/blkback: unmap all persistent grants when frontend gets disconnected
  rsxx: Remove deprecated IRQF_DISABLED
  block: hd: remove deprecated IRQF_DISABLED
  drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks
  drbd: compute the end before rb_insert_augmented()
  drbd: Add missing newline in resync progress display in /proc/drbd
  drbd: reduce lock contention in drbd_worker
  drbd: Improve asender performance
  drbd: Get rid of the WORK_PENDING macro
  drbd: Get rid of the __no_warn and __cond_lock macros
  drbd: Avoid inconsistent locking warning
  drbd: Remove superfluous newline from "resync_extents" debugfs entry.
  drbd: Use consistent names for all the bi_end_io callbacks
  drbd: Use better variable names
2014-10-18 12:12:45 -07:00
Linus Torvalds
d3dc366bba Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
2014-10-18 11:53:51 -07:00
Mike Snitzer
b277da0a8a block: disable entropy contributions for nonrot devices
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.

Historically, all block devices have automatically made entropy
contributions.  But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
    - On SSD disks, the completion times aren't as random as they
      are for rotational drives. So it's questionable whether they
      should contribute to the random pool in the first place.
    - Calling add_disk_randomness() has a lot of overhead.

There are more reliable sources for randomness than non-rotational block
devices.  From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-04 10:55:32 -06:00
Martin K. Petersen
c611529e7c sd: Honor block layer integrity handling flags
A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.

Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-30 15:17:35 -06:00
Subhash Jadavani
6fe8c1dbef scsi: balance out autopm get/put calls in scsi_sysfs_add_sdev()
SCSI Well-known logical units generally don't have any scsi driver
associated with it which means no one will call scsi_autopm_put_device()
on these wlun scsi devices and this would result in keeping the
corresponding scsi device always active (hence LLD can't be suspended as
well). Same exact problem can be seen for other scsi device representing
normal logical unit whose driver is yet to be loaded. This patch fixes
the above problem with this approach:

- make the scsi_autopm_put_device call at the end of scsi_sysfs_add_sdev
  to make it balance out the get earlier in the function.
- let drivers do paired get/put calls in their probe methods.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-15 16:02:05 -07:00
Sujit Reddy Thumma
2eefd57b97 sd: Avoid sending medium write commands if device is write protected
The SYNCHRONIZE_CACHE command is a medium write command and hence can
fail when the device is write protected. Avoid sending such commands by
making sure that write-cache-enable is disabled even though the device
claim to support it.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-15 16:01:57 -07:00
K. Y. Srinivasan
26b9fd8b34 sd: fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout
Commit ID: 7e660100d8 added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-25 17:16:42 -04:00
Martin K. Petersen
c1d40a527e scsi: add a blacklist flag which enables VPD page inquiries
Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.

Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.

Reported-by: KY Srinivasan <kys@microsoft.com>
Tested-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-25 17:16:41 -04:00
Christoph Hellwig
fd2eb9034e scsi: move the writeable field from struct scsi_device to struct scsi_cd
We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.

Just leave this entirely to the CDROM driver to make everyones life
simpler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2014-07-25 17:16:41 -04:00
Christoph Hellwig
87949eee7e sd: split sd_init_command
Factor out a function to initialize regular read/write commands and leave
sd_init_command as a simple dispatcher to the different prepare routines.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:29 +02:00
Christoph Hellwig
e4200f8ee3 sd: retry discard commands
Currently cmd->allowed is initialized from rq->retries for discard
commands, but retries is always 0 for non-BLOCK_PC requests.  Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:28 +02:00
Christoph Hellwig
a25ee54851 sd: retry write same commands
Currently cmd->allowed is initialized from rq->retries for write same
commands, but retries is always 0 for non-BLOCK_PC requests.  Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:27 +02:00
Christoph Hellwig
6a7b43985d sd: don't use scsi_setup_blk_pc_cmnd for discard requests
Simplify handling of discard requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:26 +02:00
Christoph Hellwig
59b1134c5a sd: don't use scsi_setup_blk_pc_cmnd for write same requests
Simplify handling of write same requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
2014-07-17 22:12:47 +02:00
Christoph Hellwig
a118c6c1d9 sd: don't use scsi_setup_blk_pc_cmnd for flush requests
Simplify handling of flush requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Also rename scsi_setup_flush_cmnd to sd_setup_flush_cmnd for consistency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:12:33 +02:00
Christoph Hellwig
5158a899d8 scsi: set sc_data_direction in common code
The data direction fiel in the SCSI command is derived only from the block
request structure.  Move setting it up into common code instead of
duplicating it in the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:41 +02:00
Christoph Hellwig
3868cf8ea7 scsi: restructure command initialization for TYPE_FS requests
We should call the device handler prep_fn for all TYPE_FS requests,
not just simple read/write calls that are handled by the disk driver.

Restructure the common I/O code to call the prep_fn handler and zero
out the CDB, and just leave the call to scsi_init_io to the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:27 +02:00
Martin K. Petersen
bcdb247c6b sd: Limit transfer length
Until now the per-command transfer length has exclusively been gated by
the max_sectors parameter in the scsi_host template. Given that the size
of this parameter has been bumped to an unsigned int we have to be
careful not to exceed the target device's capabilities.

If the if the device specifies a Maximum Transfer Length in the Block
Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
devices that have use_16_for_rw set and 0xffff for the rest. We then
combine the chosen disk limit with max_sectors in the host template. The
smaller of the two will be used to set the max_hw_sectors queue limit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:33 +02:00
Clément Calmels
8d964478b2 sd: bad return code of init_sd
In init_sd function, if kmem_cache_create or mempool_create_slab_pools
calls fail, the error will not be correclty reported because
class_register previously set the value of err to 0.

Signed-off-by: Clément Calmels <clement.calmels@free.fr>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:32 +02:00
Vaughan Cao
cb2fb68d06 sd: notify block layer when using temporary change to cache_type
This is a fix for commit 39c60a0948

  "sd: fix array cache flushing bug causing performance problems"

We must notify the block layer via q->flush_flags after a temporary change
of the cache_type to write through.  Without this, a SYNCHRONIZE CACHE
command will still be generated.  This patch factors out a helper that
can be called from sd_revalidate_disk and cache_type_store.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:31 +02:00
Akinobu Mita
e430cbc8bb sd: use READ_16 or WRITE_16 when transfer length is greater than 0xffff
This change makes the scsi disk driver handle the requests whose
transfer length is greater than 0xffff with READ_16 or WRITE_16.

However, this is a preparation for extending the data type of
max_sectors in struct Scsi_Host and scsi_host_template.  So, it is
impossible to happen this condition for now, because SCSI low-level
drivers can not specify max_sectors greater than 0xffff due to the
data type limitation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:30 +02:00
Alan Stern
b14bf2d0c0 usb-storage/SCSI: Add broken_fua blacklist flag
Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs.  This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Büsch <m@bues.ch>
Tested-by: Michael Büsch <m@bues.ch>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 22:47:18 -07:00
Linus Torvalds
1c54fc1efe SCSI for-linus on 20140609
This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc,
 be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status
 of a long neglected driver for older hardware.  In addition there are a lot of
 minor fixes and cleanups and some more updates to make scsi mq ready.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJTlcwiAAoJEDeqqVYsXL0M5qsIALzVPLd4yxA16zCiaPQUeIV5
 mfYmwISFlN+qW3AcUeSH4D13YgegCjEBfqaDMWvIkgouxLy/7jpxtChutq3MCzUE
 cDT1B9+ZrzoqBISRNHEh/gx5F1MOF2VPuqG2pe0J90wyRCNzJscB6PbtWMAo86CA
 2eu7wq3K9FXxCC1qY0PzwBLXHqUcgk5GWiK9CM/k4W0NiTVeNmwPeh5i91IQnBHx
 E2l7NAXgNLyCf5tyeswvZ4pW0T3hlaswNmBB4qC8oJm4U6UqMN+tk4ML63Pz7uPe
 4mlHG0uI8Vbdi13iv1EDUZ9Vo8iqVrzP2UAhakgP9poKSGqE4d/MD0EKNGQB2Es=
 =cBY8
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This patch consists of the usual driver updates (qla2xxx, qla4xxx,
  lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to
  maintained status of a long neglected driver for older hardware.  In
  addition there are a lot of minor fixes and cleanups and some more
  updates to make scsi mq ready"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits)
  include/scsi/osd_protocol.h: remove unnecessary __constant
  mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
  Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed"
  mptfusion: fix msgContext in mptctl_hp_hostinfo
  acornscsi: remove linked command support
  scsi/NCR5380: dprintk macro
  fusion: Remove use of DEF_SCSI_QCMD
  fusion: Add free msg frames to the head, not tail of list
  mpt2sas: Add free smids to the head, not tail of list
  mpt2sas: Remove use of DEF_SCSI_QCMD
  mpt2sas: Remove uses of serial_number
  mpt3sas: Remove use of DEF_SCSI_QCMD
  mpt3sas: Remove uses of serial_number
  qla2xxx: Use kmemdup instead of kmalloc + memcpy
  qla4xxx: Use kmemdup instead of kmalloc + memcpy
  qla2xxx: fix incorrect debug printk
  be2iscsi: Bump the driver version
  be2iscsi: Fix processing cqe for cxn whose endpoint is freed
  be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
  be2iscsi: Fix memory corruption in MBX path
  ...
2014-06-09 18:54:06 -07:00
David Jeffery
2a863ba8f6 sd: medium access timeout counter fails to reset
There is an error with the medium access timeout feature of the sd driver. The
sdkp->medium_access_timed_out value is reset to zero in sd_done() in the wrong
place.  Currently it is reset to zero only when a command returns sense data.
This can result in cases where the medium access check falsely triggers from
timed out commands which are hours or days apart.

For example, an I/O command times out and is aborted.  It then retries and
succeeds.  But with no sense data generated and returned, the
medium_access_timed_out value is not reset.  If no sd command returns sense
data, then the next command to time out (however far in time from the first
failure) will trigger the medium access timeout and put the device offline.

The resetting of sdkp->medium_access_timed_out should occur before the check
for sense data.

To reproduce using scsi_debug, use SCSI_DEBUG_OPT_TIMEOUT or
SCSI_DEBUG_OPT_MAC_TIMEOUT to force an I/O command to timeout.  Then, remove
the opt value so the I/O will succeed on retry.  Perform more I/O as desired.
Finally, repeat the process to make a new I/O command time out.  Without the
patch, the device will be marked offline even though many I/O commands have
succeeded between the 2 instances of timed out commands.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by:  Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-05-19 12:54:21 +02:00
Christoph Hellwig
a1b73fc194 scsi: reintroduce scsi_driver.init_command
Instead of letting the ULD play games with the prep_fn move back to
the model of a central prep_fn with a callback to the ULD.  This
already cleans up and shortens the code by itself, and will be required
to properly support blk-mq in the SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-05-19 12:35:09 +02:00
Jens Axboe
dc4a93078b sd/skd: stuff discard page in request->completion_data
Store the pointer to the page there, so we can always safely
reference it from end_io context where ->bio may have been
cleared.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-16 21:37:30 -06:00
Jens Axboe
b4f42e2831 block: remove struct request buffer member
This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:02 -06:00
Linus Torvalds
b7e70ca9c7 Merge branch 'async-scsi-resume' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci
Pull async SCSI resume support from Dan Williams:
 "Allow disks and other devices to resume in parallel.

  This provides a tangible speed up for a non-esoteric use case (laptop
  resume):

    https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach"

* 'async-scsi-resume' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci:
  scsi: async sd resume
2014-04-11 17:23:52 -07:00
Dan Williams
3c31b52f96 scsi: async sd resume
async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL.  With this in place the type of resume
operation is encoded in the async function identifier.

There is a concern that async resume could trigger PSU overload.  In the
enterprise, storage enclosures enforce staggered spin-up regardless of
what the kernel does making async scanning safe by default.  Outside of
that context a user can disable asynchronous scanning via a kernel
command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
deciding whether to do resume asynchronously.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
[alan: bug fix and clean up suggestion]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-04-10 15:30:35 -07:00
Martin K. Petersen
b2bff6ceb6 [SCSI] sd: Quiesce mode sense error messages
Messages about discovered disk properties are only printed once unless
they are found to have changed. Errors encountered during mode sense,
however, are printed every time we revalidate.

Quiesce mode sense errors so they are only printed during the first
scan.

[jejb: checkpatch fixes]
Bugzilla: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733565
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-27 08:26:33 -07:00
Alan Stern
7aae51347b [SCSI] sd: don't fail if the device doesn't recognize SYNCHRONIZE CACHE
Evidently some wacky USB-ATA bridges don't recognize the SYNCHRONIZE
CACHE command, as shown in this email thread:

	http://marc.info/?t=138978356200002&r=1&w=2

The fact that we can't tell them to drain their caches shouldn't
prevent the system from going into suspend.  Therefore sd_sync_cache()
shouldn't return an error if the device replies with an Invalid
Command ASC.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Tested-by: Daniel Mack <zonque@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:22 -07:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Jens Axboe
b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Geert Uytterhoeven
ef80d1e18b [SCSI] sd: Do not call do_div() with a 64-bit divisor
do_div() is meant for divisions of 64-bit number by 32-bit numbers.
Passing 64-bit divisor types caused issues in the past on 32-bit platforms,
cfr. commit ea077b1b96 ("m68k: Truncate base
in do_div()").

As scsi_device.sector_size is unsigned (int), factor should be unsigned
int, too.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-12-19 07:39:03 -08:00
James Bottomley
2451079bc2 [SCSI] Fix erratic device offline during EH
Commit 18a4d0a22e
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.

This patch moves the check to the final test, eliminating
this problem.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-12-19 07:39:02 -08:00
Martin K. Petersen
54b2b50c20 [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers
Some host adapters do not pass commands through to the target disk
directly. Instead they provide an emulated target which may or may not
accurately report its capabilities. In some cases the physical device
characteristics are reported even when the host adapter is processing
commands on the device's behalf. This can lead to adapter firmware hangs
or excessive I/O errors.

This patch disables WRITE SAME for devices connected to host adapters
that provide an emulated target. Driver writers can disable WRITE SAME
by setting the no_write_same flag in the host adapter template.

[jejb: fix up rejections due to eh_deadline patch]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-11-29 08:48:39 +04:00
Kent Overstreet
a4ad39b1d1 block: Convert bio_iovec() to bvec_iter
For immutable biovecs, we'll be introducing a new bio_iovec() that uses
our new bvec iterator to construct a biovec, taking into account
bvec_iter->bi_bvec_done - this patch updates existing users for the new
usage.

Some of the existing users really do need a pointer into the bvec array
- those uses are all going to be removed, but we'll need the
functionality from immutable to remove them - so for now rename the
existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
patches.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
2013-11-23 22:33:49 -08:00
Linus Torvalds
0d522ee749 SCSI for-linus on 20131110
This patch set is driver updates for qla4xxx, scsi_debug, pm80xx, fcoe/libfc,
 eas2r, lpfc, be2iscsi and megaraid_sas plus some assorted bug fixes and
 cleanups.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSfw30AAoJEDeqqVYsXL0MZPEIAK6GBHFw8JsU3NQ4SbM5hzdM
 ywPryTn7DO9jyj0J04i6TNtbS6om9E8tjLyr3SnmTQNiTDXGv44rIEfJyHR9ko2n
 E2hRu4xaGEWK4dkuktQuOqj2fuXRyeXr2maYIXjkmFI0hesLqozYKgLAeWTHvabE
 2HICwG/lfCzesqVl69Y3V8n1vZvtJqAls6liwY09i9eSDRe39DynRn7bjLXzkPkc
 ynjJYl22CIZ7nb+PgzqQ+xEUIdXqGG890CvGaqg7+x3ZUmmOtfECaDUkCjVeiiE6
 sy72V6E4ET/YMrkhRmIUKyZxGbl/tMxYPuGaBhq2fSNRx8x1R+Ajfh9UM2AZTh4=
 =hCHG
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This patch set is driver updates for qla4xxx, scsi_debug, pm80xx,
  fcoe/libfc, eas2r, lpfc, be2iscsi and megaraid_sas plus some assorted
  bug fixes and cleanups"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits)
  [SCSI] scsi_error: Escalate to LUN reset if abort fails
  [SCSI] Add 'eh_deadline' to limit SCSI EH runtime
  [SCSI] remove check for 'resetting'
  [SCSI] dc395: Move 'last_reset' into internal host structure
  [SCSI] tmscsim: Move 'last_reset' into host structure
  [SCSI] advansys: Remove 'last_reset' references
  [SCSI] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset
  [SCSI] dpt_i2o: Remove DPTI_STATE_IOCTL
  [SCSI] megaraid_sas: Fix synchronization problem between sysPD IO path and AEN path
  [SCSI] lpfc: Fix typo on NULL assignment
  [SCSI] scsi_dh_alua: ALUA handler attach should succeed while TPG is transitioning
  [SCSI] scsi_dh_alua: ALUA check sense should retry device internal reset unit attention
  [SCSI] esas2r: Cleanup snprinf formatting of firmware version
  [SCSI] esas2r: Remove superfluous mask of pcie_cap_reg
  [SCSI] esas2r: Fixes for big-endian platforms
  [SCSI] esas2r: Directly call kernel functions for atomic bit operations
  [SCSI] lpfc 8.3.43: Update lpfc version to driver version 8.3.43
  [SCSI] lpfc 8.3.43: Fixed not processing task management IOCB response status
  [SCSI] lpfc 8.3.43: Fixed spinlock hang.
  [SCSI] lpfc 8.3.43: Fixed invalid Total_Data_Placed value received for els and ct command responses
  ...
2013-11-14 12:25:38 +09:00
Jens Axboe
e37459b8e2 Merge branch 'blk-mq/core' into for-3.13/core
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-timeout.c
2013-11-08 09:08:12 -07:00
Jens Axboe
5953316dbf block: make rq->cmd_flags be 64-bit
We have officially run out of flags in a 32-bit space. Extend it
to 64-bit even on 32-bit archs.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
James Bottomley
7e660100d8 [SCSI] Derive the FLUSH_TIMEOUT from the basic I/O timeout
Rather than having a separate constant for specifying the timeout on FLUSH
operations, use the basic I/O timeout value that is already configurable
on a per target basis to derive the FLUSH timeout. Looking at the current
definitions of these timeout values, the FLUSH operation is supposed to have
a value that is twice the normal timeout value. This patch preserves this
relationship while leveraging the flexibility of specifying the I/O timeout.

Based on a prior patch by KY Srinivasan <kys@microsoft.com>

Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:16 +01:00
Oliver Neukum
95897910a5 [SCSI] sd: Add error handling during flushing caches
It makes no sense to flush the cache of a device without medium.
Errors during suspend must be handled according to their causes.
Errors due to missing media or unplugged devices must be ignored.
Errors due to devices being offlined must also be ignored.
The error returns must be modified so that the generic layer
understands them.

[jejb: fix up whitespace and other formatting problems]
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:13 +01:00
Bernd Schubert
af73623f5f [SCSI] sd: Reduce buffer size for vpd request
Somehow older areca firmware versions have issues with
scsi_get_vpd_page() and a large buffer, the firmware
seems to crash and the scsi error-handler will start endless
recovery retries.
Limiting the buf-size to 64-bytes fixes this issue with older
firmware versions (<1.49 for my controller).

Fixes a regression with areca controllers and older firmware versions
introduced by commit: 66c28f9712

Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: stable@vger.kernel.org # delay inclusion for 2 months for testing
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:12 +01:00
Aaron Lu
10c580e423 [SCSI] sd: call blk_pm_runtime_init before add_disk
Sujit has found a race condition that would make q->nr_pending
unbalanced, it occurs as Sujit explained:

"
sd_probe_async() ->
	add_disk() ->
		disk_add_event() ->
			schedule(disk_events_workfn)
	sd_revalidate_disk()
	blk_pm_runtime_init()
return;

Let's say the disk_events_workfn() calls sd_check_events() which tries
to send test_unit_ready() and because of sd_revalidate_disk() trying to
send another commands the test_unit_ready() might be re-queued as the
tagged command queuing is disabled.

So the race condition is -

Thread 1 			  |		Thread 2
sd_revalidate_disk()		  |	sd_check_events()
...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
				  |	nr_pending = -1 since
				  |	q->dev != NULL
"

The problem is, the test_unit_ready request doesn't get counted the
first time it is queued, so the later decrement of q->nr_pending in
blk_pm_requeue_request makes it unbalanced.

Fix this by calling blk_pm_runtime_init before add_disk so that all
requests initiated there will all be counted.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-and-tested-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23 14:09:18 +01:00
Alan Stern
984f1733fc [SCSI] sd: Fix potential out-of-bounds access
This patch fixes an out-of-bounds error in sd_read_cache_type(), found
by Google's AddressSanitizer tool.  When the loop ends, we know that
"offset" lies beyond the end of the data in the buffer, so no Caching
mode page was found.  In theory it may be present, but the buffer size
is limited to 512 bytes.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 09:44:00 -07:00
Greg Kroah-Hartman
e1ea2351fb [SCSI] sd: convert class code to use dev_groups
The dev_attrs field of struct class is going away soon, dev_groups
should be used instead.  This converts the scsi disk class code to use
the correct field.

It required some functions to be moved around to place the show and
store functions next to each other, the old order seemed to make no
sense at all.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-21 10:10:48 -07:00
Ewan D. Milne
085b513f97 [SCSI] sd: fix crash when UA received on DIF enabled device
sd_prep_fn will allocate a larger CDB for the command via mempool_alloc
for devices using DIF type 2 protection.  This CDB was being freed
in sd_done, which results in a kernel crash if the command is retried
due to a UNIT ATTENTION.  This change moves the code to free the larger
CDB into sd_unprep_fn instead, which is invoked after the request is
complete.

It is no longer necessary to call scsi_print_command separately for
this case as the ->cmnd will no longer be NULL in the normal code path.

Also removed conditional test for DIF type 2 when freeing the larger
CDB because the protection_type could have been changed via sysfs while
the command was executing.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-07-23 07:41:53 -07:00
Linus Torvalds
84cbd7222b SCSI misc on 20130702
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
 megaraid_sas, bfa, ipr) and a few bug fixes.  Also of note is that the
 Buslogic driver has been rewritten to a better coding style and 64 bit support
 added.  We also removed the libsas limitation on 16 bytes for the command size
 (currently no drivers make use of this).
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0ugCAAoJEDeqqVYsXL0MX2sH+gOkWuy5p3igz+VEim8TNaOA
 VV5EIxG1v7Q0ZiXCp/wcF6eqhgQkWvkrKSxWkaN0yzq8LEWfQeY7VmFDbGgFeVUZ
 XMlX5ay8+FLCIK9M76oxwhV7VAXYbeUUZafh+xX6StWCdKrl0eJbicOGoUk/pjsi
 ZjCBpK5BM0SW+s2gMSDQhO2eMsgMp9QrJMiCJHUF1wWPN8Yez6va1tg4b9iW39BZ
 dd3sJq+PuN6yDbYAJIjEpiGF9gDaaYxSE6bTKJuY+oy08+VsP/RRWjorTENs9Aev
 rQXZIC3nwsv26QRSX7RDSj+UE+kFV6FcPMWMU3HN2UG6ttprtOxT8tslVJf7LcA=
 =BxtF
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
  megaraid_sas, bfa, ipr) and a few bug fixes.  Also of note is that the
  Buslogic driver has been rewritten to a better coding style and 64 bit
  support added.  We also removed the libsas limitation on 16 bytes for
  the command size (currently no drivers make use of this)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (101 commits)
  [SCSI] megaraid: minor cut and paste error fixed.
  [SCSI] ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call
  [SCSI] ufs: fix register address in UIC error interrupt handling
  [SCSI] ufshcd-pltfrm: add missing empty slot in ufs_of_match[]
  [SCSI] ufs: use devres functions for ufshcd
  [SCSI] ufs: Fix the response UPIU length setting
  [SCSI] ufs: rework link start-up process
  [SCSI] ufs: remove version check before IS reg clear
  [SCSI] ufs: amend interrupt configuration
  [SCSI] ufs: wrap the i/o access operations
  [SCSI] storvsc: Update the storage protocol to win8 level
  [SCSI] storvsc: Increase the value of scsi timeout for storvsc devices
  [SCSI] MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver
  [SCSI] BusLogic: Port driver to 64-bit.
  [SCSI] BusLogic: Fix style issues
  [SCSI] libiscsi: Added new boot entries in the session sysfs
  [SCSI] aacraid: Fix for arrays are going offline in the system. System hangs
  [SCSI] ipr: IOA Status Code(IOASC) update
  [SCSI] sd: Update WRITE SAME heuristics
  [SCSI] fnic: potential dead lock in fnic_is_abts_pending()
  ...
2013-07-04 12:30:30 -07:00
Kees Cook
02aa2a3763 drivers: avoid format string in dev_set_name
Calling dev_set_name with a single paramter causes it to be handled as a
format string.  Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Martin K. Petersen
66c28f9712 [SCSI] sd: Update WRITE SAME heuristics
SATA drives located behind a SAS controller would incorrectly receive
WRITE SAME commands. Tweak the heuristics so that:

 - If REPORT SUPPORTED OPERATION CODES is provided we will use that to
   choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
   fixes an issue with the old code which would issue WRITE SAME(10)
   despite the command not being whitelisted in REPORT SUPPORTED
   OPERATION CODES.

 - If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
   to WRITE SAME(10) unless the device has an ATA Information VPD page.
   The assumption is that a SATL which is smart enough to implement
   WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.

To facilitate the new heuristics scsi_report_opcode() has been modified
to so we can distinguish between "operation not supported" and "RSOC not
supported".

Reported-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-26 17:56:18 -07:00
Ben Hutchings
2ee3e26c67 [SCSI] sd: Fix parsing of 'temporary ' cache mode prefix
Commit 39c60a0948 '[SCSI] sd: fix array cache flushing bug causing
performance problems' added temp as a pointer to "temporary " and used
sizeof(temp) - 1 as its length.  But sizeof(temp) is the size of the
pointer, not the size of the string constant.  Change temp to a static
array so that sizeof() does what was intended.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-26 08:44:33 -07:00
Hannes Reinecke
0761df9c4b [SCSI] sd: avoid deadlocks when running under multipath
When multipathed systems run into an all-paths-down scenario
all devices might be dropped, too. This causes 'del_gendisk'
to be called, which will unregister the kobj_map->probe()
function for all disk device numbers.
When the device comes back the default ->probe() function
is run which will call __request_module(), which will
deadlock.
As 'del_gendisk' typically does _not_ trigger a module unload
the default ->probe() function is pointless anyway.
This patch implements a dummy ->probe() function, which will
just return NULL if the disk is not registered.
This will avoid the deadlock. Plus it'll speed up device
scanning.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-04 11:16:23 -07:00
James Bottomley
297b8a0734 Merge branch 'postmerge' into for-linus
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10 07:54:01 -07:00
James Bottomley
832e77bc11 Merge branch 'misc' into for-linus
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10 07:53:40 -07:00
Al Viro
db2a144bed block_device_operations->release() should return void
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07 02:16:21 -04:00
Lin Ming
6df339a51e [SCSI] sd: change to auto suspend mode
Uses block layer runtime pm helper functions in
scsi_runtime_suspend/resume for devices that take advantage of it.

Remove scsi_autopm_* from sd open/release path and check_events path.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-06 12:48:31 -07:00
Lin Ming
9b21493c45 [SCSI] sd: use REQ_PM in sd's runtime suspend operation
With the introduction of REQ_PM, modify sd's runtime suspend operation
functions to use that flag so that the operations to put the device into
runtime suspended state(i.e. sync cache and stop device) will not affect
its runtime PM status.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-06 12:48:17 -07:00
James Bottomley
39c60a0948 [SCSI] sd: fix array cache flushing bug causing performance problems
Some arrays synchronize their full non volatile cache when the sd driver sends
a SYNCHRONIZE CACHE command.  Unfortunately, they can have Terrabytes of this
and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
writeback cache.  This leads to massive slowdowns on journalled filesystems.

The fix is to allow userspace to turn off the writeback cache setting as a
temporary measure (i.e. without doing the MODE SELECT to write it back to the
device), so even though the device reported it has a writeback cache, the
user, knowing that the cache is non volatile and all they care about is
filesystem correctness, can turn that bit off in the kernel and avoid the
performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.

The way you do this is add a 'temporary' prefix when performing the usual
cache setting operations, so

echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type

Reported-by: Ric Wheeler <rwheeler@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-02 16:15:54 -07:00
Aaron Lu
691e3d3175 [SCSI] sd: update sd to use the new pm callbacks
Update sd driver to use the callbacks defined in dev_pm_ops.

sd_freeze is NULL, the bus level callback has taken care of quiescing
the device so there should be nothing needs to be done here.
Consequently, sd_thaw is not needed here either.

suspend, poweroff and runtime suspend share the same routine sd_suspend,
which will sync flush and then stop the drive, this is the same as before.

resume, restore and runtime resume share the same routine sd_resume,
which will start the drive by putting it into active power state, this
is also the same as before.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-30 09:28:34 +00:00
Aaron Lu
a01475637c [SCSI] sd: put to stopped power state when runtime suspend
When device is runtime suspended, put it to stopped power state to save
some power.

This will also make the behaviour consistent with what the scsi_pm.c
thinks about sd as the comment says:
sd treats runtime suspend, system suspend and system hibernate identical.
With this patch, it is now identical.
And sd_shutdown will also do nothing when it finds the device has been
runtime suspended, if we do not spin down the disk in runtime suspend
by putting it into stopped power state, the disk will be shut down
incorrectly.
And the the same problem can be solved for runtime power off after
runtime suspended case by this change.

With the current runtime scheme for disk, it will only be runtime
suspended when no process opens the disk, so this shouldn't happen a
lot, which makes it acceptable to spin down the disk when runtime
suspended. If some day a more aggressive runtime scheme is used, like
the 'request based runtime pm for disk' that Alan Stern and Lin Ming
has been working, we can introduce some policy to control this. But for
now, make it simple and correct by spinning down the disk.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-30 09:22:09 +00:00
Jason J. Herne
53ad570be6 [SCSI] sd: Use SCSI read/write(16) with > 32-bit LBA drives
Force large capacity (> 0xFFFFFFFF blocks) drives to use READ/WRITE(16) instead
of READ/WRITE(10). Some(most/all?) USB enclosures do not like READ(10) commands
when a large capacity drive is installed. This issue was reported and discussed
here: http://marc.info/?l=linux-usb&m=135247705222324

Signed-off-by: Jason J. Herne <hernejj@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-27 09:00:38 +04:00
Joel D. Diaz
afd5e34b2b [SCSI] sd: Reshuffle init_sd to avoid crash
scsi_register_driver will register a prep_fn() function, which
in turn migh need to use the sd_cdp_pool for DIF.
Which hasn't been initialised at this point, leading to
a crash. So reshuffle the init_sd() and exit_sd() paths
to have the driver registered last.

Signed-off-by: Joel D. Diaz <joeldiaz@us.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-27 08:59:34 +04:00
Martin K. Petersen
5db44863b6 [SCSI] sd: Implement support for WRITE SAME
Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.

 - We set the default maximum to 0xFFFF because there are several
   devices out there that only support two-byte block counts even with
   WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
   device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
   LIMITS VPD.

 - max_write_same_blocks can be overriden per-device basis in sysfs.

 - The UNMAP discovery heuristics remain unchanged but the discard
   limits are tweaked to match the "real" WRITE SAME commands.

 - In the error handling logic we now distinguish between WRITE SAME
   with and without UNMAP set.

The discovery process heuristics are:

 - If the device reports a SCSI level of SPC-3 or greater we'll issue
   READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
   supported. If that's the case we will use it.

 - If the device supports the block limits VPD and reports a MAXIMUM
   WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).

 - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
   0xFFFFFFFF or the block count exceeds 0xFFFF.

 - no_write_same is set for ATA, FireWire and USB.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-13 22:45:42 -08:00
Martin K. Petersen
26e85fcd15 [SCSI] sd: Permit merged discard requests
Support requests with more than one bio payload for discards. The total
number of bytes to be discarded is stored in req->__data_len and used in
sd_done() to complete the I/O.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-13 21:23:53 -08:00
Martin K. Petersen
fe542396da [SCSI] sd: Ensure we correctly disable devices with unknown protection type
We set the capacity to zero when we discovered a device formatted with
an unknown DIF protection type. However, the read_capacity code would
override the capacity and cause the device to be enabled regardless.

Make sd_read_protection_type() return an error if the protection type is
unknown. Also prevent duplicate printk lines when the device is being
revalidated.

Reported-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-09-24 13:01:24 +04:00
Martin K. Petersen
8172499aae [SCSI] sd: Allow protection_type to be overridden
We have encountered a few devices that misbehaved when operating in T10
PI mode. Allow T10 PI protection type to be overridden from userland.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-09-24 12:10:59 +04:00
Martin K. Petersen
8c579ab69d [SCSI] sd: Avoid remapping bad reference tags
It does not make sense to translate ref tags with unexpected values.
Instead we simply ignore them and let the upper layers catch the
problem. Ref tags that contain the expected value are still remapped.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-09-24 12:10:59 +04:00
Namjae Jeon
b81478d82e [SCSI] set to WCE if usb cache quirk is present.
Make use of USB quirk method to identify such HDD while reading
the cache status in sd_probe(). If cache quirk is present for
the HDD, lets assume that cache is enabled and make WCE bit
equal to 1.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:59:00 +01:00
Josh Hunt
9e1a15376b [SCSI] properly initialize atomic_t
Initialize atomic_t scsi_host_next_hn and ioerr_cntas per the guidelines
defined in Documentation/atomic_ops.txt

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:36 +01:00
Alan Stern
6a0bdffa00 SCSI & usb-storage: add try_rc_10_first flag
Several bug reports have been received recently for USB mass-storage
devices that don't handle READ CAPACITY(16) commands properly.  They
report bogus sizes, in some cases becoming unusable as a result.

The bugs were triggered by commit
09b6b51b0b (SCSI & usb-storage: add
flags for VPD pages and REPORT LUNS), which caused usb-storage to stop
overriding the SCSI level reported by devices.  By default, the sd
driver will try READ CAPACITY(16) first for any device whose level is
above SCSI_SPC_2.

It seems likely that any device large enough to require the use of
READ CAPACITY(16) (i.e., 2 TB or more) would be able to handle READ
CAPACITY(10) commands properly.  Indeed, I don't know of any devices
that don't handle READ CAPACITY(10) properly.

Therefore this patch (as1559) adds a new flag telling the sd driver
to try READ CAPACITY(10) before READ CAPACITY(16), and sets this flag
for every USB mass-storage device.  If a device really is larger than
2 TB, sd will fall back to READ CAPACITY(16) just as it used to.

This fixes Bugzilla #43391.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 22:05:31 -07:00
Dan Williams
a7a20d1039 [SCSI] sd: limit the scope of the async probe domain
sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[jejb: remove unneeded config guards in include file]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 09:10:46 +01:00
Linus Torvalds
a75ee6ecd4 SCSI updates on 20120331
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPdrIWAAoJEDeqqVYsXL0Mny4IAMTzXGOXCykpWhdIe2R8w0Ys
 eIoTJhBKoQWnLTV8cOODtwmtZcoQLeXkZmizZiAJvX6O1tOgueg+W4AFa9grxXGI
 O0d1bSb2ardzU7VZrZSY60Hd4bylMwn4Xv/0dRrQMwTJO0LEeGWsJPV2+2BuXwMB
 lGCNB67oUBXgMOI1jUZQRwx/mBzQ3e/gINjnpZTNKHia7YkX/yVTFISq7htgfDN7
 1wRGxymbHtVap3NbtUO96BUUndAiF5vom+4WNvaQUyPrCc6aoGWjv+J9DQXY/zgv
 AYjujAluK396D6YncGFAWBzYOg9WFbq54v0PRUanjcTTAu5ILs2BxqWdhmnvl14=
 =IH8T
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

Pull SCSI updates from James Bottomley:
 "This is primarily another round of driver updates (lpfc, bfa, fcoe,
  ipr) plus a new ufshcd driver.  There shouldn't be anything
  controversial in here (The final deletion of scsi proc_ops which
  caused some build breakage has been held over until the next merge
  window to give us more time to stabilise it).

  I'm afraid, with me moving continents at exactly the wrong time,
  anything submitted after the merge window opened has been held over to
  the next merge window."

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (63 commits)
  [SCSI] ipr: Driver version 2.5.3
  [SCSI] ipr: Increase alignment boundary of command blocks
  [SCSI] ipr: Increase max concurrent oustanding commands
  [SCSI] ipr: Remove unnecessary memory barriers
  [SCSI] ipr: Remove unnecessary interrupt clearing on new adapters
  [SCSI] ipr: Fix target id allocation re-use problem
  [SCSI] atp870u, mpt2sas, qla4xxx use pci_dev->revision
  [SCSI] fcoe: Drop the rtnl_mutex before calling fcoe_ctlr_link_up
  [SCSI] bfa: Update the driver version to 3.0.23.0
  [SCSI] bfa: BSG and User interface fixes.
  [SCSI] bfa: Fix to avoid vport delete hang on request queue full scenario.
  [SCSI] bfa: Move service parameter programming logic into firmware.
  [SCSI] bfa: Revised Fabric Assigned Address(FAA) feature implementation.
  [SCSI] bfa: Flash controller IOC pll init fixes.
  [SCSI] bfa: Serialize the IOC hw semaphore unlock logic.
  [SCSI] bfa: Modify ISR to process pending completions
  [SCSI] bfa: Add fc host issue lip support
  [SCSI] mpt2sas: remove extraneous sas_log_info messages
  [SCSI] libfc: fcoe_transport_create fails in single-CPU environment
  [SCSI] fcoe: reduce contention for fcoe_rx_list lock [v2]
  ...
2012-03-31 13:31:23 -07:00
Petr Uzel
2db93ce8cc [SCSI] sd: make comment and printk string match code
Adapt comment and printk string after renaming sd_init_command to sd_prep_fn
Adapt comment and printk string after renaming sd_attach to sd_probe

Signed-off-by: Petr Uzel <petr.uzel@suse.cz>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-03-27 08:26:28 +01:00
Linus Torvalds
424a6f6ef9 SCSI updates on 20120319
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP
 CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC
 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44
 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK
 +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a
 uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE=
 =1TxB
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

SCSI updates from James Bottomley:
 "The update includes the usual assortment of driver updates (lpfc,
  qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
  amount of infrastructure work in the SAS library and transport class
  as well as an iSCSI update.  There's also a new SCSI based virtio
  driver."

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
  [SCSI] qla4xxx: Update driver version to 5.02.00-k15
  [SCSI] qla4xxx: trivial cleanup
  [SCSI] qla4xxx: Fix sparse warning
  [SCSI] qla4xxx: Add support for multiple session per host.
  [SCSI] qla4xxx: Export CHAP index as sysfs attribute
  [SCSI] scsi_transport: Export CHAP index as sysfs attribute
  [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
  [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
  [SCSI] pm8001: fix endian issue with code optimization.
  [SCSI] pm8001: Fix possible racing condition.
  [SCSI] pm8001: Fix bogus interrupt state flag issue.
  [SCSI] ipr: update PCI ID definitions for new adapters
  [SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
  [SCSI] isci: improvements in driver unloading routine
  [SCSI] isci: improve phy event warnings
  [SCSI] isci: debug, provide state-enum-to-string conversions
  [SCSI] scsi_transport_sas: 'enable' phys on reset
  [SCSI] libsas: don't recover end devices attached to disabled phys
  [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
  [SCSI] libsas: set attached device type and target protocols for local phys
  ...
2012-03-22 12:55:29 -07:00
Lan Tianyu
4e2247b2bd [SCSI] sd: Add runtime pm in the sd_check_events()
The sd_check_event() will be called periodly even when the device is in the
suspended status to check media event. The scsi_test_unit_ready() in the
sd_check_event() will issue scsi cmd request. Issuing scsi request when the
device is in the suspeneded status will cause problem. For example, when a usb
flash disk in the suspended status, scsi_test_unit_ready() issues a scsi
request. The request will be returned as failed because the usb device is not
active. The patch adds scsi_autopm_get_device() and scsi_autopm_put_device()
around scsi_test_unit_ready() in the sd_check_event() to resolve such problem.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-03-19 11:34:18 +00:00
Martin K. Petersen
18a4d0a22e [SCSI] Handle disk devices which can not process medium access commands
We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.

The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.

If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.

Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.

[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 10:14:52 -06:00
Martin K. Petersen
89730393f2 [SCSI] sd: Make sure provisioning mode is reported correctly
The provisioning_mode parameter in sysfs did not get updated in the
SD_LBP_DISABLE case. Make sure the provisioning mode is always set
correctly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 09:39:09 -06:00
Alan Stern
09b6b51b0b SCSI & usb-storage: add flags for VPD pages and REPORT LUNS
This patch (as1507) adds a skip_vpd_pages flag to struct scsi_device
and a no_report_luns flag to struct scsi_target.  The first is used to
control whether sd will look at VPD pages for information on block
provisioning, limits, and characteristics.  The second prevents
scsi_report_lun_scan() from issuing a REPORT LUNS command.

The patch also modifies usb-storage to set the new flag bits for all
USB devices and targets, and to stop adjusting the scsi_level value.

Historically we have seen that USB mass-storage devices often don't
support VPD pages or REPORT LUNS properly.  Until now we have avoided
these things by setting the scsi_level to SCSI_2 for all USB devices.
But this has the side effect of storing the LUN bits into the second
byte of each CDB, and now we have a report of a device which doesn't
like that.  The best solution is to stop abusing scsi_level and
instead have separate flags for VPD pages and REPORT LUNS.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Perry Wagle <wagle@mac.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-08 17:36:41 -08:00
Paolo Bonzini
0bfc96cb77 block: fail SCSI passthrough ioctls on partition devices
Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device.  This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.

This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent.  In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice.  Still, I'm treating it specially to avoid spamming the logs.

In principle, this restriction should include programs running with
CAP_SYS_RAWIO.  If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities.  However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls.  Their actions will still be logged.

This patch does not affect the non-libata IDE driver.  That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-14 15:07:24 -08:00
Paolo Bonzini
577ebb374c block: add and use scsi_blk_cmd_ioctl
Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-14 15:07:24 -08:00
Lin Ming
54f5758846 [SCSI] sd: check runtime PM status in sd_shutdown
sd_shutdown is called during reboot/poweroff.
It may fail if parent device, for example, ata port, was runtime suspended.

Fix it by checking runtime PM status of sd.
Exit immediately if sd was runtime suspended already.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-01-08 19:14:57 -05:00
Dave Kleikamp
21208ae5a2 [SCSI] sd: remove arbitrary SD_MAX_DISKS namespace limit
There is no reason to limit the SCSI disk namespace to sdXXX.

Add new error messages to sd_probe() in the unlikely event that either
ida_get_new() or sd_format_disk_name() fail.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-30 12:58:11 +04:00
Nao Nishijima
fe2d1851e9 [SCSI] sd: Use sd_printk() instead of printk()
sd_ioctl() still use printk() for log output.
It should use sd_printk() instead of printk(), as well as other sd_*.

All SCSI messages should output via s*_printk() instead of printk().

Signed-off-by: Nao Nishijima <nao.nishijima.xt@hitachi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-29 00:16:20 -07:00
Luben Tuikov
0bcaa11154 [SCSI] Retrieve the Caching mode page (version 2)
Some kernel transport drivers unconditionally disable
retrieval of the Caching mode page. One such for example is
the BBB/CBI transport over USB. Such a restraint is too
harsh as some devices do support the Caching mode
page. Unconditionally enabling the retrieval of this mode
page over those transports at their transport code level may
result in some devices failing and becoming unusable.

This patch implements a method of retrieving the Caching
mode page without unconditionally enabling it in the
transports which unconditionally disable it. The idea is to
ask for all supported pages, page code 0x3F, and then search
for the Caching mode page in the mode parameter data
returned. The sd driver already asks for all the mode pages
supported by the attached device by setting the page code to
0x3F in order to find out if the media is write protected by
reading the WP bit in the Device Specific Parameter
field. It then attempts to retrieve only the Caching mode
page by setting the page code to 8 and actually attempting
to retrieve it if and only if the transport allows it.

The method implemented here is that if the transport doesn't
allow retrieval of the Caching mode page and the device is
not RBC, then we ask for all pages supported by setting the
page code to 0x3F (similarly to how the WP bit is retrieved
above), and then we search for the Caching mode page in the
mode parameter data returned.

With this patch, devices over SATA, report this (no change):

Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Smart devices report their Caching mode page. This is a
change where we'd previously see the kernel making
assumption about the device's cache being write-through:

Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] 610472646 4096-byte logical blocks: (2.50 TB/2.27 TiB)
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Mode Sense: 47 00 10 08
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA

And "dumb" devices over BBB, are correctly shown not to
support reporting the Caching mode page:

Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] 15663104 512-byte logical blocks: (8.01 GB/7.46 GiB)
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Write Protect is off
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] No Caching mode page present
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through

Version 2 adds this:

Some devices don't support page code 0x3F, and others require a
fixed transfer length of 192 bytes. This single commit includes a
patch by Alan Stern which fixes this.

Reported-and-tested-by: Richard Senior <richard@r-senior.demon.co.uk>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-24 12:43:52 -04:00
Martin K. Petersen
2a8cfad06e [SCSI] sd: Unmap discard alignment needs to be converted to bytes
The block layer discard alignment is reported in bytes, not in units of
the logical block size.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-24 12:38:15 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
James Bottomley
3dea642afd [SCSI] Revert "[SCSI] Retrieve the Caching mode page"
This reverts commit 24d720b726.

Previously we thought there was little possibility that devices would
crash with this, but some have been found.

Reported-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-23 12:53:09 -05:00
Martin K. Petersen
09b9cc44c9 sd: Fail discard requests when logical block provisioning has been disabled
Ensure that we kill discard requests after logical block provisioning
has been disabled in sysfs.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 09:35:53 -07:00
Martin K. Petersen
c98a0eb0e9 [SCSI] sd: Logical Block Provisioning update
SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).

The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:

 - WRITE SAME(10) with the UNMAP bit set

 - WRITE SAME(10) without the UNMAP bit set. This allows us to support
   devices that predate the TP/LBP enhancements in SBC3 and which work
   by way zero-detection

Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.

I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:37:34 -05:00
Tejun Heo
f4013c3879 [SCSI] sd,sr: kill compat SDEV_MEDIA_CHANGE event
SDEV_MEDIA_CHANGE event was first added by commit a341cd0f (SCSI: add
asynchronous event notification API) for SATA AN support and then
extended to cover generic media change events by commit 285e9670
([SCSI] sr,sd: send media state change modification events).

This event was mapped to block device in userland with all properties
stripped to simulate CHANGE event on the block device, which, in turn,
was used to trigger further userspace action on media change.

The recent addition of disk event framework kept this event for
backward compatibility but it turns out to be unnecessary and causes
erratic and inefficient behavior.  The new disk event generates proper
events on the block devices and the compat events are mapped to block
device with all properties stripped, so the block device ends up
generating multiple duplicate events for single actual event.

This patch removes the compat event generation from both sr and sd as
suggested by Kay Sievers.  Both existing and newer versions of udev
and the associated tools will behave better with the removal of these
events as they from the beginning were expecting events on the block
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-14 09:17:35 -06:00
Tejun Heo
2bae0093ca [SCSI] sd: implement sd_check_events()
Replace sd_media_change() with sd_check_events().

* Move media removed logic into set_media_not_present() and
  media_not_present() and set sdev->changed iff an existing media is
  removed or the device indicates UNIT_ATTENTION.

* Make sd_check_events() sets sdev->changed if previously missing
  media becomes present.

* Event is reported only if sdev->changed is set.

This makes media presence event reported if scsi_disk->media_present
actually changed or the device indicated UNIT_ATTENTION.  For backward
compatibility, SDEV_EVT_MEDIA_CHANGE is generated each time
sd_check_events() detects media change event.

[jejb: fix boot failure]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-14 09:17:34 -06:00
Linus Torvalds
275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
James Bottomley
a8733c7baf [SCSI] fix medium error problems with some arrays which can cause data corruption
Our current handling of medium error assumes that data is returned up
to the bad sector.  This assumption holds good for all disk devices,
all DIF arrays and most ordinary arrays.  However, an LSI array engine
was recently discovered which reports a medium error without returning
any data.  This means that when we report good data up to the medium
error, we've reported junk originally in the buffer as good.  Worse,
if the read consists of requested data plus a readahead, and the error
occurs in readahead, we'll just strip off the readahead and report
junk up to userspace as good data with no error.

The fix for this is to have the error position computation take into
account the amount of data returned by the driver using the scsi
residual data.  Unfortunately, not every driver fills in this data,
but for those who don't, it's set to zero, which means we'll think a
full set of data was transferred and the behaviour will be identical
to the prior behaviour of the code (believe the buffer up to the error
sector).  All modern drivers seem to set the residual, so that should
fix up the LSI failure/corruption case.

Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-22 23:26:48 -06:00
Jens Axboe
fcc57045d5 Revert "sd: implement sd_check_events()"
This reverts commit c8d2e93735.

We run into merging problems with the SCSI tree, revert this one
so it can be handled by a postmerge tree there.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-22 09:48:49 +01:00
Luben Tuikov
24d720b726 [SCSI] Retrieve the Caching mode page
Some kernel transport drivers unconditionally disable
retrieval of the Caching mode page. One such for example is
the BBB/CBI transport over USB.  Such a restraint is too
harsh as some devices do support the Caching mode
page. Unconditionally enabling the retrieval of this mode
page over those transports at their transport code level may
result in some devices failing and becoming unusable.

This patch implements a method of retrieving the Caching
mode page without unconditionally enabling it in the
transports which unconditionally disable it. The idea is to
ask for all supported pages, page code 0x3F, and then search
for the Caching mode page in the mode parameter data
returned. The sd driver already asks for all the mode pages
supported by the attached device by setting the page code to
0x3F in order to find out if the media is write protected by
reading the WP bit in the Device Specific Parameter
field. It then attempts to retrieve only the Caching mode
page by setting the page code to 8 and actually attempting
to retrieve it if and only if the transport allows it.

The method implemented here is that if the transport doesn't
allow retrieval of the Caching mode page and the device is
not RBC, then we ask for all pages supported by setting the
page code to 0x3F (similarly to how the WP bit is retrieved
above), and then we search for the Caching mode page in the
mode parameter data returned.

With this patch, devices over SATA, report this (no change):

Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Smart devices report their Caching mode page. This is a
change where we'd previously see the kernel making
assumption about the device's cache being write-through:

Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] 610472646 4096-byte logical blocks: (2.50 TB/2.27 TiB)
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Mode Sense: 47 00 10 08
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA

And "dumb" devices over BBB, are correctly shown not to
support reporting the Caching mode page:

Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] 15663104 512-byte logical blocks: (8.01 GB/7.46 GiB)
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Write Protect is off
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] No Caching mode page present
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through

Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-21 12:23:53 -06:00
Alan Stern
3ff5588d3f [SCSI] sd: improve logic and efficiecy of media-change detection
This patch (as1415) improves the formerly incomprehensible logic in
sd_media_changed() (the current code refers to "changed" as a state,
whereas in fact it is a relation between two states).  It also adds a
big comment so that everyone can understand what is really going on.

The patch also improves efficiency by not reporting a media change
when no medium was ever present.  If no medium was present the last
time we checked and there's still no medium, it's not necessary to
tell the caller that a change occurred.  Doing so merely causes the
caller to attempt to revalidate a non-existent disk, which is a waste
of time.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-21 12:23:52 -06:00
Tejun Heo
c8d2e93735 sd: implement sd_check_events()
Replace sd_media_change() with sd_check_events().  sd used to set the
changed state whenever the device is not ready, which can cause event
loop while the device is not ready.  Media presence handling code is
changed such that the changed state is set iff the media presence
actually changes.  UA still always sets the changed state and
NOT_READY always (at least where it used to set ->changed) clears
media presence, so no event is lost.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-16 17:53:39 +01:00
Tejun Heo
9f8a2c23c6 scsi: replace sr_test_unit_ready() with scsi_test_unit_ready()
The usage of TUR has been confusing involving several different
commits updating different parts over time.  Currently, the only
differences between scsi_test_unit_ready() and sr_test_unit_ready()
are,

* scsi_test_unit_ready() also sets sdev->changed on NOT_READY.

* scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
  NOT_READY.

Due to the above two differences, sr is using its own
sr_test_unit_ready(), but sd - the sole user of the above extra
handling - doesn't even need them.

Where scsi_test_unit_ready() is used in sd_media_changed(), the code
is looking for device ready w/ media present state which is true iff
TUR succeeds w/o sense data or UA, and when the device is not ready
for whatever reason sd_media_changed() explicitly marks media as
missing so there's no reason to set sdev->changed automatically from
scsi_test_unit_ready() on NOT_READY.

Drop both special handlings from scsi_test_unit_ready(), which makes
it equivalant to sr_test_unit_ready(), and replace
sr_test_unit_ready() with scsi_test_unit_ready().  Also, drop the
unnecessary explicit NOT_READY check from sd_media_changed().
Checking return value is enough for testing device readiness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-16 17:53:39 +01:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Martin K. Petersen
518fa8e39b [SCSI] sd: Export effective protection mode in sysfs
Create a sysfs entry that reports the negotiated DIX/DIF protection mode
for a SCSI disk. This depends on the protection type the disk is
formatted with as well as the protection capabilities advertised by the
controller.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-25 14:57:44 -05:00
Linus Torvalds
5cc1035062 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (141 commits)
  USB: mct_u232: fix broken close
  USB: gadget: amd5536udc.c: fix error path
  USB: imx21-hcd - fix off by one resource size calculation
  usb: gadget: fix Kconfig warning
  usb: r8a66597-udc: Add processing when USB was removed.
  mxc_udc: add workaround for ENGcm09152 for i.MX35
  USB: ftdi_sio: add device ids for ScienceScope
  USB: musb: AM35x: Workaround for fifo read issue
  USB: musb: add musb support for AM35x
  USB: AM35x: Add musb support
  usb: Fix linker errors with CONFIG_PM=n
  USB: ohci-sh - use resource_size instead of defining its own resource_len macro
  USB: isp1362-hcd - use resource_size instead of defining its own resource_len macro
  USB: isp116x-hcd - use resource_size instead of defining its own resource_len macro
  USB: xhci: Fix compile error when CONFIG_PM=n
  USB: accept some invalid ep0-maxpacket values
  USB: xHCI: PCI power management implementation
  USB: xHCI: bus power management implementation
  USB: xHCI: port remote wakeup implementation
  USB: xHCI: port power management implementation
  ...

Manually fix up (non-data) conflict: the SCSI merge gad renamed the
'hw_sector_size' member to 'physical_block_size', and the USB tree
brought a new use of it.
2010-10-22 20:30:48 -07:00
Linus Torvalds
c70b5296e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (84 commits)
  [SCSI] be2iscsi: SGE Len == 64K
  [SCSI] be2iscsi: Remove premature free of cid
  [SCSI] be2iscsi: More time for FW
  [SCSI] libsas: fix bug for vacant phy
  [SCSI] sd: Fix overflow with big physical blocks
  [SCSI] st: add MTWEOFI to write filemarks without flushing drive buffer
  [SCSI] libsas: Don't issue commands to devices that have been hot-removed
  [SCSI] megaraid_sas: Add Online Controller Reset to MegaRAID SAS drive
  [SCSI] lpfc 8.3.17: Update lpfc driver version to 8.3.17
  [SCSI] lpfc 8.3.17: Replace function reset methodology
  [SCSI] lpfc 8.3.17: SCSI fixes
  [SCSI] lpfc 8.3.17: BSG fixes
  [SCSI] lpfc 8.3.17: SLI Additions and Fixes
  [SCSI] lpfc 8.3.17: Code Cleanup and Locking fixes
  [SCSI] zfcp: Remove scsi_cmnd->serial_number from debug traces
  [SCSI] ipr: fix array error logging
  [SCSI] aha152x: enable PCMCIA on 64bit
  [SCSI] scsi_dh_alua: Handle all states correctly
  [SCSI] cxgb4i: connection and ddp setting update
  [SCSI] cxgb3i: fixed connection over vlan
  ...
2010-10-22 17:34:15 -07:00
Hans de Goede
5ce524bdff scsi/sd: add a no_read_capacity_16 scsi_device flag
I seem to have a knack for digging up buggy usb devices which don't work
with Linux, and I'm crazy enough to try to make them work.  So this time a
friend of mine asked me to get an mp4 player (an mp3 player which can play
videos on a small screen) to work with Linux.

It is based on the well known rockbox chipset for which we already have an
unusual devs entries to work around some of its bugs.  But this model
comes with an additional twist.

This model chokes on read_capacity_16 calls.  Now normally we don't make
those calls, but this model comes with an sdcard slot and when there is no
card in there (and shipped from the factory there is none), it reports a
size of 0.  However this time the programmers actually got the
read_capacity_10 response right!  So they substract one from the size as
stored internally in the mp3 player before reporting it back, resulting in
an answer of ...  0xffffffff sectors, causing sd.c to try a
read_capacity_16, on which the device crashes.

This patch adds a flag to scsi_device to indicate that a a device cannot
handle read_capacity_16, and when this flag is set if a device reports an
lba of 0xffffffff as answer to a read_capacity_10, assumes it tries to
report a size of 0.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-22 10:22:05 -07:00
Jens Axboe
fa251f8990 Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier
Conflicts:
	block/blk-core.c
	drivers/block/loop.c
	mm/swapfile.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19 09:13:04 +02:00
Martin K. Petersen
526f7c7950 [SCSI] sd: Fix overflow with big physical blocks
The hw_sector_size variable could overflow if a device reported huge
physical blocks.  Switch to the more accurate physical_block_size
terminology and make sure we use an unsigned int to match the range
permitted by READ CAPACITY(16).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-11 17:33:20 -05:00
Michael Reed
1a03ae0f55 [SCSI] sd name space exhaustion causes system hang
Following a site power outage which re-enabled all the ports on my FC
switches, my system subsequently booted with far too many luns!  I had
let it run hoping it would make multi-user.  It didn't.  :(  It hung solid
after exhausting the last sd device, sdzzz, and attempting to create sdaaaa
and beyond.  I was unable to get a dump.

Discovered using a 2.6.32.13 based system.

correct this by detecting when the last index is utilized and failing
the sd probe of the device.  Patch applies to scsi-misc-2.6.

Signed-off-by: Michael Reed <mdr@sgi.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-07 17:16:28 -05:00
Martin K. Petersen
045d3fe766 [SCSI] sd: Update thin provisioning support
Add support for the Thin Provisioning VPD page and use the TPU and TPWS
bits to switch between UNMAP and WRITE SAME(16) for discards.  If no TP
VPD page is present we fall back to old scheme where the max descriptor
count combined with the max lba count are used trigger UNMAP.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-17 13:07:55 -04:00
Tejun Heo
4913efe456 block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush()
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().

blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
device has write cache and can flush it, it should set REQ_FLUSH.  If
the device can handle FUA writes, it should also set REQ_FUA.

All blk_queue_ordered() users are converted.

* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Tejun Heo
6958f14545 block: kill QUEUE_ORDERED_BY_TAG
Nobody is making meaningful use of ORDERED_BY_TAG now and queue
draining for barrier requests will be removed soon which will render
the advantage of tag ordering moot.  Kill ORDERED_BY_TAG.  The
following users are affected.

* brd: converted to ORDERED_DRAIN.
* virtio_blk: ORDERED_TAG path was already marked deprecated.  Removed.
* xen-blkfront: ORDERED_TAG case dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:36 +02:00
Alan Stern
7e44331240 [SCSI] sd: fix medium-removal bug
Commit 409f3499a2 (scsi/sd: remove big
kernel lock) introduced a bug in the sd_release routine.  Medium
removal should be allowed when the number of open file references
drops to 0, not when it becomes non-zero.

This patch (as1414) adjusts the test to fix the bug.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-07 10:31:54 -05:00
Mike Christie
e3b3e62467 [SCSI] scsi/block: increase flush/sync timeout
We have been seeing the flush request timeout with a wide
range of hardware from tgt+iser to FC targets from a major vendor.

After discussions about if the value should be configurable and
what the best value should be, this patch just increases the flush/sync
cache timeout to 1 minute. 2 minutes was determined to be too long, and
making it configurable was troublesome for users.

This patch was made over Linus's tree. It is not made over scsi-misc
or scsi-rc-fixes, because Linus's had block layer changes that my
patch was built over.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-05 14:57:04 -03:00
David Miller
2e4c332913 [SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
GCC warns about empty printf format strings, and after
the addition of %pV these existing such cases in the
scsi driver layer were exposed enough for the compiler
to start seeing them.

Based almost entirely upon a patch by Joe Perches.

[jejb: fix up sym53c8xx msg]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-02 17:23:20 -03:00
H Hartley Sweeten
439d77f70f scsi/sd.c: quiet all sparse noise
In sd_store_cache_type the symbol 'len' is declared twice.  Remove the
second declaration to quiet the following sparse warning.

warning: symbol 'len' shadows an earlier one

In sd_probe the variable 'index' is declared as a u32.  This variable is
used in a call to ida_get_new which is expecting an int *.  Make the
variable an int to quiet the following sparse warning.

warning: incorrect type in argument 2 (different signedness)

There are 4 symbols in the file that are not exported and produce
the following sparse warnings.

warning: symbol 'sd_cdb_cache' was not declared. Should it be static?
warning: symbol 'sd_cdb_pool' was not declared. Should it be static?
warning: symbol 'sd_read_protection_type' was not declared. Should it be static?
warning: symbol 'sd_read_app_tag_own' was not declared. Should it be static?

Make them static to quiet the warnings.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: James E.J. Bottomley <James.Bottomley@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:01 -07:00
Linus Torvalds
2f9e825d3e Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
  block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
  xen-blkfront: fix missing out label
  blkdev: fix blkdev_issue_zeroout return value
  block: update request stacking methods to support discards
  block: fix missing export of blk_types.h
  writeback: fix bad _bh spinlock nesting
  drbd: revert "delay probes", feature is being re-implemented differently
  drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
  drbd: Disable delay probes for the upcomming release
  writeback: cleanup bdi_register
  writeback: add new tracepoints
  writeback: remove unnecessary init_timer call
  writeback: optimize periodic bdi thread wakeups
  writeback: prevent unnecessary bdi threads wakeups
  writeback: move bdi threads exiting logic to the forker thread
  writeback: restructure bdi forker loop a little
  writeback: move last_active to bdi
  writeback: do not remove bdi from bdi_list
  writeback: simplify bdi code a little
  writeback: do not lose wake-ups in bdi threads
  ...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
2010-08-10 15:22:42 -07:00
FUJITA Tomonori
e96f6abe02 scsi: use REQ_TYPE_FS for flush request
scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
requests even when we can (e.g. UNIT ATTENTION) and we send the
response to the callers (then the callers can decide what they want).
We need a workaround such as the commit
77a4229719 to retry BLOCK_PC flush
requests. We will need the similar workaround for discard requests too
since SCSI-ml handle them as BLOCK_PC internally.

This uses REQ_TYPE_FS for flush requests from file systems instead of
REQ_TYPE_BLOCK_PC.

scsi-ml retries only REQ_TYPE_FS requests that have data to
transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
also need to retry REQ_TYPE_FS requests without data because the
callers don't.

This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
requests when appropriate. Thanks to scsi_noretry_cmd(),
REQ_TYPE_BLOCK_PC requests don't be retried as before.

Note that basically, this reverts the commit
77a4229719 since now we use REQ_TYPE_FS
for flush requests.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:41 +02:00
FUJITA Tomonori
6a32a8aed5 scsi: convert discard to REQ_TYPE_FS from REQ_TYPE_BLOCK_PC
Jens, any reason why this isn't included in your for-2.6.36 yet?

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH resend] scsi: convert discard to REQ_TYPE_FS from REQ_TYPE_BLOCK_PC

The block layer (file systems) sends discard requests as REQ_TYPE_FS
(the role of REQ_TYPE_FS is that setting up commands and interpreting
the results). But SCSI-ml treats discard requests as
REQ_TYPE_BLOCK_PC.

scsi-ml can handle discard requests as REQ_TYPE_FS
easily. scsi_setup_discard_cmnd() sets up struct request and the bio
nicely. Only remaining issue is that discard requests can't be
completed partially so we need to modify sd_done.

This conversion also fixes the problem that discard requests aren't
retried when possible (e.g. UNIT ATTENTION).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:31 +02:00
Arnd Bergmann
409f3499a2 scsi/sd: remove big kernel lock
Every user of the BKL in the sd driver is the
result of the pushdown from the block layer
into the open/close/ioctl functions.

The only place that used to rely on the BKL is
the sdkp->openers variable, which gets converted
into an atomic_t.

Nothing else seems to rely on the BKL, since the
functions do not touch global data without holding
another lock, and the open/close functions are
still protected from concurrent execution using
the bdev->bd_mutex.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:26:08 +02:00
Arnd Bergmann
6e9624b8ca block: push down BKL into .open and .release
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.

This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.

The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.

Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:25:34 +02:00
Arnd Bergmann
8a6cfeb6de block: push down BKL into .locked_ioctl
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:25:00 +02:00
FUJITA Tomonori
610a63498f scsi: fix discard page leak
We leak a page allocated for discard on some error conditions
(e.g. scsi_prep_state_check returns BLKPREP_DEFER in
scsi_setup_blk_pc_cmnd).

We unprep on requests that weren't prepped in the error path of
scsi_init_io. It makes the error path to clean up scsi commands messy.

Let's strictly apply the rule that we can't unprep on a request that
wasn't prepped.

Calling just scsi_put_command() in the error path of scsi_init_io() is
enough. We don't set REQ_DONTPREP yet.

scsi_setup_discard_cmnd can safely free a page on the error case with
the above rule.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:28 +02:00
FUJITA Tomonori
82b6d57fb1 scsi: need to reset unprep_rq_fn in sd_remove
This is for block's for-2.6.36.

We need to reset q->unprep_rq_fn in sd_remove. Otherwise we hit kernel
oops if we access to a scsi disk device via sg after removing scsi
disk module.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:15 +02:00
FUJITA Tomonori
00fff26539 block: remove q->prepare_flush_fn completely
This removes q->prepare_flush_fn completely (changes the
blk_queue_ordered API).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:15 +02:00
FUJITA Tomonori
90467c294a scsi: stop using q->prepare_flush_fn
scsi-ml builds flush requests via q->prepare_flush_fn(), however,
builds discard requests via q->prep_rq_fn.

Using two different mechnisms for the similar requests (building
commands in SCSI ULD) doesn't make sense.

Handing both via q->prep_rq_fn makes the code design simpler.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:58 +02:00
FUJITA Tomonori
802447c1c0 scsi: remove unused free discard page in sd_done
- sd_done isn't called for pc request so we never call the code.
- we use sd_unprep to free discard page now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:51 +02:00
FUJITA Tomonori
f1126e950d scsi: add sd_unprep_fn to free discard page
This fixes discard page leak by using q->unprep_rq_fn facility.

q->unprep_rq_fn is called when all the data buffer (req->bio and
scsi_data_buffer) in the request is freed.

sd_unprep() uses rq->buffer to free discard page allocated in
sd_prepare_discard().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:49 +02:00
Christoph Hellwig
66ac028019 block: don't allocate a payload for discard request
Allocating a fixed payload for discard requests always was a horrible hack,
and it's not coming to byte us when adding support for discard in DM/MD.

So change the code to leave the allocation of a payload to the lowlevel
driver.  Unfortunately that means we'll need another hack, which allows
us to update the various block layer length fields indicating that we
have a payload.  Instead of hiding this in sd.c, which we already partially
do for UNMAP support add a documented helper in the core block layer for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:08 +02:00
Christoph Hellwig
33659ebbae block: remove wrappers for request type/flags
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests.  This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:17:56 +02:00
Alan Stern
478a8a0543 [SCSI] sd: add support for runtime PM
This patch (as1399) adds runtime-PM support to the sd driver.  The
support is unsophisticated: If a SCSI disk device is mounted, or if
its device file is held open, then the device will not be
runtime-suspended; otherwise it will (provided userspace gives
permission by writing "auto" to the sysfs power/control attribute).

In order to make this work, a dev_set_drvdata() call had to be moved
from sd_probe_async() to sd_probe().  Also, a few lines of code were
changed to use a local variable instead of recalculating the address
of an embedded struct device.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-07-28 09:07:50 -05:00
Tejun Heo
72ec24bd77 SCSI: implement sd_unlock_native_capacity()
Implement sd_unlock_native_capacity() method which calls into
hostt->unlock_native_capacity() if implemented.  This will be invoked
by block layer if partitions extend beyond the end of the device and
can be used to implement, for example, on-demand ATA host protected
area unlocking.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-06-02 13:50:04 -04:00
James Bottomley
95bb335c0e [SCSI] Merge scsi-misc-2.6 into scsi-rc-fixes-2.6
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-05-18 10:37:41 -04:00
Hannes Reinecke
c213e1407b [SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error
Some arrays are giving I/O errors with ext3 filesystems when
SYNCHRONIZE_CACHE gets a UNIT_ATTENTION.  What is happening is that
these commands have no retries, so the UNIT_ATTENTION causes the
barrier to fail.  We should be enable retries here to clear any
transient error and allow the barrier to succeed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-05-05 12:13:26 -04:00
James Bottomley
3233ac1981 [SCSI] sd: retry read_capacity on UNIT_ATTENTION
Hazard testing uncovered yet another bug in sd. Under heavy reset
activity the retry counter might be exhausted and the command will be
returned with sense UNIT_ATTENTION/0x29/00 (POWER ON, RESET, OR BUS
DEVICE RESET OCCURRED). In those cases we should just increase the
retry counter again, retrying one more to clear up this Unit Attention
state.

[jejb: update to work with RC16 devices and not to loop endlessly]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 13:36:26 -05:00
Hannes Reinecke
f87146bba5 [SCSI] sd: quiet spurious error messages in READ_CAPACITY(16)
sd always tries to submit a READ_CAPACITY(16) CDB,
regardless whether the host actually supports it.
queuecommand() will then return DID_ABORT, which is
not qualified enough to detect the true cause here.
So better check in sd_try_rc16 first if the cdblen
is supported.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-04-11 13:25:06 -05:00
Linus Torvalds
2f4084209a Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
  cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
  loop: Update mtime when writing using aops
  block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
  backing-dev: Handle class_create() failure
  Block: Fix block/elevator.c elevator_get() off-by-one error
  drbd: lc_element_by_index() never returns NULL
  cciss: unlock on error path
  cfq-iosched: Do not merge queues of BE and IDLE classes
  cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
  i2o: Remove the dangerous kobj_to_i2o_device macro
  block: remove 16 bytes of padding from struct request on 64bits
  cfq-iosched: fix a kbuild regression
  block: make CONFIG_BLK_CGROUP visible
  Remove GENHD_FL_DRIVERFS
  block: Export max number of segments and max segment size in sysfs
  block: Finalize conversion of block limits functions
  block: Fix overrun in lcm() and move it to lib
  vfs: improve writeback_inodes_wb()
  paride: fix off-by-one test
  drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
  ...
2010-04-09 11:50:29 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jens Axboe
b4b7a4ef09 Merge branch 'master' into for-linus
Conflicts:
	block/Kconfig

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-03-19 08:05:10 +01:00
Linus Torvalds
961cde93de Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (69 commits)
  [SCSI] scsi_transport_fc: Fix synchronization issue while deleting vport
  [SCSI] bfa: Update the driver version to 2.1.2.1.
  [SCSI] bfa: Remove unused header files and did some cleanup.
  [SCSI] bfa: Handle SCSI IO underrun case.
  [SCSI] bfa: FCS and include file changes.
  [SCSI] bfa: Modified the portstats get/clear logic
  [SCSI] bfa: Replace bfa_get_attr() with specific APIs
  [SCSI] bfa: New portlog entries for events (FIP/FLOGI/FDISC/LOGO).
  [SCSI] bfa: Rename pport to fcport in BFA FCS.
  [SCSI] bfa: IOC fixes, check for IOC down condition.
  [SCSI] bfa: In MSIX mode, ignore spurious RME interrupts when FCoE ports are in FW mismatch state.
  [SCSI] bfa: Fix Command Queue (CPE) full condition check and ack CPE interrupt.
  [SCSI] bfa: IOC recovery fix in fcmode.
  [SCSI] bfa: AEN and byte alignment fixes.
  [SCSI] bfa: Introduce a link notification state machine.
  [SCSI] bfa: Added firmware save clear feature for BFA driver.
  [SCSI] bfa: FCS authentication related changes.
  [SCSI] bfa: PCI VPD, FIP and include file changes.
  [SCSI] bfa: Fix to copy fpma MAC when requested by user space application.
  [SCSI] bfa: RPORT state machine: direct attach mode fix.
  ...
2010-03-18 16:54:31 -07:00
NeilBrown
97fedbbe10 Remove GENHD_FL_DRIVERFS
This flag is not used, so best discarded.

Signed-off-by: NeilBrown <neilb@suse.de>
--
Hi Jens,
 I came across this recently - these are the only two occurances
 of "GENHD_FL_DRIVERFS" in the kernel, so it cannot be needed.
NeilBrown
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-03-16 08:55:32 +01:00