Add some tracepoints for ata_qc_issue, ata_qc_complete, and
ata_eh_link_autopsy.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
If NCQ autosense or the sense data reporting feature is enabled
the LBA of the offending command should be stored in the sense
data 'information' field.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
ACS-4 defines a sense data reporting feature set.
This patch implements support for it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Some newer devices support NCQ autosense (cf ACS-4), so we should
be using it to retrieve the sense code and speed up recovery.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
If READ_LOG_DMA_EXT is supported we should try to use it for
reading the log pages.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates. There's also an
update to ufs which adds new phy drivers and finally a new logging
infrastructure for SCSI.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60
5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN
U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ
k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D
+Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO
NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw=
=odol
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates.
There's also an update to ufs which adds new phy drivers and finally a
new logging infrastructure for SCSI"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
scsi_logging: return void for dev_printk() functions
scsi: print single-character strings with seq_putc
scsi: merge consecutive seq_puts calls
scsi: replace seq_printf with seq_puts
aha152x: replace seq_printf with seq_puts
advansys: replace seq_printf with seq_puts
scsi: remove SPRINTF macro
sg: remove an unused variable
hpsa: Use local workqueues instead of system workqueues
hpsa: add in P840ar controller model name
hpsa: add in gen9 controller model names
hpsa: detect and report failures changing controller transport modes
hpsa: shorten the wait for the CISS doorbell mode change ack
hpsa: refactor duplicated scan completion code into a new routine
hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
hpsa: do not use function pointers in fast path command submission
hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
hpsa: return failed from device reset/abort handlers
hpsa: check for ctlr lockup after command allocation in main io path
...
We should only try to evaluate the cdb if this is an ATAPI
device, for any other device the 'cdb' field and the cdb_len
has no meaning.
Fixes: cbba5b0ee4
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
libata already uses an internal buffer, so we should be using
__scsi_format_command() here.
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the FIXME comment in atapi_eh_request_sense () asking whether
memset of sense buffer is necessary. The buffer may be partially or
fully filled by the device. We want it to be cleared.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The driver sata_dwc_460ex is using this symbol. To build it as a
module we have to have the symbol exported. This patch adds
EXPORT_SYMBOL_GPL() macro for that.
tj: Updated to use EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as
the only known user is an in-tree driver. Suggested by Sergei.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Add new ATA device type for ZAC devices.
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
libata-eh.c should handle AMNF error condition (error byte bit 0,
usually code 0x01) in libata-eh.c along with UNC as a media error so
SCSI stack can handle it properly (translation code 0x01 is already
present in libata-scsi.c) but was never passed down due to lack of
handling in EH.
While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
AMNF 0x01 error code generates generic "device error" that is retried
several times by SCSI stack instead of "media error" that is passed up to
software.
So we may assume deprecated AMNF error code is surely not dead yet, and
it's better for it to be handled properly. As we may see it is used by
modern enough devices, and used properly: drive returned AMNF only when IDs
for track cannot be read completely due to dying head or positioning,
otherwise it returned UNC(orrectables).
Not handling it causes wrong generic error code ("device error") reporting
down the stack, can damage failing drives further because of excessive
retries, and slows salvaging down a lot. Also, there is handling code in
libata-scsi.c for 0x01 AMNF error already.
https://bugzilla.kernel.org/show_bug.cgi?id=80031
tj: Shortened $SUBJ and moved its content to the first paragraph.
Signed-off-by: Alexey Asemov <alex@alex-at.ru>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun says:
"At least for libata, worrying about suspend/resume failures don't make
whole lot of sense. If suspend failed, just proceed with suspend. If
the device can't be woken up afterwards, that's that. There isn't
anything we could have done differently anyway. The same for resume, if
spinup fails, the device is dud and the following commands will invoke
EH actions and will eventually fail. Again, there really isn't any
*choice* to make. Just making sure the errors are handled gracefully
(ie. don't crash) and the following commands are handled correctly
should be enough."
The only libata user that actually cares about the result from a suspend
operation is libsas. However, it only cares about whether queuing a new
operation collides with an in-flight one. All libsas does with the
error is retry, but we can just let libata wait for the previous
operation before continuing.
Other cleanups include:
1/ Unifying all ata port pm operations on an ata_port_pm_ prefix
2/ Marking all ata port pm helper routines as returning void, only
ata_port_pm_ entry points need to fake a 0 return value.
3/ Killing ata_port_{suspend|resume}_common() in favor of calling
ata_port_request_pm() directly
4/ Killing the wrappers that just do a to_ata_port() conversion
5/ Clearly marking the entry points that do async operations with an
_async suffix.
Reference: http://marc.info/?l=linux-scsi&m=138995409532286&w=2
Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Prompted by the social effort in the US to discourage usage of the
adjective "retarded".
In this case we needlessly anthropomorphize hard drives. The
implication is that due to design deficiencies in the device reset
recovery time is negatively impacted. We can simply clearly state that
fact. "Exceptional devices cause outliers in reset recovery time." This
steers clear of any unintended comparison of such devices to humans with
cognitive disabilities.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Remove an unnecessary arithmetic operation from a call to snprintf, because
the size parameter of snprintf includes the trailing null space.
Also, initialize the buffer on definition instead of a memset call.
Signed-off-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libata changes from Tejun Heo:
"Nothing too interesting. Only two minor fixes in libata core. Most
changes are specific to hardware which isn't too common"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: Add Device IDs for Intel Wildcat Point-LP
sata_rcar: Convert to clk_prepare/unprepare
drivers/libata: Set max sector to 65535 for Slimtype DVD A DS8A9SH drive
libata: Add some missing command descriptions
sata_highbank: clear whole array in highbank_initialize_phys()
ahci: disabled FBS prior to issuing software reset
libata: Fix display of sata speed
ahci: imx: setup power saving methods
ata_piix: minor typo and a printk fix
ahci: Changing two module params with static and __read_mostly
Add some missing command enumerations from the ATA-8 ACS-3 spec into
include/linux/ata.h, and add the corresponding human-readable command
descriptions in libata-eh.c.
Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
libata EH decrements scmd->retries when the command failed for reasons
unrelated to the command itself so that, for example, commands aborted
due to suspend / resume cycle don't get penalized; however,
decrementing scmd->retries isn't enough for ATA passthrough commands.
Without this fix, ATA passthrough commands are not resend to the
drive, and no error is signalled to the caller because:
- allowed retry count is 1
- ata_eh_qc_complete fill the sense data, so result is valid
- sense data is filled with untouched ATA registers.
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Jeff moved on to a greener pasture.
s/Maintained by: Jeff Garzik/Maintained by: Tejun Heo/g
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
We need to do different things for system PM and runtime PM, e.g. we do
not need to enable runtime wake for ZPODD when we are doing system
suspend, etc.
Currently, we use PMSG_SUSPEND for both system suspend and runtime
suspend and PMSG_ON for both system resume and runtime resume. Change
this by using PMSG_AUTO_SUSPEND for runtime suspend and PMSG_AUTO_RESUME
for runtime resume. And since PMSG_ON means no transition, it is changed
to PMSG_RESUME for ata port's system resume.
The ata_acpi_set_state is modified accordingly, and the sata case and
pata case is seperated for easy reading.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
When ata port is runtime suspended, it will check if the ODD attched to
it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero
power ready state. And if this is not the case, the highest acpi state
will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD. And
if the ODD can be powered off, runtime wake capability needs to be
enabled and powered_off flag will be set to let resume code knows that
the ODD was in powered off state.
And on resume, before it is powered on, if it was powered off during
suspend, runtime wake capability needs to be disabled. After it is
recovered, the ODD is considered functional, post power on processing
like eject tray if the ODD is drawer type is done, and several ZPODD
related fields will also be reset.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Per the Mount Fuji spec, the ODD is considered zero power ready when:
- For slot type ODD, no media inside;
- For tray type ODD, no media inside and tray closed.
The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.
The information provided by the media status byte is not accurate, it
is possible that after a new disc is just inserted, the status byte
still returns media not present. So this information can not be used as
the deciding factor, we use sense code to decide if zpready status is
true.
When we first sensed the ODD in the zero power ready state, the
zp_sampled will be set and timestamp will be recoreded. And after ODD
stayed in this state for some pre-defined period, the ODD is considered
as power off ready and the zp_ready flag will be set. The zp_ready flag
serves as the deciding factor other code will use to see if power off is
OK for the ODD.
The Mount Fuji spec suggests a delay should be used here, to avoid the
case user ejects the ODD and then instantly inserts a new one again, so
that we can avoid a power transition. And some ODDs may be slow to place
its head to the home position after disc is ejected, so a delay here is
generally a good idea. And the delay time can be changed via the module
param zpodd_poweroff_delay.
The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
It should be a mistake introduced by commit 8d899e70c1.
qc->flags can't be set AC_ERR_*
Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ata_device->dma_mode's initial value is zero, which is not a valid dma
mode, but ata_dma_enabled will return true for this value. This patch
sets dma_mode to 0xff in reset function, so that ata_dma_enabled will
not return true for this case, or it will cause problem for pata_acpi.
The corrsponding bugzilla page is at:
https://bugzilla.kernel.org/show_bug.cgi?id=49151
Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Szymon Janc <szymon@janc.net.pl>
Tested-by: Dutra Julio <dutra.julio@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This is a large set of updates, mostly for drivers (qla2xxx [including support
for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa, be2iscsi, isci,
lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas). There's also a rework for tape
adding virtually unlimited numbers of tape drives plus a set of dif fixes for
sd and a fix for a live lock on hot remove of SCSI devices.
This round includes a signed tag pull of isci-for-3.6
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJQaqCFAAoJEDeqqVYsXL0MKJ4IALg/Obnk0/fNvBUNIrh5zRmj
r9UlXFJnlEDT03qRGdn8okgWMChbgaD1ZrwDTQnjNsabVQoTXI6oO6/uL2c8crpY
BFBwJvkNJS99nbcZv10CpJ3K7ykmRnKlkYon12iknhGwdtU+XJ14Z4PUcZkI9jmg
sBQQ6uNVWyosaONNE+k6o+dw6OTttJkzRX8e9in3thstxNTcG+h9iB1zZ/ETkSEj
tD4MyOgDiPf8kPV2awQThQGpni9Tu3SQr5dEn/iUUktUjiYsDNQuyaAk+QzyhUU7
D35iIJnIHlXTSTMQkrG4qpJHBvqPkWlYJzaOmheQryQ3vzp2C5Ly/hS9il45uIQ=
=49u9
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"This is a large set of updates, mostly for drivers (qla2xxx [including
support for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa,
be2iscsi, isci, lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas).
There's also a rework for tape adding virtually unlimited numbers of
tape drives plus a set of dif fixes for sd and a fix for a live lock
on hot remove of SCSI devices.
This round includes a signed tag pull of isci-for-3.6
Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
Fix up trivial conflict in drivers/scsi/qla2xxx/qla_nx.c due to new PCI
helper function use in a function that was removed by this pull.
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (198 commits)
[SCSI] st: remove st_mutex
[SCSI] sd: Ensure we correctly disable devices with unknown protection type
[SCSI] hpsa: gen8plus Smart Array IDs
[SCSI] qla4xxx: Update driver version to 5.03.00-k1
[SCSI] qla4xxx: Disable generating pause frames for ISP83XX
[SCSI] qla4xxx: Fix double clearing of risc_intr for ISP83XX
[SCSI] qla4xxx: IDC implementation for Loopback
[SCSI] qla4xxx: update copyrights in LICENSE.qla4xxx
[SCSI] qla4xxx: Fix panic while rmmod
[SCSI] qla4xxx: Fail probe_adapter if IRQ allocation fails
[SCSI] qla4xxx: Prevent MSI/MSI-X falling back to INTx for ISP82XX
[SCSI] qla4xxx: Update idc reg in case of PCI AER
[SCSI] qla4xxx: Fix double IDC locking in qla4_8xxx_error_recovery
[SCSI] qla4xxx: Clear interrupt while unloading driver for ISP83XX
[SCSI] qla4xxx: Print correct IDC version
[SCSI] qla4xxx: Added new mbox cmd to pass driver version to FW
[SCSI] scsi_dh_alua: Enable STPG for unavailable ports
[SCSI] scsi_remove_target: fix softlockup regression on hot remove
[SCSI] ibmvscsi: Fix host config length field overflow
[SCSI] ibmvscsi: Remove backend abstraction
...
Device Sleep is a feature as described in AHCI 1.3.1 Technical Proposal.
This feature enables an HBA and SATA storage device to enter the DevSleep
interface state, enabling lower power SATA-based systems.
Aggressive Device Sleep enables the HBA to assert the DEVSLP signal as
soon as there are no commands outstanding to the device and the port
specific Device Sleep idle timer has expired. This enables autonomous
entry into the DevSleep interface state without waiting for software
in power sensitive systems.
This patch enables Aggressive Device Sleep only if both host controller
and device support it.
Tested on AMD reference board together with Device Sleep supported device
sample.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Reviewed-by: Aaron Lu <aaron.lwe@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up. In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Two bits were appended to the end of the bitfield
list in struct scsi_device. Resolve that conflict
by including both bits.
Conflicts:
include/scsi/scsi_device.h
The function ata_ering_clear_cb is only referenced in this file and
should be marked static to prevent it from being exposed globally.
This quiets the sparse warning:
warning: symbol 'ata_ering_clear_cb' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).
Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time. Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
ATA and SATA drives have had built-in retries for media errors
for as long as they've been commonplace in computers (early 1990s).
When libata stumbles across a bad sector, it can waste minutes
sitting there doing retry after retry before finally giving up
and letting the higher layers deal with it.
This patch removes retries for media errors only.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Commit d902747("[libata] Add ATA transport class") introduced
ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared.
But ata_count_probe_trials_cb() didn't check this flag and it still
counts the old error history. So wrong probe trials count is returned
and it causes problem, for example, SATA link speed is slowed down from
3.0Gbps to 1.5Gbps.
Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb().
Cc: stable <stable@vger.kernel.org> # 2.6.37+
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata. Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).
Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds. They are not
prepared for it to loop back into eh.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Reenable sending SRST to devices connected behind a Sil3726 PMP.
This allow staggered spinups and handles drives that spins up slowly.
While the drives spin up, the PMP will not accept SRST.
Most controller reissues the reset until the drive is ready, while
some [Sil3124] returns an error.
In ata_eh_error, wait 10s before reset the ATA port and try again.
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
They were getting this implicitly by an include of module.h
from device.h -- but we are going to clean that up and break
that include chain, so include export.h explicitly now.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
libata EH intentionally left a port frozen if it failed
ata_eh_reset(). The intention was avoiding continuous loop of resets
when the controller or attached device is flaky and reporting spurious
hotplug events. Once port enters this state, it can be recovered with
manual rescan, which seemed reasonable.
However, outside of my convoluted test setup, there have been very few
reports justifying this choice while there have been more cases where
the automatic freezing of the port after hotplug attempt of a faulty
device caused confusion and led to unnecessary resets.
This patch changes the behavior so that the port is thawed after reset
failure. This change doesn't necessarily solve but makes it easier
and more intuitive to work around hotplug related problems
(ie. re-pluggin or power cycling the device) as reported in the
followings.
https://bugzilla.kernel.org/show_bug.cgi?id=34712http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Reartes Guillermo <rtguille@gmail.com>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
To work around controllers which can't properly plug events while
reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
after reset but before RESET is marked done. As reset is the final
recovery action and full verification of devices including onlineness
and classfication match is done afterwards, this shouldn't lead to
lost devices or missed hotplug events.
Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
if the condition happens after resetting an empty port, the port could
be left frozen and EH will end without thawing it, making the port
unresponsive to further hotplug events.
Thaw if the port is frozen after clearing EH_PENDING. This problem is
reported by Bruce Stenning in the following thread.
http://thread.gmane.org/gmane.linux.kernel/1123265
stable: I think we should weather this patch a bit longer in -rcX
before sending it to -stable. Please wait at least a month
after this patch makes upstream. Thanks.
-v2: Fixed spelling in the comment per Dave Howorth.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Cc: stable@kernel.org
Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Give users the option of completely powering off unoccupied
SATA ports using the existing min_power link_power_management_policy
option. When the use selects this option on an empty port, we
will power the port off by setting DET to off. For occupied ports,
behavior is unchanged.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used. Implement ATA_FLAG_NO_DIPM and apply it.
This problem was reported by Stefan Bader in the following thread.
http://thread.gmane.org/gmane.linux.ide/48841
stable: applicable to 2.6.37 and 38.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Right at the moment, the libata error handler is incredibly
monolithic. This makes it impossible to use from composite drivers
like libsas and ipr which have to handle error themselves in the first
instance.
The essence of the change is to split the monolithic error handler
into two components: one which handles a queue of ata commands for
processing and the other which handles the back end of readying a
port. This allows the upper error handler fine grained control in
calling libsas functions (and making sure they only get called for ATA
commands whose lower errors have been fixed up).
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The SCSI host eh_cmd_q should be protected by the host lock (not the
port lock). This probably doesn't matter that much at the moment,
since we try to serialise the add and eh pieces, but it might matter
in future for more convenient error handling. Plus this switches
libata to the standard eh pattern where you lock, remove from the cmd
queue to a local list and unlock and then operate on the local list.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ata_eh_analyze_serror() suppresses hotplug notifications if LPM is
being used because LPM generates spurious hotplug events. It compared
whether link->lpm_policy was different from ATA_LPM_MAX_POWER to
determine whether LPM is enabled; however, this is incorrect as for
drivers which don't implement LPM, lpm_policy is always
ATA_LPM_UNKNOWN. This disabled hotplug detection for all drivers
which don't implement LPM.
Fix it by comparing whether lpm_policy is greater than
ATA_LPM_MAX_POWER.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Low level drivers may behave differently depending on the current
link->lpm_policy. During ata_eh_set_lpm(), DIPM enable commands are
issued after the successful completion of ap->ops->set_lpm(), which
means that the controller is already in the target state. This causes
DIPM enable commands to be processed with mismatching controller power
state and link->lpm_policy value.
In ahci, link->lpm_policy is used to ignore certain PHY events if LPM
is enabled; however, as DIPM commands are issued with stale
link->lpm_policy, they sometimes end up triggering these conditions
and get aborted leading to LPM configuration failure.
Fix it by updating link->lpm_policy before issuing DIPM enable
commands.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
In libata, the non-EH code paths should always take and release
ap->lock explicitly when accessing hardware or shared data structures.
However, once EH is active, it's assumed that the port is owned by EH
and EH methods don't explicitly take ap->lock unless race from irq
handler or other code paths are expected. However, libata EH didn't
guarantee exclusion among EHs for ports of the same host. IOW,
multiple EHs may execute in parallel on multiple ports of the same
controller.
In many cases, especially in SATA, the ports are completely
independent of each other and this doesn't cause problems; however,
there are cases where different ports share the same resource, which
lead to obscure timing related bugs such as the one fixed by commit
213373cf (ata_piix: fix locking around SIDPR access).
This patch implements exclusion among EHs of the same host. When EH
begins, it acquires per-host EH ownership by calling ata_eh_acquire().
When EH finishes, the ownership is released by calling
ata_eh_release(). EH ownership is also released whenever the EH
thread goes to sleep from ata_msleep() or explicitly and reacquired
after waking up.
This ensures that while EH is actively accessing the hardware, it has
exclusive access to it while allowing EHs to interleave and progress
in parallel as they hit waiting stages, which dominate the time spent
in EH. This achieves cross-port EH exclusion without pervasive and
fragile changes while still allowing parallel EH for the most part.
This was first reported by yuanding02@gmail.com more than three years
ago in the following bugzilla. :-)
https://bugzilla.kernel.org/show_bug.cgi?id=8223
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: yuanding02@gmail.com
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>