Commit Graph

45212 Commits

Author SHA1 Message Date
Dan Williams
1a34c06401 [SCSI] libsas: fix port->dev_list locking
port->dev_list maintains a list of devices attached to a given sas root port.
It needs to be mutated under a lock as contexts outside of the
single-threaded-libsas-workqueue access the list via sas_find_dev_by_rphy().
Fixup locations where the list was being mutated without a lock.

This is a follow-up to commit 5911e963 "[SCSI] libsas: remove expander
from dev list on error", where Luben noted [1]:

    > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(),
    > sas_unregister_common_dev(), and sas_ex_discover_end_dev()

    Yes, I can see that and that is very unfortunate.

[1]: http://marc.info/?l=linux-scsi&m=131480962006471&w=2

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-16 10:54:02 -05:00
Bhanu Prakash Gollapudi
814740d5f6 [SCSI] fcoe,libfcoe: Move common code for fcoe_get_lesb to fcoe_transport
Except for obtaining the netdev from lport, fcoe_get_lesb is the common code
for the LLDs.

Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Acked-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-16 10:38:01 -05:00
Dan Williams
ac013ed1cb [SCSI] isci: export phy events via ->lldd_control_phy()
Allow the sas-transport-class to update events for local phys via a new
PHY_FUNC_GET_EVENTS command to ->lldd_control_phy().  Fixup drivers that
are not prepared for new enum phy_func values, and unify
->lldd_control_phy() error codes.

These are the SAS defined phy events that are reported in a
smp-report-phy-error-log command:
 * /sys/class/sas_phy/<phyX>/invalid_dword_count
 * /sys/class/sas_phy/<phyX>/running_disparity_error_count
 * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count
 * /sys/class/sas_phy/<phyX>/phy_reset_problem_count

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 13:24:26 -05:00
Dan Williams
b50102d3e9 [SCSI] isci: atapi support
Based on original implementation from Jiangbi Liu and Maciej Trela.

ATAPI transfers happen in two-to-three stages.  The two stage atapi
commands are those that include a dma data transfer.  The data transfer
portion of these operations is handled by the hardware packet-dma
acceleration.  The three-stage commands do not have a data transfer and
are handled without hardware assistance in raw frame mode.

stage1: transmit host-to-device fis to notify the device of an incoming
atapi cdb.  Upon reception of the pio-setup-fis repost the task_context
to perform the dma transfer of the cdb+data (go to stage3), or repost
the task_context to transmit the cdb as a raw frame (go to stage 2).

stage2: wait for hardware notification of the cdb transmission and then
go to stage 3.

stage3: wait for the arrival of the terminating device-to-host fis and
terminate the command.

To keep the implementation simple we only support ATAPI packet-dma
protocol (for commands with data) to avoid needing to handle the data
transfer manually (like we do for SATA-PIO).  This may affect
compatibility for a small number of devices (see
ATA_HORKAGE_ATAPI_MOD16_DMA).

If the data-transfer underruns, or encounters an error the
device-to-host fis is expected to arrive in the unsolicited frame queue
to pass to libata for disposition.  However, in the DONE_UNEXP_FIS (data
underrun) case it appears we need to craft a response.  In the
DONE_REG_ERR case we do receive the UF and propagate it to libsas.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 13:20:03 -05:00
Vasu Dev
49a198898e [SCSI] libfc: cache align struct fc_exch fields
cache aligned xid and ex_lock beside
removing holes.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:56:26 -05:00
Vasu Dev
ed26cfece6 [SCSI] libfc: cache align struct fc_fcp_pkt fields
Re-arrange its fields to avoid padding and have better
cacheline alignments.

Removed not used start_time, end_time and last_pkt_time
fields.

This all reduced this struct size to 448 from 480 and
that also reduced one cacheline on x86_64 beside
eliminating 8 pads. However kept logical fields together.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:55:07 -05:00
Dan Williams
05a2a17317 [SCSI] libsas: fix warnings when checking sata/stp protocol
Several sas drivers legitimately check the protocol against the union of
SAS_PROTOCOL_SATA and SAS_PROTOCOL_STP.  Provide a SAS_PROTOCOL_STP_ALL
to silence warnings like:

drivers/scsi/pm8001/pm8001_sas.c:438:3: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
drivers/scsi/mvsas/mv_sas.c:798:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
drivers/scsi/mvsas/mv_sas.c:1783:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
drivers/scsi/mvsas/mv_sas.c:1886:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]
drivers/scsi/isci/request.c:3565:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:51:53 -05:00
Dan Williams
d962480e9a [SCSI] libsas: fix try_test_sas_gpio_gp_bit() build error
If the user has disabled CONFIG_SCSI_SAS_HOST_SMP then libsas drivers
will not be receiving smp-gpio frames and do not need this lookup code.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Tested-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:47:40 -05:00
Dan Williams
f6e67035a9 [SCSI] libsas,libata: fix ->change_queue_{depth|type} for sata devices
Pass queue_depth change requests to libata, and prevent queue_type
changes for ATA devices.

Otherwise:
1/ we do not honor the libata specific restrictions on the queue depth
2/ libsas drivers that do not set sdev->tagged_supported are unable to
   change the queue_depth of ata devices via sysfs

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:30:30 -05:00
Luben Tuikov
ffaac8f45b [SCSI] libsas: Allow expander T-T attachments
Allow expander table-to-table attachments for
expanders that support it.

Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:23:11 -05:00
Dan Williams
8ec6552f4a [SCSI] libsas: sgpio write support
Add SFF-8485 v0.7 / SAS-1 smp-write-gpio register support to libsas.
Defer SAS-2 support unless/until it defines an sgpio interface.

Minimum implementation needed to get the lights blinking.
try_test_sas_gpio_gp_bit() provides a common method to parse the
incoming write data (raw bitstream), and the to_sas_gpio_gp_bit() helper
routine can be used as a basis for the set/clear operations for the
'read' implementation.  Host implementations parse as many bits
(ODx.[012]) as are locally supported and report the number of registers
successfully written.  If the submitted data overruns the internal
number of registers available report the write as a success with the
number of bytes remaining reported in ->resid_len.

Example (assuming an active backplane) set the "identify" pattern for
the first 21 devices:

smp_write_gpio --count=2 --data=92,49,24,92,24,92,49,24 -t 4 --index=1 /dev/bsg/sas_hostX

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-09-22 14:59:09 +04:00
Hannes Reinecke
6c3633d08a [SCSI] scsi_dh: Implement match callback function
Some device handler types are not tied to the vendor/model
but rather to a specific capability. Eg ALUA is supported
if the 'TPGS' setting in the standard inquiry is set.
This patch implements a 'match' callback for device handler
which supersedes the original vendor/model lookup and
implements the callback for the ALUA handler.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-30 12:28:30 -07:00
Hannes Reinecke
d7c48feb38 [SCSI] scsi_dh_alua: Evaluate TPGS setting from inquiry data
Instead of issuing a standard inquiry from within the
alua device handler we can evaluate the TPGS setting from
the existing inquiry data of the sdev and save us the I/O.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-30 12:26:49 -07:00
Nao Nishijima
a72c5e5eb7 [SCSI] genhd: add a new attribute "alias" in gendisk
This patch allows the user to set an "alias" of the disk via sysfs interface.

This patch only adds a new attribute "alias" in gendisk structure.
To show the alias instead of the device name in kernel messages,
we need to revise printk messages and use alias_name() in them.

Example:
(current) printk("disk name is %s\n", disk->disk_name);
(new)     printk("disk name is %s\n", alias_name(disk));

Users can use alphabets, numbers, '-' and '_' in "alias" attribute. A disk can
have an "alias" which length is up to 255 bytes. This attribute is write-once.

Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Suggested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Nao Nishijima <nao.nishijima.xt@hitachi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-29 00:16:19 -07:00
Mike Christie
76e4e12ff2 [SCSI] scsi scan: don't fail scans when host is in recovery
The problem is that if we are doing a scsi scan then the device goes
into recovery then we will wait for the recovery to complete. It waits
because scsi-ml will send inquiries or report luns and the queueing code
will have been blocked due to the host not being ready. However, if we
are in recovery and then a scan is started the scan will silently fail
and some devices will not be added.

It is easy to hit the problem where devices do not show up with
FC where we are doing tests that disrupt the target controllers.
When the controller is disruprted (reboot, or setting firmware, etc),
and we cause the dev loss tmo to fire then devices will be removed
Then when the problem has been fixed, the rport will be scanned and
devices should be added back. But if we cause another disruption before
scanning has started then devices will not get added back. If the problem
is not started until the scan is started then the devices will be added
back.

This patch fixes that problem by not failing scans when the host
is in recovery. We will let scsi-ml send the IO and let the queueing
and scsi error handling deal with it like is done if we went into
recovery while scanning.

For recovery cases where the host is being torn down then with the
patch we will still fail the scan since there is not point in scanning.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-29 00:16:19 -07:00
Vikas Chaudhary
2944369144 [SCSI] scsi: Added support for adapter and firmware reset
Added new sysfs attr 'host_reset' in scsi_sysfs.c to
perform adapter or firmware reset as suggested by
Mike Christie here:
http://marc.info/?l=linux-scsi&m=127359347111167&w=2

user/application can write "adapter" or "firmware" on
this attr and it will call newly added function hook
in scsi_host_template to call LDD adapter or firmware
reset implementation.

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:46 -06:00
Vikas Chaudhary
fcb5124e03 [SCSI] scsi_transport_iscsi: Added support to update initiator iscsi port
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:37 -06:00
Vikas Chaudhary
8c7d40fb6b [SCSI] scsi_transport_iscsi: Added support to update mtu
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:28 -06:00
Manish Rangankar
17fa575eec [SCSI] scsi_transport_iscsi: Add conn login, kernel to user, event to support offload session login.
Offload drivers like qla4xxx will offload the sending of the login/logout
pdus still, so this patch adds iscsi_conn_login_event which is
used by these types of drivers to notify userspace that the connection
has changed state.

It also adds a iscsi_is_session_online helper so the lld
can query the sessions state field.

Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:23 -06:00
Mike Christie
90eeb01a03 [SCSI] iscsi class: add bsg support to iscsi class
This patch adds bsg support to the iscsi class. There is only
1 request, the host vendor one, supported. It is expected that
this would be used for things like flash updates.

This patch is made over this one
http://marc.info/?l=linux-scsi&m=131149780020992&w=2

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:21 -06:00
Mike Christie
4223b9e919 [SCSI] iscsi class: expand vlan support
Add support to set vlan priority and enable/disble a vlan.

Patch based on code from Vikas Chaudhary.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:18 -06:00
Mike Christie
f27fb2ef7b [SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:14 -06:00
Mike Christie
b78dbba005 [SCSI] iscsi class: remove iface param mask
We can replace the iface param mask with the
attr_is_visible callback.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:12 -06:00
Mike Christie
1d063c1729 [SCSI] iscsi class: sysfs group is_visible callout for session attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:06 -06:00
Mike Christie
3128c6c73c [SCSI] iscsi cls: sysfs group is_visible callout for conn attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:03 -06:00
Mike Christie
8d07913dbe [SCSI] iscsi class: add iface representation
A iscsi host can have multiple interfaces. This patch
adds a new iface iscsi class for this. It exports the
network settings now, and will be extended to also
export iscsi initiator port settings like the isid
and initiator name for drivers that can support multiple
initiator ports.

Based on patch from Lalit Chandivade.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:00 -06:00
Mike Christie
56c155b5ca [SCSI] iscsi_transport: add support for net settings
Allows user space (iscsiadm) to send down network configuration
parameters for LLD to set private network configuration on the iSCSI
adapters.

Based on patch from Lalit Chandivade.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:56 -06:00
Bhanu Prakash Gollapudi
d834895c41 [SCSI] fcoe: Move common functions to fcoe_transport library
Export fcoe_get_wwn, fcoe_validate_vport_create and fcoe_wwn_to_str so that all
LLDs can use these common function.

Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:46 -06:00
Dan Williams
4fcf812ca3 [SCSI] libsas: export sas_alloc_task()
Now that isci has added a 3rd open coded user of this functionality just
share the libsas version.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:13 -06:00
Linus Torvalds
5ccc38740a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
  Revert "cfq: Remove special treatment for metadata rqs."
  block: fix flush machinery for stacking drivers with differring flush flags
  block: improve rq_affinity placement
  blktrace: add FLUSH/FUA support
  Move some REQ flags to the common bio/request area
  allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
  xen/blkback: Make description more obvious.
  cfq-iosched: Add documentation about idling
  block: Make rq_affinity = 1 work as expected
  block: swim3: fix unterminated of_device_id table
  block/genhd.c: remove useless cast in diskstats_show()
  drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
  drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
  bsg-lib: add module.h include
  cfq-iosched: Reduce linked group count upon group destruction
  blk-throttle: correctly determine sync bio
  loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
  loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
  loop: add management interface for on-demand device allocation
  loop: replace linked list of allocated devices with an idr index
  ...
2011-08-19 10:47:07 -07:00
Linus Torvalds
0c3bef6128 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric
2011-08-19 10:02:37 -07:00
Linus Torvalds
b4fd4ae6c6 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
2011-08-17 13:15:25 -07:00
Ian Campbell
aa462abe8a mm: fix __page_to_pfn for a const struct page argument
This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0ec9 "mm: make some struct page's const") to be
removed.

Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Ian Campbell
f991879473 mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.
Followup to 33dd4e0ec9 "mm: make some struct page's const" which missed the
HASHED_PAGE_VIRTUAL case.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Linus Torvalds
6cac952960 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtc: Limit RTC PIE frequency
  rtc: Fix hrtimer deadlock
  rtc: Handle errors correctly in rtc_irq_set_state()

Fixup trivial conflicts in drivers/rtc/interface.c due to slightly
trivially versions of the same patch coming in two different ways.
2011-08-17 10:28:33 -07:00
Linus Torvalds
950d0a10d1 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: Track the owner of irq descriptor
  irq: Always set IRQF_ONESHOT if no primary handler is specified
  genirq: Fix wrong bit operation
2011-08-17 10:23:50 -07:00
Jeff Moyer
4853abaae7 block: fix flush machinery for stacking drivers with differring flush flags
Commit ae1b153962, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
-               while (!list_empty(&q->queue_head)) {
+               if (!list_empty(&q->queue_head)) {
                        rq = list_entry_rq(q->queue_head.next);
-                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
-                           (rq->cmd_flags & REQ_FLUSH_SEQ))
-                               return rq;
-                       rq = blk_do_flush(q, rq);
-                       if (rq)
-                               return rq;
+                       return rq;
                }

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
        unsigned int fflags = q->flush_flags; /* may change, cache it */
        bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
        bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
        bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
        REQ_FUA);
        unsigned skip = 0;
...
        if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                rq->cmd_flags &= ~REQ_FLUSH;
		if (!has_fua)
			rq->cmd_flags &= ~REQ_FUA;
	        return rq;
	}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all.  Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking.  While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io).  Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers.  So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance.  Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-15 21:37:25 +02:00
Linus Torvalds
97c24d1d45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: remove unused "ddr" parameter in struct mmc_ios
  mmc: dw_mmc: Fix DDR mode support.
  mmc: core: use defined R1_STATE_PRG macro for card status
  mmc: sdhci: use f_max instead of host->clock for timeouts
  mmc: sdhci: move timeout_clk calculation farther down
  mmc: sdhci: check host->clock before using it as a denominator
  mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK"
  mmc: tmio: eliminate unused variable 'mmc' warning
  mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC
  mmc: sdhci-s3c: Fix build for header change
  mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro
  mmc: cb710: fix possible pci_dev leak in cb710_pci_configure()
  mmc: core: Detect eMMC v4.5 ext_csd entries
  mmc: mmc_test: avoid stalled file in debugfs
  mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk
  mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing
  mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
2011-08-14 12:28:15 -07:00
Rafael J. Wysocki
17f2ae7f67 PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case.  Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-14 13:34:31 +02:00
Linus Torvalds
17987783e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: Fix compile warning in wm8750.c
  ASoC: omap: Update e-mail address of Jarkko Nikula
  ASoC: SAMSUNG: Add I2S0 internal dma driver
  ASoC: Terminate WM8750 SPI device ID table
  ASoC: Add missing break in WM8994 probe
  ALSA: snd-usb-caiaq: Correct offset fields of outbound iso_frame_desc
  ALSA: azt3328 - adjust error handling code to include debugging code
  ALSA: hda - Add CONFIG_SND_HDA_POWER_SAVE to stac_vrefout_set()
  ALSA: usb-audio - Add quirk for BOSS Micro BR-80
  ASoC: Fix typo in wm8750 spi_ids
  ASoC: Fix warning in Speyside WM8962
  ASoC: Fix SPI driver binding for WM8987
  ASoC: Fix binding of WM8750 on Jive
  ASoC: WM8903: Free IRQ on device removal
  ASoC: Tegra: wm8903 machine driver: Allow re-insertion of module
  ASoC: Tegra: tegra_pcm_deallocate_dma_buffer: Don't OOPS
2011-08-13 18:36:28 -07:00
Jaehoon Chung
7fd781e8f9 mmc: remove unused "ddr" parameter in struct mmc_ios
"mmc: dw_mmc: Fix DDR mode support" removed the last user.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2011-08-13 14:50:32 -04:00
Linus Torvalds
73e0881d31 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  dt: add empty of_get_property for non-dt
2011-08-12 21:59:09 -07:00
Linus Torvalds
ce8a84ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  e1000e: increase driver version number
  e1000e: alternate MAC address update
  e1000e: do not disable receiver on 82574/82583
  e1000e: alternate MAC address does not work on device id 0x1060
  PCnet: Fix section mismatch
  bnx2x: disable dcb on 578xx since not supported yet
  bnx2x: properly clean indirect addresses
  bnx2x: prevent race between undi_unload and load flows
  bnx2x: fix select_queue when FCoE is disabled
  bnx2x: init FCOE FP only once
  ipv4: some rt_iif -> rt_route_iif conversions
  net/bridge/netfilter/ebtables.c: use available error handling code
  net/netlabel/netlabel_kapi.c: add missing cleanup code
  net/irda: sh_sir: tidyup compile warning
  net/irda: sh_sir: add missing header
  net/irda: sh_irda: add missing header
  slcan: ldisc generated skbs are received in softirq context
  scm: Capture the full credentials of the scm sender
  tcp: initialize variable ecn_ok in syncookies path
  drivers/net/wireless/wl1251: add missing kfree
  ...
2011-08-12 06:43:53 -07:00
Jarkko Nikula
7ec41ee5ad ASoC: omap: Update e-mail address of Jarkko Nikula
My gmail account got disabled and I'm not going to reopen it.

Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-08-12 11:45:10 +09:00
Vasiliy Kulikov
72fa59970f move RLIMIT_NPROC check from set_user() to do_execve_common()
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 11:24:42 -07:00
Namhyung Kim
c09c47caed blktrace: add FLUSH/FUA support
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

 - WRITE:            W
 - WRITE_FLUSH:      FW
 - WRITE_FUA:        WF
 - WRITE_FLUSH_FUA:  FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:05 +02:00
Matthew Wilcox
8e4bf84474 Move some REQ flags to the common bio/request area
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as
on a request, so relocate them to the shared part of the enum.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:03 +02:00
Stephen Warren
89272b8c0d dt: add empty of_get_property for non-dt
The patch adds empty function of_get_property for non-dt build, so that
drivers migrating to dt can save some '#ifdef CONFIG_OF'.

This also fixes the current Tegra compile problem in linux-next.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-09 11:27:16 -06:00
Linus Torvalds
6bb615bc39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  sound: pss - don't use the deprecated function check_region
  ALSA: timer - Add NULL-check for invalid slave timer
  ALSA: timer - Fix Oops at closing slave timer
  ASoC: Acknowledge WM8996 interrupts before acting on them
  ASoC: Rename WM8915 to WM8996
  ALSA: Fix dependency of CONFIG_SND_TEA575X
  ALSA: asihpi - use kzalloc()
  ALSA: snd-usb-caiaq: Fix keymap for RigKontrol3
  ALSA: snd-usb: Fix uninitialized variable usage
  ALSA: hda - Fix a complile warning in patch_via.c
  ALSA: hdspm - Fix uninitialized compile warnings
  ALSA: usb-audio - add quirk for Keith McMillen StringPort
  ALSA: snd-usb: operate on given mixer interface only
  ALSA: snd-usb: avoid dividing by zero on invalid input
  ALSA: snd-usb: Accept UAC2 FORMAT_TYPE descriptors with bLength > 6
  sound: oss/pas2: Remove CLOCK_TICK_RATE dependency from PAS16 driver
  ALSA: hda - Use auto-parser for ASUS UX50, Eee PC P901, S101 and P1005
  ALSA: hda - Fix digital-mic mono recording on ASUS Eee PC
  ASoC: sgtl5000: fix cache handling
  ASoC: Disable wm_hubs periodic DC servo update
2011-08-09 08:41:36 -07:00
Peter Zijlstra
5c723ba5b7 mm: Fix fixup_user_fault() for MMU=n
In commit 2efaca927f ("mm/futex: fix futex writes on archs with SW
tracking of dirty & young") we forgot about MMU=n.  This patch fixes
that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/1311761831.24752.413.camel@twins
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 12:11:02 -07:00