linux_dsm_epyc7002/drivers
Mauricio Faria de Oliveira 47796938c4 Revert "dm mpath: fix stalls when handling invalid ioctls"
This reverts commit a1989b3300.

That commit introduced a regression at least for the case of the SG_IO ioctl()
running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
than blocking due to queue_if_no_path until a path becomes active, for example.

That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
from multipath devices; which leads to SCSI/filesystem errors in such a guest.

More general scenarios can hit that regression too. The following demonstration
employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
(some output & user changes omitted for brevity and comments added for clarity).

Reverting that commit restores normal operation (queueing) in failing scenarios;
tested on linux-next (next-20151022).

1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)

    $ cat sg_simple0.c
    ... see [3] ...
    $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
    $ gcc sgio_inquiry.c -o sgio_inquiry

2) The ioctl() works fine with active paths present.

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | |- 8:0:11:0  sdz  65:144  active undef running
    | `- 9:0:9:0   sdbf 67:144  active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  active undef running
      `- 9:0:12:0  sdbo 68:32   active undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    Some of the INQUIRY command's response:
        IBM       2145              0000
    INQUIRY duration=0 millisecs, resid=0

3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
   for unprivileged users (rather than blocking due to queue_if_no_path).

    # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
          do multipathd -k"fail path $path"; done

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | |- 8:0:11:0  sdz  65:144  failed undef running
    | `- 9:0:9:0   sdbf 67:144  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  failed undef running
      `- 9:0:12:0  sdbo 68:32   failed undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
   it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().

    $ dmesg
    <...>
    [] device-mapper: multipath: Failing path 65:144.
    [] device-mapper: multipath: Failing path 67:144.
    [] device-mapper: multipath: Failing path 65:224.
    [] device-mapper: multipath: Failing path 68:32.
    [] sgio_inquiry: sending ioctl 2285 to a partition!

5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
   (then queueing happens -- in this example, queue_if_no_path is set);
   this is due to a conditional check in scsi_verify_blk_ioctl().

    # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

    # ./sgio_inquiry /dev/mapper/85ag56 &
    [1] 72830

    # cat /proc/72830/stack
    [<c00000171c0df700>] 0xc00000171c0df700
    [<c000000000015934>] __switch_to+0x204/0x350
    [<c000000000152d4c>] msleep+0x5c/0x80
    [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
    [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
    [<c0000000003128e4>] block_ioctl+0x64/0xd0
    [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
    [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
    [<c000000000009358>] system_call+0x38/0xd0

6) This is the function call chain exercised in this analysis:

SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
    -> do_vfs_ioctl()
        -> vfs_ioctl()
            ...
            error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
            ...
                -> dm_blk_ioctl() @ drivers/md/dm.c
                    -> multipath_ioctl() @ drivers/md/dm-mpath.c
                        ...
                        (bdev = NULL, due to no active paths)
                        ...
                        if (!bdev || <...>) {
                            int err = scsi_verify_blk_ioctl(NULL, cmd);
                            if (err)
                                r = err;
                        }
                        ...
                            -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                ...
                                if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                    return 0;
                                ...
                                if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                    return 0;
                                ...
                                printk_ratelimited(KERN_WARNING
                                           "%s: sending ioctl %x to a partition!\n" <...>);

                                return -ENOIOCTLCMD;
                            <-
                        ...
                        return r ? : <...>
                    <-
            ...
            if (error == -ENOIOCTLCMD)
                error = -ENOTTY;
             out:
                return error;
            ...

Links:
[1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
[3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-10-31 18:53:51 -04:00
..
accessibility
acpi Merge branch 'acpi-ec' 2015-10-01 22:30:35 +02:00
amba
android mm: mark most vm_operations_struct const 2015-09-10 13:29:01 -07:00
ata Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2015-09-03 16:55:55 -07:00
atm solos-pci: Increase headroom on received packets 2015-09-17 21:29:07 -07:00
auxdisplay
base Merge branches 'pm-cpuidle', 'pm-opp' and 'pm-tools' 2015-10-01 22:30:47 +02:00
bcma
block nvme: move to a new drivers/nvme/host directory 2015-10-09 10:40:37 -06:00
bluetooth Bluetooth: hci_bcm: Fix crash on suspend 2015-08-28 21:09:14 +02:00
bus regmap: Changes for v4.3 2015-09-08 16:48:55 -07:00
cdrom cdrom: Random writing support for BD-RE media 2015-09-25 09:07:57 -06:00
char Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2015-09-26 21:05:23 -04:00
clk A few driver fixes for tegra, rockchip, and st SoCs and a two-liner 2015-09-19 20:17:40 -07:00
clocksource clocksource/drivers/keystone: Fix bad NO_IRQ usage 2015-09-29 14:33:51 +02:00
connector
cpufreq cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get() 2015-09-16 02:17:49 +02:00
cpuidle Additional power management and ACPI material for v4.3-rc1 2015-09-11 19:11:06 -07:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2015-09-26 21:05:23 -04:00
dca
devfreq PM / devfreq: Fix incorrect type issue. 2015-09-11 14:23:30 +09:00
dio
dma dmaengine fixes for 4.3-rc4 2015-10-02 14:46:15 -04:00
dma-buf
edac edac updates for v4.3-rc1 2015-09-11 16:21:12 -07:00
eisa
extcon extcon: Fix attached value returned by is_extcon_changed 2015-09-21 15:07:19 +09:00
firewire
firmware arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions 2015-10-01 12:51:28 +02:00
fmc
gpio Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-18 08:11:42 -07:00
gpu drm/dp/mst: add some defines for logical/physical ports 2015-10-02 15:34:42 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-09-04 12:02:11 -07:00
hsi mm: mark most vm_operations_struct const 2015-09-10 13:29:01 -07:00
hv Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc 2015-09-20 22:44:51 -07:00
hwmon hwmon: (pwm-fan) Fix module autoload for OF platform driver 2015-09-20 17:50:19 -07:00
hwspinlock
hwtracing/coresight
i2c Merge branch 'i2c/for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2015-09-08 16:16:26 -07:00
ide
idle intel_idle: Skylake Client Support - updated 2015-09-10 14:03:44 -04:00
iio This is the bulk of GPIO changes for the v4.3 kernel cycle: 2015-09-04 10:07:45 -07:00
infiniband Changes for 4.3-rc4 2015-10-01 16:38:52 -04:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-10-02 17:53:25 -04:00
iommu Merge git://git.infradead.org/intel-iommu 2015-10-02 07:59:29 -04:00
ipack
irqchip Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-10-04 11:40:09 +01:00
isdn libnvdimm for 4.3: 2015-09-08 14:35:59 -07:00
leds leds:lp55xx: Correct Kconfig dependency for f/w user helper 2015-09-17 10:02:20 +02:00
lguest
macintosh powerpc updates for 4.3 2015-09-03 16:41:38 -07:00
mailbox Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration 2015-09-05 18:11:04 -07:00
mcb
md Revert "dm mpath: fix stalls when handling invalid ioctls" 2015-10-31 18:53:51 -04:00
media media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memory IOMMU Updates for Linux v4.3 2015-09-08 17:22:35 -07:00
memstick
message mptfusion: prevent some memory corruption 2015-08-26 07:11:45 -07:00
mfd genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
misc Char/Misc driver fixes for 4.3-rc3 2015-09-26 20:53:15 -04:00
mmc mmc: core: fix dead loop of mmc_retune 2015-09-30 14:54:22 +02:00
mtd UBI: return ENOSPC if no enough space available 2015-09-29 12:47:05 +02:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-01 21:55:35 -04:00
nfc This is the bulk of GPIO changes for the v4.3 kernel cycle: 2015-09-04 10:07:45 -07:00
ntb NTB: Fix range check on memory window index 2015-09-07 15:27:12 -04:00
nubus
nvdimm pmem: add proper fencing to pmem_rw_page() 2015-09-17 11:49:28 -04:00
nvme nvme: add missing endianess annotations in nvme_pr_command 2015-10-22 09:15:48 -06:00
nvmem
of Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-09-26 06:01:33 -04:00
oprofile
parisc PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code" 2015-09-15 13:18:04 -05:00
parport
pci Merge branches 'pm-pci' and 'acpi-pci' 2015-10-01 22:30:12 +02:00
pcmcia pcmcia: soc_common: remove skt_dev_info's clk pointer 2015-09-03 16:01:03 +01:00
perf
phy This is the bulk of GPIO changes for the v4.3 kernel cycle: 2015-09-04 10:07:45 -07:00
pinctrl Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-18 08:11:42 -07:00
platform platform-drivers-x86 for 4.3-2 2015-09-17 21:41:02 -07:00
pnp
power power supply and reset fixes for the v4.3 series 2015-09-17 12:25:42 -07:00
powercap powercap / RAPL: disable the 2nd power limit properly 2015-08-29 01:46:40 +02:00
pps
ps3
ptp
pwm pwm: Changes for v4.3-rc1 2015-09-09 10:55:32 -07:00
rapidio
ras
regulator Merge commit 'b8c93646fd5c' into omap-for-v4.3/fixes 2015-09-24 16:23:20 -07:00
remoteproc
reset reset: ath79: Fix missing spin_lock_init 2015-09-01 14:48:40 +02:00
rpmsg
rtc rtc: abx80x: fix RTC write bit 2015-09-05 19:37:31 +02:00
s390 virtio: fixes on top of 4.3-rc1 2015-09-18 09:28:20 -07:00
sbus
scsi sd: implement the Persistent Reservation API 2015-10-21 14:46:58 -06:00
sfi
sh SH Drivers Updates for v4.3 2015-09-21 12:02:27 -07:00
sn
soc genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
spi Merge remote-tracking branches 'spi/fix/spidev' and 'spi/fix/xtfpga' into spi-linus 2015-09-22 09:48:41 -07:00
spmi genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
ssb
staging Staging driver fixes for 4.3-rc3 2015-09-26 20:56:50 -04:00
target iscsi-target: Avoid OFMarker + IFMarker negotiation 2015-09-24 23:24:46 -07:00
tc
thermal thermal: avoid division by zero in power allocator 2015-10-01 21:42:35 -04:00
thunderbolt thunderbolt: Allow loading of module on recent Apple MacBooks with thunderbolt 2 controller 2015-09-20 15:20:11 -07:00
tty tty: serial: Add missing module license for 8250_base.ko 2015-09-22 09:09:15 -07:00
uio
usb USB: whiteheat: fix potential null-deref at probe 2015-09-23 12:15:19 -07:00
uwb
vfio
vhost virtio: fixes on top of 4.3-rc1 2015-09-18 09:28:20 -07:00
video Merge branch 'akpm' (patches from Andrew) 2015-09-10 18:19:42 -07:00
virt
virtio virtio_balloon: do not change memory amount visible via /proc/meminfo 2015-09-08 13:32:11 +03:00
vlynq
vme
w1 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2015-09-01 18:46:42 -07:00
watchdog watchdog: iTCO: Fix dependencies on I2C 2015-09-28 10:56:10 +02:00
xen Merge branch 'akpm' (patches from Andrew) 2015-09-10 18:19:42 -07:00
zorro
Kconfig nvme: move to a new drivers/nvme/host directory 2015-10-09 10:40:37 -06:00
Makefile nvme: move to a new drivers/nvme/host directory 2015-10-09 10:40:37 -06:00