linux_dsm_epyc7002/drivers/ata
David Jeffery ce75145267 libata: prevent HSM state change race between ISR and PIO
It is possible for ata_sff_flush_pio_task() to set ap->hsm_task_state to
HSM_ST_IDLE in between the time __ata_sff_port_intr() checks for HSM_ST_IDLE
and before it calls ata_sff_hsm_move() causing ata_sff_hsm_move() to BUG().

This problem is hard to reproduce making this patch hard to verify, but this
fix will prevent the race.

I have not been able to reproduce the problem, but here is a crash dump from
a 2.6.32 kernel.

On examining the ata port's state, its hsm_task_state field has a value of HSM_ST_IDLE:

crash> struct ata_port.hsm_task_state ffff881c1121c000
  hsm_task_state = 0

Normally, this should not be possible as ata_sff_hsm_move() was called from ata_sff_host_intr(),
which checks hsm_task_state and won't call ata_sff_hsm_move() if it has a HSM_ST_IDLE value.

PID: 11053  TASK: ffff8816e846cae0  CPU: 0   COMMAND: "sshd"
 #0 [ffff88008ba03960] machine_kexec at ffffffff81038f3b
 #1 [ffff88008ba039c0] crash_kexec at ffffffff810c5d92
 #2 [ffff88008ba03a90] oops_end at ffffffff8152b510
 #3 [ffff88008ba03ac0] die at ffffffff81010e0b
 #4 [ffff88008ba03af0] do_trap at ffffffff8152ad74
 #5 [ffff88008ba03b50] do_invalid_op at ffffffff8100cf95
 #6 [ffff88008ba03bf0] invalid_op at ffffffff8100bf9b
    [exception RIP: ata_sff_hsm_move+317]
    RIP: ffffffff813a77ad  RSP: ffff88008ba03ca0  RFLAGS: 00010097
    RAX: 0000000000000000  RBX: ffff881c1121dc60  RCX: 0000000000000000
    RDX: ffff881c1121dd10  RSI: ffff881c1121dc60  RDI: ffff881c1121c000
    RBP: ffff88008ba03d00   R8: 0000000000000000   R9: 000000000000002e
    R10: 000000000001003f  R11: 000000000000009b  R12: ffff881c1121c000
    R13: 0000000000000000  R14: 0000000000000050  R15: ffff881c1121dd78
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88008ba03d08] ata_sff_host_intr at ffffffff813a7fbd
 #8 [ffff88008ba03d38] ata_sff_interrupt at ffffffff813a821e
 #9 [ffff88008ba03d78] handle_IRQ_event at ffffffff810e6ec0
--- <IRQ stack> ---
    [exception RIP: pipe_poll+48]
    RIP: ffffffff81192780  RSP: ffff880f26d459b8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff880f26d459c8  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff881a0539fa80
    RBP: ffffffff8100bb8e   R8: ffff8803b23324a0   R9: 0000000000000000
    R10: ffff880f26d45dd0  R11: 0000000000000008  R12: ffffffff8109b646
    R13: ffff880f26d45948  R14: 0000000000000246  R15: 0000000000000246
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
    RIP: 00007f26017435c3  RSP: 00007fffe020c420  RFLAGS: 00000206
    RAX: 0000000000000017  RBX: ffffffff8100b072  RCX: 00007fffe020c45c
    RDX: 00007f2604a3f120  RSI: 00007f2604a3f140  RDI: 000000000000000d
    RBP: 0000000000000000   R8: 00007fffe020e570   R9: 0101010101010101
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fffe020e5f0
    R13: 00007fffe020e5f4  R14: 00007f26045f373c  R15: 00007fffe020e5e0
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

Somewhere between the ata_sff_hsm_move() check and the ata_sff_host_intr() check, the value changed.
On examining the other cpus to see what else was running, another cpu was running the error handler
routines:

PID: 326    TASK: ffff881c11014aa0  CPU: 1   COMMAND: "scsi_eh_1"
 #0 [ffff88008ba27e90] crash_nmi_callback at ffffffff8102fee6
 #1 [ffff88008ba27ea0] notifier_call_chain at ffffffff8152d515
 #2 [ffff88008ba27ee0] atomic_notifier_call_chain at ffffffff8152d57a
 #3 [ffff88008ba27ef0] notify_die at ffffffff810a154e
 #4 [ffff88008ba27f20] do_nmi at ffffffff8152b1db
 #5 [ffff88008ba27f50] nmi at ffffffff8152aaa0
    [exception RIP: _spin_lock_irqsave+47]
    RIP: ffffffff8152a1ff  RSP: ffff881c11a73aa0  RFLAGS: 00000006
    RAX: 0000000000000001  RBX: ffff881c1121deb8  RCX: 0000000000000000
    RDX: 0000000000000246  RSI: 0000000000000020  RDI: ffff881c122612d8
    RBP: ffff881c11a73aa0   R8: ffff881c17083800   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff881c1121c000
    R13: 000000000000001f  R14: ffff881c1121dd50  R15: ffff881c1121dc60
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #6 [ffff881c11a73aa0] _spin_lock_irqsave at ffffffff8152a1ff
 #7 [ffff881c11a73aa8] ata_exec_internal_sg at ffffffff81396fb5
 #8 [ffff881c11a73b58] ata_exec_internal at ffffffff81397109
 #9 [ffff881c11a73bd8] atapi_eh_request_sense at ffffffff813a34eb

Before it tried to acquire a spinlock, ata_exec_internal_sg() called ata_sff_flush_pio_task().
This function will set ap->hsm_task_state to HSM_ST_IDLE, and has no locking around setting this
value. ata_sff_flush_pio_task() can then race with the interrupt handler and potentially set
HSM_ST_IDLE at a fatal moment, which will trigger a kernel BUG.

v2: Fixup comment in ata_sff_flush_pio_task()

tj: Further updated comment.  Use ap->lock instead of shost lock and
    use the [un]lock_irq variant instead of the irqsave/restore one.

Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2015-01-19 14:11:23 -05:00
..
acard-ahci.c AHCI: Move host activation code into ahci_host_activate() 2014-10-06 11:43:35 -04:00
ahci_da850.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
ahci_imx.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
ahci_mvebu.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
ahci_platform.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
ahci_st.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
ahci_sunxi.c Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
ahci_tegra.c ata: ahci_tegra: Read calibration fuse 2014-08-26 10:48:27 -04:00
ahci_xgene.c ahci_xgene: Fix the DMA state machine lockup for the ATA_CMD_PACKET PIO mode command. 2015-01-05 09:02:56 -05:00
ahci.c ahci: Remove Device ID for Intel Sunrise Point PCH 2015-01-13 10:32:29 -05:00
ahci.h AHCI: Do not read HOST_IRQ_STAT reg in multi-MSI mode 2014-10-06 11:43:36 -04:00
ata_generic.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
ata_piix.c ata_piix: Add Device IDs for Intel 9 Series PCH 2014-08-28 08:53:40 -04:00
Kconfig ata: pata_at91: depend on !ARCH_MULTIPLATFORM 2015-01-13 15:56:02 -05:00
libahci_platform.c AHCI: Move host activation code into ahci_host_activate() 2014-10-06 11:43:35 -04:00
libahci.c ahci: Use dev_info() to inform about the lack of Device Sleep support 2015-01-09 17:04:12 -05:00
libata-acpi.c ACPI and power management updates for 3.15-rc1 2014-04-01 12:48:54 -07:00
libata-core.c libata: allow sata_sil24 to opt-out of tag ordered submission 2015-01-19 09:10:07 -05:00
libata-eh.c libata: export ata_get_cmd_descript() 2015-01-05 11:22:49 -05:00
libata-pmp.c ata: enable quirk from jmicron JMB350 for JMB394 2014-01-31 07:05:44 -05:00
libata-scsi.c libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2015-01-08 10:35:40 -05:00
libata-sff.c libata: prevent HSM state change race between ISR and PIO 2015-01-19 14:11:23 -05:00
libata-transport.c libata: Implement ATA_DEV_ZAC 2014-11-05 11:22:06 -05:00
libata-transport.h [libata] Add ATA transport class 2010-10-21 20:21:03 -04:00
libata-zpodd.c libata: zpodd: eliminate odd_can_poweroff 2014-03-14 11:23:47 -04:00
libata.h scsi: use 64-bit LUNs 2014-07-17 22:07:37 +02:00
Makefile ata: Add support for the Tegra124 SATA controller 2014-07-18 17:52:33 -04:00
pata_acpi.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ali.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_amd.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_arasan_cf.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_artop.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_at32.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_at91.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_atiixp.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_atp867x.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_bf54x.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_cmd64x.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cmd640.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cs5520.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cs5530.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cs5535.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cs5536.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_cypress.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_efar.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ep93xx.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_hpt3x2n.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
pata_hpt3x3.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_hpt37x.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
pata_hpt366.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_icside.c Drivers: ata: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
pata_imx.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_isapnp.c pata_isapnp: Don't use invalid I/O ports 2013-10-07 15:17:32 -04:00
pata_it821x.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_it8213.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ixp4xx_cf.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_jmicron.c ata: Disabling the async PM for JMicron chip 363/361 2014-09-01 08:38:06 -04:00
pata_legacy.c pata_legacy: Remove dead code 2014-03-11 08:30:53 -04:00
pata_macio.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_marvell.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_mpc52xx.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_mpiix.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_netcell.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ninja32.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ns87410.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_ns87415.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_octeon_cf.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_of_platform.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_oldpiix.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_opti.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_optidma.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_palmld.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_pcmcia.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
pata_pdc202xx_old.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_pdc2027x.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_piccolo.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_platform.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_pxa.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_radisys.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_rb532_cf.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_rdc.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_rz1000.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_samsung_cf.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
pata_sc1200.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_scc.c pata_scc: propagate return value of scc_wait_after_reset 2014-08-18 09:15:21 -04:00
pata_sch.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_serverworks.c pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller 2014-10-07 17:10:14 -04:00
pata_sil680.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_sis.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_sl82c105.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_triflex.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pata_via.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
pdc_adma.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sata_dwc_460ex.c sata_dwc_460ex: fix resource leak on error path 2015-01-07 10:33:47 -05:00
sata_fsl.c Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
sata_highbank.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
sata_inic162x.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
sata_mv.c ata: drop owner assignment from platform_drivers 2014-10-20 16:20:17 +02:00
sata_nv.c scsi: drop reason argument from ->change_queue_depth 2014-11-24 14:45:27 +01:00
sata_promise.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sata_promise.h
sata_qstor.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sata_rcar.c Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
sata_sil24.c libata: allow sata_sil24 to opt-out of tag ordered submission 2015-01-19 09:10:07 -05:00
sata_sil.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
sata_sis.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
sata_svw.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sata_sx4.c ata: remove superfluous casts 2014-03-26 12:36:53 -04:00
sata_uli.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sata_via.c ata: use CONFIG_PM_SLEEP instead of CONFIG_PM where applicable in host drivers 2014-05-09 22:37:49 -04:00
sata_vsc.c ata: delete non-required instances of include <linux/init.h> 2014-02-13 16:40:56 -05:00
sis.h