linux_dsm_epyc7002/drivers/mmc/host
Jean-Nicolas Graux 5cad24d835 mmc: mmci: avoid clearing ST Micro busy end interrupt mistakenly
This fixes a race condition that may occur whenever ST micro busy end
interrupt is raised just after being unmasked but before leaving mmci
interrupt context.

A dead-lock has been found if connecting mmci ST Micro variant whose amba
id is 0x10480180 to some new eMMC that supports internal caches.  Whenever
mmci driver enables cache control by programming eMMC's EXT_CSD register,
block driver may request to flush the eMMC internal caches causing mmci
driver to send a MMC_SWITCH command to the card with FLUSH_CACHE operation.
And because busy end interrupt may be mistakenly cleared while not yet
processed, this mmc request may never complete.  As a result, mmcqd task
may be stuck forever.

Here is an instance caught by lockup detector which shows that mmcqd task
was hung while waiting for mmc_flush_cache command to complete:

..
[  240.251595] INFO: task mmcqd/1:52 blocked for more than 120 seconds.
[  240.257973]       Not tainted 4.1.13-00510-g9d91424 #2
[  240.263109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.270955] mmcqd/1         D c047504c     0    52      2 0x00000000
[  240.277359] [<c047504c>] (__schedule) from [<c04754a0>] (schedule+0x40/0x98)
[  240.284418] [<c04754a0>] (schedule) from [<c0477d40>] (schedule_timeout+0x148/0x188)
[  240.292191] [<c0477d40>] (schedule_timeout) from [<c0476040>] (wait_for_common+0xa4/0x170)
[  240.300491] [<c0476040>] (wait_for_common) from [<c02efc1c>] (mmc_wait_for_req_done+0x4c/0x13c)
[  240.309224] [<c02efc1c>] (mmc_wait_for_req_done) from [<c02efd90>] (mmc_wait_for_cmd+0x64/0x84)
[  240.317953] [<c02efd90>] (mmc_wait_for_cmd) from [<c02f5b14>] (__mmc_switch+0xa4/0x2a8)
[  240.325964] [<c02f5b14>] (__mmc_switch) from [<c02f5d40>] (mmc_switch+0x28/0x30)
[  240.333389] [<c02f5d40>] (mmc_switch) from [<c02f0984>] (mmc_flush_cache+0x54/0x80)
[  240.341073] [<c02f0984>] (mmc_flush_cache) from [<c02ff0c4>] (mmc_blk_issue_rq+0x114/0x4e8)
[  240.349459] [<c02ff0c4>] (mmc_blk_issue_rq) from [<c03008d4>] (mmc_queue_thread+0xc0/0x180)
[  240.357844] [<c03008d4>] (mmc_queue_thread) from [<c003cf90>] (kthread+0xdc/0xf4)
[  240.365339] [<c003cf90>] (kthread) from [<c0010068>] (ret_from_fork+0x14/0x2c)
..
..
[  240.664311] INFO: task partprobe:564 blocked for more than 120 seconds.
[  240.670943]       Not tainted 4.1.13-00510-g9d91424 #2
[  240.676078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.683922] partprobe       D c047504c     0   564    486 0x00000000
[  240.690318] [<c047504c>] (__schedule) from [<c04754a0>] (schedule+0x40/0x98)
[  240.697396] [<c04754a0>] (schedule) from [<c0477d40>] (schedule_timeout+0x148/0x188)
[  240.705149] [<c0477d40>] (schedule_timeout) from [<c0476040>] (wait_for_common+0xa4/0x170)
[  240.713446] [<c0476040>] (wait_for_common) from [<c01f3300>] (submit_bio_wait+0x58/0x64)
[  240.721571] [<c01f3300>] (submit_bio_wait) from [<c01fbbd8>] (blkdev_issue_flush+0x60/0x88)
[  240.729957] [<c01fbbd8>] (blkdev_issue_flush) from [<c010ff84>] (blkdev_fsync+0x34/0x44)
[  240.738083] [<c010ff84>] (blkdev_fsync) from [<c0109594>] (do_fsync+0x3c/0x64)
[  240.745319] [<c0109594>] (do_fsync) from [<c000ffc0>] (ret_fast_syscall+0x0/0x3c)
..

Here is the detailed sequence showing when this issue may happen:

1) At probe time, mmci device is initialized and card busy detection based
on DAT[0] monitoring is enabled.

2) Later during run time, since card reported to support internal caches, a
MMCI_SWITCH command is sent to eMMC device with FLUSH_CACHE operation. On
receiving this command, eMMC may enter busy state (for a relatively short
time in the case of the dead-lock).

3) Then mmci interrupt is raised and mmci_irq() is called:

MMCISTATUS register is read and is equal to 0x01000440. So the following
status bits are set:
- MCI_CMDRESPEND (= 6)
- MCI_DATABLOCKEND (= 10)
- MCI_ST_CARDBUSY (= 24)

Since MMCIMASK0 register is 0x3FF, status variable is set to 0x00000040 and
BIT MCI_CMDRESPEND is cleared by writing MMCICLEAR register.

Then mmci_cmd_irq() is called. Considering the following conditions:
- host->busy_status is 0,
- this is a "busy response",
- reading again MMCISTATUS register gives 0x1000400,
MMCIMASK0 is updated to unmask MCI_ST_BUSYEND bit.

Thus, MMCIMASK0 is set to 0x010003FF and host->busy_status is set to wait
for busy end completion.

Back again in status loop of mmci_irq(), we quickly go through
mmci_data_irq() as there are no data in that case.  And we finally go
through following test at the end of while(status) loop:

/*
 * Don't poll for busy completion in irq context.
 */
if (host->variant->busy_detect && host->busy_status)
	status &= ~host->variant->busy_detect_flag;

Because status variable is not yet null (is equal to 0x40), we do not leave
interrupt context yet but we loop again into while(status) loop. So we run
across following steps:

a) MMCISTATUS register is read again and this time is equal to 0x01000400.
So that following bits are set:
- MCI_DATABLOCKEND (= 10)
- MCI_ST_CARDBUSY (= 24)

Since MMCIMASK0 register is equal to 0x010003FF:

b) status variable is set to 0x01000000.
c) MCI_ST_CARDBUSY bit is cleared by writing MMCICLEAR register.

Then, mmci_cmd_irq() is called one more time. Since host->busy_status is
set and that MCI_ST_CARDBUSY is set in status variable, we just return from
this function.

Back again in mmci_irq(), status variable is set to 0 and we finally leave
the while(status) loop. As a result we leave interrupt context, waiting for
busy end interrupt event.

Now, consider that busy end completion is raised IN BETWEEN steps 3.a) and
3.c). In such a case, we may mistakenly clear busy end interrupt at step
3.c) while it has not yet been processed. This will result in mmc command
to wait forever for a busy end completion that will never happen.

To fix the problem, this patch implements the following changes:

Considering that the mmci seems to be triggering the IRQ on both edges
while monitoring DAT0 for busy completion and that same status bit is used
to monitor start and end of busy detection, special care must be taken to
make sure that both start and end interrupts are always cleared one after
the other.

1) Clearing of card busy bit is moved in mmc_cmd_irq() function where
unmasking of busy end bit is effectively handled.
2) Just before unmasking busy end event, busy start event is cleared by
writing card busy bit in MMCICLEAR register.
3) Finally, once we are no more busy with a command, busy end event is
cleared writing again card busy bit in MMCICLEAR register.

This patch has been tested with the ST Accordo5 machine, not yet supported
upstream but relies on the mmci driver.

Signed-off-by: Sarang Mairal <sarang.mairal@garmin.com>
Signed-off-by: Jean-Nicolas Graux <jean-nicolas.graux@st.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-02-08 12:22:27 +01:00
..
android-goldfish.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
atmel-mci.c mmc: atmel-mci: Remove redundant runtime PM calls 2016-05-02 10:33:20 +02:00
au1xmmc.c mmc: host: drop owner assignment from platform_drivers 2014-10-20 16:20:56 +02:00
bfin_sdh.c mmc: bfin_sdh: remove the MMC_DATA_STREAM flag 2016-02-29 11:02:59 +01:00
cb710-mmc.c
cb710-mmc.h mmc: cb710: use to_platform_device() 2016-01-05 18:04:57 +01:00
davinci_mmc.c mmc: davinci: request gpios using gpio descriptors 2016-11-29 09:04:53 +01:00
dw_mmc-exynos.c mmc: dw_mmc: exynos: fix to call suspend callback 2016-12-05 10:31:14 +01:00
dw_mmc-exynos.h mmc: dw_mmc: exynos: Support eMMC's HS400 mode 2015-03-23 14:13:28 +01:00
dw_mmc-k3.c mmc: dw_mmc-k3: deploy runtime PM facilities 2016-11-29 09:00:39 +01:00
dw_mmc-pci.c mmc: dw_mmc-pci: deploy runtime PM facilities 2016-11-29 09:00:40 +01:00
dw_mmc-pltfm.c mmc: dw_mmc-pltfm: deploy runtime PM facilities 2016-11-29 09:00:41 +01:00
dw_mmc-pltfm.h
dw_mmc-rockchip.c mmc: dw_mmc: disable biu clk if possible 2016-11-29 09:00:38 +01:00
dw_mmc.c mmc: dw_mmc: force setup bus if active slots exist 2017-01-23 10:19:30 +01:00
dw_mmc.h mmc: dw_mmc: display the clock message only one time when card is polling 2016-12-05 10:31:17 +01:00
jz4740_mmc.c mmc: delete is_first_req parameter from pre-request callback 2016-11-29 09:05:27 +01:00
Kconfig mmc: sdhci-cadence: add Cadence SD4HC support 2016-12-08 15:02:52 +01:00
Makefile mmc: sdhci-cadence: add Cadence SD4HC support 2016-12-08 15:02:52 +01:00
meson-gx-mmc.c MMC: meson: avoid possible NULL dereference 2017-01-10 11:53:00 +01:00
mmc_spi.c mmc: mmc_spi: Add Card Detect comments and fix CD GPIO case 2016-03-16 12:36:09 +01:00
mmci_qcom_dml.c mmc: mmci: Add qcom dml support to the driver. 2014-09-09 13:58:46 +02:00
mmci_qcom_dml.h mmc: mmci: Add qcom dml support to the driver. 2014-09-09 13:58:46 +02:00
mmci.c mmc: mmci: avoid clearing ST Micro busy end interrupt mistakenly 2017-02-08 12:22:27 +01:00
mmci.h mmc: mmci: refactor ST Micro busy detection 2016-11-29 09:00:47 +01:00
moxart-mmc.c mmc: moxart: fix wait_for_completion_interruptible_timeout return variable type 2016-09-26 21:31:07 +02:00
mtk-sd.c mmc: delete is_first_req parameter from pre-request callback 2016-11-29 09:05:27 +01:00
mvsdio.c mmc: mvsdio: delete platform data code path 2015-12-22 11:32:12 +01:00
mvsdio.h
mxcmmc.c mmc: host: use the defined function to check whether card is removable 2016-07-25 10:34:21 +02:00
mxs-mmc.c mmc: mxs-mmc: Fix additional cycles after transmission stop 2017-01-12 12:31:00 +01:00
of_mmc_spi.c mmc: of_mmc_spi: fix unused warning 2016-03-17 14:54:40 +01:00
omap_hsmmc.c mmc: delete is_first_req parameter from pre-request callback 2016-11-29 09:05:27 +01:00
omap.c mmc: omap: Initialize dma_slave_config to avoid random data in it's fields 2016-09-14 13:59:33 +02:00
pxamci.c mmc: pxamci: fix potential oops 2016-07-18 11:50:40 +02:00
pxamci.h
rtsx_pci_sdmmc.c mmc: delete is_first_req parameter from pre-request callback 2016-11-29 09:05:27 +01:00
rtsx_usb_sdmmc.c mmc: rtsx_usb_sdmmc: Enable runtime PM autosuspend 2016-11-29 09:00:28 +01:00
s3cmci.c mmc: s3cmci: Use DMA slave map rather than exported DMA filter 2016-11-29 09:00:45 +01:00
s3cmci.h mmc: s3cmci: Register cpufreq notifier only on S3C24xx 2016-07-25 10:34:46 +02:00
sdhci_f_sdh30.c mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS 2016-07-29 11:29:04 +02:00
sdhci-acpi.c mmc: sdhci-acpi: Only powered up enabled acpi child devices 2017-01-12 12:15:20 +01:00
sdhci-bcm-kona.c mmc: sdhci-bcm-kona: fix error return code in sdhci_bcm_kona_probe() 2016-09-26 21:31:08 +02:00
sdhci-brcmstb.c mmc: sdhci-brcmstb: Fix incorrect capability 2016-09-26 21:31:28 +02:00
sdhci-cadence.c mmc: sdhci-cadence: add Socionext UniPhier specific compatible string 2016-12-20 11:40:52 +01:00
sdhci-cns3xxx.c mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS 2016-07-29 11:29:04 +02:00
sdhci-dove.c mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS 2016-07-29 11:29:04 +02:00
sdhci-esdhc-imx.c mmc: sdhci-esdhc-imx: Correct two register accesses 2016-10-13 08:58:03 +02:00
sdhci-esdhc.h mmc: sdhci-of-esdhc: support both BE and LE host controller 2015-10-26 16:00:08 +01:00
sdhci-iproc.c mmc: sdhci-iproc: support standard byte register accesses 2016-11-29 09:01:00 +01:00
sdhci-msm.c sdhci: sdhci-msm: update dll configuration 2016-11-29 09:05:20 +01:00
sdhci-of-arasan.c mmc: sdhci-of-arasan: add sdhci_arasan_voltage_switch for arasan, 5.1 2016-10-10 14:01:33 +02:00
sdhci-of-at91.c mmc: sdhci-of-at91: Fix module autoload 2016-11-29 09:00:29 +01:00
sdhci-of-esdhc.c mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0 2016-11-29 09:17:21 +01:00
sdhci-of-hlwd.c mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS 2016-07-29 11:29:04 +02:00
sdhci-pci-core.c mmc: sdhci-pci: Use ACPI to get max frequency for Intel NI byt sdio 2016-12-05 10:31:19 +01:00
sdhci-pci-data.c mmc: sdhci-pci: Add support for drive strength selection for SPT 2015-06-01 09:07:14 +02:00
sdhci-pci-o2micro.c mmc: sdhci-pci: Make sdhci_pci_o2_fujin2_pci_init() static 2015-10-26 16:00:05 +01:00
sdhci-pci-o2micro.h mmc: sdhci-pci: Make sdhci_pci_o2_fujin2_pci_init() static 2015-10-26 16:00:05 +01:00
sdhci-pci.h mmc: sdhci-pci: Add support for Intel GLK 2016-11-29 09:05:20 +01:00
sdhci-pic32.c mmc: sdhci-pic32: remove owner assignment 2016-05-02 10:33:25 +02:00
sdhci-pltfm.c mmc: sdhci: Remove ->platform_init() callback as it's no longer used 2016-09-26 21:31:16 +02:00
sdhci-pltfm.h mmc: sdhci: remove unneeded (void *) casts in sdhci_(pltfm_)priv() 2016-11-29 09:01:00 +01:00
sdhci-pxav2.c mmc: sdhci-pltfm: Drop define for SDHCI_PLTFM_PMOPS 2016-07-29 11:29:04 +02:00
sdhci-pxav3.c mmc: sdhci: Rename sdhci_set_power() to sdhci_set_power_noreg() 2016-10-10 14:20:41 +02:00
sdhci-s3c-regs.h
sdhci-s3c.c mmc: sdhci-s3c: add spin_unlock_irq() before calling clk_round_rate 2016-12-05 10:31:17 +01:00
sdhci-sirf.c mmc: sdhci-sirf: Remove non needed #ifdef CONFIG_PM* for dev_pm_ops 2016-07-27 11:25:23 +02:00
sdhci-spear.c Update Viresh Kumar's email address 2015-07-17 16:39:53 -07:00
sdhci-st.c mmc: sdhci-st: Handle interconnect clock 2016-09-12 10:31:43 +02:00
sdhci-tegra.c mmc: tegra: Mark 64-bit DMA broken on Tegra124 2016-09-26 21:31:23 +02:00
sdhci.c mmc: sdhci: Ignore unexpected CARD_INT interrupts 2017-01-31 11:26:49 +01:00
sdhci.h mmc: sdhci: export sdhci_execute_tuning() 2016-12-08 15:02:45 +01:00
sdricoh_cs.c mmc: sdricoh_cs: Less checks in sdricoh_init_mmc() after, error detection 2016-02-29 11:02:45 +01:00
sh_mmcif.c mmc: sh_mmcif: Use a 10s timeout in the error recovery path 2016-07-25 10:34:25 +02:00
sh_mobile_sdhi.c mmc: sh_mobile_sdhi: Add tuning support 2016-11-29 09:00:58 +01:00
sunxi-mmc.c mmc: sunxi: Prevent against null dereference for vmmc 2016-11-29 09:00:31 +01:00
tifm_sd.c mmc: Convert pr_warning to pr_warn 2014-09-24 10:13:09 +02:00
tmio_mmc_dma.c mmc: tmio: merge distributed include files 2016-05-02 10:33:40 +02:00
tmio_mmc_pio.c mmc: tmio: remove SDIO from TODO list 2016-11-29 09:01:04 +01:00
tmio_mmc.c mmc: TMIO: Use devm_request_irq() 2015-06-01 09:06:48 +02:00
tmio_mmc.h mmc: tmio: Add tuning support 2016-11-29 09:00:57 +01:00
toshsd.c PM / Runtime: Move ignore_children flag under CONFIG_PM 2016-04-22 01:32:37 +02:00
toshsd.h mmc: add Toshiba PCI SD controller driver 2014-11-26 14:30:58 +01:00
usdhi6rol0.c mmc: usdhi6rol0: add pinctrl to set pin drive strength 2016-05-02 10:36:06 +02:00
ushc.c mmc: ushc: Fix incorrect parameter in sizeof 2014-02-25 15:42:20 -05:00
via-sdmmc.c
vub300.c mmc: vub300: don't print error when allocating urb fails 2016-09-26 21:31:09 +02:00
wbsd.c mmc: wbsd: implement check for dma mapping error 2016-11-29 09:01:02 +01:00
wbsd.h
wmt-sdmmc.c mmc: constify of_device_id array 2015-03-23 14:13:49 +01:00