linux_dsm_epyc7002/drivers/mtd
Han Xu 4d0721cb10 mtd: rawnand: gpmi: Fix the random DMA timeout issue
[ Upstream commit 7671edeb193910482a9b0c22cd32176e7de7b2ed ]

To get better performance, current gpmi driver collected and chained all
small DMA transfers in gpmi_nfc_exec_op, the whole chain triggered and
wait for complete at the end.

But some random DMA timeout found in this new driver, with the help of
ftrace, we found the root cause is as follows:

Take gpmi_ecc_read_page() as an example, gpmi_nfc_exec_op collected 6
DMA transfers and the DMA chain triggered at the end. It waits for bch
completion and check jiffies if it's timeout. The typical function graph
shown below,

   63.216351 |   1)               |  gpmi_ecc_read_page() {
   63.216352 |   1)   0.750 us    |    gpmi_bch_layout_std();
   63.216354 |   1)               |    gpmi_nfc_exec_op() {
   63.216355 |   1)               |      gpmi_chain_command() {
   63.216356 |   1)               |        mxs_dma_prep_slave_sg() {
   63.216357 |   1)               |          /* mxs chan ccw idx: 0 */
   63.216358 |   1)   1.750 us    |        }
   63.216359 |   1)               |        mxs_dma_prep_slave_sg() {
   63.216360 |   1)               |          /* mxs chan ccw idx: 1 */
   63.216361 |   1)   2.000 us    |        }
   63.216361 |   1)   6.500 us    |      }
   63.216362 |   1)               |      gpmi_chain_command() {
   63.216363 |   1)               |        mxs_dma_prep_slave_sg() {
   63.216364 |   1)               |          /* mxs chan ccw idx: 2 */
   63.216365 |   1)   1.750 us    |        }
   63.216366 |   1)               |        mxs_dma_prep_slave_sg() {
   63.216367 |   1)               |          /* mxs chan ccw idx: 3 */
   63.216367 |   1)   1.750 us    |        }
   63.216368 |   1)   5.875 us    |      }
   63.216369 |   1)               |      /* gpmi_chain_wait_ready */
   63.216370 |   1)               |      mxs_dma_prep_slave_sg() {
   63.216372 |   1)               |        /* mxs chan ccw idx: 4 */
   63.216373 |   1)   3.000 us    |      }
   63.216374 |   1)               |      /* gpmi_chain_data_read */
   63.216376 |   1)               |      mxs_dma_prep_slave_sg() {
   63.216377 |   1)               |        /* mxs chan ccw idx: 5 */
   63.216378 |   1)   2.000 us    |      }
   63.216379 |   1)   1.125 us    |      mxs_dma_tx_submit();
   63.216381 |   1)   1.000 us    |      mxs_dma_enable_chan();
   63.216712 |   0)   2.625 us    |  mxs_dma_int_handler();
   63.216717 |   0)   4.250 us    |  bch_irq();
   63.216723 |   0)   1.250 us    |  mxs_dma_tasklet();
   63.216723 |   1)               |      /* jiffies left 250 */
   63.216725 |   1) ! 372.000 us  |    }
   63.216726 |   1)   2.625 us    |    gpmi_count_bitflips();
   63.216730 |   1) ! 379.125 us  |  }

but it's not gurantee that bch irq handled always after dma irq handled,
sometimes bch_irq comes first and gpmi_nfc_exec_op won't wait anymore,
another gpmi_nfc_exec_op may get invoked before last DMA chain IRQ
handled, this messed up the next DMA chain and causes DMA timeout. Check
the trace log when issue happened.

   63.218923 |   1)               |  gpmi_ecc_read_page() {
   63.218924 |   1)   0.625 us    |    gpmi_bch_layout_std();
   63.218926 |   1)               |    gpmi_nfc_exec_op() {
   63.218927 |   1)               |      gpmi_chain_command() {
   63.218928 |   1)               |        mxs_dma_prep_slave_sg() {
   63.218929 |   1)               |          /* mxs chan ccw idx: 0 */
   63.218929 |   1)   1.625 us    |        }
   63.218931 |   1)               |        mxs_dma_prep_slave_sg() {
   63.218931 |   1)               |          /* mxs chan ccw idx: 1 */
   63.218932 |   1)   1.750 us    |        }
   63.218933 |   1)   5.875 us    |      }
   63.218934 |   1)               |      gpmi_chain_command() {
   63.218934 |   1)               |        mxs_dma_prep_slave_sg() {
   63.218935 |   1)               |          /* mxs chan ccw idx: 2 */
   63.218936 |   1)   1.875 us    |        }
   63.218937 |   1)               |        mxs_dma_prep_slave_sg() {
   63.218938 |   1)               |          /* mxs chan ccw idx: 3 */
   63.218939 |   1)   1.625 us    |        }
   63.218939 |   1)   5.875 us    |      }
   63.218940 |   1)               |      /* gpmi_chain_wait_ready */
   63.218941 |   1)               |      mxs_dma_prep_slave_sg() {
   63.218942 |   1)               |        /* mxs chan ccw idx: 4 */
   63.218942 |   1)   1.625 us    |      }
   63.218943 |   1)               |      /* gpmi_chain_data_read */
   63.218944 |   1)               |      mxs_dma_prep_slave_sg() {
   63.218945 |   1)               |        /* mxs chan ccw idx: 5 */
   63.218947 |   1)   2.375 us    |      }
   63.218948 |   1)   0.625 us    |      mxs_dma_tx_submit();
   63.218949 |   1)   1.000 us    |      mxs_dma_enable_chan();
   63.219276 |   0)   5.125 us    |  bch_irq();                  <----
   63.219283 |   1)               |      /* jiffies left 250 */
   63.219285 |   1) ! 358.625 us  |    }
   63.219286 |   1)   2.750 us    |    gpmi_count_bitflips();
   63.219289 |   1) ! 366.000 us  |  }
   63.219290 |   1)               |  gpmi_ecc_read_page() {
   63.219291 |   1)   0.750 us    |    gpmi_bch_layout_std();
   63.219293 |   1)               |    gpmi_nfc_exec_op() {
   63.219294 |   1)               |      gpmi_chain_command() {
   63.219295 |   1)               |        mxs_dma_prep_slave_sg() {
   63.219295 |   0)   1.875 us    |  mxs_dma_int_handler();      <----
   63.219296 |   1)               |          /* mxs chan ccw idx: 6 */
   63.219297 |   1)   2.250 us    |        }
   63.219298 |   1)               |        mxs_dma_prep_slave_sg() {
   63.219298 |   0)   1.000 us    |  mxs_dma_tasklet();
   63.219299 |   1)               |          /* mxs chan ccw idx: 0 */
   63.219300 |   1)   1.625 us    |        }
   63.219300 |   1)   6.375 us    |      }
   63.219301 |   1)               |      gpmi_chain_command() {
   63.219302 |   1)               |        mxs_dma_prep_slave_sg() {
   63.219303 |   1)               |          /* mxs chan ccw idx: 1 */
   63.219304 |   1)   1.625 us    |        }
   63.219305 |   1)               |        mxs_dma_prep_slave_sg() {
   63.219306 |   1)               |          /* mxs chan ccw idx: 2 */
   63.219306 |   1)   1.875 us    |        }
   63.219307 |   1)   6.000 us    |      }
   63.219308 |   1)               |      /* gpmi_chain_wait_ready */
   63.219308 |   1)               |      mxs_dma_prep_slave_sg() {
   63.219309 |   1)               |        /* mxs chan ccw idx: 3 */
   63.219310 |   1)   2.000 us    |      }
   63.219311 |   1)               |      /* gpmi_chain_data_read */
   63.219312 |   1)               |      mxs_dma_prep_slave_sg() {
   63.219313 |   1)               |        /* mxs chan ccw idx: 4 */
   63.219314 |   1)   1.750 us    |      }
   63.219315 |   1)   0.625 us    |      mxs_dma_tx_submit();
   63.219316 |   1)   0.875 us    |      mxs_dma_enable_chan();
   64.224227 |   1)               |      /* jiffies left 0 */

In the first gpmi_nfc_exec_op, bch_irq comes first and gpmi_nfc_exec_op
exits, but DMA IRQ still not happened yet until the middle of following
gpmi_nfc_exec_op, the first DMA transfer index get messed and DMA get
timeout.

To fix the issue, when there is bch ops in DMA chain, the
gpmi_nfc_exec_op should wait for both completions rather than bch
completion only.

Fixes: ef347c0cfd ("mtd: rawnand: gpmi: Implement exec_op")
Signed-off-by: Han Xu <han.xu@nxp.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201209035104.22679-3-han.xu@nxp.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:53:49 +01:00
..
chips mtd: Replace HTTP links with HTTPS ones 2020-08-02 22:17:19 +02:00
devices mtd: spear_smi: Enable compile testing 2020-10-02 09:09:08 +02:00
hyperbus mtd: hyperbus: Fix build failure when only RPCIF_HYPERBUS is enabled 2020-10-12 21:12:08 +02:00
lpddr mtd: lpddr: fix excessive stack usage with clang 2020-08-27 14:36:07 +02:00
maps mtd: maps: vmu-flash: fix typos for struct memcard 2020-10-02 09:08:27 +02:00
nand mtd: rawnand: gpmi: Fix the random DMA timeout issue 2020-12-30 11:53:49 +01:00
parsers mtd: parsers: bcm63xx: Do not make it modular 2020-10-02 09:09:08 +02:00
spi-nor mtd: spi-nor: atmel: fix unlock_all() for AT25FS010/040 2020-12-30 11:53:41 +01:00
tests treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 326 2019-06-05 17:37:06 +02:00
ubi ubi: check kthread_should_stop() after the setting of task state 2020-09-17 22:55:59 +02:00
ftl.c
inftlcore.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
inftlmount.c mtd: fix spelling mistake "BlockMultiplerBits" -> "BlockMultiplierBits" 2020-03-11 14:49:30 +01:00
Kconfig mtd: Support kmsg dumper based on pstore/blk 2020-05-31 19:49:01 -07:00
Makefile mtd: Support kmsg dumper based on pstore/blk 2020-05-31 19:49:01 -07:00
mtd_blkdevs.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102 2019-05-24 17:39:00 +02:00
mtdblock_ro.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102 2019-05-24 17:39:00 +02:00
mtdblock.c mtd: clear cache_state to avoid writing to bad blocks repeatedly 2020-06-05 10:16:14 +02:00
mtdchar.c mtd: properly check all write ioctls for permissions 2020-07-24 23:03:11 +02:00
mtdconcat.c mtd: mtdconcat: map: remove redundant assignment to variable 'size' 2020-09-11 18:49:34 +02:00
mtdcore.c NAND Core changes: 2020-10-17 10:45:42 -07:00
mtdcore.h mtd: Provide fs_context-aware mount_mtd() replacement 2019-09-05 14:34:23 -04:00
mtdoops.c mtd: mtdoops: Don't write panic data twice 2020-09-07 14:22:12 +02:00
mtdpart.c mtd: Add support for emulated SLC mode on MLC NANDs 2020-05-11 09:51:41 +02:00
mtdpstore.c iov_iter: Move unnecessary inclusion of crypto/hash.h 2020-06-30 09:34:23 -04:00
mtdsuper.c mtd: Kill mount_mtd() 2019-09-05 14:34:26 -04:00
mtdswap.c mtd: no need to check return value of debugfs_create functions 2019-11-14 10:57:38 +01:00
nftlcore.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
nftlmount.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
rfd_ftl.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
sm_ftl.c mtd: sm_ftl: fix NULL pointer warning 2020-01-09 20:09:55 +01:00
sm_ftl.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ssfdc.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00