linux_dsm_epyc7002/drivers
Thorsten Leemhuis 720639ef01 nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
commit 538e4a8c571efdf131834431e0c14808bcfb1004 upstream.

Some Kingston A2000 NVMe SSDs sooner or later get confused and stop
working when they use the deepest APST sleep while running Linux. The
system then crashes and one has to cold boot it to get the SSD working
again.

Kingston seems to known about this since at least mid-September 2020:
https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994

Someone working for a German company representing Kingston to the German
press confirmed to me Kingston engineering is aware of the issue and
investigating; the person stated that to their current knowledge only
the deepest APST sleep state causes trouble. Therefore, make Linux avoid
it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD.

I have two such SSDs, but it seems the problem doesn't occur with them.
I hence couldn't verify if this patch really fixes the problem, but all
the data in front of me suggests it should.

This patch can easily be reverted or improved upon if a better solution
surfaces.

FWIW, there are many reports about the issue scattered around the web;
most of the users disabled APST completely to make things work, some
just made Linux avoid the deepest sleep state:

https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80
https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir
https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu

For the record, some data from 'nvme id-ctrl /dev/nvme0'

NVME Identify Controller:
vid       : 0x2646
ssvid     : 0x2646
mn        : KINGSTON SA2000M81000G
fr        : S5Z42105
[...]
ps    0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10 09:29:19 +01:00
..
accessibility speakup: fix uninitialized flush_lock 2020-12-30 11:53:44 +01:00
acpi ACPI/IORT: Do not blindly trust DMA masks from firmware 2021-02-03 23:28:50 +01:00
amba
android binder: add flag to clear buffer on txn complete 2020-12-30 11:54:09 +01:00
ata
atm atm: idt77252: call pci_disable_device() on error path 2021-01-12 20:18:09 +01:00
auxdisplay
base driver core: Extend device_is_dependent() 2021-01-27 11:55:18 +01:00
bcma
block xen-blkfront: allow discard-* nodes to be optional 2021-02-03 23:28:44 +01:00
bluetooth Bluetooth: revert: hci_h5: close serdev device and free hu in h5_close 2021-01-12 20:18:16 +01:00
bus bus: fsl-mc: fix error return code in fsl_mc_object_allocate() 2020-12-30 11:53:46 +01:00
cdrom
char um: random: Register random as hwrng-core device 2021-01-06 14:56:55 +01:00
clk clk: qcom: gcc-sm250: Use floor ops for sdcc clks 2021-02-03 23:28:44 +01:00
clocksource clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI 2020-12-30 11:53:37 +01:00
connector
counter counter:ti-eqep: remove floor 2021-01-27 11:55:12 +01:00
cpufreq cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get() 2021-01-17 14:17:00 +01:00
cpuidle cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLE 2020-11-16 13:24:32 +01:00
crypto crypto: marvel/cesa - Fix tdma descriptor on 64-bit 2021-02-03 23:28:40 +01:00
dax device-dax: Fix range release 2021-01-06 14:56:56 +01:00
dca
devfreq
dio
dma dmaengine: xilinx_dma: fix mixed_enum_type coverity warning 2021-01-17 14:17:02 +01:00
dma-buf dmabuf: fix use-after-free of dmabuf's file->f_inode 2021-01-12 20:18:24 +01:00
edac EDAC/amd64: Fix PCI component registration 2020-12-30 11:54:11 +01:00
eisa
extcon extcon: max77693: Fix modalias string 2020-12-30 11:53:49 +01:00
firewire
firmware firmware: imx: select SOC_BUS to fix firmware build 2021-02-03 23:28:46 +01:00
fpga fpga: Specify HAS_IOMEM dependency for FPGA_DFL 2020-12-01 18:46:24 +01:00
fsi fsi: Aspeed: Add mutex to protect HW access 2020-12-30 11:53:46 +01:00
gnss
gpio gpiolib: free device name on error path to fix kmemleak 2021-02-10 09:29:16 +01:00
gpu drm/amd/display: Revert "Fix EDID parsing after resume from suspend" 2021-02-10 09:29:19 +01:00
greybus
hid HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices 2021-01-30 13:55:17 +01:00
hsi HSI: omap_ssi: Don't jump to free ID in ssi_add_controller() 2020-12-30 11:53:24 +01:00
hv x86/hyperv: Fix kexec panic/hang issues 2021-01-27 11:54:57 +01:00
hwmon hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values 2021-01-19 18:27:25 +01:00
hwspinlock
hwtracing stm class: Fix module init return on allocation failure 2021-01-27 11:55:15 +01:00
i2c i2c: tegra: Create i2c_writesl_vi() to use with VI I2C for filling TX FIFO 2021-02-07 15:37:15 +01:00
i3c i3c master: fix missing destroy_workqueue() on error in i3c_master_register 2021-01-06 14:56:53 +01:00
ide scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT 2021-01-12 20:18:15 +01:00
idle intel_idle: Build fix 2020-12-03 10:00:23 +01:00
iio iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free() 2021-01-27 11:55:12 +01:00
infiniband RDMA/cxgb4: Fix the reported max_recv_sge value 2021-02-03 23:28:46 +01:00
input Input: i8042 - unbreak Pegatron C15B 2021-02-10 09:29:11 +01:00
interconnect interconnect: imx8mq: Use icc_sync_state 2021-01-27 11:55:29 +01:00
iommu iommu/vt-d: Do not use flush-queue when caching-mode is on 2021-02-07 15:37:13 +01:00
ipack
irqchip irqchip/mips-cpu: Set IPI domain parent chip 2021-01-27 11:55:13 +01:00
isdn misdn: dsp: select CONFIG_BITREVERSE 2021-01-19 18:27:26 +01:00
leds leds: trigger: fix potential deadlock with libata 2021-02-03 23:28:41 +01:00
lightnvm lightnvm: fix memory leak when submit fails 2021-01-27 11:55:22 +01:00
macintosh macintosh/adb-iop: Send correct poll command 2020-12-30 11:53:39 +01:00
mailbox mailbox: arm_mhu_db: Fix mhu_db_shutdown by replacing kfree with devm_kfree 2020-12-30 11:53:28 +01:00
mcb
md bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES 2021-02-03 23:28:39 +01:00
media media: rc: ensure that uevent can be read directly after rc device register 2021-02-03 23:28:38 +01:00
memory memory: renesas-rpc-if: Fix unbalanced pm_runtime_enable in rpcif_{enable,disable}_rpm 2020-12-30 11:54:27 +01:00
memstick memstick: r592: Fix error return in r592_probe() 2020-12-30 11:53:34 +01:00
message
mfd mfd: cpcap: Fix interrupt regression with regmap clear_ack 2020-12-30 11:53:16 +01:00
misc habanalabs: disable FW events on device removal 2021-02-07 15:37:17 +01:00
mmc mmc: core: Limit retries when analyse of SDIO tuples fails 2021-02-10 09:29:18 +01:00
most
mtd mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine 2021-01-27 11:54:50 +01:00
mux
net net: ipa: pass correct dma_handle to dma_free_coherent() 2021-02-10 09:29:15 +01:00
nfc nfc: s3fwrn5: Release the nfc firmware 2020-12-30 11:53:53 +01:00
ntb
nubus
nvdimm libnvdimm/dimm: Avoid race between probe and available_slots_show() 2021-02-10 09:29:17 +01:00
nvme nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs 2021-02-10 09:29:19 +01:00
nvmem
of of/device: Update dma_range_map only when dev has valid dma-ranges 2021-02-03 23:28:50 +01:00
opp opp: Call the missing clk_put() on error 2021-01-06 14:56:49 +01:00
oprofile
parisc
parport
pci PCI: Fix pci_slot_release() NULL pointer dereference 2020-12-30 11:54:28 +01:00
pcmcia
perf
phy phy: cpcap-usb: Fix warning for missing regulator_disable 2021-02-07 15:37:13 +01:00
pinctrl pinctrl: qcom: Don't clear pending interrupts when enabling 2021-01-27 11:55:27 +01:00
platform platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control 2021-02-07 15:37:16 +01:00
pnp
power power: supply: bq24190_charger: fix reference leak 2020-12-30 11:53:25 +01:00
powercap Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux 2020-11-10 10:02:31 -08:00
pps
ps3 powerpc/ps3: use dma_mapping_error() 2020-12-30 11:53:53 +01:00
ptp phy: dp83640: select CONFIG_CRC32 2021-01-17 14:17:02 +01:00
pwm pwm: sun4i: Remove erroneous else branch 2020-12-30 11:53:59 +01:00
rapidio
ras
regulator regulator: bd718x7: Add enable times 2021-01-19 18:27:24 +01:00
remoteproc remoteproc: sysmon: Ensure remote notification ordering 2020-12-30 11:54:28 +01:00
reset
rpmsg
rtc rtc: pcf2127: only use watchdog when explicitly available 2021-01-09 13:46:22 +01:00
s390 s390/vfio-ap: No need to disable IRQ after queue reset 2021-02-03 23:28:39 +01:00
sbus
scsi scsi: ibmvfc: Set default timeout to avoid crash during migration 2021-02-07 15:37:15 +01:00
sfi
sh
siox
slimbus slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew() 2020-12-30 11:53:47 +01:00
soc ARM: imx: fix imx8m dependencies 2021-02-03 23:28:45 +01:00
soundwire soundwire: master: use pm_runtime_set_active() on add 2020-12-30 11:53:28 +01:00
spi spi: altera: Fix memory leak on error path 2021-02-03 23:28:46 +01:00
spmi
ssb
staging media: hantro: Fix reset_raw_fmt initialization 2021-02-03 23:28:37 +01:00
target scsi: target: tcmu: Fix use-after-free of se_cmd->priv 2021-01-27 11:54:50 +01:00
tc
tee tee: optee: replace might_sleep with cond_resched 2021-02-03 23:28:43 +01:00
thermal thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed 2020-12-30 11:54:29 +01:00
thunderbolt thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link() 2021-02-10 09:29:15 +01:00
tty tty: avoid using vfs_iocb_iter_write() for redirected console writes 2021-02-03 23:28:36 +01:00
uio uio: Fix use-after-free in uio_unregister_device() 2020-11-09 18:54:30 +01:00
usb xhci: fix bounce buffer usage for non-sg list case 2021-02-10 09:29:17 +01:00
vdpa vdpa/mlx5: Restore the hardware used index after change map 2021-02-10 09:29:15 +01:00
vfio vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU 2020-12-30 11:54:03 +01:00
vhost vhost_net: fix ubuf refcount incorrectly when sendmsg fails 2021-01-12 20:18:13 +01:00
video fbcon: Disable accelerated scrolling 2021-01-06 14:56:51 +01:00
virt nitro_enclaves: Fixup type and simplify logic of the poll mask setup 2020-11-09 18:20:36 +01:00
virtio virtio_ring: Fix two use after free bugs 2020-12-30 11:54:00 +01:00
visorbus
vlynq
vme
w1
watchdog watchdog: rti-wdt: fix reference leak in rti_wdt_probe 2021-01-06 14:56:54 +01:00
xen xen: Fix XenStore initialisation for XS_LOCAL 2021-02-03 23:28:41 +01:00
zorro
Kconfig
Makefile vdpa: mlx5: fix vdpa/vhost dependencies 2020-12-02 04:09:56 -05:00