linux_dsm_epyc7002/drivers
Brian King 309bd27121 [SCSI] scsi: Device scanning oops for offlined devices (resend)
If a device gets offlined as a result of the Inquiry sent
during scanning, the following oops can occur. After the
disk gets put into the SDEV_OFFLINE state, the error handler
sends back the failed inquiry, which wakes the thread doing
the scan. This starts a race between the scanning thread
freeing the scsi device and the error handler calling
scsi_run_host_queues to restart the host. Since the disk
is in the SDEV_OFFLINE state, scsi_device_get will still
work, which results in __scsi_iterate_devices getting
a reference to the scsi disk when it shouldn't.

The following execution thread causes the oops:

CPU 0 (scan)				CPU 1 (eh)

---------------------------------------------------------
scsi_probe_and_add_lun
                        ....
                                        scsi_eh_offline_sdevs
                                        scsi_eh_flush_done_q
scsi_destroy_sdev
scsi_device_dev_release
                                        scsi_restart_operations
                                         scsi_run_host_queues
                                          __scsi_iterate_devices
                                           get_device
scsi_device_dev_release_usercontext
                                          scsi_run_queue
                                            <---OOPS--->

The patch fixes this by changing the state of the sdev to SDEV_DEL
before doing the final put_device, which should prevent the race
from occurring.

Original oops follows:

Badness in kref_get at lib/kref.c:32
Call Trace:
[C00000002F4476D0] [C00000000000EE20] .show_stack+0x68/0x1b0 (unreliable)
[C00000002F447770] [C00000000037515C] .program_check_exception+0x1cc/0x5a8
[C00000002F447840] [C00000000000446C] program_check_common+0xec/0x100
 Exception: 700 at .kref_get+0x10/0x28
    LR = .kobject_get+0x20/0x3c
[C00000002F447B30] [C00000002F447BC0] 0xc00000002f447bc0 (unreliable)
[C00000002F447BB0] [C000000000254BDC] .get_device+0x20/0x3c
[C00000002F447C30] [D000000000063188] .scsi_device_get+0x34/0xdc [scsi_mod]
[C00000002F447CC0] [D0000000000633EC] .__scsi_iterate_devices+0x50/0xbc [scsi_mod]
[C00000002F447D60] [D00000000006A910] .scsi_run_host_queues+0x34/0x5c [scsi_mod]
[C00000002F447DF0] [D000000000069054] .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[C00000002F447EE0] [C00000000007B4E0] .kthread+0x128/0x178
[C00000002F447F90] [C000000000025E84] .kernel_thread+0x4c/0x68
Unable to handle kernel paging request for <7>PCI: Enabling device: (0002:41:01.1), cmd 143
data at address 0x000001b8
Faulting instruction address: 0xd0000000000698e4
sym1: <1010-66> rev 0x1 at pci 0002:41:01.1 irq 216
sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi2 : sym-2.2.2
cpu 0x0: Vector: 300 (Data Access) at [c00000002f447a30]
    pc: d0000000000698e4: .scsi_run_queue+0x2c/0x218 [scsi_mod]
    lr: d00000000006a904: .scsi_run_host_queues+0x28/0x5c [scsi_mod]
    sp: c00000002f447cb0
   msr: 9000000000009032
   dar: 1b8
 dsisr: 40000000
  current = 0xc0000000045fecd0
  paca    = 0xc00000000048ee80
    pid   = 1123, comm = scsi_eh_1
enter ? for help
[c00000002f447d60] d00000000006a904 .scsi_run_host_queues+0x28/0x5c [scsi_mod]
[c00000002f447df0] d000000000069054 .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[c00000002f447ee0] c00000000007b4e0 .kthread+0x128/0x178
[c00000002f447f90] c000000000025e84 .kernel_thread+0x4c/0x68

Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-06-28 12:39:56 -04:00
..
acorn
acpi
amba
atm
base Enable minimal per-device resume tracing 2006-06-24 14:47:59 -07:00
block [PATCH] CCISS: tidy up product table indentation 2006-06-25 10:01:22 -07:00
bluetooth
cdrom [PATCH] cdrom/mcdx: section fixes 2006-06-25 10:01:16 -07:00
char [PATCH] synclink_gt: add GT2 adapter support 2006-06-25 10:01:24 -07:00
connector
cpufreq
crypto
dio
dma
edac
eisa
fc4
firmware [PATCH] DMI: cleanup kernel-doc, add to DocBook 2006-06-25 10:01:24 -07:00
hwmon
i2c
ide [PATCH] ide-floppy: fix debug-only syntax error 2006-06-25 10:01:20 -07:00
ieee1394 [PATCH] ieee1394: nodemgr: do not peek into struct semaphore 2006-06-25 10:00:54 -07:00
infiniband Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2006-06-25 16:07:58 -07:00
input [PATCH] random: remove redundant SA_SAMPLE_RANDOM from touchscreen drivers 2006-06-25 10:01:00 -07:00
isdn
leds [PATCH] LED: add LED heartbeat trigger 2006-06-25 10:01:23 -07:00
macintosh [PATCH] Rewritten backlight infrastructure for portable Apple computers 2006-06-25 10:00:59 -07:00
mca
md
media Fixes some sync issues between V4L/DVB development and GIT 2006-06-25 02:05:24 -03:00
message
mfd
misc
mmc
mtd
net [PATCH] m68knommu: 532x FEC eth struct map 2006-06-25 17:43:33 -07:00
nubus
oprofile [PATCH] oprofile: convert from semaphores to mutexes 2006-06-25 10:01:04 -07:00
parisc
parport [PATCH] parport: add to kernel-doc 2006-06-25 10:01:25 -07:00
pci
pcmcia
pnp [PATCH] pnp: card_probe(): fix memory leak 2006-06-25 10:01:01 -07:00
rapidio
rtc [PATCH] RTC: add rtc-ds1742 driver 2006-06-25 10:01:14 -07:00
s390 [PATCH] kernel/sys.c: cleanups 2006-06-25 10:01:06 -07:00
sbus [PATCH] mm: remove VM_LOCKED before remap_pfn_range and drop VM_SHM 2006-06-25 10:00:55 -07:00
scsi [SCSI] scsi: Device scanning oops for offlined devices (resend) 2006-06-28 12:39:56 -04:00
serial [PATCH] m68knommu: 532x UART support 2006-06-25 17:43:33 -07:00
sh
sn
spi
tc
telephony
usb Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2006-06-25 06:44:44 -04:00
video Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb 2006-06-25 10:09:31 -07:00
w1
zorro
Kconfig
Makefile