linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Benjamin Herrenschmidt	b302545744	block/swim3: Locking fixes The old PowerMac swim3 driver has some "interesting" locking issues, using a private lock and failing to lock the queue before completing requests, which triggered WARN_ONs among others. This rips out the private lock, makes everything operate under the block queue lock, and generally makes things simpler. We used to also share a queue between the two possible instances which was problematic since we might pick the wrong controller in some cases, so make the queue and the current request per-instance and use queuedata to point to our private data which is a lot cleaner. We still share the queue lock but then, it's nearly impossible to actually use 2 swim3's simultaneously: one would need to have a Wallstreet PowerBook, the only machine afaik with two of these on the motherboard, and populate both hotswap bays with a floppy drive (the machine ships only with one), so nobody cares... While at it, add a little fix to clear up stale interrupts when loading the driver or plugging a floppy drive in a bay. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-12-12 12:42:12 +01:00
Finn Thain	ed04c97d51	m68k/mac: cleanup forward declarations Move some forward declarations into header files and adjust includes. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2011-12-10 19:52:46 +01:00
Josh Durgin	51703306b3	rbd: remove buggy rollback functionality This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:19 -08:00
Josh Durgin	81e759fbf7	rbd: return an error when an invalid header is read This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:10 -08:00
Justin P. Mattock	42b2aa86c6	treewide: Fix typos in various parts of the kernel, and fix some comments. The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-12-02 14:57:31 +01:00
Lukas Czerner	dfaf3c036c	loop: Fix discard_alignment default setting discard_alignment is not relevant to the loop driver since it is supposed to be set as a workaround for the old sector 63 alignments. So set it to zero rather than block size. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-12-02 14:47:03 +01:00
Stephen M. Cameron	59bd71a81b	cciss: fix flush cache transfer length We weren't filling in the transfer length of the flush cache command (it transfers 4 bytes of zeroes). Firmware didn't seem to be bothered by this, but it should be fixed. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-28 20:12:05 +01:00
Stephen M. Cameron	6225da4815	cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler IRQF_SHARED is required for older controllers that don't support MSI(X) and which may end up sharing an interrupt. Also remove deprecated IRQF_DISABLED. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-28 20:12:05 +01:00
Dave Young	ae95757a90	loop: fix loop block driver discard and encryption comment The loop driver does not support discard if encryption is enabled, fix the comment. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-25 09:41:25 +01:00
Dan Carpenter	3e54a3d1b8	mtip32xx: uninitialized variable in mtip_quiesce_io() We recently introduce new continue in the loop which make gcc complain. In theory if MTIP_FLAG_SVC_THD_ACTIVE_BIT is set, we could hit continue over and over until eventually we time out of the loop. In that case "active" should be set as true, but right now it's uninitialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-24 12:59:00 +01:00
Asai Thambi S P	60ec0eecfa	mtip32xx: updates based on feedback * queue ncq commands when a non-ncq is in progress or error handling is active * merge variables 'internal_cmd_in_progress' and 'eh_active' into new variable 'flags' * get rid of read/write semaphore 'internal_sem' * new service thread to issue queued commands * use macros from ata.h for command codes * return ENOTTY for BLKFLSBUF ioctl * style changes Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-23 08:29:24 +01:00
Li Dongyang	ae18be11b5	xen-blkback: convert hole punching to discard request on loop devices As of `dfaa2ef68e`, loop devices support discard request now. We could just issue a discard request, and the loop driver will punch the hole for us, so we don't need to touch the internals of loop device and punch the hole ourselves, Thanks. V0->V1: rebased on devel/for-jens-3.3 Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:05 -05:00
Konrad Rzeszutek Wilk	421463526f	xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io .. and move it to its own function that will deal with the discard operation. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:03 -05:00
Konrad Rzeszutek Wilk	5ea4298669	xen/blk[front\|back]: Enhance discard support with secure erasing support. Part of the blkdev_issue_discard(xx) operation is that it can also issue a secure discard operation that will permanantly remove the sectors in question. We advertise that we can support that via the 'discard-secure' attribute and on the request, if the 'secure' bit is set, we will attempt to pass in REQ_DISCARD \| REQ_SECURE. CC: Li Dongyang <lidongyang@novell.com> [v1: Used 'flag' instead of 'secure:1' bit] [v2: Use 'reserved' uint8_t instead of adding a new value] [v3: Check for nseg when mapping instead of operation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:01 -05:00
Konrad Rzeszutek Wilk	97e36834f5	xen/blk[front\|back]: Squash blkif_request_rw and blkif_request_discard together In a union type structure to deal with the overlapping attributes in a easier manner. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:27:59 -05:00
Dan Carpenter	a2c2a0e668	paride: fix potential information leak in pg_read() Smatch has a new check for Rosenberg type information leaks where structs are copied to the user with uninitialized stack data in them. i In this case, the pg_write_hdr struct has a hole in it. struct pg_write_hdr { char magic; /* 0 1 / char func; / 1 1 / / XXX 2 bytes hole, try to pack / int dlen; / 4 4 */ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tim Waugh <tim@cyberelk.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:50 +01:00
Stephen M. Cameron	0007a4c90a	cciss: auto engage SCSI mid layer at driver load time A long time ago, probably in 2002, one of the distros, or maybe more than one, loaded block drivers prior to loading the SCSI mid layer. This meant that the cciss driver, being a block driver, could not engage the SCSI mid layer at init time without panicking, and relied on being poked by a userland program after the system was up (and the SCSI mid layer was therefore present) to engage the SCSI mid layer. This is no longer the case, and cciss can safely rely on the SCSI mid layer being present at init time and engage the SCSI mid layer straight away. This means that users will see their tape drives and medium changers at driver load time without need for a script in /etc/rc.d that does this: for x in /proc/driver/cciss/cciss* do echo "engage scsi" > $x done However, if no tape drives or medium changers are detected, the SCSI mid layer will not be engaged. If a tape drive or medium change is later hot-added to the system it will then be necessary to use the above script or similar for the device(s) to be acceesible. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:49 +01:00
Dmitry Monakhov	7035b5df3c	loop: cleanup set_status interface 1) Anyone who has read access to loopdev has permission to call set_status and may change important parameters such as lo_offset, lo_sizelimit and so on, which contradicts to read access pattern and definitely equals to write access pattern. 2) Add lo_offset over i_size check to prevent blkdev_size overflow. ##Testcase_bagin #dd if=/dev/zero of=./file bs=1k count=1 #losetup /dev/loop0 ./file /* userspace_application / struct loop_info64 loinf; fd = open("/dev/loop0", O_RDONLY); ioctl(fd, LOOP_GET_STATUS64, &loinf); / Set offset to any value which is bigger than i_size, and sizelimit * to nonzero value/ loinf.lo_offset = 40961024; loinf.lo_sizelimit = 1024; ioctl(fd, LOOP_SET_STATUS64, &loinf); /* After this loop device will have size similar to 0x7fffffffffxxxx */ #blockdev --getsz /dev/loop0 ##OUTPUT: 36028797018955968 ##Testcase_end [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:49 +01:00
Dmitry Monakhov	3bb9068278	loop: prevent information leak after failed read If read was not fully successful we have to fail whole bio to prevent information leak of old pages ##Testcase_begin dd if=/dev/zero of=./file bs=1M count=1 losetup /dev/loop0 ./file -o 4096 truncate -s 0 ./file # OOps loop offset is now beyond i_size, so read will silently fail. # So bio's pages would not be cleared, may which result in information leak. hexdump -C /dev/loop0 ##testcase_end Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:48 +01:00
Matthew Garrett	1937335856	The Windows driver .inf disables ASPM on all cciss devices. Do the same. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: iss_storagedev@hp.com Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-11 22:05:54 +01:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	06d381484f	Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: net: xen-netback: use API provided by xenbus module to map rings block: xen-blkback: use API provided by xenbus module to map rings xen: use generic functions instead of xen_{alloc, free}_vm_area()	2011-11-06 18:31:36 -08:00
Jens Axboe	a71f483d79	mtip32xx: update to new ->make_request() API Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:36:21 +01:00
Jens Axboe	0e838c624e	mtip32xx: add module.h include to avoid conflict with moduleh tree Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	3ff147d3a8	mtip32xx: mark a few more items static Missed two items: mtip_major, and mtip_pci_driver. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	6316668fbc	mtip32xx: ensure that all local functions are static Kill the declarations in the header file and mark them as static. Reshuffle a few functions to ensure that everything is properly declared before being used. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	ef0f158734	mtip32xx: cleanup compat ioctl handling Do the conversion/copy up front instead of passing in a compat flag to the ioctl handler and subsequently to the exec_drive_taskfile() function. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	16d02c040b	mtip32xx: fix warnings/errors on 32-bit compiles We need to clean up the compat ioctl handling, but this makes it work for now at least. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Sam Bradshaw	88523a6155	block: Add driver for Micron RealSSD pcie flash cards This adds mtip32xx, a driver supporting Microns line of pci-express flash storage cards. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-11-05 08:35:10 +01:00
Linus Torvalds	3d0a8d10cf	Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block * 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits) virtio-blk: use ida to allocate disk index hpsa: add small delay when using PCI Power Management to reset for kump cciss: add small delay when using PCI Power Management to reset for kump xen/blkback: Fix two races in the handling of barrier requests. xen/blkback: Check for proper operation. xen/blkback: Fix the inhibition to map pages when discarding sector ranges. xen/blkback: Report VBD_WSECT (wr_sect) properly. xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. xen-blkfront: plug device number leak in xlblk_init() error path xen-blkfront: If no barrier or flush is supported, use invalid operation. xen-blkback: use kzalloc() in favor of kmalloc()+memset() xen-blkback: fixed indentation and comments xen-blkfront: fix a deadlock while handling discard response xen-blkfront: Handle discard requests. xen-blkback: Implement discard requests ('feature-discard') xen-blkfront: add BLKIF_OP_DISCARD and discard request struct drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd() drivers/block/loop.c: emit uevent on auto release drivers/block/cpqarray.c: use pci_dev->revision loop: always allow userspace partitions and optionally support automatic scanning ... Fic up trivial header file includsion conflict in drivers/block/loop.c	2011-11-04 17:22:14 -07:00
Linus Torvalds	b4fdcb02f1	Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c	2011-11-04 17:06:58 -07:00
Matthew Wilcox	f1938f6e1e	NVMe: Implement doorbell stride capability The doorbell stride allows devices to spread out their doorbells instead of packing them tightly. This feature was added as part of ECN 003. This patch also enables support for more than 512 queues :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	ce38c14957	NVMe: Version 0.7 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	2b2c189687	NVMe: Don't probe namespace 0 ECN 001 documented that namespace 0 is not valid. Sending an Identify with CNS of 0 and Namespace of 0 is an undefined command. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	0d1bc91258	Fix calculation of number of pages in a PRP List The existing calculation underestimated the number of pages required as it did not take into account the pointer at the end of each page. The replacement calculation may overestimate the number of pages required if the last page in the PRP List is entirely full. By using ->npages as a counter as we fill in the pages, we ensure that we don't try to free a page that was never allocated. Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	bc5fc7e4b2	NVMe: Create nvme_identify and nvme_get_features functions Instead of open-coding calls to nvme_submit_admin_cmd, these small wrappers are simpler to use (the patch removes 14 lines from nvme_dev_add() for example). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	684f5c2025	NVMe: Fix memory leak in nvme_dev_add() The driver was allocating 8k of memory, then freeing 4k of it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	d1a490e026	NVMe: Fix calls to dma_unmap_sg dma_unmap_sg() must be called with the same 'nents' passed to dma_map_sg(), not the number returned from dma_map_sg(). Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	d0ba1e497b	NVMe: Correct sg list setup in nvme_map_user_pages Our SG list was constructed to always fill the entire first page, even if that was more than the length of the I/O. This is probably harmless, but some IOMMUs might do something bad. Correcting the first call to sg_set_page() made it look a lot closer to the sg_set_page() in the loop, so fold the first call to sg_set_page() into the loop. Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6413214c5d	Fix bug in NVME_IOCTL_SUBMIT_IO Missing 'break' in the switch statement meant that we'd fall through to the 'return -EINVAL' case.	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6bbf1acdde	NVMe: Rework ioctls Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD ioctl that can submit any admin command. Add a new ID ioctl that returns the namespace ID of the queried device. It corresponds to the SCSI Idlun ioctl. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	eac623ba7a	NVMe: Add the nvme thread to the wait queue before waking it up If the I/O was not completed by a single NVMe command, we add the bio to the congestion list and wake up the kthread to resubmit it. But the kthread calls remove_wait_queue() unconditionally, which will oops if it's not on the wait queue. So add the kthread to the wait queue before waking it up. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	6f0f54499f	NVMe: Return real error from nvme_create_queue nvme_setup_io_queues() was assuming that a NULL return from nvme_create_queue() was an out-of-memory error. That's not necessarily true; the adapter might return -EIO, for example. Change the calling convention to return an ERR_PTR on failure instead of NULL. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	be5e094840	NVMe: Version 0.6 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	184d2944cb	NVMe: Add a few calling convention notes For the benefit of reviewers, add comments to a few functions describing their calling context Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	b77954cbdd	NVMe: Handle failures from memory allocations in nvme_setup_prps If any of the memory allocations in nvme_setup_prps fail, handle it by modifying the passed-in data length to reflect the number of bytes we are actually able to send. Also allow the caller to specify the GFP flags they need; for user-initiated commands, we can use GFP_KERNEL allocations. The various callers are updated to handle this possibility; the main I/O path is already prepared for this possibility (as it may happen due to nvme_map_bio being unable to map all the segments of the I/O). The other callers return -ENOMEM instead of doing partial I/Os. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	5aff9382dd	NVMe: Use an IDA to allocate minor numbers The current approach of using the namespace ID as the minor number doesn't work when there are multiple adapters in the machine. Rather than statically partitioning the number of namespaces between adapters, dynamically allocate minor numbers to namespaces as they are detected. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	fd63e9ceee	NVMe: Add include of delay.h for msleep Previously it was being implicitly included through some other header file Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	8de055350f	NVMe: Add support for timing out I/Os In the kthread, walk the list of outstanding I/Os and check they've not hit the timeout. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	21075bdee0	NVMe: Rename cancel_cmdid_data to cancel_cmdid The trailing '_data' on the end was annoying and inconsistent. Also, make it actually return the data since this is needed for timing out commands. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	09a58f5364	NVMe: Fix bug in error handling When an I/O completed with an error, we would call bio_endio twice (once with -EIO and once with 0). Found by inspection. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	22605f9681	NVMe: Time out initialisation after a few seconds THe device reports (in its capability register) how long it will take to initialise. If that time elapses before the ready bit becomes set, conclude the device is broken and refuse to initialise it. Log a nice error message so the user knows why we did nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	aba2080f3f	NVMe: Fix warning in free_irq We need to clear the affinity mask before calling free_irq() Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	7f53f9d242	NVMe: Correct the Controller Configuration settings The arbitration field was extended by one bit, shifting the shutdown notification bits by one. Also, the SQ/CQ entry size was made configurable for future extensions. Reported-by: Paul Luse <paul.e.luse@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	8ef700678f	NVMe: Version 0.5 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	6c7d49455c	NVMe: Change the definition of nvme_user_io The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	4948168280	NVMe: Add compat_ioctl Make ioctls work for 32-bit applications on 64-bit kernels. The structures are defined to be the same for both 32- and 64-bit applications, so we can use the same handler for both. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	9ecdc94621	NVMe: Simplify queue lookup Fill in all the num_possible_cpus() entries with duplicate pointers. This reduces the complexity of the frequently-called get_nvmeq(), as well as avoiding a bug in it when there are fewer queues than CPUs. Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	3cb967c039	NVMe: Remove the kthread from the wait queue Once there are no more bios on the congestion list, we can stop waking up the nvme kthread every time a completion happens. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	7523d834dd	NVMe: Fix off-by-one when filling in PRP lists If the last element in the PRP list fits on the end of the page, there's no need to allocate an extra page to put that single element in. It can fit on the end of the page. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	ac88c36a38	NVMe: Fix interpretation of 'Number of Namespaces' field The spec says this is a 0s based value. We don't need to handle the maximal value because it's reserved to mean "every namespace". Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	19e899b2f9	NVMe: Remove outdated comments The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	fa92282149	NVMe: Fix comment formatting Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	714a7a2288	NVMe: Convert comments to kernel-doc notation Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	b57ab0fada	NVMe: Version 0.4 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	e6d15f79f9	NVMe: Reduce maximum queue depth by 1 The spec says we're not allowed to completely fill the submission queue. Solve this by reducing the number of allocatable cmdids by 1. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	d8ee9d69f2	NVMe: Fix discontiguous accesses When we submit subsequent portions of the I/O, we need to access the updated block, not start reading again from the original position. This was showing up as miscompares in the XFS randholes testcase. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	1ad2f8932a	NVMe: Handle bios that contain non-virtually contiguous addresses NVMe scatterlists must be virtually contiguous, like almost all I/Os. However, when the filesystem lays out files with a hole, it can be that adjacent LBAs map to non-adjacent virtual addresses. Handle this by submitting one NVMe command at a time for each virtually discontiguous range. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	00df5cb4eb	NVMe: Implement Flush Linux implements Flush as a bit in the bio. That means there may also be data associated with the flush; if so the flush should be sent before the data. To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate the completion routine should do nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	c42705592b	NVMe: Mark CMD_CTX_CANCELLED as being unlikely Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	7547881d09	NVMe: Correct SQ doorbell semantics The value written to the doorbell needs to be the first free index in the queue, not the most recently used index in the queue. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	740216fc59	NVMe: Let the kthread take care of devices earlier If interrupts are misconfigured, the kthread will be needed to process admin queue completions. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	b348b7d543	NVMe: Rename nr_queues to nr_io_queues I got confused about whether this included the admin queue or not, and had to resort to reading the spec. It doesn't include the admin queue, so make that clear in the name. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	ca1615424c	NVMe: Remove setting of 'flags' in rw command This was the data transfer bit until spec rev 0.92 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	ad8a5df97c	NVMe: Release 0.3 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	1fa6aeadf1	NVMe: Add a kthread to handle the congestion list Instead of trying to resubmit I/Os in the I/O completion path (in interrupt context), wake up a kthread which will resubmit I/O from user context. This allows mke2fs to run to completion. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	eeee322647	NVMe: Handle failures differently in nvme_submit_bio_queue() Return -EBUSY if the queue is full or -ENOMEM if we failed to allocate memory (or map a scatterlist). Also use GFP_ATOMIC to allocate the nvme_bio and move the locking to the callers of nvme_submit_bio_queue(). In nvme_make_request(), don't permit an I/O to jump the queue -- if the congestion list already has an entry, just add to the tail, rather than trying to submit. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	768308400f	NVMe: Handle physical merging of bvec entries In order to not overrun the sg array, we have to merge physically contiguous pages into a single sg entry. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	1974b1ae88	NVMe: Check for DMA mapping failure If dma_map_sg returns 0 (failure), we need to fail the I/O. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	d567760c40	NVMe: Pass the nvme_dev to nvme_free_prps and nvme_setup_prps We were passing the nvme_queue to access the q_dmadev for the dma_alloc_coherent calls, but since we moved to the dma pool API, we really only need the nvme_dev. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	99802a7aee	NVMe: Optimise memory usage for I/Os between 4k and 128k Add a second memory pool for smaller I/Os. We can pack 16 of these on a single page instead of using an entire page for each one. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	091b609258	NVMe: Switch to use DMA Pool API Calling dma_free_coherent from interrupt context causes warnings. Using the DMA pools delays freeing until pool destruction, so avoids the problem. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	d534df3c73	NVMe: Rename nvme_req_info to nvme_bio There are too many things called 'info' in this driver. This data structure is auxiliary information for a struct bio, so call it nvme_bio, or nbio when used as a variable. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Shane Michael Matthews	e025344c56	NVMe: Initial PRP List support Add a pointer to the nvme_req_info to hold a new data structure (nvme_prps) which contains a list of the pages allocated to this particular request for holding PRP list entries. nvme_setup_prps() now returns this pointer. To allocate and free the memory used for PRP lists, we need a struct device, so we need to pass the nvme_queue pointer to many functions which didn't use to need it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	51882d00f0	NVMe: Advance the sg pointer when filling in an sg list For multipage BIOs, we were always using sg[0] instead of advancing through the list. Oops :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	d2d8703481	NVMe: Renumber the special context values If POISON_POINTER_DELTA isn't defined, ensure they're in page 0 which should never be mapped. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	9294bbed78	NVMe: Handle the congestion list a little better In the bio completion handler, check for bios on the congestion list for this NVM queue. Also, lock the congestion list in the make_request function as the queue may end up being shared between multiple CPUs. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	e85248e516	NVMe: Record the timeout for each command In addition to recording the completion data for each command, record the anticipated completion time. Choose a timeout of 5 seconds for normal I/Os and 60 seconds for admin I/Os. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	ec6ce618d6	NVMe: Need to lock queue during interrupt handling If we're sharing a queue between multiple CPUs and we cancel a sync I/O, we must have the queue locked to avoid corrupting the stack of the thread that submitted the I/O. It turns out this is the same locking that's needed for the threaded irq handler, so share that code. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	48e3d39816	NVMe: Detect command IDs completing that are out of range If the adapter completes a command ID that is outside the bounds of the array, return CMD_CTX_INVALID instead of random data, and print a message in the sync_completion handler (which is rapidly becoming the misc completion handler :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	b36235df01	NVMe: Detect commands that are completed twice Set the context value to CMD_CTX_COMPLETED, and print a message in the sync_completion handler if we see it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	be7b62754e	NVMe: Use a symbolic name to represent cancelled commands instead of 0 I have plans for other special values in sync_completion. Plus, this is more self-documenting, and lets us detect bogus usages. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	58ffacb545	NVMe: Add a module parameter to use a threaded interrupt We're currently calling bio_endio from hard interrupt context. This is not a good idea for preemptible kernels as it will cause longer latencies. Using a threaded interrupt will run the entire queue processing mechanism (including bio_endio) in a thread, which can be preempted. Unfortuantely, it also adds about 7us of latency to the single-I/O case, so make it a module parameter for the moment. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	b1ad37efca	NVMe: Call put_nvmeq() before calling nvme_submit_sync_cmd() We can't have preemption disabled when we call schedule(). Accept the possibility that we'll get preempted, and it'll cost us some cacheline bounces. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	3c0cf138d7	NVMe: Allow fatal signals to interrupt I/O If the user sends a fatal signal, sleeping in the TASK_KILLABLE state permits the task to be aborted. The only wrinkle is making sure that if/when the command completes later that it doesn't upset anything. Handle this by setting the data pointer to 0, and checking the value isn't NULL in the sync completion path. Eventually, bios can be cancelled through this path too. Note that the cmdid isn't freed to prevent reuse. We should also abort the command in the future, but this is a good start. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	db5d0c198d	NVMe: Release 0.2 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	6ee44cdced	NVMe: Add download / activate firmware ioctls Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	388f037f4e	NVMe: Move sysfs entries to the right place Because I wasn't setting driverfs_dev, the devices were showing up under /sys/devices/virtual/block. Now they appear underneath the PCI device which they belong to. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Shane Michael Matthews	5911f20039	NVMe: Disable the device before we write the admin queues In case the card has been left in a partially-configured state, write 0 to the Enable bit. Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	574e8b95bc	NVMe: Request I/O regions Calling pci_request_selected_regions() reserves these regions for our use. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00

1 2 3 4 5 ...

2252 Commits