linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-15 10:26:42 +07:00

Author	SHA1	Message	Date
Jens Axboe	849c6e7746	NVMe: fix race condition in nvme_submit_sync_cmd() If we have a race between the schedule timing out and the command completing, we could have the task issuing the command exit nvme_submit_sync_cmd() while the irq is running sync_completion(). If that happens, we could be corrupting memory, since the stack that held 'cmdinfo' is no longer valid. Fix this by always calling nvme_abort_cmd_info(). Once that call completes, we know that we have either run sync_completion() if the completion came in, or that we will never run it since we now have special_completion() as the command callback handler. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-12 08:53:40 -07:00
Jens Axboe	fe54303ee2	NVMe: fix retry/error logic in nvme_queue_rq() The logic around retrying and erroring IO in nvme_queue_rq() is broken in a few ways: - If we fail allocating dma memory for a discard, we return retry. We have the 'iod' stored in ->special, but we free the 'iod'. - For a normal request, if we fail dma mapping of setting up prps, we have the same iod situation. Additionally, we haven't set the callback for the request yet, so we also potentially leak IOMMU resources. Get rid of the ->special 'iod' store. The retry is uncommon enough that it's not worth optimizing for or holding on to resources to attempt to speed it up. Additionally, it's usually best practice to free any request related resources when doing retries. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-11 13:58:39 -07:00
Indraneel M	285dffc910	NVMe: Fix FS mount issue (hot-remove followed by hot-add) After Hot-remove of a device with a mounted partition, when the device is hot-added again, the new node reappears as nvme0n1. Mounting this new node fails with the error: mount: mount /dev/nvme0n1p1 on /mnt failed: File exists. The old nodes's FS entries still exist and the kernel can't re-create procfs and sysfs entries for the new node with the same name. The patch fixes this issue. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Indraneel M <indraneel.m@samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-11 08:24:18 -07:00
Jens Axboe	97fe383222	NVMe: fix error return checking from blk_mq_alloc_request() We return an error pointer or the request, not NULL. Half the call paths got it right, the others didn't. Fix those up. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-10 13:02:44 -07:00
Sam Bradshaw	c87fd5407e	NVMe: fix freeing of wrong request in abort path We allocate 'abort_req', but free 'req' in case of an error submitting the IO. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-10 13:00:31 -07:00
Keith Busch	9af8785a38	NVMe: Fix command setup on IO retry On retry, the req->special is pointing to an already setup IOD, but we still need to setup the command context and callback, otherwise you'll see false twice completed errors and leak requests. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-12-03 19:10:45 -07:00
Keith Busch	c78b47136f	NVMe: Update module version major number It's already near impossible to tell what bits someone is running based on a 'modinfo nvme', and I don't want to try guessing if someone is running blk-mq or bio-based. Let's make it obvious with the module version that the blk-mq conversion is a major change. Future bio-based versions can increment to 0.10 in a fork if revisions occur. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-21 18:22:35 -07:00
Jens Axboe	be7837e89d	NVMe: fail pci initialization if the device doesn't have any BARs The PCI init of NVMe doesn't check for valid bars before proceeding to map and use BAR 0. If the device is hosed (or firmware is), then we should catch this case and give up early. This fixes a: [ 1662.035778] WARNING: CPU: 0 PID: 4 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xa7/0xc0() and later badness on such a device. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-20 11:10:06 -07:00
Jens Axboe	2c30540b38	NVMe: add ->exit_hctx() hook If we do teardown and setup of the queue and block related parts of the driver, then we should clear nvmeq->hctx once we kill the hardware queue. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-20 11:10:03 -07:00
Jens Axboe	e32efbfc35	NVMe: make setup work for devices that don't do INTx The setup/probe part currently relies on INTx being there and working, that's not always the case. For devices that don't advertise INTx, enable a single MSIx vector early on and disable it again before we ask for our full range of queue vecs. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-19 12:46:17 -07:00
Jens Axboe	1397688953	NVMe: enable IO stats by default Before the blk-mq conversion they were on by default, we should not change behavior there. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-18 08:45:31 -07:00
Jens Axboe	6dcc0cf6cb	NVMe: nvme_submit_async_admin_req() must use atomic rq allocation We are called for async event notification issues, and the nvmeq lock is already held. If we fail the request allocation, we'll just retry next time. Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-18 08:21:18 -07:00
Jens Axboe	9d135bb8c2	NVMe: replace blk_put_request() with blk_mq_free_request() No point in using blk_put_request(), since we know we are blk-mq. This only makes sense in core code where we could be dealing with either legacy or blk-mq drivers. Additionally, use blk_mq_free_hctx_request() for the request completion fast path, where we already know the mapping from request to hardware queue. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-17 10:43:42 -07:00
kbuild test robot	a64e6bb4db	NVMe: __nvme_submit_admin_cmd() can be static drivers/block/nvme-core.c:865:5: sparse: symbol '__nvme_submit_admin_cmd' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-10 09:27:01 -07:00
Dan Carpenter	9f173b3384	NVMe: blk_mq_alloc_request() returns error pointers We recently converted this to blk_mq but the error checks have to be updated to check for IS_ERR() instead of NULL. Fixes: `a4aea5623d` ('NVMe: Convert to blk-mq') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-05 13:47:03 -07:00
Matias Bjørling	a4aea5623d	NVMe: Convert to blk-mq This converts the NVMe driver to a blk-mq request-based driver. The NVMe driver is currently bio-based and implements queue logic within itself. By using blk-mq, a lot of these responsibilities can be moved and simplified. The patch is divided into the following blocks: * Per-command data and cmdid have been moved into the struct request field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id maintenance are now handled by blk-mq through the rq->tag field. * The logic for splitting bio's has been moved into the blk-mq layer. The driver instead notifies the block layer about limited gap support in SG lists. * blk-mq handles timeouts and is reimplemented within nvme_timeout(). This both includes abort handling and command cancelation. * Assignment of nvme queues to CPUs are replaced with the blk-mq version. The current blk-mq strategy is to assign the number of mapped queues and CPUs to provide synergy, while the nvme driver assign as many nvme hw queues as possible. This can be implemented in blk-mq if needed. * NVMe queues are merged with the tags structure of blk-mq. * blk-mq takes care of setup/teardown of nvme queues and guards invalid accesses. Therefore, RCU-usage for nvme queues can be removed. * IO tracing and accounting are handled by blk-mq and therefore removed. * Queue suspension logic is replaced with the logic from the block layer. Contributions in this patch from: Sam Bradshaw <sbradshaw@micron.com> Jens Axboe <axboe@fb.com> Keith Busch <keith.busch@intel.com> Robert Nelson <rlnelson@google.com> Acked-by: Keith Busch <keith.busch@intel.com> Acked-by: Jens Axboe <axboe@fb.com> Updated for new ->queue_rq() prototype. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:52 -07:00
Keith Busch	9dbbfab7d5	NVMe: Do not over allocate for discard requests Discard requests are often for very large ranges. The discard size is not representative of the data transfer size so we don't need to allocate for such a large prp list. This patch requests allocating only enough for the memory needed for the data transfer and saves a little over 8k of memory per max discard request. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:37 -07:00
Keith Busch	9e60352cf8	NVMe: Do not open disks that are being deleted It is possible the block layer will request to open a block device after the driver deleted it. Subsequent releases will cause a double free, or the disk's private_data is pointing to freed memory. This patch protects the driver's freed disks from being opened and accessed: the nvme namespaces are freed only when the device's refcount is 0, so at that moment there were no active openers and no more should be allowed, and it is safe to clear the disk's private_data that is about to be freed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Henry Chow <henry.chow@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:32 -07:00
Keith Busch	5940c8578f	NVMe: Clear QUEUE_FLAG_STACKABLE The nvme namespace request_queue's flags are initialized to QUEUE_FLAG_DEFAULT, which currently sets QUEUE_FLAG_STACKABLE. The device-mapper indicates this flag means the block driver is requset based, though this driver is bio-based and problems will occur if an nvme namespace is used with a request based dm device. This patch clears the stackable flag. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:10 -07:00
Keith Busch	387caa5a2b	NVMe: Fix device probe waiting on kthread If we ever do parallel device probing, we need to wake up all processes waiting for nvme kthread to start, not just one. This is currently serialized so the bug is not reachable today, but fixing this anyway in the hopes we implement parallel or asynchronous probe in the future. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:10 -07:00
Keith Busch	7963e52181	NVMe: Passthrough IOCTL for IO commands The NVME_IOCTL_SUBMIT_IO only works for IO commands with block data transfers and isn't usable for other NVMe commands like flush, data set management, or any sort of vendor unique command. The NVME_IOCTL_ADMIN_CMD, however, can easily be modified to accept arbitrary IO commands in addition to arbitrary admin commands without breaking backward compatibility. This patch just adds a new IOCTL to distinguish if the driver should submit the command on an IO or Admin queue. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	1b9dbf7fe0	NVMe: Add revalidate_disk callback This adds a callback to revalidate the disk and change its block size and capacity if needed. Before, a user would have to remove + rescan an entire device if they changed the logical block size using an NVMe Format or other vendor specific command; now they can just run something that issues the BLKRRPART IOCTL, like # hdparm -z /dev/nvmeXnY This can also be used in response to the 1.2 Spec's Namespace Attribute Change asynchronous event. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	7be50e93fb	NVMe: Fix nvmeq waitqueue entry initialization We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: `b9afca3efb` Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	b4ff9c8ddb	NVMe: Translate NVMe status to errno This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	e179729a82	NVMe: Remove duplicate compat SG_IO code We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	a96d4f5c2d	NVMe: Reference count pci device If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Nilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Andreea-Cristina Bernat	062261be4e	nvme: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Sam Bradshaw	5905535610	NVMe: Correctly handle IOCTL_SUBMIT_IO when cpus > online queues nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	302c6727e5	NVMe: Fix filesystem sync deadlock on removal This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	f435c2825b	NVMe: Call nvme_free_queue directly Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Matthew Minter <matthew_minter@xyratex.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Dan McLeran	2484f40780	NVMe: Add shutdown timeout as module parameter. The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	7c1b245038	NVMe: Skip orderly shutdown on failed devices Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	a67394790a	NVMe: Whitespace fixes Fixing tabs inadvertently converted to spaces. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	c81f49758a	NVMe: Use pci_stop_and_remove_bus_device_locked() Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	badc34d415	NVMe: Handling devices incapable of I/O This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Dan McLeran	01079522f9	NVMe: Change nvme_enable_ctrl to set EN and manage CC thru ctrl_config. Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	1d09062460	NVMe: Mismatched host/device page size support Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	6fccf9383b	NVMe: Async event request Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Joe Perches	4d51abf9bc	block: Use dma_zalloc_coherent Use the zeroing function instead of dma_alloc_coherent & memset(,0,) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Mike Snitzer	b277da0a8a	block: disable entropy contributions for nonrot devices Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set QUEUE_FLAG_NONROT. Historically, all block devices have automatically made entropy contributions. But as previously stated in commit `e2e1a148` ("block: add sysfs knob for turning off disk entropy contributions"): - On SSD disks, the completion times aren't as random as they are for rotational drives. So it's questionable whether they should contribute to the random pool in the first place. - Calling add_disk_randomness() has a lot of overhead. There are more reliable sources for randomness than non-rotational block devices. From a security perspective it is better to err on the side of caution than to allow entropy contributions from unreliable "random" sources. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-10-04 10:55:32 -06:00
Linus Torvalds	b55b390202	Merge git://git.infradead.org/users/willy/linux-nvme Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...	2014-06-15 15:58:03 -10:00
Keith Busch	f3db22feb5	NVMe: Fix hot cpu notification dead lock There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-13 10:43:34 -04:00
Matthew Wilcox	bd67608a61	NVMe: Rename io_timeout to nvme_io_timeout It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 23:04:30 -04:00
Sam Bradshaw	b4e75cbf13	NVMe: Adhere to request queue block accounting enable/disable Recently, a new sysfs control "iostats" was added to selectively enable or disable io statistics collection for request queues. This patch hooks that control. IO statistics collection is rather expensive on large, multi-node machines with drives pushing millions of iops. Having the ability to disable collection if not needed can improve throughput significantly. As a data point, on a quad E5-4640, I see more than 50% throughput improvement when io statistics accounting is disabled during heavily multi-threaded small block random read benchmarks where device performance is in the million iops+ range. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:58:54 -04:00
Keith Busch	a51afb5433	NVMe: Fix nvme get/put queue semantics The routines to get and lock nvme queues required the caller to "put" or "unlock" them even if getting one returned NULL. This patch fixes that. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:58:34 -04:00
Keith Busch	9d43cf646e	NVMe: Make admin timeout a module parameter Signed-off-by: Keith Busch <keith.busch@intel.com> [made admin_timeout static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:57:56 -04:00
Keith Busch	61e4ce086d	NVMe: Make iod bio timeout a parameter This was originally set to 4 times the IO timeout, but that was when the IO timeout was 5 seconds instead of 30. 20 seconds for total time to failure seemed more reasonable than 2 minutes for most, but other users have requested to make this a module parameter instead. Signed-off-by: Keith Busch <keith.busch@intel.com> [renamed the module parameter to retry_time] [made retry_time static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:56:43 -04:00
Santosh Y	6808c5fb7f	NVMe: Prevent possible NULL pointer dereference kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod' can fail. So check the return value. Signed-off-by: Santosh Y <santosh.sy@samsung.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 16:43:30 -04:00
Keith Busch	f0d54a5417	NVMe: Implement PCIe reset notification callback Quiesce and shutdown the device prior to reset, then restart the device and resume IO after. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-27 11:12:29 -06:00
Matthew Wilcox	21bd78bcf4	NVMe: Enable BUILD_BUG_ON checks Since _nvme_check_size() wasn't being called from anywhere, the compiler was optimising it away ... along with all the link-time build failures that would result if any of the structures were the wrong size. Call it from nvme_exit() for no particular reason. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-09 22:42:39 -04:00

1 2 3

136 Commits