Commit Graph

589142 Commits

Author SHA1 Message Date
Bob Liu
7b427a5953 xen-blkfront: save uncompleted reqs in blkfront_resume()
Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during
migration, but that's too late after multi-queue was introduced.

After a migrate to another host (which may not have multiqueue support), the
number of rings (block hardware queues) may be changed and the ring and shadow
structure will also be reallocated.

The blkfront_recover() then can't 'save and resubmit' the real
uncompleted reqs because shadow structure have been reallocated.

This patch fixes this issue by moving the 'save' logic out of
blkfront_recover() to earlier place in blkfront_resume().

The 'resubmit' is not changed and still in blkfront_recover().

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
2016-06-29 12:32:39 -04:00
Bob Liu
2a6f71ad99 xen-blkfront: fix resume issues after a migration
After a migrate to another host (which may not have multiqueue
support), the number of rings (block hardware queues)
may be changed and the ring info structure will also be reallocated.

This patch fixes two related bugs:
 * call blk_mq_update_nr_hw_queues() to make blk-core know the number
   of hardware queues have been changed.
 * Don't store rinfo pointer to hctx->driver_data, because rinfo may be
   reallocated so use hctx->queue_num to get the rinfo structure instead.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-06-08 13:54:46 -04:00
Bob Liu
efd1535270 xen-blkfront: don't call talk_to_blkback when already connected to blkback
Sometimes blkfront may twice receive blkback_changed() notification
(XenbusStateConnected) after migration, which will cause
talk_to_blkback() to be called twice too and confuse xen-blkback.

The flow is as follow:
   blkfront                                        blkback
blkfront_resume()
 > talk_to_blkback()
  > Set blkfront to XenbusStateInitialised
                                                front changed()
                                                 > Connect()
                                                  > Set blkback to XenbusStateConnected

blkback_changed()
 > Skip talk_to_blkback()
   because frontstate == XenbusStateInitialised
 > blkfront_connect()
  > Set blkfront to XenbusStateConnected

-----
And here we get another XenbusStateConnected notification leading
to:
-----
blkback_changed()
 > because now frontstate != XenbusStateInitialised
   talk_to_blkback() is also called again
  > blkfront state changed from
  XenbusStateConnected to XenbusStateInitialised
    (Which is not correct!)

						front_changed():
                                                 > Do nothing because blkback
                                                   already in XenbusStateConnected

Now blkback is in XenbusStateConnected but blkfront is still
in XenbusStateInitialised - leading to no disks.

Poking of the XenbusStateConnected state is allowed (to deal with
block disk change) and has to be dealt with. The most likely
cause of this bug are custom udev scripts hooking up the disks
and then validating the size.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-06-08 13:54:39 -04:00
Javier González
116f7d4a21 lightnvm: reserved space calculation incorrect
The nvm_dev->max_pages_per_blk variable was removed in favor of the new
nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
used in rrpc_capacity, reporting the reserved capacity to zero. Replace
with ->sec_per_blk to calculate the reserved area again.

Signed-off-by: Javier González <javier@cnexlabs.com>
Updated patch description. Was "lightnvm: eliminate redundant variable"
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
6d5be9590b lightnvm: rename nr_pages to nr_ppas on nvm_rq
The number of ppas contained on a request is not necessarily the number
of pages that it maps to neither on the target nor on the device side.
In order to avoid confusion, rename nr_pages to nr_ppas since it is what
the variable actually contains.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
df414b33bb lightnvm: add is_cached entry to struct ppa_addr
A target requires a method to identify PPAs that are either cached in
memory or on disk. This can efficiently be maintained within the PPA.
The target host-side translation table can then lookup a PPA and know
from the PPA if it is cached or on disk. In the case it is cached, it is
the responsibility of the target to maintain this cache.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
04a8aa173b lightnvm: expose gennvm_mark_blk to targets
Targets can update a block state when having a reference to an
in-memory virtual block. In the case that a target does not keep the
block metadata in memory, it does not have a way to update this
structure.

Therefore, expose gennvm_mark_blk() through the media managers
->mark_blk() callback and let targets update the state structure through
this callback.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
976bdfcae3 lightnvm: remove mgt targets on mgt removal
Targets associated with a device manager are not freed on device
removal. They have to be manually removed before shutdown. Make sure
any outstanding targets are freed upon shutdown.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Arnd Bergmann
45bbd0529e lightnvm: pass dma address to hardware rather than pointer
A recent change to lightnvm added code to pass a kernel pointer
to the hardware, which gcc complained about:

drivers/nvme/host/lightnvm.c: In function 'nvme_nvm_rqtocmd':
drivers/nvme/host/lightnvm.c:472:32: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  c->ph_rw.metadata = cpu_to_le64(rqd->meta_list);

It looks like this has no way of working anyway, so this changes
the code to pass the dma_address instead. This was most likely
what was intended here. Neither of the two are currently ever
written to, so the effect is the same for now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a34b1eb78e21 ("lightnvm: enable metadata to be sent to device")
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
cca87bc9d3 lightnvm: do not assume sequential lun alloc.
When doing GC, rrpc calculates the physical LUN to which the rrpc block
belongs too. This calculation is based on the assumption that LUNs are
assigned sequentially to the LUN list. Use the reference to the LUN
instead. This saves us the calculation and allows us to align LUNs in a
different manner to, for example, take advantage of devide parallelism.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Sagi Grimberg
b86d8d363e nvme/lightnvm: Log using the ctrl named device
Align with the rest of the nvme subsystem.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
75b8564932 lightnvm: rename dma helper functions
Until now, the dma pool have been exclusively used to allocate the ppa
list being sent to the device. In pblk (upcoming), we use these pools to
allocate metadata too. Thus, we generalize the names of some variables
on the dma helper functions to make the code more readable.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
003fad376b lightnvm: enable metadata to be sent to device
Enable metadata buffer to be sent to the device through the metadata
field on the physical rw nvme command. The size of the metadata buffer
must follow dev->oob_size * # of PPAs.

Signed-off-by: Javier González <javier@cnexlabs.com>
Updated description.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
57682b4915 lightnvm: do not free unused metadata on rrpc
rrpc does not save any metadata on a given request. Thus, do not attempt
to free the metadata dma region.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
293a6e8e27 lightnvm: fix out of bound ppa lun id on bb tbl
The ppa configured for retrieving the bad block table uses the internal
lun id to setup the get bad block ppa. This increases monotonically
with the number luns available. When configuring a ppa, the channel and
lun must be specified separately, leading to an out of bound memory
access in gennvm_block_bb when lun id goes beyond the luns available
within a channel.

Additional, remove out of bound check in gennvm_block_bb(), as it was a
buggy to begin with.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
00ee6cc3b7 lightnvm: refactor set_bb_tbl for accepting ppa list
The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
nr_pages internally. Instead, make these two variables explicit.
This allows a user to call it without initializing a struct nvm_rq
first.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
a63d5cf203 lightnvm: move responsibility for bad blk mgmt to target
We move the responsibility of managing the persistent bad block table to
the target. The target may choose to mark a block bad or retry writing
to it. Never the less, it should be the target that makes the decision
and not the media manager.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
5ebc7d9fe1 lightnvm: make nvm_set_rqd_ppalist() aware of vblks
A virtual block enables a block to identify multiple physical blocks.
This is useful for metadata where a device media supports multiple
planes. In that case, a block, with multiple planes can be managed
as a single vblk. Reducing the metadata required by one forth.

nvm_set_rqd_ppalist() takes care of expanding a ppa_list with vblks
automatically. However, for some use-cases, where only a single physical
block is required, the ppa_list should not be expanded.

Therefore, add a vblk parameter to nvm_set_rqd_ppalist(), and only
expand the ppa_list if vblk is set.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
6659d4d80c lightnvm: remove struct factory_blks
Now that device ops->get_bb_table no longer uses a callback, the
struct factory_blks can be removed.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
e11903f5df lightnvm: refactor device ops->get_bb_tbl()
The device ops->get_bb_tbl() takes a callback, that allows the caller
to use its own callback function to update its data structures in the
returning function.

This makes it difficult to send parameters to the callback, and usually
is circumvented by small private structures, that both carry the callers
state and any flags needed to fulfill the update.

Refactor ops->get_bb_tbl() to fill a data buffer with the status of the
blocks returned, and let the user call the callback function manually.
That will provide the necessary flags and data structures and simplify
the logic around ops->get_bb_tbl().

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
5136061ce7 lightnvm: introduce nvm_for_each_lun_ppa() macro
Users that wish to iterate all luns on a device. Must create a
struct ppa_addr and separate iterators for channels and luns. To set the
iterators, two loops are required, one to iterate channels, and another
to iterate luns. This leads to decrease in readability.

Introduce nvm_for_each_lun_ppa, which implements the nested loop and
sets ppa, channel, and lun variable for each loop body, eliminating
the boilerplate code.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Simon A. F. Lund
6f8645cba5 lightnvm: refactor dev->online_target to global nvm_targets
A target name must be unique. However, a per-device registration of
targets is maintained on a dev->online_targets list, with a per-device
search for targets upon registration.

This results in a name collision when two targets, with the same name,
are created on two different targets, where the per-device list is not
shared.

Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Simon A. F. Lund
6063fe399d lightnvm: rename nvm_targets to nvm_tgt_type
The functions nvm_register_target(), nvm_unregister_target() and
associated list refers to a target type that is being registered by a
target type module. Rename nvm_*_targets() to nvm_*_tgt_type(), so that
the intension is clear.

This enables target instances to use the _nvm_*_targets() naming.

Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Wenwei Tao
909049a719 lightnvm: store rrpc->soffset in device sector size
Since we mainly use soffset in device sector size, we therefore store
this value in rrpc->soffset, instead of the offset in 512byte sector
size. This eliminates the "(ilog2(dev->sec_size) - 9)" calculation on
each I/O.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Updated patch description.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Wenwei Tao
66e3d07f75 lightnvm: calculate rrpc total blocks and sectors up front
Calculate rrpc total blocks and sectors up front, make sense
to use them. For example, we use rrpc->nr_sects to calculate rrpc
area size, but it makes no sense if we don't initialize it up front,
since it would be zero until we finish rrpc luns init.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
7f7c5d03c0 lightnvm: avoid memory leak when lun_map kcalloc fails
A memory leak occurs if the lower page table is initialized and the
following dev->lun_map fails on allocation.

Rearrange the initialization of lower page table to allow dev->lun_map
to fail gracefully without memory leak.

Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Move kfree of dev->lun_map to nvm_free()
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
22e8c9766a lightnvm: move block fold outside of get_bb_tbl()
The get block table command returns a list of blocks and planes
with their associated state. Users, such as gennvm and sysblk,
manages all planes as a single virtual block.

It was therefore  natural to fold the bad block list before it is
returned. However, to allow users, which manages on a per-plane
block level, to also use the interface, the get_bb_tbl interface is
changed to not fold by default and instead let the caller fold if
necessary.

Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
4891d120b9 lightnvm: add fpg_size and pfpg_size to struct nvm_dev
The flash page size (fpg) and size across planes (pfpg) are convenient
to know when allocating buffer sizes. This has previously been a
calculated in various places. Replace with the pre-calculated values.

Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
1145e6351a lightnvm: implement nvm_submit_ppa_list
The nvm_submit_ppa function assumes that users manage all plane
blocks as a single block. Extend the API with nvm_submit_ppa_list
to allow the user to send its own ppa list. If the user submits more
than a single PPA, the user must take care to allocate and free
the corresponding ppa list.

Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
ecfb40c6aa lightnvm: handle submit_io failure
The device ->submit_io() callback might fail to submit I/O to device.
In that case, the nvm_submit_ppa function should not wait for
completion. Instead return the ->submit_io() error.

Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Jeff Mahoney
57aac2f1be lightnvm: fix "warning: ‘ret’ may be used uninitialized"
This fixes the following warnings:
drivers/lightnvm/sysblk.c:125:9: warning: ‘ret’ may be used
uninitialized in this function

drivers/lightnvm/sysblk.c:275:15: warning: ‘ret’ may be used
uninitialized in this function

In both cases, ret is only set from within a loop that may not be entered.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Keith Busch
87c3207781 NVMe: Fix reset/remove race
This fixes a scenario where device is present and being reset, but a
request to unbind the driver occurs.

A previous patch series addressing a device failure removal scenario
flushed reset_work after controller disable to unblock reset_work waiting
on a completion that wouldn't occur. This isn't safe as-is. The broken
scenario can potentially be induced with:

  modprobe nvme && modprobe -r nvme

To fix, the reset work is flushed immediately after setting the controller
removing flag, and any subsequent reset will not proceed with controller
initialization if the flag is set.

The controller status must be polled while active, so the watchdog timer
is also left active until the controller is disabled to cleanup requests
that may be stuck during namespace removal.

[Fixes: ff23a2a15a]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-03 14:00:29 -06:00
Ming Lin
b7b9c22787 nvme: fix nvme_ns_remove() deadlock
On receipt of a namespace attribute changed AER, we acquire the
namespace mutex lock before proceeding to scan and validate the
namespace list. In case of namespace detach/delete command,
nvme_ns_remove function deadlocks trying to acquire the already held
lock.

All callers, except nvme_remove_namespaces(), of nvme_ns_remove()
already held namespaces_mutex. So we can simply fix the deadlock by
not acquiring the mutex in nvme_ns_remove() and acquiring it in
nvme_remove_namespaces().

Reported-by: Sunad Bhandary S <sunad.s@samsung.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:16:13 -06:00
Ming Lin
0bf77e9dbb nvme: switch to RCU freeing the namespace
Switch to RCU freeing the namespace structure so that
nvme_start_queues, nvme_stop_queues and nvme_kill_queues would
be able to get away with only a RCU read side critical section.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimerg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:16:11 -06:00
Wang Sheng-Hui
a5b714ad39 NVMe: correct comment for offset enum of controller registers in nvme.h
Section 3.1 gives the comment for the offset of controller registers
in the specification 1.2a.

Some are mis-copied in the header file nvme.h. Correct them.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:13:35 -06:00
Ming Lin
6904242db1 nvme: add helper nvme_cleanup_cmd()
This hides command cleanup into nvme.h and fabrics drivers will
also use it.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:11:58 -06:00
Christoph Hellwig
f866fc4282 nvme: move AER handling to common code
The transport driver still needs to do the actual submission, but all the
higher level code can be shared.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:09:26 -06:00
Christoph Hellwig
5955be2144 nvme: move namespace scanning to core
Move the scan work item and surrounding code to the common code.  For now
we need a new finish_scan method to allow the PCI driver to set the
irq affinity hints, but I have plans in the works to obsolete this as well.

Note that this moves the namespace scanning from nvme_wq to the system
workqueue, but as we don't rely on namespace scanning to finish from reset
or I/O this should be fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:09:24 -06:00
Christoph Hellwig
92911a55d4 nvme: tighten up state check for namespace scanning
We only should be scanning namespaces if the controller is live.  Currently
we call the function just before setting it live, so fix the code up to
move the call to nvme_queue_scan to just below the state change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:09:23 -06:00
Christoph Hellwig
bb8d261e08 nvme: introduce a controller state machine
Replace the adhoc flags in the PCI driver with a state machine in the
core code.  Based on code from Sagi Grimberg for the Fabrics driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:09:21 -06:00
Christoph Hellwig
04a934d4c7 nvme: remove the io_incapable method
It's unused since "NVMe: Move error handling to failed reset handler".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:09:20 -06:00
Wang Sheng-Hui
23bd63ceea NVMe: nvme_core_exit() should do cleanup in the reverse order as nvme_core_init does
nvme_core_init does:
    1) register_blkdev
    2) __register_chrdev
    3) class_create

nvme_core_exit should do cleanup in the reverse order.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:05:03 -06:00
Keith Busch
3b24774e1f NVMe: Fix check_flush_dependency warning
If the controller fails and is degraded after a reset, we need to kill
off all requests queues before removing the inaccessble namespaces. This
will prevent del_gendisk from syncing dirty data, which we can't due
from a WQ_MEM_RECLAIM work queue.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02 09:03:06 -06:00
Wang Sheng-Hui
b31356dfde NVMe: small typo in section BLK_DEV_NVME_SCSI of host/Kconfig
"as well as " is miss typed "as well a " in section
"config BLK_DEV_NVME_SCSI"

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-26 08:31:50 -06:00
Christoph Hellwig
76e3914ae5 nvme: fix cntlid type
Controller IDs in NVMe are unsigned 16-bit types.  In the Fabrics driver we
actually pass ctrl->id by reference, so we need it to have the correct type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-26 08:31:22 -06:00
Jeff Moyer
49bdedb362 skd: remove broken discard support
Simply creating a file system on an skd device, followed by mount and
fstrim will result in errors in the logs and then a BUG().  Let's remove
discard support from that driver.  As far as I can tell, it hasn't
worked right since it was merged.  This patch also has a side-effect of
cleaning up an unintentional shadowed declaration inside of
skd_end_request.

I tested to ensure that I can still do I/O to the device using xfstests
./check -g quick.  I didn't do anything more extensive than that,
though.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-25 19:12:38 -06:00
Jens Axboe
c888a8f95a block: kill off q->flush_flags
Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-13 13:33:19 -06:00
Guilherme G. Piccoli
c875a7093f nvme: Avoid reset work on watchdog timer function during error recovery
This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.

If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).

In this patch we also have split the huge condition expression on
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-13 08:15:39 -06:00
Jens Axboe
7e19793096 NVMe: silence warning about unused 'dev'
Depending on options, we might not be using dev in nvme_cancel_io():

drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’:
drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable]
  struct nvme_dev *dev = data;
                   ^

So get rid of it, and just cast for the dev_dbg_ratelimited() call.

Fixes: 82b4552b91 ("nvme: Use blk-mq helper for IO termination")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12 16:11:11 -06:00
Jens Axboe
2245f6de6c block: kill blk_queue_flush()
We don't have any drivers left using it, so kill it off. Update
documentation to use the newer blk_queue_write_cache().

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-12 16:00:39 -06:00