So far harmless, but it's confusing and a bug waiting to happen if the
shifts grow larger than 4.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case the queue is not LIVE (fully functional and connected at the nvmf
level), we cannot allow any commands other than connect to pass through.
Add a new queue state flag NVME_FC_Q_LIVE which is set after nvmf connect
and cleared in queue teardown.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
The driver can handle tracking only one AEN request, so this patch
removes handling for multiple ones.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This was being saved in a structure, but never used anywhere. The queue
size is obtained through other means, so there's no reason to duplicate
this without a user for it.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the lldd api, a lldd may unregister a remoteport (loss of connectivity
or driver unload) or localport (driver unload). The lldd must wait for the
remoteport_delete or localport_delete before completing its actions post
the unregister. The xxx_deletes currently occur only when the xxxport
structure is fully freed after all references are removed. Thus the lldd
may be held hostage until an app or in-kernel entity that has a namespace
open finally closes so the namespace can be removed, the controller
removed, thus the transport objects, thus the lldd.
This patch decouples the transport and os-facing objects from the lldd
and the remoteport and localport. There is a point in all deletions
where the transport will no longer interact with the lldd on behalf of
a controller. That point centers around the association established
with the target/subsystem. It will access the lldd whenever it attempts
to create an association and while the association is active. New
associations may only be created if the remoteport is live (thus the
localport is live). It will not access the lldd after deleting the
association.
Therefore, the patch tracks the count of active controllers - those with
associations being created or that are active - on a remoteport. It also
tracks the number of remoteports that have active controllers, on a
a localport. When a remoteport is unregistered, as soon as there are no
active controllers, the lldd's remoteport_delete may be called and the
lldd may continue. Similarly, when a localport is unregistered, as soon
as there are no remoteports with active controllers, the localport_delete
callback may be made. This significantly speeds up unregistration with
the lldd.
The transport objects continue in suspended status with reconnect timers
running, and upon expiration, normal ref-counting will occur and the
objects will be freed. The transport object may still be held hostage
by the application/kernel module, but that is acceptable.
With this change, the lldd may be fully unloaded and reloaded, and
if registrations occur prior to the timeouts, the nvme controller and
namespaces will resume normally as if a link bounce.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The localport resume was not updating the lldd ops structure. If the
lldd is unloaded and reloaded, the ops pointers will differ.
Additionally, as there are device references taken by the localport,
ensure that resume only resumes if the device matches as well.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a remoteport is unregistered (connectivity lost), the following
actions are taken:
- the remoteport is marked DELETED
- the time when dev_loss_tmo would expire is set in the remoteport
- all controllers on the remoteport are reset.
After a controller resets, it will stall in a RECONNECTING state waiting
for one of the following:
- the controller will continue to attempt reconnect per max_retries and
reconnect_delay. As no remoteport connectivity, the reconnect attempt
will immediately fail. If max reconnects has not been reached, a new
reconnect_delay timer will be schedule. If the current time plus
another reconnect_delay exceeds when dev_loss_tmo expires on the remote
port, then the reconnect_delay will be shortend to schedule no later
than when dev_loss_tmo expires. If max reconnect attempts are reached
(e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without
connectivity, the controller is deleted.
- the remoteport is re-registered prior to dev_loss_tmo expiring.
The resume of the remoteport will immediately attempt to reconnect
each of its suspended controllers.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: updated to use nvme_delete_ctrl]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Check remoteport connectivity before initiating reconnects
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
connectivity loss.
The transport initializes the value in the nvme_fc_register_remoteport()
call. If the value is not set, a default of 60s is set.
Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine,
which allows the lldd to dynamically update the value on an existing
remoteport.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Clean up some of the controller state checks and add the
RESETTING->RECONNECTING state transition.
Specifically:
- the movement of the RESETTING state change and schedule of reset_work
to core doesn't work wiht nvme_fc_error_recovery setting state to
RECONNECTING before attempting to reset. Remove the state change as
the reset request does it.
- In the rare cases where an error occurs right as we're transitioning
to LIVE, defer the controller start actions.
- In error handling on teardown of associations while performing initial
controller creation - avoid quiesce calls on the admin_q. They are
unneeded.
- Add the RESETTING->RECONNECTING transition in the reset handler.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Prevent racing controller reset and delete flows. reset_work must not
ever self-requeue so flushing it suffices.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No change in behavior except that the FC code cancels two work items a
little later now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Move the ->delete_work and the associated helpers to common code instead
of duplicating them in every driver. This also adds the missing reference
get/put for the loop driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No need to have two functions doing the same thing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There's no need to wait for the full nvme_wq, which is now shared,
to flush. flush only the delete_work item.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.
Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same host port
and target port for the same host (hostnqn, hostid) to the same
subsystem. Fails the connection request if an existing controller.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting. This removes double refcounting and gets us an automatic
reference for the character device operations. We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.
Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.
To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.
If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.
Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.
If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.
Change the io completion handler for the following items:
When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.
In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).
Rearranged code flow to remove a largely unneeded local variable.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently, if a frame is lost of command fails as part of initial
association create for a new controller, the new controller connection
request will immediately fail.
Add in an immediate 3 retry loop before giving up.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Move blk_mq_reinit_tagset from blk-mq to nvme core
as the only user of it. Current transports that use
it (rdma, fc) simply implement .reinit_request op.
This patch does not change any functionality.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Added a new fc class and a device node for udev events under it. I
expect the fc class will eventually be the location where the FC SCSI and
FC NVME merge in the future. Therefore names are kept somewhat generic.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
To support auto-connecting to FC-NVME devices upon their dynamic
appearance, add a uevent that can kick off connection scripts.
uevent is posted against the fc_udev device.
patch set tested with the following rule to kick an nvme-cli connect-all
for the FC initiator and FC target ports. This is just an example for
testing and not intended for real life use.
ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"
I will post proposed udev/systemd scripts for possible kernel support.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Sync with NVM Express spec change and FC-NVME 1.18.
FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.
Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the LLDD resets or detaches from an fc port, the LLDD will
deregister all remoteports seen by the fc port and deregister the
localport associated with the fc port. The teardown of the localport
structure will be held off due to reference counting until all the
remoteports are removed (and they are held off until all
controllers/associations to terminated). Currently, if the fc port
is reinit/reattached and registered again as a localport it is
treated as an independent entity from the prior localport and all
prior remoteports and controllers cannot be revived. They are
created as new and separate entities.
This patch changes the localport registration to look at the known
localports that are waiting to be torndown. If they are the same port
based on wwn's, the local port is transitioned out of the teardown
state. This allows the remote ports and controller connections to
be reestablished and resumed as long as the localport can also be
reregistered within the timeout windows.
The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
the functionality and moves the lport get/put routines to avoid
forward references.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The FC-NVME spec hasn't locked down on the format string for TRADDR.
Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
where the wwn's are hex values but not prefixed by 0x.
Most implementations so far expect a string format of
"nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
uses the match_u64 parser which requires a leading 0x prefix to set
the base properly. If it's not there, a match will either fail or return
a base 10 value.
The resolution in T11 is pushing out. Therefore, to fix things now and
to cover any eventuality and any implementations already in the field,
this patch adds support for both formats.
The change consists of replacing the token matching routine with a
routine that validates the fixed string format, and then builds
a local copy of the hex name with a 0x prefix before calling
the system parser.
Note: the same parser routine exists in both the initiator and target
transports. Given this is about the only "shared" item, we chose to
replicate rather than create an interdendency on some shared code.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are cases where threads are in the process of submitting new
io when the LLDD calls in to remove the remote port. In some cases,
the next io actually goes to the LLDD, who knows the remoteport isn't
present and rejects it. To properly recovery/restart these i/o's we
don't want to hard fail them, we want to treat them as temporary
resource errors in which a delayed retry will work.
Add a couple more checks on remoteport connectivity and commonize the
busy response handling when it's seen.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Usually before we teardown the controller we want to:
1. complete/cancel any ctrl inflight works
2. remove ctrl namespaces (only for removal though, resets
shouldn't remove any namespaces).
but we do not want to destroy the controller device as
we might use it for logging during the teardown stage.
This patch adds nvme_start_ctrl() which queues inflight
controller works (aen, ns scan, queue start and keep-alive
if kato is set) and nvme_stop_ctrl() which cancels the works
namespace removal is left to the callers to handle.
Move nvme_uninit_ctrl after we are done with the
controller device.
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues
quiescing/unquiescing respects the submission path rcu grace.
Also, make sure to unquiesce before cleanup the admin queue.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-By: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
We might have more/less queues once we reconnect/reset. For
example due to cpu going online/offline or controller constraints.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Its what the user passed, so its probably a better
idea to keep it intact. Also, limit the number of
I/O queues to max online cpus and the lport maximum
hw queues.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
All transports use either a private cache of controller cap or an on-stack
copy, move it to the generic struct nvme_ctrl. In the future it will also
be maintained by the core.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
All all transports use the queue_count in exactly the same, so move it to
the generic struct nvme_ctrl. In the future it will also be maintained by
the core.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-By: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Currently, the fc transport invokes nvme_fc_error_recovery() on every
io in which the transport detects an error. Which means:
a) it's really noisy on large io loads that all get hit by a link down.
b) we repeatively call nvme_stop_queues() even though queues are
stopped upon the first error or as first steps of reset_work.
Correct by:
Errors are only meaningful if the controller is in the LIVE state.
Thus, enact the reset_work only if LIVE. If called repeatively, state
will have already transitioned.
There's no need to stop the queues here. Let the first steps of
reset_work do the queue stopping.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a controller connection is attempted (say to a subsystem that
does not exist), the first attempt errors out. If another connect
is attempted, it crashes.
Issue is the prior controller has yet execute it's final put, thus
its still on lists. However, opts points on it have been cleared, thus
causing the crash if they are referenced.
Fix is to add the missing put after the nvme_uninit_ctrl() call on
the attachment failure.
Signed-off-by: Paul Ely <Paul.Ely@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html
Wait for io aborts to complete wait converted from msleep look to
using a struct completion.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Current fc transport code, on io termination, is calling
nvme_cleanup_cmd() followed by the transport dma unmap routine
which also calls nvme_cleanup_cmd(). Which means two kfrees occur
on the same address, raising havoc. This resulted in odd data errors,
effectively corruption..
Fix by removing the extraneous double calls. Call now occurs only in
teardown paths and as part of dma unmap routine.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures. Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it. For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This moves the nvme_reset function from the PCIe driver to common code,
renaming it to nvme_reset_ctrl in the process. Additionally a new
helper nvme_reset_ctrl_sync is added for the case where we want to
wait for the reset. To facilitate that the reset_work work structure is
move to the common nvme_ctrl structure and the ->reset_ctrl method is
removed. For now the drivers initialize the reset_work with their own
callback, but longer term we should move to callouts for specific
parts of the reset process and move even more code to the core.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>