Commit Graph

456 Commits

Author SHA1 Message Date
Max Gurtovoy
a536b49785 nvmet: fix error flow during ns enable
In case we fail to enable p2pmem on the current namespace, disable the
backing store device before exiting.

Cc: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-03-28 18:15:03 +01:00
Ming Lei
02db99548d nvmet: fix building bvec from sg list
There are two mistakes for building bvec from sg list for file
backed ns:

- use request data length to compute number of io vector, this way
doesn't consider sg->offset, and the result may be smaller than required
io vectors

- bvec->bv_len isn't capped by sg->length

This patch fixes this issue by building bvec from sg directly, given
the whole IO stack is ready for multi-page bvec.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 3a85a5de29 ("nvme-loop: add a NVMe loopback host driver")

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-03-28 18:15:02 +01:00
Christoph Hellwig
005c674f70 nvmet: ignore EOPNOTSUPP for discard
NVMe DSM is a pure hint, so if the underlying device / file system
does not support discard-like operations we should not fail the
operation but rather return success.

Fixes: 3b031d1599 ("nvmet: add error log support for bdev backend")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 12:57:34 -06:00
James Smart
404ec31df4 nvmet-fc: bring Disconnect into compliance with FC-NVME spec
The FC-NVME spec, when finally approved, modified the disconnect LS
such that the only scope available is the association.

Rework the Disconnect LS processing to be in accordance with the
change.

Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 12:57:34 -06:00
James Smart
0191e7405b nvmet-fc: fix issues with targetport assoc_list list walking
There are two changes:

1) The logic in the __nvmet_fc_free_assoc() routine is bad. It uses
"safe" routines assuming pointers will come back valid.  However, the
intervening next structure being linked can be removed from the list and
the resulting safe pointers are bad, resulting in NULL ptrs being hit.

Correct by scheduling a work element to perform the association delete,
which can be done while under lock.

2) Prior patch that added the work element scheduling left a possible
reference on the object if the work element couldn't be scheduled.

Correct by doing the put on a failing schedule_work() call.

Signed-off-by: Nigel Kirkland <nigel.kirkland@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 12:57:30 -06:00
Yufen Yu
d11de63f2b nvme-loop: init nvmet_ctrl fatal_err_work when allocate
After commit 4d43d395fe (workqueue: Try to catch flush_work() without
INIT_WORK()), it can cause warning when delete nvme-loop device, trace
like:

[   76.601272] Call Trace:
[   76.601646]  ? del_timer+0x72/0xa0
[   76.602156]  __cancel_work_timer+0x1ae/0x270
[   76.602791]  cancel_work_sync+0x14/0x20
[   76.603407]  nvmet_ctrl_free+0x1b7/0x2f0 [nvmet]
[   76.604091]  ? free_percpu+0x168/0x300
[   76.604652]  nvmet_sq_destroy+0x106/0x240 [nvmet]
[   76.605346]  nvme_loop_destroy_admin_queue+0x30/0x60 [nvme_loop]
[   76.606220]  nvme_loop_shutdown_ctrl+0xc3/0xf0 [nvme_loop]
[   76.607026]  nvme_loop_delete_ctrl_host+0x19/0x30 [nvme_loop]
[   76.607871]  nvme_do_delete_ctrl+0x75/0xb0
[   76.608477]  nvme_sysfs_delete+0x7d/0xc0
[   76.609057]  dev_attr_store+0x24/0x40
[   76.609603]  sysfs_kf_write+0x4c/0x60
[   76.610144]  kernfs_fop_write+0x19a/0x260
[   76.610742]  __vfs_write+0x1c/0x60
[   76.611246]  vfs_write+0xfa/0x280
[   76.611739]  ksys_write+0x6e/0x120
[   76.612238]  __x64_sys_write+0x1e/0x30
[   76.612787]  do_syscall_64+0xbf/0x3a0
[   76.613329]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

We fix it by moving fatal_err_work init to nvmet_alloc_ctrl(), which may
more reasonable.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 12:05:39 -06:00
Christoph Hellwig
77141dc6ce nvmet: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:42 -07:00
Christoph Hellwig
3641bd323f nvmet-rdma: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:39 -07:00
Christoph Hellwig
d0ad69043d nvme-loop: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:36 -07:00
Christoph Hellwig
a4b74fcc29 nvmet-fcloop: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:34 -07:00
Christoph Hellwig
4f80fc77fc nvmet-fc: convert to SPDX identifiers
Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20 07:22:31 -07:00
Bart Van Assche
5bc373ff25 nvmet: fix indentation
This patch avoids that smatch complains about inconsistent indentation.

Fixes: a07b4970f4 ("nvmet: add a generic NVMe target") # v4.10
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-02-20 07:17:49 -07:00
Raju Rangoju
5cbab6303b nvmet-rdma: fix null dereference under heavy load
Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.

To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()

Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")

Cc: <stable@vger.kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-23 17:16:59 -07:00
Sagi Grimberg
fb8658581a nvmet-tcp: fix uninitialized variable access
If we end up in nvmet_tcp_try_recv_one with a bogus state
queue receive state we will access result which is uninitialized.

Initialize restult to 0 which will be considered as if no data
was received by the tcp socket.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-16 09:44:20 -07:00
Sagi Grimberg
26c682274e nvme-fabrics: allow nvmf_connect_io_queue to poll
Preparation for polling support for fabrics. Polling support
means that our completion queues are not generating any interrupts
which means we need to poll for the nvmf io queue connect as well.

Reviewed by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18 17:50:48 +01:00
Christoph Hellwig
f4d10b5c85 nvmet-tcp: fix endianess annotations
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18 17:50:45 +01:00
Chaitanya Kulkarni
5698b805fb nvmet: use a macro for default error location
This patch defines a new macro NVMET_NO_ERROR_LOC to represent the
default error location value in the nvme-error-log-page.
This is a pure cleanup patch and it does not change any functionality.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18 17:50:44 +01:00
Colin Ian King
66c6afbd73 nvmet: fix comparison of a u16 with -1
Currently the u16 req->error_loc is being compared to -1 which
will always be false.  Fix this by casting -1 to u16 to fix this.

Detected by clang:
  warning: result of comparison of constant -1 with expression of
  type 'u16' (aka 'unsigned short') is always false
  [-Wtautological-constant-out-of-range-compare]

Fixes: 76574f37bf ("nvmet: add interface to update error-log page")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18 17:50:43 +01:00
Chaitanya Kulkarni
23454d59cc nvmet: update smart log with num err log entries
Now that we have error log page implementation update smart log command
handler to provide number of error log entries in the lifetime of the
controller field.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:07 +01:00
Chaitanya Kulkarni
11ad507784 nvmet: add error log page cmd handler
Now that we have support for all the major parts of the target we add
a NVMe error log page handler so that host can read the log page.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:06 +01:00
Chaitanya Kulkarni
c6aa3542e0 nvmet: add error log support for file backend
This patch adds support for the file backend to populate the
error log entries. Here we map the errno to the NVMe status codes.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:06 +01:00
Chaitanya Kulkarni
3b031d1599 nvmet: add error log support for bdev backend
This patch adds the support for the block device backend to populate the
error log entries. Here we map the blk_status_t to the NVMe status.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:05 +01:00
Chaitanya Kulkarni
2da6e00580 nvmet: add error log support for admin-cmd
This patch adds the support to maintain the error log page for admin
commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:05 +01:00
Chaitanya Kulkarni
762a11dfee nvmet: add error log support for rdma backend
This patch adds the support to maintain the error log page for rdma
transport, we mainly focus here on the NVME_INVALID_FIELD errors.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:04 +01:00
Chaitanya Kulkarni
84faf42b8a nvmet: add error log support for fabrics-cmd
This patch adds the support to maintain error log page for the fabrics
prop get, prop set, and admin connect commands. Here we also update the
discovery.c and add update set/get features and parse functions to
support error log page.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:04 +01:00
Chaitanya Kulkarni
e81446afc7 nvmet: add error log support in the core
This patch adds the support to maintain error log page for the
nvmet-core.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:03 +01:00
Chaitanya Kulkarni
76574f37bf nvmet: add interface to update error-log page
This patch adds nvmet_req based interface to the nvmet-core so that
we can update the error log page. We update error log page in
the request completion path when status is not set to NVME_SC_SUCCESS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:03 +01:00
Chaitanya Kulkarni
e4a976254e nvmet: add error-log definitions
This patch adds necessary fields in the target data structures to
support error log page. For a target controller, we add a new error log
field to maintain the error log, at any given point we maintain error
entries equal to NVMET_ERROR_LOG_SLOTS for each controller. In the
following patch, we also update the error log page entry in the I/O
completion path so we introduce a spinlock for synchronization of the
log.

For nvmet_req, we add a new field error_loc to hold the location of
the error in the command when the actual error occurs for each request
and a starting LBA if applicable.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:02 +01:00
Chaitanya Kulkarni
b7c8f3663d nvme: remove nvme_common command cdw10 array
This is a preparation patch which removes the nvme common command cdw10
array and replace with individual fields. This is needed for the nvmet
error log page implementation make is error log page entry offset
assignment easier.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:01 +01:00
Sagi Grimberg
16d3a280d4 nvmet: remove unused variable
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:59:00 +01:00
Sagi Grimberg
ad4f530e95 nvmet: allow configfs tcp trtype configuration
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:58 +01:00
Sagi Grimberg
872d26a391 nvmet-tcp: add NVMe over TCP target driver
This patch implements the TCP transport driver for the NVMe over Fabrics
target stack. This allows exporting NVMe over Fabrics functionality over
good old TCP/IP.

The driver implements the TP 8000 of how nvme over fabrics capsules and
data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte
stream. nvme-tcp header and data digest are supported as well.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:57 +01:00
Sagi Grimberg
1672ddb8d6 nvmet: Add install_queue callout
nvmet-tcp will implement it to allocate queue commands which
are only known at nvmf connect time (sq size).

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:55 +01:00
Jens Axboe
96f774106e Linux 4.20-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
 zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
 OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
 giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
 UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
 VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
 MkmA1lI=
 =7kaD
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc6' into for-4.21/block

Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-09 17:45:40 -07:00
Chaitanya Kulkarni
5a3a6d6965 nvmet: fix the structure member indentation
This is a cleanup patch which fixes the structure member indentation
introduced by the p2p:

commit c6925093d0 ("nvmet: Optionally use PCI P2P memory").
We don't change any functionality in this patch.

This is needed so that any future members will also follow the uniform
indentation.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:58 -07:00
Chaitanya Kulkarni
cb019da3da nvmet: use unlikely for req status check
This patch adds unlikely in the nvmet request completion path for the
status check in the low level function __nvmet_req_complete.
This is helpful in the scenario where host and target connection is
working smoothly.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:58 -07:00
Israel Rukshin
ad1f824948 nvmet-rdma: Add unlikely for response allocated check
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:58 -07:00
Sagi Grimberg
9b95d2fb85 nvmet: expose support for fabrics SQ flow control disable in treq
Technical Proposal introduces an indication for SQ flow control
disable support. Expose it since we are able to operate in this mode.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Sagi Grimberg
0445e1b5a2 nvmet: don't override treq upon modification.
Only override the allowed parts of it.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Sagi Grimberg
e6a622fd6d nvmet: support fabrics sq flow control
Technical proposal 8005 "fabrics SQ flow control" introduces a mode
where a host and controller agree to omit sq_head pointer updates
when sending nvme completions.

In case the host indicated desire to operate in this mode (connect attribute)
the controller will return back a connect completion with sq_head value
of 0xffff as indication that it will omit sq_head pointer updates.

This mode saves us an atomic update in the I/O path.

Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: suggested better implementation]
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
James Smart
6e2e312ea7 nvmet-fc: remove the IN_ISR deferred scheduling options
All target lldd's call the cmd receive and op completions in non-isr
thread contexts. As such the IN_ISR options are not necessary.
Remove the functionality and flags, which also removes cpu assignments
to queues.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Christoph Hellwig
03198c4d9f nvmet: mark nvmet_genctr static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Jay Sternberg
b662a07857 nvmet: enable Discovery Controller AENs
Add functions to find connections requesting Discovery Change events
and send a notification to hosts that maintain an explicit persistent
connection and have and active Asynchronous Event Request pending.
Only Hosts that have access to the Subsystem effected by the change
will receive notifications of Discovery Change event.

Call these functions each time there is a configfs change that effects
the Discover Log Pages.

Set the OAES field in the Identify Controller response to advertise the
support for Asynchronous Event Notifications.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Sagi Grimberg
253928eec6 nvmet: allow host connect even if no allowed subsystems are exported
It is perfectly valid that a host connects to a discovery subsystem
and gets an empty discovery log page since no subsystems are
provisioned to it. No reason to disallow connecting to the discovery
subsystem all together.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:57 -07:00
Jay Sternberg
6a8ec0ac5e nvmet: add support to Discovery controllers for commands
Add custom get/set features to commands allowed by Discovery controllers.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Jay Sternberg
f301c2b136 nvmet: add defines for discovery change async events
Add AEN/AER values as defined by the specification

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Jay Sternberg
90107455cc nvmet: make kato and AEN processing for use by other controllers
Make common process of get/set features available to other controllers by
making simple functions static inline and others not static and prototypes
in nvmet.h file

Also remove static from nvmet_execute_async_event and add prototype to
nvmet.h to allow used by other controllers

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Jay Sternberg
f9362ac173 nvmet: allow Keep Alive for Discovery controller
Per change to specification allowing Discovery controllers to have
explicit persistent connections, remove restriction on Discovery
controllers allowing kato on connect.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Jay Sternberg
7114ddeb40 nvmet: change aen mask functions to use bit numbers
Functions nvmet_aen_disabled and nvmet_clear_aen were using
values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
and test_and_set_bit.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Jay Sternberg
6c8312ad50 nvmet: provide aen bit functions for multiple controller types
Move nvmet_aen_disabled and nvmet_clear_aen in preparation for other types
of controllers to use, initially the discovery controller.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Chaitanya Kulkarni
50a909db36 nvmet: use IOCB_NOWAIT for file-ns buffered I/O
This patch optimizes read command behavior when file-ns configured
with buffered I/O. Instead of offloading the buffered I/O read operations
to the worker threads, we first issue the read operation with IOCB_NOWAIT
and try and access the data from the cache. Here we only offload the
request to the worker thread and complete the request in the worker
thread context when IOCB_NOWAIT request fails.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Sagi Grimberg
c09305ae49 nvmet: support for traffic based keep-alive
A controller that supports traffic based keep-alive can restart the keep
alive timer even when no keep-alive was not received in the kato period
as long as other admin or I/O commands were received.  For each command
set ctrl->cmd_seen to true, and when keep-alive timer expires, if any
commands were seen, resched ka_work instead of escalating to a fatal
error.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:56 -07:00
Sagi Grimberg
12b2117161 nvme: introduce ctrl attributes enumeration
We are growing more controller attributes, so use a proper enumeration
for it.  For now just add the 128-bit hostid which we support.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07 22:26:55 -07:00
Israel Rukshin
d7dcdf9d4e nvmet-rdma: fix response use after free
nvmet_rdma_release_rsp() may free the response before using it at error
flow.

Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-07 07:11:11 -08:00
Jens Axboe
e7d9439107 nvme: remove opportunistic polling from bdev target
It doesn't set HIPRI on the bio, so polling for it is pretty silly.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-26 08:25:48 -07:00
Linus Torvalds
dc5db21865 for-linus-20181109
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlvl3xoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsoUEACecROMzIey+QOJ9o0YSaK5tlBWoC1OyK72
 ncHCyxw+uW+qWa4Csosib7BRomz3E0nsQzf0b3MJiiZJGXnbfxRP0GWDrriK+635
 UeVmWQfU2Ga2QmXGpnlSSF7mHHZWOWG7hUonGUBgCOiN2tLymHrnjmmPf7TuDuav
 CliIgjan124MamohXyjcIa0EtySZmkTlJAxOtoOlxXwitLWnd28Vb3kOhG4dHXQs
 lKO94R2souY00PX/e/8GtkAf1Em+5yJea/LAcOW8+qHA1WYzDqp1/TG/STLX0MVP
 rgqg4E8j1LzCCmunaneIuSeo6yQhZPF2OHMGD8ERn+5c5RY0J4wsBAW2ZO0aqmBP
 nfv+ZdZa71NU+pEEkOe79B5l1JiQNwTCu1ay/kfSGQpAnd2ejnY6QqiLnfCEgqWG
 mcZi/JKcUst8I+MCV9vSjzl9GvMY7Wn5Fy031T4Qog2F0ZPns3FyNLuOLXXjkH7C
 h5uliwuvLV3TpVWgWfVh8R7/j3M0dLndl5G5EYHwkb7It59r4rmNfgm1llZPFhWq
 TENDvq5vLJaPXh5NIyxR1TowJNfgVembr5XiSJ7nOt8IFfJvkcfVIbQqRcpBk8vG
 ygde6xKwGSxsfRoIhNTFU0mlphFpNgske600oZkm6PZq6DnivS9/4PD77nIH7p5v
 6kR8aG43uQ==
 =Nry6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20181109' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:

 - Two fixes for an ubd regression, one for missing locking, and one for
   a missing initialization of a field. The latter was an old latent
   bug, but it's now visible and triggers (Me, Anton Ivanov)

 - Set of NVMe fixes via Christoph, but applied manually due to a git
   tree mixup (Christoph, Sagi)

 - Fix for a discard split regression, in three patches (Ming)

 - Update libata git trees (Geert)

 - SPDX identifier for sata_rcar (Kuninori Morimoto)

 - Virtual boundary merge fix (Johannes)

 - Preemptively clear memory we are going to pass to userspace, in case
   the driver does a short read (Keith)

* tag 'for-linus-20181109' of git://git.kernel.dk/linux-block:
  block: make sure writesame bio is aligned with logical block size
  block: cleanup __blkdev_issue_discard()
  block: make sure discard bio is aligned with logical block size
  Revert "nvmet-rdma: use a private workqueue for delete"
  nvme: make sure ns head inherits underlying device limits
  nvmet: don't try to add ns to p2p map unless it actually uses it
  sata_rcar: convert to SPDX identifiers
  ubd: fix missing initialization of io_req
  block: Clear kernel memory before copying to user
  MAINTAINERS: Fix remaining pointers to obsolete libata.git
  ubd: fix missing lock around request issue
  block: respect virtual boundary mask in bvecs
2018-11-09 16:31:51 -06:00
Christoph Hellwig
d39aa49792 Revert "nvmet-rdma: use a private workqueue for delete"
This reverts commit 2acf70ade7.

The commit never really fixed the intended issue and caused all
kinds of other issues, including a use before initialization.

Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 06:14:47 -07:00
Sagi Grimberg
21d3bbdd4c nvmet: don't try to add ns to p2p map unless it actually uses it
Even without CONFIG_P2PDMA this results in a error print:
nvmet: no peer-to-peer memory is available that's supported by rxe0 and /dev/nullb0

Fixes: c6925093d0 ("nvmet: Optionally use PCI P2P memory")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09 06:14:47 -07:00
Linus Torvalds
9931a07d51 Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "AFS series, with some iov_iter bits included"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  missing bits of "iov_iter: Separate type from direction and use accessor functions"
  afs: Probe multiple fileservers simultaneously
  afs: Fix callback handling
  afs: Eliminate the address pointer from the address list cursor
  afs: Allow dumping of server cursor on operation failure
  afs: Implement YFS support in the fs client
  afs: Expand data structure fields to support YFS
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Calc callback expiry in op reply delivery
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Implement the YFS cache manager service
  afs: Remove callback details from afs_callback_break struct
  afs: Commit the status on a new file/dir/symlink
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Don't invoke the server to read data beyond EOF
  afs: Add a couple of tracepoints to log I/O errors
  afs: Handle EIO from delivery function
  afs: Fix TTL on VL server and address lists
  afs: Implement VL server rotation
  afs: Improve FS server rotation error handling
  ...
2018-11-01 19:58:52 -07:00
Linus Torvalds
bd6bf7c104 pci-v4.20-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
 oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
 qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
 XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
 mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
 BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
 npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
 Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
 bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
 d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
 rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
 /Nkhov32/n6GjxQ=
 =58I4
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - Fix ASPM link_state teardown on removal (Lukas Wunner)

 - Fix misleading _OSC ASPM message (Sinan Kaya)

 - Make _OSC optional for PCI (Sinan Kaya)

 - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
   (Patrick Talbert)

 - Remove x86 and arm64 node-local allocation for host bridge structures
   (Punit Agrawal)

 - Pay attention to device-specific _PXM node values (Jonathan Cameron)

 - Support new Immediate Readiness bit (Felipe Balbi)

 - Differentiate between pciehp surprise and safe removal (Lukas Wunner)

 - Remove unnecessary pciehp includes (Lukas Wunner)

 - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)

 - Tolerate PCIe Slot Presence Detect being hardwired to zero to
   workaround broken hardware, e.g., the Wilocity switch/wireless device
   (Lukas Wunner)

 - Unify pciehp controller & slot structs (Lukas Wunner)

 - Constify hotplug_slot_ops (Lukas Wunner)

 - Drop hotplug_slot_info (Lukas Wunner)

 - Embed hotplug_slot struct into users instead of allocating it
   separately (Lukas Wunner)

 - Initialize PCIe port service drivers directly instead of relying on
   initcall ordering (Keith Busch)

 - Restore PCI config state after a slot reset (Keith Busch)

 - Save/restore DPC config state along with other PCI config state
   (Keith Busch)

 - Reference count devices during AER handling to avoid race issue with
   concurrent hot removal (Keith Busch)

 - If an Upstream Port reports ERR_FATAL, don't try to read the Port's
   config space because it is probably unreachable (Keith Busch)

 - During error handling, use slot-specific reset instead of secondary
   bus reset to avoid link up/down issues on hotplug ports (Keith Busch)

 - Restore previous AER/DPC handling that does not remove and
   re-enumerate devices on ERR_FATAL (Keith Busch)

 - Notify all drivers that may be affected by error recovery resets
   (Keith Busch)

 - Always generate error recovery uevents, even if a driver doesn't have
   error callbacks (Keith Busch)

 - Make PCIe link active reporting detection generic (Keith Busch)

 - Support D3cold in PCIe hierarchies during system sleep and runtime,
   including hotplug and Thunderbolt ports (Mika Westerberg)

 - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
   are empty or occupied (Jon Derrick)

 - Remove duplicated include from pci/pcie/err.c and unused variable
   from cpqphp (YueHaibing)

 - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
   Pawandeep)

 - Uninline PCI bus accessors for better ftracing (Keith Busch)

 - Remove unused AER Root Port .error_resume method (Keith Busch)

 - Use kfifo in AER instead of a local version (Keith Busch)

 - Use threaded IRQ in AER bottom half (Keith Busch)

 - Use managed resources in AER core (Keith Busch)

 - Reuse pcie_port_find_device() for AER injection (Keith Busch)

 - Abstract AER interrupt handling to disconnect error injection (Keith
   Busch)

 - Refactor AER injection callbacks to simplify future improvments
   (Keith Busch)

 - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)

 - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)

 - Add switch fall-through annotations (Gustavo A. R. Silva)

 - Remove unused Switchtec quirk variable (Joshua Abraham)

 - Fix pci.c kernel-doc warning (Randy Dunlap)

 - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)

 - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)

 - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
   useless dmesg errors (Logan Gunthorpe)

 - Update Switchtec NTB documentation (Wesley Yung)

 - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)

 - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)

 - Add PCI support for peer-to-peer DMA (Logan Gunthorpe)

 - Add sysfs group for PCI peer-to-peer memory statistics (Logan
   Gunthorpe)

 - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
   Gunthorpe)

 - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
   Gunthorpe)

 - Add PCI peer-to-peer DMA driver writer's documentation (Logan
   Gunthorpe)

 - Add block layer flag to indicate driver support for PCI peer-to-peer
   DMA (Logan Gunthorpe)

 - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
   memory (Logan Gunthorpe)

 - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
   Gunthorpe)

 - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
   Gunthorpe)

 - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
   Christoph Hellwig, Logan Gunthorpe)

 - Cache VF config space size to optimize enumeration of many VFs
   (KarimAllah Ahmed)

 - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)

 - Fix VMD AERSID quirk Device ID matching (Jon Derrick)

 - Fix Cadence PHY handling during probe (Alan Douglas)

 - Signal Cadence Endpoint interrupts via AXI region 0 instead of last
   region (Alan Douglas)

 - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
   Douglas)

 - Remove redundant controller tests for "device_type == pci" (Rob
   Herring)

 - Document R-Car E3 (R8A77990) bindings (Tho Vu)

 - Add device tree support for R-Car r8a7744 (Biju Das)

 - Drop unused mvebu PCIe capability code (Thomas Petazzoni)

 - Add shared PCI bridge emulation code (Thomas Petazzoni)

 - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)

 - Add aardvark Root Port emulation (Thomas Petazzoni)

 - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)

 - Add initial power management for i.MX7 (Leonard Crestez)

 - Add PME_Turn_Off support for i.MX7 (Leonard Crestez)

 - Fix qcom runtime power management error handling (Bjorn Andersson)

 - Update TI dra7xx unaligned access errata workaround for host mode as
   well as endpoint mode (Vignesh R)

 - Fix kirin section mismatch warning (Nathan Chancellor)

 - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)

 - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)

 - Update Keystone to use MRRS quirk for host bridge instead of open
   coding (Kishon Vijay Abraham I)

 - Refactor Keystone link establishment (Kishon Vijay Abraham I)

 - Simplify and speed up Keystone link training (Kishon Vijay Abraham I)

 - Remove unused Keystone host_init argument (Kishon Vijay Abraham I)

 - Merge Keystone driver files into one (Kishon Vijay Abraham I)

 - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
   Abraham I)

 - Rename Keystone functions for uniformity (Kishon Vijay Abraham I)

 - Add Keystone device control module DT binding (Kishon Vijay Abraham
   I)

 - Use SYSCON API to get Keystone control module device IDs (Kishon
   Vijay Abraham I)

 - Clean up Keystone PHY handling (Kishon Vijay Abraham I)

 - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)

 - Clean up Keystone config space access checks (Kishon Vijay Abraham I)

 - Get Keystone outbound window count from DT (Kishon Vijay Abraham I)

 - Clean up Keystone outbound window configuration (Kishon Vijay Abraham
   I)

 - Clean up Keystone DBI setup (Kishon Vijay Abraham I)

 - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)

 - Fix Keystone IRQ status checking (Kishon Vijay Abraham I)

 - Add debug messages for all Keystone errors (Kishon Vijay Abraham I)

 - Clean up Keystone includes and macros (Kishon Vijay Abraham I)

 - Fix Mediatek unchecked return value from devm_pci_remap_iospace()
   (Gustavo A. R. Silva)

 - Fix Mediatek endpoint/port matching logic (Honghui Zhang)

 - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
   Zhang)

 - Remove redundant Mediatek PM domain check (Honghui Zhang)

 - Convert Mediatek to pci_host_probe() (Honghui Zhang)

 - Fix Mediatek MSI enablement (Honghui Zhang)

 - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)

 - Add Mediatek loadable module support (Honghui Zhang)

 - Detach VMD resources after stopping root bus to prevent orphan
   resources (Jon Derrick)

 - Convert pcitest build process to that used by other tools (iio, perf,
   etc) (Gustavo Pimentel)

* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI/AER: Refactor error injection fallbacks
  PCI/AER: Abstract AER interrupt handling
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Use managed resource allocations
  PCI: pcie: Remove redundant 'default n' from Kconfig
  PCI: aardvark: Implement emulated root PCI bridge config space
  PCI: mvebu: Convert to PCI emulated bridge config space
  PCI: mvebu: Drop unused PCI express capability code
  PCI: Introduce PCI bridge emulated config space common logic
  PCI: vmd: Detach resources after stopping root bus
  nvmet: Optionally use PCI P2P memory
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
  block: Add PCI P2P flag for request queue
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  ...
2018-10-25 06:50:48 -07:00
David Howells
aa563d7bca iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Logan Gunthorpe
c6925093d0 nvmet: Optionally use PCI P2P memory
Create a configfs attribute in each nvme-fabrics namespace to enable P2P
memory use.  The attribute may be enabled (with a boolean) or a specific
P2P device may be given (with the device's PCI name).

When enabled, the namespace will ensure the underlying block device
supports P2P and is compatible with any specified P2P device.  If no device
was specified it will ensure there is compatible P2P memory somewhere in
the system.  Enabling a namespace with P2P memory will fail with EINVAL
(and an appropriate dmesg error) if any of these conditions are not met.

Once a controller is set up on a specific port, the P2P device to use for
each namespace will be found and stored in a radix tree by namespace ID.
When memory is allocated for a request, the tree is used to look up the P2P
device to allocate memory against.  If no device is in the tree (because no
appropriate device was found), or if allocation of P2P memory fails, fall
back to using regular memory.

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-17 12:18:24 -05:00
Logan Gunthorpe
5b2322e48c nvmet: Introduce helper functions to allocate and free request SGLs
Add helpers to allocate and free the SGL in a struct nvmet_req:

  int nvmet_req_alloc_sgl(struct nvmet_req *req)
  void nvmet_req_free_sgl(struct nvmet_req *req)

This will be expanded in a future patch to implement peer-to-peer memory
DMAs and should be common with all target drivers.

The new helpers are used in nvmet-rdma.  Seeing we use req.transfer_len as
the length of the SGL it is set earlier and cleared on any error.  It also
seems to be unnecessary to accumulate the length as the map_sgl functions
should only ever be called once per request.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
2018-10-17 12:18:23 -05:00
Bart Van Assche
1216e9ef18 nvmet-fcloop: suppress a compiler warning
Building with W=1 enables the compiler warning -Wimplicit-fallthrough=3. That
option does not recognize the fall-through comment in the fcloop driver. Add
a fall-through comment that is recognized for -Wimplicit-fallthrough=3. This
patch avoids that the compiler reports the following warning when building
with W=1:

drivers/nvme/target/fcloop.c:647:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (op == NVMET_FCOP_READDATA)
      ^

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:31 +02:00
Bart Van Assche
1c4665272c nvmet-fc: fix kernel-doc headers
This patch avoids that the kernel-doc tool complains about two function
headers when building with W=1.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:30 +02:00
Bart Van Assche
8eacd1bd21 nvmet: avoid integer overflow in the discard code
Although I'm not sure whether it is a good idea to support large discard
commands, I think integer overflow for discard ranges larger than 4 GB
should be avoided. This patch avoids that smatch reports the following:

drivers/nvme/target/io-cmd-file.c:249:1 nvmet_file_execute_discard() warn: should '((range.nlb)) << req->ns->blksize_shift' be a 64 bit type?

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:27 +02:00
Bart Van Assche
0d3ebdec93 nvmet-rdma: declare local symbols static
This patch avoids that sparse complains about missing declarations.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:27 +02:00
Bart Van Assche
5eadc9cce1 nvmet: use strlcpy() instead of strcpy()
Although the code modified by this patch looks fine to me, this patch avoids
that Coverity reports the following complaint (ID 1364971 and ID 1364973):
"You might overrun the 256-character fixed-size string id->subnqn".

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:26 +02:00
Bart Van Assche
43a6f8fb61 nvmet: use strcmp() instead of strncmp() for subsystem lookup
strncmp() stops comparing when either the end of one of the first two arguments
is reached or when 'n' characters have been compared, whichever comes first.
That means that strncmp(s1, s2, n) is equivalent to strcmp(s1, s2) if n exceeds
the length of s1 or the length of s2. Since that is the case in
nvmet_find_get_subsys(), change strncmp() into strcmp(). This patch avoids that
the following warning is reported by smatch:

drivers/nvme/target/core.c:940:1 nvmet_find_get_subsys() error: strncmp() '"nqn.2014-08.org.nvmexpress.discovery"' too small (37 vs 223)

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:24 +02:00
Chaitanya Kulkarni
48440ab6dc nvmet: remove unreachable code
Get rid of the unreachable code in the nvmet_parse_discovery_cmd().
Keep the error message identical to the admin-cmd.c and io-cmd*.c

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17 08:58:23 +02:00
Sagi Grimberg
2acf70ade7 nvmet-rdma: use a private workqueue for delete
Queue deletion is done asynchronous when the last reference on the queue
is dropped.  Thus, in order to make sure we don't over allocate under a
connect/disconnect storm, we let queue deletion complete before making
forward progress.

However, given that we flush the system_wq from rdma_cm context which
runs from a workqueue context, we can have a circular locking complaint
[1]. Fix that by using a private workqueue for queue deletion.

[1]:
======================================================
WARNING: possible circular locking dependency detected
4.19.0-rc4-dbg+ #3 Not tainted
------------------------------------------------------
kworker/5:0/39 is trying to acquire lock:
00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]

but task is already holding lock:
00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 ((work_completion)(&queue->release_work)){+.+.}:
       process_one_work+0x474/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #2 ((wq_completion)"events"){+.+.}:
       flush_workqueue+0xf3/0x970
       nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma]
       cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
nvme nvme0: Reconnecting in 10 seconds...

-> #1 (&id_priv->handler_mutex/1){+.+.}:
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #0 (&id_priv->handler_mutex){+.+.}:
       lock_acquire+0xc5/0x200
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       rdma_destroy_id+0x6f/0x440 [rdma_cm]
       nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

Fixes: 777dc82395 ("nvmet-rdma: occasionally flush ongoing controller teardown")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Bart Van Assche <bvanassche@acm.org>

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-05 09:25:18 +02:00
Sagi Grimberg
73383adfad nvmet: don't split large I/Os unconditionally
If we know that the I/O size exceeds our inline bio vec, no
point using it and split the rest to begin with. We could
in theory reuse the inline bio and only allocate the bio_vec,
but its really not worth optimizing for.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:13 -07:00
James Smart
ea96d6496f nvmet_fc: support target port removal with nvmet layer
Currently, if a targetport has been connected to via the nvmet config
(in other words, the add_port() transport routine called, and the nvmet
port pointer stored for using in upcalls on new io), and if the
targetport is then removed (say the lldd driver decides to unload or
fully reset its hardware) and then re-added (the lldd driver reloads or
reinits its hardware), the port pointer has been lost so there's no way
to continue to post commands up to nvmet via the transport port.

Correct by allocating a small "port context" structure that will be
linked to by the targetport. The context will save the targetport WWN's
and the nvmet port pointer to use for it.  Initial allocation will occur
when the targetport is bound to via add_port.  The context will be
deallocated when remove_port() is called.  If a targetport is removed
while nvmet has the active port context, the targetport will be unlinked
from the port context before removal.  If a new targetport is registered,
the port contexts without a binding are looked through and if the WWN's
match (so it's the same as nvmet's port context) the port context is
linked to the new target port.  Thus new io can be received on the new
targetport and operation resumes with nvmet.

Additionally, this also resolves nvmet configuration changing out from
underneath of the nvme-fc target port (for example: a nvmetcli clear).

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:10 -07:00
Milan P. Gandhi
d4e4230c8f nvme-fc: fix for a minor typos
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:09 -07:00
Chaitanya Kulkarni
d93cb3927c nvmet: remove redundant module prefix
This patch removes the redundant module prefix used in the pr_err() when
nvmet_get_smart_log_nsid() failed to find the namespace provided as a part
of smart-log command.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 14:16:09 -07:00
Hannes Reinecke
be1277f5eb nvme: count all ANA groups for ANA Log page
When issuing a short read on the ANA log page the number of groups
should not change, even though the final returned data might contain
less groups than that number.

Signed-off-by: Hannes Reinecke <hare@suse.com>
[switched to a for loop]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-17 15:49:40 +02:00
Sagi Grimberg
8407879c4e nvmet-rdma: fix possible bogus dereference under heavy load
Currently we always repost the recv buffer before we send a response
capsule back to the host. Since ordering is not guaranteed for send
and recv completions, it is posible that we will receive a new request
from the host before we got a send completion for the response capsule.

Today, we pre-allocate 2x rsps the length of the queue, but in reality,
under heavy load there is nothing that is really preventing the gap to
expand until we exhaust all our rsps.

To fix this, if we don't have any pre-allocated rsps left, we dynamically
allocate a rsp and make sure to free it when we are done. If under memory
pressure we fail to allocate a rsp, we silently drop the command and
wait for the host to retry.

Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[hch: dropped a superflous assignment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-05 12:18:01 -07:00
Chaitanya Kulkarni
04db0e5ec5 nvmet: free workqueue object if module init fails
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-28 08:40:44 +02:00
James Smart
afd299ca99 nvme-fcloop: Fix dropped LS's to removed target port
When a targetport is removed from the config, fcloop will avoid calling
the LS done() routine thinking the targetport is gone. This leaves the
initiator reset/reconnect hanging as it waits for a status on the
Create_Association LS for the reconnect.

Change the filter in the LS callback path. If tport null (set when
failed validation before "sending to remote port"), be sure to call
done. This was the main bug. But, continue the logic that only calls
done if tport was set but there is no remoteport (e.g. case where
remoteport has been removed, thus host doesn't expect a completion).

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-28 08:40:43 +02:00
Jason Gunthorpe
0a3173a5f0 Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window

Conflicts:
 drivers/infiniband/core/rdma_core.c
   - Use the rdma code and revise with the new spelling for
     atomic_fetch_add_unless
 drivers/nvme/host/rdma.c
   - Replace max_sge with max_send_sge in new blk code
 drivers/nvme/target/rdma.c
   - Use the blk code and revise to use NULL for ib_post_recv when
     appropriate
   - Replace max_sge with max_recv_sge in new blk code
 net/rds/ib_send.c
   - Use the net code and revise to use NULL for ib_post_recv when
     appropriate

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:21:29 -06:00
Jason Gunthorpe
89982f7cce Linux 4.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
 mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
 qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
 zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
 vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
 wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
 L47IdXo=
 =sJkt
 -----END PGP SIGNATURE-----

Merge tag 'v4.18' into rdma.git for-next

Resolve merge conflicts from the -rc cycle against the rdma.git tree:

Conflicts:
 drivers/infiniband/core/uverbs_cmd.c
  - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
  - Merge removal of file->ucontext in for-next with new code in -rc
 drivers/infiniband/core/uverbs_main.c
  - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 13:12:00 -06:00
Linus Torvalds
73ba2fb33c for-4.19/block-20180812
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl
 tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J
 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn
 thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ
 QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8
 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk
 yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X
 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l
 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK
 FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA
 Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5
 Zolos5H+JA==
 =b9ib
 -----END PGP SIGNATURE-----

Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "First pull request for this merge window, there will also be a
  followup request with some stragglers.

  This pull request contains:

   - Fix for a thundering heard issue in the wbt block code (Anchal
     Agarwal)

   - A few NVMe pull requests:
      * Improved tracepoints (Keith)
      * Larger inline data support for RDMA (Steve Wise)
      * RDMA setup/teardown fixes (Sagi)
      * Effects log suppor for NVMe target (Chaitanya Kulkarni)
      * Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
      * TP4004 (ANA) support (Christoph)
      * Various NVMe fixes

   - Block io-latency controller support. Much needed support for
     properly containing block devices. (Josef)

   - Series improving how we handle sense information on the stack
     (Kees)

   - Lightnvm fixes and updates/improvements (Mathias/Javier et al)

   - Zoned device support for null_blk (Matias)

   - AIX partition fixes (Mauricio Faria de Oliveira)

   - DIF checksum code made generic (Max Gurtovoy)

   - Add support for discard in iostats (Michael Callahan / Tejun)

   - Set of updates for BFQ (Paolo)

   - Removal of async write support for bsg (Christoph)

   - Bio page dirtying and clone fixups (Christoph)

   - Set of bcache fix/changes (via Coly)

   - Series improving blk-mq queue setup/teardown speed (Ming)

   - Series improving merging performance on blk-mq (Ming)

   - Lots of other fixes and cleanups from a slew of folks"

* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
  blkcg: Make blkg_root_lookup() work for queues in bypass mode
  bcache: fix error setting writeback_rate through sysfs interface
  null_blk: add lock drop/acquire annotation
  Blk-throttle: reduce tail io latency when iops limit is enforced
  block: paride: pd: mark expected switch fall-throughs
  block: Ensure that a request queue is dissociated from the cgroup controller
  block: Introduce blk_exit_queue()
  blkcg: Introduce blkg_root_lookup()
  block: Remove two superfluous #include directives
  blk-mq: count the hctx as active before allocating tag
  block: bvec_nr_vecs() returns value for wrong slab
  bcache: trivial - remove tailing backslash in macro BTREE_FLAG
  bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
  bcache: set max writeback rate when I/O request is idle
  bcache: add code comments for bset.c
  bcache: fix mistaken comments in request.c
  bcache: fix mistaken code comments in bcache.h
  bcache: add a comment in super.c
  bcache: avoid unncessary cache prefetch bch_btree_node_get()
  bcache: display rate debug parameters to 0 when writeback is not running
  ...
2018-08-14 10:23:25 -07:00
Chaitanya Kulkarni
dedf0be544 nvmet: add ns write protect support
This patch implements the Namespace Write Protect feature described in
"NVMe TP 4005a Namespace Write Protect". In this version, we implement
No Write Protect and Write Protect states for target ns which can be
toggled by set-features commands from the host side.

For write-protect state transition, we need to flush the ns specified
as a part of command so we also add helpers for carrying out synchronous
flush operations.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: fixed an incorrect endianess conversion, minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08 12:00:53 +02:00
Chaitanya Kulkarni
b369b30cf5 nvmet: use Retain Async Event bit to clear AEN
In the current implementation, we clear the AEN bit when we get the
"get log page" command if given log page is associated with AEN.
This patch allows optionally retaining the AEN for the ctrl
under consideration when Retain Asynchronous Event (RAE) bit is set
as a part of "get log page" command.

This allows the host to read the Log page and optionally retaining the
AEN associated with this log page when using userspace tools like
nvme-cli.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: also use the new helper in the just merged ANA code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-27 19:14:31 +02:00
Christoph Hellwig
62ac0d32f7 nvmet: support configuring ANA groups
Allow creating non-default ANA groups (group ID > 1).  Groups are created
either by assigning the group ID to a namespace, or by creating a configfs
group object under a specific port.  All namespaces assigned to a group
that doesn't have a configfs object for a given port are marked as
inaccessible.

Allow changing the ANA state on a per-port basis by creating an
ana_groups directory under each port, and another directory with an
ana_state file in it.  The default ANA group 1 directory is created
automatically for each port.

For all changes in ANA configuration the ANA change AEN is sent.  We only
keep a global changecount instead of additional per-group changecounts to
keep the implementation as simple as possible.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-07-27 19:13:06 +02:00
Christoph Hellwig
72efd25dcf nvmet: add minimal ANA support
Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.

Just add a default ANA group 1 that is optimized on all ports.  This is
(and will remain) the default assignment for any namespace not epxlicitly
assigned to another ANA group.  The ANA state can be manually changed
through the configfs interface, including the change state.

Includes fixes and improvements from Hannes Reinecke.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-07-27 19:13:02 +02:00
Christoph Hellwig
793c7cfce0 nvmet: track and limit the number of namespaces per subsystem
TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field
in the Identify controller data to help the host size resources.  Put
an upper limit on the supported namespaces to be able to support this
value as supporting 32-bits worth of namespaces would lead to very
large buffers.  The limit is completely arbitrary at this point.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-07-27 19:13:01 +02:00
Christoph Hellwig
4ee4328048 nvmet: keep a port pointer in nvmet_ctrl
This will be needed for the ANA AEN code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-07-27 19:12:52 +02:00
Hannes Reinecke
405a751960 nvmet: only check for filebacking on -ENOTBLK
We only need to check for a file-backed namespace if
nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error
it's pointless as the open() error will remain the same.

Fixes: d5eff33e ("nvmet: add simple file backed ns support")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-25 13:14:04 +02:00
Hannes Reinecke
5613d31214 nvmet: fixup crash on NULL device path
When writing an empty string into the device_path attribute the kernel
will crash with

nvmet: failed to open block device (null): (-22)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

This patch sanitizes the error handling for invalid device path settings.

Fixes: a07b4970 ("nvmet: add a generic NVMe target")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-25 13:14:03 +02:00
Bart Van Assche
23f96d1f15 nvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Andy Shevchenko
1b0d274523 nvmet: don't use uuid_le type
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le.
Since both arguments are u8 [16], use size of destination there.

Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t
already.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:51 +02:00
Sagi Grimberg
9c891c1398 nvmet: check fileio lba range access boundaries
Fail out-of-bounds with a proper status code.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:51 +02:00
Sagi Grimberg
1b72b71fac nvmet: fix file discard return status
If nvmet_copy_from_sgl failed, we falsly return successful
completion status.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 15:55:50 +02:00
James Smart
6cdefc6e2a nvme: if_ready checks to fail io to deleting controller
The revised if_ready checks skipped over the case of returning error when
the controller is being deleted.  Instead it was returning BUSY, which
caused the ios to retry, which caused the ns delete to hang waiting for
the ios to drain.

Stack trace of hang looks like:
 kworker/u64:2   D    0    74      2 0x80000000
 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
 Call Trace:
  ? __schedule+0x26d/0x820
  schedule+0x32/0x80
  blk_mq_freeze_queue_wait+0x36/0x80
  ? remove_wait_queue+0x60/0x60
  blk_cleanup_queue+0x72/0x160
  nvme_ns_remove+0x106/0x140 [nvme_core]
  nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
  nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
  process_one_work+0x160/0x350
  worker_thread+0x1c3/0x3d0
  kthread+0xf5/0x130
  ? process_one_work+0x350/0x350
  ? kthread_bind+0x10/0x10
  ret_from_fork+0x1f/0x30

Extend nvmf_fail_nonready_command() to supply the controller pointer so
that the controller state can be looked at. Fail any io to a controller
that is deleting.

Fixes: 3bc32bb118 ("nvme-fabrics: refactor queue ready check")
Fixes: 35897b920c ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
2018-07-24 13:44:40 +02:00
James Smart
d082dc1562 nvmet-fc: fix target sgl list on large transfers
The existing code to carve up the sg list expected an sg element-per-page
which can be very incorrect with iommu's remapping multiple memory pages
to fewer bus addresses. To hit this error required a large io payload
(greater than 256k) and a system that maps on a per-page basis. It's
possible that large ios could get by fine if the system condensed the
sgl list into the first 64 elements.

This patch corrects the sg list handling by specifically walking the
sg list element by element and attempting to divide the transfer up
on a per-sg element boundary. While doing so, it still tries to keep
sequences under 256k, but will exceed that rule if a single sg element
is larger than 256k.

Fixes: 48fa362b6c ("nvmet-fc: simplify sg list handling")
Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24 13:44:09 +02:00
Sagi Grimberg
59e29ce66b nvme: cache struct nvme_ctrl reference to struct nvme_request
We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:18 +02:00
Max Gurtovoy
202093848c nvmet-rdma: add an error flow for post_recv failures
Posting receive buffer operation can fail, thus we should make
sure to have an error flow during initialization phase. While
we're here, add a debug print in case of a failure.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:17 +02:00
Max Gurtovoy
2fc464e216 nvmet-rdma: add unlikely check in the fast path
ib_post_send operation should succeed unless something unusual
happened to the ib device.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:16 +02:00
Steve Wise
0d5ee2b2ab nvmet-rdma: support max(16KB, PAGE_SIZE) inline data
The patch enables inline data sizes using up to 4 recv sges, and capping
the size at 16KB or at least 1 page size.  So on a 4K page system, up to
16KB is supported, and for a 64K page system 1 page of 64KB is supported.

We avoid > 0 order page allocations for the inline buffers by using
multiple recv sges, one for each page.  If the device cannot support
the configured inline data size due to lack of enough recv sges, then
log a warning and reduce the inline size.

Add a new configfs port attribute, called param_inline_data_size,
to allow configuring the size of inline data for a given nvmf port.
The maximum size allowed is still enforced by nvmet-rdma with
NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE).
And the default size, if not specified via configfs, is still PAGE_SIZE.
This preserves the existing behavior, but allows larger inline sizes
for small page systems.  If the configured inline data size exceeds
NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is
reduced.  If param_inline_data_size is set to 0, then inline data is
disabled for that nvmf port.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:16 +02:00
Chaitanya Kulkarni
55eb942eda nvmet: add buffered I/O support for file backed ns
Add a new "buffered_io" attribute, which disabled direct I/O and thus
enables page cache based caching when enabled.   The attribute can only
be changed when the namespace is disabled as the file has to be reopend
for the change to take effect.

The possibly blocking read/write are deferred to a newly introduced
global workqueue.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:14 +02:00
Chaitanya Kulkarni
0866bf0c37 nvmet: add commands supported and effects log page
This patch adds support for Commands Supported and Effects log page
(Log Identifier 05h) for NVMeOF. This also makes it easier to find
which commands are supported, e.g. :-

subnqn    : testnqn1
Admin Command Set
ACS2     [Get Log Page                    ] 00000001
ACS6     [Identify                        ] 00000001
ACS8     [Abort                           ] 00000001
ACS9     [Set Features                    ] 00000001
ACS10    [Get Features                    ] 00000001
ACS12    [Asynchronous Event Request      ] 00000001
ACS24    [Keep Alive                      ] 00000001

NVM Command Set
IOCS0    [Flush                           ] 00000001
IOCS1    [Write                           ] 00000001
IOCS2    [Read                            ] 00000001
IOCS8    [Write Zeroes                    ] 00000001
IOCS9    [Dataset Management              ] 00000001

This partticular functionality can be used from the host side to examine
the NVMeOF ctrl commands supported.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23 09:35:13 +02:00
Max Gurtuvoy
d68a90e148 nvmet: reset keep alive timer in controller enable
Controllers that are not yet enabled should not really enforce keep alive
timeouts, but we still want to track a timeout and cleanup in case a host
died before it enabled the controller.  Hence, simply reset the keep
alive timer when the controller is enabled.

Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-20 14:20:51 +02:00
Steve Wise
33023fb85a IB/core: add max_send_sge and max_recv_sge attributes
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:17:28 -06:00
Christoph Hellwig
3bc32bb118 nvme-fabrics: refactor queue ready check
Move the is_connected check to the fibre channel transport, as it has no
meaning for other transports.  To facilitate this split out a new
nvmf_fail_nonready_command helper that is called by the transport when
it is asked to handle a command on a queue that is not ready.

Also avoid a function call for the queue live fast path by inlining
the check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
2018-06-15 11:21:00 +02:00
Chaitanya Kulkarni
c42d7a30ab nvmet: free smart-log buffer after use
Free smart-log buffer allocated in the function after use.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-06-11 16:18:05 +02:00
Sagi Grimberg
9ba2a5cb88 nvmet: filter newlines from user input
We should avoid consuming the newlines in traddr, trsvcid and
device_path. Add minimal processing to make sure they are gone.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-08 12:51:10 -06:00
Christoph Hellwig
f39ae4719b nvmet: return all zeroed buffer when we can't find an active namespace
Quote from Figure 106 in NVMe 1.3a:

  The Identify Namespace data structure is returned to the host for the
  namespace specified in the Namespace Identifier (CDW1.NSID) field if it
  is an active NSID. If the specified namespace is not an active NSID,
  then the controller returns a zero filled data structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-08 12:51:09 -06:00
Christoph Hellwig
55fdd6b613 nvmet: mask pending AENs
Per section 5.2 of the NVMe 1.3 spec:

  "When the controller posts a completion queue entry for an outstanding
  Asynchronous Event Request command and thus reports an asynchronous
  event, subsequent events of that event type are automatically masked by
  the controller until the host clears that event. An event is cleared by
  reading the log page associated with that event using the Get Log Page
  command (see section 5.14)."

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-06-01 14:37:35 +02:00
Christoph Hellwig
c86b8f7b41 nvmet: add AEN configuration support
AEN configuration via the 'Get Features' and 'Set Features' admin
command is mandatory, so we should be implemeting handling for it.

Signed-off-by: Hannes Reinecke <hare@suse.com>
[hch: use WRITE_ONCE, check for invalid values]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-06-01 14:37:35 +02:00
Christoph Hellwig
c16734ea98 nvmet: implement the changed namespaces log
Just keep a per-controller buffer of changed namespaces and copy it out
in the get log page implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-06-01 14:37:35 +02:00
Christoph Hellwig
8ab0805f11 nvmet: split log page implementation
Remove the common code to allocate a buffer and copy it into the SGL.
Instead the two no-op implementations just zero the SGL directly, and
the smart log allocates a buffer on its own.  This prepares for the
more elaborate ANA log page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-06-01 14:37:35 +02:00
Christoph Hellwig
c7759fff22 nvmet: add a new nvmet_zero_sgl helper
Zeroes the SGL in the payload.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-06-01 14:37:35 +02:00
Wei Yongjun
1367bc8285 nvmet: fix error return code in nvmet_file_ns_enable()
Fix to return error code -ENOMEM from the memory alloc fail error
handling case instead of 0, as done elsewhere in this function.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-31 18:46:46 +02:00
Wei Yongjun
81cf54e01a nvmet: fix a typo in nvmet_file_ns_enable()
Fix a typo in nvmet_file_ns_enable().

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-31 18:46:46 +02:00
Christoph Hellwig
fe4a97918d nvme-loop: add support for multiple ports
This is useful at least for multipath testing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-05-30 08:05:18 +02:00
Jens Axboe
b7405176b5 Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/block
Pull NVMe changes from Christoph:

"Here is the current batch of nvme updates for 4.18, we have a few more
 patches in the queue, but I'd like to get this pile into your tree
 and linux-next ASAP.

 The biggest item is support for file-backed namespaces in the NVMe
 target from Chaitanya, in addition to that we mostly small fixes from
 all the usual suspects."

* 'nvme-4.18-2' of git://git.infradead.org/nvme:
  nvme: fixup memory leak in nvme_init_identify()
  nvme: fix KASAN warning when parsing host nqn
  nvmet-loop: use nr_phys_segments when map rq to sgl
  nvmet-fc: increase LS buffer count per fc port
  nvmet: add simple file backed ns support
  nvmet: remove duplicate NULL initialization for req->ns
  nvmet: make a few error messages more generic
  nvme-fabrics: allow duplicate connections to the discovery controller
  nvme-fabrics: centralize discovery controller defaults
  nvme-fabrics: remove unnecessary controller subnqn validation
  nvme-fc: remove setting DNR on exception conditions
  nvme-rdma: stop admin queue before freeing it
  nvme-pci: Fix AER reset handling
  nvme-pci: set nvmeq->cq_vector after alloc cq/sq
  nvme: host: core: fix precedence of ternary operator
  nvme: fix lockdep warning in nvme_mpath_clear_current_path
2018-05-29 12:56:20 -06:00
Christoph Hellwig
db8c48e4b2 nvme: return BLK_EH_DONE from ->timeout
NVMe always completes the request before returning from ->timeout, either
by polling for it, or by disabling the controller.  Return BLK_EH_DONE so
that the block layer doesn't even try to complete it again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Chaitanya Kulkarni
eb464833a2 nvmet-loop: use nr_phys_segments when map rq to sgl
Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
if a command contains data to me mapped.  This fixes the case where
a struct requests contains LBAs, but no data will actually be send,
e.g. the pending Write Zeroes support.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 16:50:12 +02:00
James Smart
17d78252ee nvmet-fc: increase LS buffer count per fc port
Todays limit on concurrent LS's is very small - 4 buffers. With large
subsystem counts or large numbers of initiators connecting, the limit
may be exceeded.

Raise the LS buffer count to 256.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 16:50:12 +02:00
Chaitanya Kulkarni
d5eff33ee6 nvmet: add simple file backed ns support
This patch adds simple file backed namespace support for NVMeOF target.

The new file io-cmd-file.c is responsible for handling the code for I/O
commands when ns is file backed. Also, we introduce mempools based slow
path using sync I/Os for file backed ns to ensure forward progress under
reclaim.

The old block device based implementation is moved to io-cmd-bdev.c and
use a "nvmet_bdev_" symbol prefix.  The enable/disable calls are also
move into the respective files.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: updated changelog, fixed double req->ns lookup in bdev case]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 16:50:12 +02:00
Chaitanya Kulkarni
618cff4285 nvmet: remove duplicate NULL initialization for req->ns
Remove the duplicate NULL initialization for req->ns.  req->ns is always
initialized to NULL in nvmet_req_init(), so there is no need to reset
it later on failures unless we have previously assigned a value to it.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 16:50:12 +02:00
Chaitanya Kulkarni
b40b83e365 nvmet: make a few error messages more generic
"nvmet_check_ctrl_status()" is called from admin-cmd.c along
with io-cmd.c, make the error message more generic.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 16:50:12 +02:00
Linus Torvalds
eb4f959b26 First pull request for 4.17-rc
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
 - SPDX license tag cleanups (new tag Linux-OpenIB)
 - RoCE GID fixes related to default GIDs
 - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
   mlx4, mlx5, and hfi1 (medium batch)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa
 DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL
 oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+
 YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK
 UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9
 5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF
 nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig
 fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7
 CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH
 UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B
 jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q
 xhjIS6/EZjcTAh5e4LzR
 =PI8v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "This is our first pull request of the rc cycle. It's not that it's
  been overly quiet, we were just waiting on a few things before sending
  this off.

  For instance, the 6 patch series from Intel for the hfi1 driver had
  actually been pulled in on Tuesday for a Wednesday pull request, only
  to have Jason notice something I missed, so we held off for some
  testing, and then on Thursday had to respin the series because the
  very first patch needed a minor fix (unnecessary cast is all).

  There is a sizable hns patch series in here, as well as a reasonably
  largish hfi1 patch series, then all of the lines of uapi updates are
  just the change to the new official Linux-OpenIB SPDX tag (a bunch of
  our files had what amounts to a BSD-2-Clause + MIT Warranty statement
  as their license as a result of the initial code submission years ago,
  and the SPDX folks decided it was unique enough to warrant a unique
  tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
  and core/cache/cma updates to round out the bunch.

  None of it was overly large by itself, but in the 2 1/2 weeks we've
  been collecting patches, it has added up :-/.

  As best I can tell, it's been through 0day (I got a notice about my
  last for-next push, but not for my for-rc push, but Jason seems to
  think that failure messages are prioritized and success messages not
  so much). It's also been through linux-next. And yes, we did notice in
  the context portion of the CMA query gid fix patch that there is a
  dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
  and remove it anywhere we can.

  Summary:

   - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)

   - SPDX license tag cleanups (new tag Linux-OpenIB)

   - RoCE GID fixes related to default GIDs

   - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
     mlx4, mlx5, and hfi1 (medium batch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/mlx4: Fix integer overflow when calculating optimal MTT size
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
  IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
  IB/hfi1: Fix loss of BECN with AHG
  IB/hfi1 Use correct type for num_user_context
  IB/hfi1: Fix handling of FECN marked multicast packet
  IB/core: Make ib_mad_client_id atomic
  iw_cxgb4: Atomically flush per QP HW CQEs
  IB/uverbs: Fix kernel crash during MR deregistration flow
  IB/uverbs: Prevent reregistration of DM_MR to regular MR
  RDMA/mlx4: Add missed RSS hash inner header flag
  RDMA/hns: Fix a couple misspellings
  RDMA/hns: Submit bad wr
  RDMA/hns: Update assignment method for owner field of send wqe
  RDMA/hns: Adjust the order of cleanup hem table
  RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
  RDMA/hns: Remove some unnecessary attr_mask judgement
  RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
  ...
2018-05-04 20:51:10 -10:00
Johannes Thumshirn
8bfc3b4c6f nvmet: switch loopback target state to connecting when resetting
After commit bb06ec3145 ("nvme: expand nvmf_check_if_ready checks")
resetting of the loopback nvme target failed as we forgot to switch
it's state to NVME_CTRL_CONNECTING before we reconnect the admin
queues. Therefore the checks in nvmf_check_if_ready() choose to go to
the reject_io case and thus we couldn't sent out an identify
controller command to reconnect.

Change the controller state to NVME_CTRL_CONNECTING after tearing down
the old connection and before re-establishing the connection.

Fixes: bb06ec3145 ("nvme: expand nvmf_check_if_ready checks")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-03 09:37:50 -06:00
Greg Thelen
d6fc6a22fc nvmet-rdma: depend on INFINIBAND_ADDR_TRANS
NVME_TARGET_RDMA code depends on INFINIBAND_ADDR_TRANS provided symbols.
So declare the kconfig dependency.  This is necessary to allow for
enabling INFINIBAND without INFINIBAND_ADDR_TRANS.

Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:15:43 -04:00
James Smart
bb06ec3145 nvme: expand nvmf_check_if_ready checks
The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted.  When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests.  As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12 09:58:27 -06:00
Daniel Verkamp
c739969849 nvmet: fix space padding in serial number
Commit 42de82a8b5 previously attempted to fix this, and it did
correctly pad the MN and FR fields with spaces, but the SN field still
contains 0 bytes.  The current code fills out the first 16 bytes with
hex2bin, leaving the last 4 bytes zeroed.  Rather than adding a lot of
error-prone math to avoid overwriting SN twice, just set the whole thing
to spaces up front (it's only 20 bytes).

Fixes: 42de82a8b5 ("nvmet: don't report 0-bytes in serial number")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12 09:58:27 -06:00
Rodrigo R. Galvao
543c09c89f nvmet: Fix nvmet_execute_write_zeroes sector count
We have to increment the number of logical blocks to a 1's based value
in the native format prior to converting to 512b units.

Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
[changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12 09:58:27 -06:00
Arnd Bergmann
6038aa532a nvme: target: fix buffer overflow
nvmet_execute_get_disc_log_page() passes a fixed-length string into
nvmet_format_discovery_entry(), which then does a longer memcpy() on
it, as pointed out by gcc-8:

In function 'nvmet_format_discovery_entry',
    inlined from 'nvmet_execute_get_disc_log_page' at drivers/nvme/target/discovery.c:126:4:
drivers/nvme/target/discovery.c:62:2: error: 'memcpy' forming offset [38, 223] is out of the bounds [0, 37] [-Werror=array-bounds]
  memcpy(e->subnqn, subsys_nqn, NVMF_NQN_SIZE);

Using strncpy() will make this well-defined, filling the rest of the
buffer with zeroes, under the assumption that the input is either
a NUL-terminated string, or a byte sequence containing no zeroes.
If the input is a string that is longer than NVMF_NQN_SIZE, we
continue to have no NUL-termination in the output.

Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12 09:58:27 -06:00
Ming Lei
11d9ea6f2c nvme-loop: fix kernel oops in case of unhandled command
When nvmet_req_init() fails, __nvmet_req_complete() is called
to handle the target request via .queue_response(), so
nvme_loop_queue_response() shouldn't be called again for
handling the failure.

This patch fixes this case by the following way:

- move blk_mq_start_request() before nvmet_req_init(), so
nvme_loop_queue_response() may work well to complete this
host request

- don't call nvme_cleanup_cmd() which is done in nvme_loop_complete_rq()

- don't call nvme_loop_queue_response() which is done via
.queue_response()

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[trimmed changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12 09:58:27 -06:00
Christoph Hellwig
e929f06d9e nvmet: constify struct nvmet_fabrics_ops
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Christoph Hellwig
a5d1861229 nvmet: refactor configfs transport type handling
Have a common table of mappings from numerical transport ids to names, and
zero the transport specific area in common code in nvmet_addr_trtype_store.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Max Gurtovoy
f871749a9f nvmet: move device_uuid configfs attr definition to suitable place
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Max Gurtovoy
a3dd7d0022 nvmet-rdma: Don't flush system_wq by default during remove_one
The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the system work queue.

Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Israel Rukshin
e1a2ee249b nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler()
We free nvmet rdma queues while handling rdma_cm events.
In order to avoid this we destroy the qp and the queue after destroying
the cm_id which guarantees that all rdma_cm events are done.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Israel Rukshin
be9bddeb0a nvmet-rdma: Remove unused queue state
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
James Smart
9d625f7792 nvmet_fc: prevent new io rqsts in possible isr completions
When a bio completion calls back into the transport for a
back-end io device, the request completion path can free
the transport io job structure allowing it to be reused for
other operations. The transport has a defer_rcv queue which
holds temporary cmd rcv ops while waitng for io job structures.
when the job frees, if there's a cmd waiting, it is picked up
and submitted for processing, which can call back out to the
bio path if it's a read.  Unfortunately, what is unknown is the
context of the original bio done call, and it may be in a state
(softirq) that is not compatible with submitting the new bio in
the same calling sequence. This is especially true when using
scsi back-end devices as scsi is in softirq when it makes the
done call.

Correct by scheduling the io to be started via workq rather
than calling the start new io path inline to the original bio
done path.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Sagi Grimberg
4c65268588 nvmet: don't return "any" ip address in discovery log page
Its perfectly valid to assign a nvmet port to listen on "any"
IP address (traddr 0.0.0.0 for ipv4 address family) for IP based
transport ports. However, we must not return this address in
discovery log entries. Instead we need to return the address
where the request was accepted on (req->port address).

Since this is nvme transport specific, introduce an optional
.disc_traddr interface that is designed to check that a
port in question is bound to "any" IP address and if so, set
the traddr from the port where the request came from.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Jens Axboe
468f098734 Merge branch 'for-jens' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Keith for 4.16-rc.

* 'for-jens' of git://git.infradead.org/nvme:
  nvmet: fix PSDT field check in command format
  nvme-multipath: fix sysfs dangerously created links
  nvme-pci: Fix nvme queue cleanup if IRQ setup fails
  nvmet-loop: use blk_rq_payload_bytes for sgl selection
  nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes
  nvme-fabrics: don't check for non-NULL module in nvmf_register_transport
2018-02-28 12:18:58 -07:00
Max Gurtovoy
bffd2b6167 nvmet: fix PSDT field check in command format
PSDT field section according to NVM_Express-1.3:
"This field specifies whether PRPs or SGLs are used for any data
transfer associated with the command. PRPs shall be used for all
Admin commands for NVMe over PCIe. SGLs shall be used for all Admin
and I/O commands for NVMe over Fabrics. This field shall be set to
01b for NVMe over Fabrics 1.0 implementations.

Suggested-by: Idan Burstein <idanb@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-02-28 10:10:22 -07:00
Christoph Hellwig
796b0b8d8d nvmet-loop: use blk_rq_payload_bytes for sgl selection
blk_rq_bytes does the wrong thing for special payloads like discards and
might cause the driver to not set up a SGL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-02-22 01:45:34 -07:00
Israel Rukshin
7756f72ccd nvmet: Change return code of discard command if not supported
Execute discard command on block device that doesn't support it
should return success.
Returning internal error while using multi-path fails the path.

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-14 15:38:59 +02:00
Linus Torvalds
0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Sagi Grimberg
423b4487fb nvmet: release a ns reference in nvmet_req_uninit if needed
nvmet_req_init looked up a namespace and took a reference on it (unless it
failed prior to that). If the request is uninitialized (in error cases) we
need to remove that reference in case it was taken, otherwise we leak
namespace reference when calling nvme_req_uninit.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-15 17:09:32 +01:00
Roy Shterman
b227c59b9b nvme: host delete_work and reset_work on separate workqueues
We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =============================================
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
[  178.494382] ---------------------------------------------
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){++++.+}
[  178.496599] , at:
[  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
[  178.497670]
               but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){++++.+}
[  178.499202] , at:
[  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.500343]
               other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]        CPU0
[  178.502472]        ----
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
                *** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){++++.+}
[  178.508079] , at:
[  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (&ctrl->delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.511070]
               stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
[  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
[  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
[  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
[  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
[  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
[  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
[  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
[  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
[  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
[  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
[  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
[  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40

Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-15 17:09:30 +01:00
Israel Rukshin
6b1943af3f nvmet: rearrange nvmet_ctrl_free()
Make it symmetric to nvmet_alloc_ctrl().

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08 11:01:59 +01:00
Israel Rukshin
eca19dc1d8 nvmet: fix error flow in nvmet_alloc_ctrl()
Remove the allocated id on error.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08 11:01:58 +01:00
Roy Shterman
0de5cd367c nvme-fabrics: protect against module unload during create_ctrl
NVMe transport driver module unload may (and usually does) trigger
iteration over the active controllers and delete them all (sometimes
under a mutex).  However, a controller can be created concurrently with
module unload which can lead to leakage of resources (most important char
device node leakage) in case the controller creation occured after the
unload delete and drain sequence.  To protect against this, we take a
module reference to guarantee that the nvme transport driver is not
unloaded while creating a controller.

Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08 11:01:56 +01:00
James Smart
9ce1f2e12e nvmet-fc: cleanup nvmet add_port/remove_port
The current fc transport add_port routine validates that there is a
matching port to the target port config. It then takes a reference
on the targetport. The del_port removes the reference.

Unfortunately, if the LLDD undergoes a hw reset or driver unload and
wants to unreg the targetport, due to the reference, the targetport
effectively can't be removed. It requires the admin to remove the
port from the nvmet config first, which calls the del_port.
Note: it appears nvmetcli clear skips over the del_port call (I'm
not attempting to change that).

There's no real reason to take the reference. With FC, there is nothing
to enable or disable as the presence of the FC targetport implicitly
means its enabled, and removal of the targtport means its disabled.

Change add_port to simply validate and change remove_port to a noop.
No references are taken on the targetport.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08 11:01:56 +01:00