Get parent info for format 2 images on every refresh (rather than
just during the initial probe). This will be needed to detect the
disappearance of the parent image in the event a mapped image
becomes unlayered (i.e., flattened). Avoid leaking the previous
parent spec on the second and subsequent times this information is
requested by dropping the previous one (if any) before updating it.
(Also, extract the pool id into a local variable before assigning
it into the parent spec.)
Switch to using a non-zero parent overlap value rather than the
existence of a parent (a non-null parent_spec pointer) to determine
whether to mark a request layered. It will soon be possible for
a layered image to become unlayered while a request is in flight.
This means that the layered flag for an image request indicates that
there was a non-zero parent overlap at the time the image request
was created. The parent overlap can change thereafter, which may
lead to special handling at request submission or completion time.
This and the next several patches are related to:
http://tracker.ceph.com/issues/3763
NOTE:
If an error occurs while refreshing the parent info (i.e.,
requesting it after initial probe), the old parent info will
persist. This is not really correct, and is a scenario that needs
to be addressed. For now we'll assert that the failure mode is
unlikely, but the issue has been documented in tracker issue 5040.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An rbd clone image that has an overlap with its parent of 0 is
effectively not a layered image at all. Detect this case and treat
such an image as non-layered. Issue a warning to be sure the user
knows what's going on.
This resolves:
http://tracker.ceph.com/issues/5028
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, rbd_img_obj_parent_read_full() assumes the incoming
object request contains bio data. But if a layered image is part of
a multi-layer stack of images it will result in read requests of
page data to parent images.
This is handling the same kind of issue as was resolved by this
commit:
5b2ab72d rbd: support reading parent page data
This resolves:
http://tracker.ceph.com/issues/5027
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The code that reads object data from the parent for a copyup on
write request currently assumes that the size of that request is the
size of a "full" object from the original target image.
That is not necessarily the case. The parent overlap could reduce
the request size below that. To fix that assumption we need to
record the number of pages in the copyup_pages array, for both an
image request and an object request. Rename a local variable in
rbd_img_obj_parent_read_full_callback() to reflect we're recording
the length of the parent read request, not the size of the target
object.
This resolves:
http://tracker.ceph.com/issues/5038
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull NVMe driver update from Matthew Wilcox:
"Lots of exciting new features in the NVM Express driver this time,
including support for emulating SCSI commands, discard support and the
ability to submit per-sector metadata with I/Os.
It's still mostly bugfixes though!"
* git://git.infradead.org/users/willy/linux-nvme: (27 commits)
NVMe: Use user defined admin ioctl timeout
NVMe: Simplify Firmware Activate code slightly
NVMe: Only clear the enable bit when disabling controller
NVMe: Wait for device to acknowledge shutdown
NVMe: Schedule timeout for sync commands
NVMe: Meta-data support in NVME_IOCTL_SUBMIT_IO
NVMe: Device specific stripe size handling
NVMe: Split non-mergeable bio requests
NVMe: Remove dead code in nvme_dev_add
NVMe: Check for NULL memory in nvme_dev_add
NVMe: Fix error clean-up on nvme_alloc_queue
NVMe: Free admin queue on request_irq error
NVMe: Add scsi unmap to SG_IO
NVMe: queue usage fixes in nvme-scsi
NVMe: Set TASK_INTERRUPTIBLE before processing queues
NVMe: Add a character device for each nvme device
NVMe: Fix endian-related problems in user I/O submission path
NVMe: Fix I/O cancellation status on big-endian machines
NVMe: Fix sparse warnings in scsi emulation
NVMe: Don't fail initialisation unnecessarily
...
Get rid of rbd_img_request_get(), because it isn't used, and maybe
won't ever be needed.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Any changes to parent images are immaterial to any mapped clone.
So there is no need to have a watch event registered on header
objects except for the header object of an image that is mapped.
In fact, a watch request is a write operation, and we may only
have read access to a parent image.
We can't set up the watch request until we know the name of the
header object though. So pass a flag to rbd_dev_image_probe() to
indicate whether this probe is for a mapping or for a parent image.
Change the second parameter to rbd_dev_header_watch_sync() be
Boolean while we're at it.
This resolves:
http://tracker.ceph.com/issues/4941
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The rbd_dev->mapping field for a parent image is not meaningful.
Since rbd_image_probe() is used both for images being mapped and
their parents, it doesn't make sense to set that flag in that
function.
So move the setting of the mapping.read_only flag out of
rbd_dev_image_probe() and into rbd_add() instead.
This resolves:
http://tracker.ceph.com/issues/4940
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, rbd_img_parent_read() assumes the incoming object request
contains bio data. But if a layered image is part of a multi-layer
stack of images it will result in read requests of page data to parent
images.
Fortunately, it's not hard to add support for page data.
This resolves:
http://tracker.ceph.com/issues/4939
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
In rbd_img_obj_parent_read_full_callback() there is an assertion
intended to verify the size of the image request for a full parent
read was the size of the original request's target object. But
assertion was looking at the parent image order rather than the
original one, and these values can differ.
Fix that.
This resolves:
http://tracker.ceph.com/issues/4938
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This rearranges rbd_dev_v2_refresh() so it works more like
rbd_dev_v1_header_info(). While format 1 images need to read the
whole header object to get any information, format 2 can collect
almost all information selectively. So the one-time initialization
will remain in a separate function--based on rbd_dev_v2_probe().
Rename rbd_dev_v2_refresh() to be rbd_dev_v2_header_info(), and have
it call rbd_dev_v2_header_onetime() if it's being called for the
first time for the given rbd device.
Rename rbd_dev_v2_probe() to be rbd_dev_v2_header_onetime() and
remove the image size and snapshot context calls it held in
common with the refresh function.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Get rid of the trivial wrapper functions rbd_dev_v1_refresh() and
rbd_dev_v1_probe(), substituting rbd_dev_v1_header_read() calls
in their place.
Rename rbd_dev_v1_header_read() to be rbd_dev_v1_header_info(), to
be more generic (it will better reflect what happens with format 2
images).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An rbd_dev structure's fields are all zero-filled for an initial
probe, so there's no need to explicitly zero the parent_spec
and parent_overlap fields in rbd_dev_v1_probe(). Removing these
assignments makes rbd_dev_v1_probe() *almost* trivial.
Move the dout() message that announces discovery of an image into
rbd_dev_image_probe(), generalize to support images in either format
and only show it if an image is fully discovered.
This highlights that are some unnecessary cleanups in the error
path for rbd_dev_v1_probe(), so they can be removed.
Now rbd_dev_v1_probe() *is* a trivial wrapper function.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Now that rbd_header_from_disk() only fills in one-time fields once,
we can extend it slightly so it releases the other fields before
replacing their values. This way there's no need to pass a
temporary buffer and then copy all the results in. Just use the rbd
device header structure in rbd_header_from_disk() so its values get
updated directly.
Note that this means we need to take the header semaphore at the
point we update things. So pass the rbd_dev rather than the address
of its header as its first argument to rbd_header_from_disk(), and
have it return an error code.
As a result, rbd_dev_v1_header_read() does all the work,
rbd_read_header() becomes unnecessary, and rbd_dev_v1_refresh()
becomes a very simple wrapper.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This rearranges rbd_header_from_disk so that it:
- allocates the snapshot context right away
- keeps results in local variables, not changing the passed-in
header until it's known we'll succeed
- does initialization of set-once fields in a header only if
they have not already been set
The last point is moot at the moment, because rbd_read_header()
(the only caller) always supplies a zero-filled header buffer.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The passed-in header structure is zeroed in rbd_header_from_disk().
Instead, have the caller do it. Note that there are two callers,
rbd_dev_v1_refresh() and rbd_dev_v1_probe(). The latter already has
a zeroed header structure so zeroing it isn't necessary there.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Defer setting the size and features fields of a mapped image until
after the Linux disk structure is set up. Set the capacity of the
disk after that.
Rearrange the definition of rbd_image_header, separating the fields
that are set only once from those that can be updated.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:
- mtip32xx fixes from Micron.
- A slew of drbd updates, this time in a nicer series.
- bcache, a flash/ssd caching framework from Kent.
- Fixes for cciss"
* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include <linux/prefetch.h>
...
Pull block core updates from Jens Axboe:
- Major bit is Kents prep work for immutable bio vecs.
- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.
- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.
- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.
- Runtime PM framework, SCSI patches exists on top of these in James'
tree.
- A few random fixes.
* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...
Add definitions for the three Firmware Activate actions, and change the
SCSI translation code to construct the command into a temporary variable
instead of translating the endianness back-and-forth.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Many of the bits in the Controller Configuration register may only be
modified when the Enable bit is clear. Clearing them at the same time
as the Enable bit might be OK, but let's play it safe and only touch the
Enable bit.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
A recent update to the specification makes it clear that the host
is expected to wait for the device to acknowledge the Enable bit
transitioning to 0 as well as waiting for the device to acknowledge a
transition to 1.
Reported-by: Khosrow Panah <Khosrow.Panah@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Hold off setting the read-only flag in rbd_add() for an image being
mapped until we have successfully probed the image. At that point
we know whether it's a snapshot mapping or not, so we can set the
read-only flag in that one place rather than doing so (for
snapshots) in rbd_dev_mapping_set(). To do this, pass a flag to the
image probe routine indicating whether we want a read-only mapping.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This function is a duplicate of rbd_dev_mapping_clear(), and was
added by mistake.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently rbd_dev_mapping_set() looks up the snapshot id for the
snapshot whose name is found in the rbd device's spec structure.
That function gets called by rbd_dev_device_setup(), which is
called by rbd_add() *after* rbd_dev_image_probe(). If the
image probe succeeds, the rbd device's spec will already have
been updated to include names and ids for all fields.
Therefore there's no need to look up the snapshot id in
rbd_dev_mapping_set().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The presence of the LAYERING bit in an rbd image's feature mask does
not guarantee the image actually has a parent image. Currently that
bit is set only when a clone (i.e., image with a parent) is created,
but it is (currently) not cleared if that clone gets flattened back
into a "normal" image. A "parent_id" query will leave the
parent_spec for the image being mapped a null pointer, but will not
return an error.
Currently, whenever an image with the LAYERED feature gets mapped, a
warning about the use of layered images gets printed. But we don't
want to do this for a flattened image, so print the warning only
if we find there is a parent spec after the probe.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Since rbd_update_mapping_size() is now a trivial wrapper, just open
code it in its two callers.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When a mapped image changes size, we change the capacity recorded
for the Linux disk associated with it, in rbd_update_mapping_size().
That function is called in two places--the format 1 and format 2
refresh routines.
There is no need to set the capacity while holding the header
semaphore. Instead, do it in the common rbd_dev_refresh(), using
the logic that's already there to initiate disk revalidation.
Add handling in the request function, just in case a request
that exceeds the capacity of the device comes in (perhaps one
that was started before a refresh shrunk the device).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This commit:
d98df63e rbd: revalidate_disk upon rbd resize
instituted a call to revalidate_disk() to notify interested parties
that a mapped image has changed size. This works well, as long as
the the rbd device doesn't map a snapshot.
A snapshot will never change size. However, the base image the
snapshot is associated with can, and it can do so while the snapshot
is mapped.
The problem is that the test for the size is looking at the size of
the base image, not the size of the mapped snapshot. This patch
corrects that.
Update the warning message shown in the event of error, and move
it into the callers.
This resolves:
http://tracker.ceph.com/issues/4911
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When rbd_dev_v2_refresh() is called, the rbd device already has a
snapshot context associated with it. But that never gets freed,
the pointer just gets overwritten.
Fix this by dropping the rbd device's reference to the snapshot
context before overwriting the pointer.
Because ceph_put_snap_context() already handles for a null pointer
we don't need to check for that (for the probe case, where no
context has yet been assigned).
This resolves:
http://tracker.ceph.com/issues/4912
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull more vfs updates from Al Viro:
"A couple of fixes + getting rid of __blkdev_put() return value"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
proc: Use PDE attribute setting accessor functions
make blkdev_put() return void
block_device_operations->release() should return void
mtd_blktrans_ops->release() should return void
hfs: SMP race on directory close()
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull Ceph changes from Alex Elder:
"This is a big pull.
Most of it is culmination of Alex's work to implement RBD image
layering, which is now complete (yay!).
There is also some work from Yan to fix i_mutex behavior surrounding
writes in cephfs, a sync write fix, a fix for RBD images that get
resized while they are mapped, and a few patches from me that resolve
annoying auth warnings and fix several bugs in the ceph auth code."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (254 commits)
rbd: fix image request leak on parent read
libceph: use slab cache for osd client requests
libceph: allocate ceph message data with a slab allocator
libceph: allocate ceph messages with a slab allocator
rbd: allocate image object names with a slab allocator
rbd: allocate object requests with a slab allocator
rbd: allocate name separate from obj_request
rbd: allocate image requests with a slab allocator
rbd: use binary search for snapshot lookup
rbd: clear EXISTS flag if mapped snapshot disappears
rbd: kill off the snapshot list
rbd: define rbd_snap_size() and rbd_snap_features()
rbd: use snap_id not index to look up snap info
rbd: look up snapshot name in names buffer
rbd: drop obj_request->version
rbd: drop rbd_obj_method_sync() version parameter
rbd: more version parameter removal
rbd: get rid of some version parameters
rbd: stop tracking header object version
rbd: snap names are pointer to constant data
...
I dived into lguest again, reworking the pagetable code so we can move
the switcher page: our fixmaps sometimes take more than 2MB now...
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRga7lAAoJENkgDmzRrbjx/yIQAKpqIBtxOJeYH3SY+Uoe7Cfp
toNYcpJEldvb0UcWN8M2cSZpHoxl1SUoq9djwcM29tcKa7EZAjHaGtb/Q1qMTDgv
+B3WAfiGU2pmXFxLAkbrlLNGnysy24JspqJQ5hcYV84EiBxQdZp+nCYgOphd+GMK
ww16vo9ya8jFjzt3GeRp/Heb3vEzV4Cp6BC3i0m8A3WNpEpbRb66pqXNk5o8ggJO
SxQOKSXmUM+0m+jKSul5xn3e2Ls2LOrZZ8/DIHA+gW66N4Zab7n2/j1Q9VRxb4lh
FqnR7KwgBX8OCh9IsBDqQYS7MohvMYge6eUdLtFrq84jvMleMEhrC8q9v2tucFUb
5t18CLwvyK7Gdg6UCKiZ7YSPcuURAILO16al9bh5IseeBDsuX+43VsvQoBmFn9k6
cLOVTZ6BlOmahK5PyRYFSvLa9Rxzr/05Mr7oYq9UgshD9io78dnqczFYIORF53rW
zD7C4HuTZfYJFfNd0wAJ0RfVXnf8QvDlMdo7zPC26DSXNWqj8OexCY0qqSWUB+2F
vcfJP6NkV4fZB8aawWIFUVwc64yqtt2uPVLa7ATZWqk16PgKrchGewmw3tiEwOgu
1l7xgffTRRUIJsqaCZoXdgw3yezcKRjuUBcOxL09lDAAhc+NxWNvzZBsKp66DwDk
yZQKn0OdXnuf0CeEOfFf
=1tYL
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio & lguest updates from Rusty Russell:
"Lots of virtio work which wasn't quite ready for last merge window.
Plus I dived into lguest again, reworking the pagetable code so we can
move the switcher page: our fixmaps sometimes take more than 2MB now..."
Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename.
Hopefully correctly resolved.
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits)
caif_virtio: Remove bouncing email addresses
lguest: improve code readability in lg_cpu_start.
virtio-net: fill only rx queues which are being used
lguest: map Switcher below fixmap.
lguest: cache last cpu we ran on.
lguest: map Switcher text whenever we allocate a new pagetable.
lguest: don't share Switcher PTE pages between guests.
lguest: expost switcher_pages array (as lg_switcher_pages).
lguest: extract shadow PTE walking / allocating.
lguest: make check_gpte et. al return bool.
lguest: assume Switcher text is a single page.
lguest: rename switcher_page to switcher_pages.
lguest: remove RESERVE_MEM constant.
lguest: check vaddr not pgd for Switcher protection.
lguest: prepare to make SWITCHER_ADDR a variable.
virtio: console: replace EMFILE with EBUSY for already-open port
virtio-scsi: reset virtqueue affinity when doing cpu hotplug
virtio-scsi: introduce multiqueue support
virtio-scsi: push vq lock/unlock into virtscsi_vq_done
virtio-scsi: pass struct virtio_scsi to virtqueue completion function
...
Schedule a timeout on sync commands in case the command times out and
the device is not being polled for timeouts. This prevents device removal
from hanging forever if the device has stopped responding.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
This adds support for namespaces with separate meta-data formats in the
submit io ioctl. The meta-data buffer has to be a contiguous, so such
a buffer is allocated and the mapped user pages are copied to/from this
buffer for write/read commands.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
We have an nvme device that has a concept of a stripe size. IO requests
that do not transfer data crossing a stripe boundary has greater
performance compared to IO that does cross it. This patch sets the
stripe size for the device if the device and vendor ids match one with
this feature and splits IO requests that cross the stripe boundary.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
It is possible a bio request can not be submitted as a single NVMe IO
command if the bio_vec is not mergeable with the NVMe PRP alignement
constraints. This condition was handled by submitting an IO for the
mergeable portion then submitting a follow on IO for the remaining data
after the previous IO completes. The remainder to be sent was tracked
by manipulating the bio->bi_idx and bio->bi_sector. This patch splits
the request as many times as necessary and submits the bios together.
Since submitting the bio may cause it to be requeued on split,
nvme_resubmit_bios had to be modified to remove the wait queue when
the bio list is empty prior to submitting the bio since a split would
have added the wait queue a second time, corrupting the wait queue head
task list.
There are a few other benefits from doing this: it fixes a potential
issue with the previous handling of a non-mergeable bio as the requeuing
method could would use an unlocked nvme_queue if the callback isn't
invoked on the queue's associated cpu; it will be possible to retry a
failed bio if desired at some later time since it does not manipulate
the original bio; the bio integrity extensions require the bio to be in
its original condition for the checks to work correctly if we implement
the end-to-end data protection in the future.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
There is no situation that could occur where we could error out of this
function and require cleaning up allocated namespaces.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The nvme_queue's depth is not set if we fail to allocate submission queue
entries, which was being used to determine how much coherent memory to
free on error. Use the depth variable instead.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Fixes a potential memory leak if requesting the admin queue irq fails.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Translates a scsi unmap request from SG_IO ioctl to NVMe
data-set-management deallocate.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Fixes nvme queue usages in scsi-to-nvme translation code to not get
a queue more often than it is being put, and not use the queue in an
unsafe way without it being locked.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
When a read for a layered image object finds the target object
doesn't exist, a read image request for the parent image is created
and submitted. When that completes, the callback routine was
not releasing that parent image request. Fix that.
The slab allocation stuff just added has greatly simplified the
search for the source of this memory leak.
This resolves:
http://tracker.ceph.com/issues/4803
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The names of objects used for image object requests are always fixed
size. So create a slab cache to manage them. Define a new function
rbd_segment_name_free() to match rbd_segment_name() (which is what
supplies the dynamically-allocated name buffer).
This is part of:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Create a slab cache to manage rbd_obj_request allocation. We aren't
using a constructor, and we'll zero-fill object request structures
when they're allocated.
This is part of:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The next patch will define a slab allocator for a object requests.
To use that we'll need to allocate the name of an object separate
from the request structure itself.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Create a slab cache to manage rbd_img_request allocation. Nothing
too fancy at this point--we'll still initialize everything at
allocation time (no constructor)
This is part of:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Use bsearch(3) to make snapshot lookup by id more efficient. (There
could be thousands of snapshots, and conceivably many more.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This functionality inadvertently disappeared in the last patch.
Image snapshots can get removed at just about any time. In
particular it can disappear even if it is in use by an rbd
client as a mapped image.
The rbd client deals with such a disappearance by responding to new
requests with ENXIO. This is implemented by each rbd device
maintaining an EXISTS flag, which is normally set but cleared if a
snapshot disappears.
This patch (re-)implements the clearing of that flag.
Whenever mapped image header information is refreshed, if the
mapping is for a snapshot, verify the mapped snapshot is still
present in the updated snapshot context. If it is not, clear the
flag.
It is not necessary to check this in the initial probe, because the
probe will not succeed if the snapshot doesn't exist.
This resolves:
http://tracker.ceph.com/issues/4880
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
We no longer use the snapshot list for anything. When we need to
look up a snapshot name, id, size, or feature mask, we just do it
directly rather than relying on this list being updated with every
refresh. The main reason it existed was for the benefit of the
device/sysfs entries that previously were associated with snapshots.
So get rid of the snapshot list, and struct rbd_snap, and the
hundreds of lines of code that supported them.
This resolves:
http://tracker.ceph.com/issues/4868
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This patch defines a handful of new functions that will allow
us to get rid of the rbd device structure's list of snapshots.
Define rbd_snap_id_by_name() to look up a snapshot id given its
name. This is efficient for format 1 images but not for format 2.
Fortunately it only gets called at mapping time so it's not that
critical.
Use rbd_snap_id_by_name() to find out the id for a snapshot getting
mapped, and pass that id to new functions rbd_snap_size() and
rbd_snap_features() to look up information about a given snapshot's
size and feature mask given its snapshot id. All this gets done
in rbd_dev_mapping_set().
As a result, snap_by_name() is no longer needed, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
In order to align with what was needed for format 1 rbd images,
rbd_dev_v2_snap_info() was set up to take as argument an index into
the array of snapshot ids in a rbd device's snapshot context.
This switches that around, so we pass the snapshot id instead.
In doing this, rbd_snap_name() now returns a dynamically-allocated
string rather than a fixed one, so there's no need to make a
duplicate in its caller, rbd_dev_spec_update().
This means the following functions take a snapshot id where they
previously used an index value:
rbd_dev_snap_info()
rbd_dev_v1_snap_info()
rbd_dev_v2_snap_info()
A new function, rbd_dev_snap_index(), determines the snap index for
format 1 images and uses it to look up the name.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Rather than scanning the list of snapshot structures for it, scan
the snapshot context buffer containing snapshot names in order to
determine for a format 1 image the name associated with a given
snapshot id.
Pull out the part of rbd_dev_v1_snap_info() that does this scan into
a new function, _rbd_dev_v1_snap_name(). Have that function return
a dynamically-allocated copy of the name, and don't duplicate it in
rbd_dev_v1_snap_info().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Nothing ever uses the version field maintained in the object request
structure any more, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Only NULL is passed as the version argument to rbd_obj_method_sync(),
so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Continued from the last patch, more parameters that can go away
because we no longer have a need to track object versions.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Several functions in rbd have parameters meant to allow the version
of an object to be passed in or out. The purpose of those was to
allow the version of a header object to be maintained, but we no
longer do that. As a result, these parameters are never actually
needed or used, so get rid of them.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The rbd code takes care to maintain the version of the header
object. This was done in hopes of using it to detect a change in
the object between reading it and setting up a watch request to
be notified of changes.
The mechanism was never fully implemented, however. And we now
avoid the original problem by setting up the watch request before
ever reading the content of the header.
The osd doesn't interpret the object version supplied with a WATCH
osd op, nor does it use the version supplied with a NOTIFY_ACK op
(we can just supply 0 for both). There is therefore no need to
maintain the header's object version any more, so stop doing so.
We'll be able to simplify some more rbd code in the next few patches
as a result of this.
This resolves:
http://tracker.ceph.com/issues/3952
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Make explicit that snapshot names don't change by making functions
return and take parameters that that point to const qualified data.
This resolves:
http://tracker.ceph.com/issues/4867
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Whenever a header object event causes a mapped rbd image to refresh
its header information, revalidate_disk() is being called. This was
done in rbd_dev_refresh() outside the control mutex in order to
avoid a lock inversion. Although a an event like this *might*
indicate the image has changed size, most of the time it does not.
Record the image size before and after the refresh, and only
call revalidate_disk() if it changes.
This resolves:
http://tracker.ceph.com/issues/4867
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
A warning gets spewed for any image being probed, including parent
images. Set up a condition such that the warning message only gets
printed for the image being mapped, not any of its parents.
Also, I didn't like the way the warning ended up being so long.
Make it a terse warning instead. People experimenting with layering
will know what the message means.
This is part of:
http://tracker.ceph.com/issues/4867
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Now that we have a library routine to create snap contexts, use it.
This is part of:
http://tracker.ceph.com/issues/4857
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Stop setting up Linux devices during the image probe operation.
Instead, set up the devices as a separate step after the image
probe, in rbd_add().
A consequence of this is that only mapped images get devices
assigned to them, which is pretty sweet.
This resolves:
http://tracker.ceph.com/issues/4774
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently an rbd_device structure gets destroyed from the release
routine for the device embedded within it. Stop doing that, instead
calling rbd_dev_image_release() right after rbd_bus_del_dev()
wherever the latter is called.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define a new function rbd_dev_unprobe() which undoes state changes
that occur from calling rbd_dev_v1_probe() or rbd_dev_v2_probe().
Note that this is a superset of rbd_header_free(), which is now
getting removed (it seems to have been used improperly anyway).
Flesh out rbd_dev_image_release() so it undoes exactly what
rbd_dev_image_probe() does.
This means that:
- rbd_dev_device_release() gets called when the last device
reference gets dropped;
- that undoes everything done by the rbd_dev_device_setup() call
at the end of rbd_dev_image_probe() (and nothing more), ending
by calling rbd_dev_image_release(); and
- rbd_dev_image_release() undoes everything else done by
rbd_dev_image_probe() (and this includes a call to
rbd_dev_unprobe().
This means the image and device portions of an rbd device are fairly
cleanly separated now, so error paths should be a little easier to
verify than they used to be.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Rename rbd_dev_probe_finish() to be rbd_dev_device_setup(). Its
purpose is to set up the Linux side of an rbd device mapping.
Rename rbd_dev_release() to be rbd_dev_device_release(), making
it more obvious it serves as the inverse of the setup function
(or it will).
Encapsulate some of what was done in rbd_dev_release() into a new
function rbd_dev_image_release(), which serves as the inverse of
setting up the ceph side of the mapped rbd image.
Define a new helper rbd_dev_clear_mapping() to simply zero out the
fields of a mapping structure--the inverse of rbd_dev_set_mapping().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Drop the module reference at the end of rbd_remove() for symmetry
with adding a reference at the top of rbd_add().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Move setting up the watch request for an image so it's done in
rbd_dev_image_probe() rather than rbd_dev_probe_finish(). Move
it all the way up to before doing the initial probe. This avoids
a potential race condition, in which we get (and use) the initial
snapshot context for an image, and it gets changed between that
time and the time we get the watch set up.
This resolves:
http://tracker.ceph.com/issues/3871
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When a format 2 image is refreshed, code is in place to verify that
the object order never changes from what it was originally. This
relies on the fact that the refresh will occur *after* an initial
load of information about the image.
An upcoming patch makes it possible for the refresh to occur first,
so we can no longer make this order check. The order really can't
ever change anyway--this was just a sanity check. So get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, a watch on an rbd device header object gets torn down
when its final Linux device reference gets dropped. Instead, tear
it down when removing the device. If an error occurs cleaning up
the watch event when unmapping, abort the unmap request.
All images (including parents) still get watch requests set up, so
tear these down also, in rbd_dev_remove_parent(). For now, ignore
any errors that occur in this case.
Get rid of local variable "rc" in rbd_remove(); use "ret" instead
(they both somehow ended up defined in the function and only one is
needed).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define a new function rbd_header_name(), which allocates and formats
the name of the header object for the rbd device.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Move a block of initialization related to the "ceph-side" of an rbd
image out of rbd_dev_probe_finish() and into rbd_dev_image_probe().
Add appropriate error handling to clean things up in the event any
of these new functions return an error.
We know that rbd_dev_snaps_update(), rbd_dev_spec_update(), and
rbd_dev_probe_parent() all clean up after themselves before they
return an error, so no special cleanup is required except when an
earlier call succeeds. Since rbd_dev_spec_update() only updates the
spec field (whose cleanup will be handled by dropping the last
reference to the spec) there is no cleanup action associatied with
that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Probe for a parent device earlier in rbd_dev_probe_finish(), before
starting to set up the Linux side of the rbd device.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When an error occurs while finishing probing a device it is assumed
that parent devices get cleaned up when deleting a device. They
don't. Add a call to clean them up. Note that this means the
parent spec will already be cleaned up so it doesn't have to be
in one of the rbd_add() error paths.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
In certain error paths, it is possible for an rbd device to have a
parent spec but no parent rbd_dev. In rbd_dev_remove_parent() use
the parent field rather than parent_spec in determining whether to
try to remove any parent devices. Use assertions to indicate that
any non-null parent pointer has parent_spec associated with it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The function __rbd_remove() is used in two spots, and it's fairly
simple. It combines cleanup of part of the ceph-side state as well
as cleaning up the Linux-side state. Just open code it in the two
callers and eliminate the function.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Set the mapping size and features earlier in rbd_dev_probe_finish().
Define rbd_dev_mapping_clear() as an inverse for setting those
fields, and use it both in error handling in rbd_dev_image_probe()
and in the final cleanup in rbd_dev_release(). Change the name
of rbd_dev_set_mapping() to of rbd_dev_mapping_set().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Encapsulate the code that removes an rbd device's parent images into
a new function, rbd_dev_remove_parent().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Encapsulate the code that probes for an rbd device's parent images
into a new function, rbd_dev_probe_parent().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Don't set the disk capacity until right before we announce the
device as available for use.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Hold off setting the EXISTS rbd device flag until just before we
announce the disk as available for use. There's no point in doing
so any earlier than that, and at that point the device truly is
fully set up and ready to use.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This just tweaks a few things in the routines that implement
rbd sysfs files.
All of the entries for an rbd device in /sys/bus/rbd/devices/<id>/
will represent information whose valid values are known by the time
they are accessible.
Right now we get the size of the mapped image by a call to
get_capacity(). There's no need to do this, because that will
return what we last set the capacity to, which is just the size
recorded for the mapping. So just show that value instead.
We also get this under protection of the header semaphore, in order
to provide a precisely correct value. This isn't really necessary;
these files are really informational only and it's not necessary to
be so careful.
Finally, print a special value in case the major device number is
not recorded. Right now that won't matter much but soon the parent
images won't have devices associated with them.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When a snapshot context update occurs, rbd_update_mapping_size() is
called to set the capacity of the disk to record the updated
size of the image in case it has changed.
There's a bug though. The mapping size is in units of *bytes*. The
code that updates the mapping size field is assigning a value that
has been scaled down to *sectors*.
Fix that. Also, check to see if the size has actually changed, and
don't bother updating things (specifically, calling set_capacity())
if it has not.
This resolves:
http://tracker.ceph.com/issues/4833
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fairly straightforward refactoring of rbd_dev_probe_update_spec().
The name is changed to rbd_dev_spec_update().
Rearrange it so nothing gets assigned to the spec until all of the
names have been successfully acquired.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Rename rbd_dev_probe() to be rbd_dev_image_probe(). Its purpose
will eventually be to probe for the existence of a valid rbd image
for the rbd device--focusing only on the ceph side and not the Linux
device side of initialization.
For now the two "sides" are not fully separated, and this function
is still the entry point for initializing the full rbd device.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, rbd_dev_destroy() does more than just the inverse of what
rbd_dev_create() does. Stop doing that, and move the two extra
things it does into the three call sites.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Encapsulate the creation of a snapshot context for rbd in a new
function rbd_snap_context_create(). Define rbd wrappers for getting
and dropping references to them once they're created.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Change some calls to WARN_ON() so they use rbd_warn() instead, so we
get consistent messaging. A few remain but they can probably just
go away eventually.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This commit added fetching if fancy striping parameters:
09186ddb rbd: get and check striping parameters
They are almost unused, but the two fields storing the information
really belonged in the rbd_image_header structure.
This patch moves them there.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Make the names and image id in an rbd_spec be pointers to constant
data. This required the use of a local variable to hold the
snapshot name in rbd_add_parse_args() to avoid a warning.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Set the rbd spec's snapshot id for an image getting mapped in
rbd_dev_probe_update_spec() rather than rbd_dev_set_mapping().
This is the more logical place for that to happen (even though
it means we might look up the snapshot by name twice).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
A function called snap_by_name() ought to just look up a snapshot by
name. It does that, but then it assigns some stuff to the rbd
device structure as well.
Change the function to do just the lookup, and have the caller do
the assignments that follow.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If a format 2 image id is found for an image being mapped, but the
subsequent probe of the image fails, rbd_dev_probe() quits without
freeing the image id. Fix that.
Also drop a redundant hunk of code in rbd_dev_image_id().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, rbd_dev_probe() assumes that any error returned by
rbd_dev_image_id() is most likely -ENOENT, and responds by
calling the format 1 probe routine, rbd_dev_v1_probe(). Then,
at the top of rbd_dev_v1_probe(), an empty string is allocated
for the image id.
This is sort of unbalanced. Fix this by having rbd_dev_image_id()
look for -ENOENT from its "get_id" method call. If that is seen,
have it allocate the empty string there rather than depending on
rbd_dev_v1_probe() to do it.
Given that this is effectively defining the format of the image,
set rbd_dev->image_format inside rbd_dev_image_id() rather than in
the format-specific probe routines.
Also drop a redundant hunk of code in rbd_dev_image_id().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
I found during some failure injection testing that the call to
rbd_free_disk() in the error path of rbd_dev_probe_finish() was
dropping an extra reference to the disk queue. The problem
occurred when put_disk tried to drop a reference to the disk's
queue. A call to blk_cleanup_queue() just prior to that will have
also dropped a reference to the queue.
The problem is that the reference dropped by put_disk() is assumed
to have been taken by add_disk(). Our code has error paths that can
occur after the disk and its queue are initialized, but before the
call to add_disk(), and in those paths we won't have that extra
reference.
The fix is easy though. In rbd_free_disk() we're already checking
the disk's GENHD_FL_UP flag. That flag is an indication that
add_disk() has been called, so just call blk_cleanup_queue()
conditional on that flag being set.
This resolves:
http://tracker.ceph.com/issues/4800
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Now that rbd_obj_method_sync() returns the number of bytes
returned by the method call, that value should be used by
callers to ensure we don't overrun the valid portion of the
buffer.
Fix the two spots that remained that weren't doing that,
rbd_dev_image_name() and rbd_dev_v2_snap_name().
Rearrange the error path slightly in rbd_dev_v2_snap_name().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When the snapshot context for an rbd device gets updated (or the
initial one is recorded) a a list of snapshot structures is created
to represent them, one entry per snapshot. Each entry includes a
dynamically-allocated copy of the snapshot name.
Currently the name is allocated in rbd_snap_create(), as a duplicate
of the passed-in name.
For format 1 images, the snapshot name provided is just a pointer to
an existing name. But for format 2 images, the passed-in name is
already dynamically allocated, and in the the process of duplicating
it here we are leaking the passed-in name.
Fix this by dynamically allocating the name for format 1 snapshots
also, and then stop allocating a duplicate in rbd_snap_create().
Change rbd_dev_v1_snap_info() so none of its parameters is
side-effected unless it's going to return success.
This is part of:
http://tracker.ceph.com/issues/4803
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>