Use the minor version ops cached in struct nfs_client instead of looking
them up again.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Replace it with a test for whether or not the sent a stateid in violation
of what we asked for.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fix the report:
net/sunrpc/clnt.c:2580:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...instead of splitting the initialisation over init_lseg() and
pnfs_layout_process().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.
Do this by calling pnfs_mark_layout_stateid_invalid().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We want to evaluate in this order:
If the client holds no layout for this inode, then return
NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout.
If the client finds the inode among the list of layouts, but the corresponding
stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the
server to retry once the outstanding LAYOUTGET is complete.
If the current layout stateid's "other" field does not match the recalled
stateid, return NFS4ERR_BAD_STATEID.
If already processing a layout recall with a newer stateid, return
NFS4ERR_OLD_STATEID. This can only happens for servers that are
non-compliant with the NFSv4.1 protocol.
If already processing a layout recall with an older stateid, return
NFS4ERR_DELAY to ask the server to retry once the outstanding
LAYOUTRETURN is complete. Again, this is technically incompliant with
the NFSv4.1 protocol.
If the current layout sequence id is newer than the recalled stateid's
sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol
non-compliance.
If the current layout sequence id is older than the recalled stateid's
sequence id+1, return NFS4ERR_DELAY.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.
The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on
blkdev_get_by_*() step we get an pnfs_block_dev struct that is
uninitialized except for bdev field which is set to whatever error
blkdev_get_by_*() returns. bl_free_device() then tries to call
blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference.
Fixing this by setting bdev in struct pnfs_block_dev only if we didn't
get an error from blkdev_get_by_*().
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
All write callbacks are required to call nfs_writeback_update_inode() upon
success to ensure that file size changes are recorded, and the attribute
cache is invalidated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes an unnecessary semicolon warning found by the kbuild robot.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXjpP3AAoJENfLVL+wpUDrREkP/0bf8RpFE4Gw296rpuhq4tq0
9+EB45ZP2xnNP07JkSa1/v+YLir4OqOyUY+Gk4fWByy3GXyLsAuz80rjrgTxRiwX
ivQa1GGJ2xsEbdDgb62thCvm0+A3xQDzVQa6i2KjN+RONixOmvp/LSQMGKOWydZ7
f3RActjU2GSM00Mekatv2q3i9ZMVrpA5d+J+F8mvDm3rsklQnpVwsgrFesudOK8i
k5dfw23FTWpv0wwzAKmudIiNry4JuPUtPD0gs1b3IaKnS/oCggb8bCjFy9LgMJVr
2INjuFPiDuJy5NnBIRIOwBVVQxbw5XzvwUIeuKw9usRvUvNn+nI+nWsG5S1xvU9M
Gyl/YsbeuFRxAMm/NZITVSgLKxHRPUG3ahZZLYbiELpbaCbsmiDs69F9LPju0nXZ
mTP0O4C8AwBsx9VONfn8yl8Y103VlYd8u3sZRYLULWC+jPISrMSiRijDeer9xilK
0RSp6QZmgE5bovq/Db794Z02soZ6S/UOKBMbpA2uW8IxHLRBds0pAPOpNsPI1Div
srpYa6HGVhHaBLFGiYDygYK3EtDVLZg2uS6FxnfaI9lfXNjSST5Niw5Kya6WgXsM
hpdQpdNtuiulrd6e1K+EUo7gCk4YnL2ApGLG1j48ht6WrI0pYYvPQRxEKgjLEMkn
KXEuY4rCOHJ63rPEsLet
=A9KO
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-4.8-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA Cleanup
Fixes an unnecessary semicolon warning found by the kbuild robot.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.
Prevent inversion of min/max values when set through sysfs and
module parameter by setting the limits dependent on each other.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.
Prevent inversion of min/max values when set through sysctl by
setting the limits dependent on each other.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The range calculation for choosing the random reserved port will panic
with divide-by-zero when min_resvport == max_resvport, a range of one
port, not zero.
Fix the reserved port range calculation by adding one to the difference.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Author: Frank Sorenson <sorenson@redhat.com>
Date: 2016-06-27 13:55:48 -0500
sunrpc: Fix bit count when setting hashtable size to power-of-two
The hashtable size is incorrectly calculated as the next higher
power-of-two when being set to a power-of-two. fls() returns the
bit number of the most significant set bit, with the least
significant bit being numbered '1'. For a power-of-two, fls()
will return a bit number which is one higher than the number of bits
required, leading to a hashtable which is twice the requested size.
In addition, the value of (1 << nbits) will always be at least num,
so the test will never be true.
Fix the hash table size calculation to correctly set hashtable
size, and eliminate the unnecessary check.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The last put of deviceid nodes for SCSI layouts may sleep, so we shouldn't
hold any spinlocks. Make sure we put them outside the bl_ext_lock.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.
This can be reproduced as follows:
1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server. Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys
2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m
3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set. Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
5. Now do some I/O to the sec=sys mount. This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client. Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect. Grabbing a new kerberos ticket at this
point will have no effect either.
Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags. Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too. Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When older servers return RPC_AUTH_NULL, it means the
rpc creds will be ignored. In that case use the sec=
that was specified instead of setting sec=null
Fixes Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1112983
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We want to recover the open stateid if there is no layout stateid
and/or the stateid argument matches an open stateid.
Otherwise throw out the existing layout and recover from scratch, as
the layout stateid is bad.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Instead of giving up altogether and falling back to doing I/O
through the MDS, which may make the situation worse, wait for
2 lease periods for the callback to resolve itself, and then
try destroying the existing layout.
Only if this was an attempt at getting a first layout, do we
give up altogether, as the server is clearly crazy.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
They are not the same error, and need to be handled differently.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
The non-retry error path is currently broken and ends up releasing the
reference to the layout twice. It also can end up clearing the
NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race.
In addition, the retry path will fail to decrement the plh_outstanding
counter.
Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
We know that the attributes will need updating if there is still a
LAYOUTCOMMIT outstanding.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If there were less than 2 entries in the multipath list, then
xprt_iter_next_entry_multiple() would never advance beyond the
first entry, which is correct for round robin behaviour, but not
for the list iteration.
The end result would be infinite looping in rpc_clnt_iterate_for_each_xprt()
as we would never see the xprt == NULL condition fulfilled.
Reported-by: Oleg Drokin <green@linuxhacker.ru>
Fixes: 80b14d5e61 ("SUNRPC: Add a structure to track multiple transports")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Avoid a bad nfs server return an unaligned length of signature.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme. Try these names first to ensure we always get a
multipath-capable device if it exists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch. Refactor it into a separate helper
to make room for the new variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>