linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 20:06:46 +07:00

Author	SHA1	Message	Date
Olga Kornievskaia	cb95deea0b	NFS OFFLOAD_CANCEL xdr Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-09 12:56:38 -04:00
Olga Kornievskaia	5178a125f6	NFS CB_OFFLOAD xdr Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-09 12:56:38 -04:00
NeilBrown	46483c2ea4	NFS: Use an appropriate work queue for direct-write completion When a direct-write completes, a work_struct is schedule to handle the completion. When NFS is being used for swap, the direct write might be a swap-out, so memory allocation can block until the write completes. The work queue currently used is not WQ_MEM_RECLAIM, so tasks can block waiting for memory - this leads to deadlock. So use nfsiod_workqueue instead. This will always have a running thread, and work items should never block waiting for memory. Signed-off-by: Neil Brown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 17:07:38 -04:00
Wei Yongjun	72bf75cfc0	NFSv4: Fix error handling in nfs4_sp4_select_mode() Error code is set in the error handling cases but never used. Fix it. Fixes: `937e3133cd` ("NFSv4.1: Ensure we clear the SP4_MACH_CRED flags in nfs4_sp4_select_mode()") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:31 -04:00
Gustavo A. R. Silva	10db5b7a2f	pnfs: Use true and false for boolean values Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:03 -04:00
Trond Myklebust	2230ca0d28	pnfs: pnfs_find_lseg() should not check NFS_LSEG_LAYOUTRETURN Layout segment validity is determined only by the NFS_LSEG_VALID flag. If it is set, the layout segment is finable. As it is, when the flexfiles driver sets NFS_LSEG_LAYOUTRETURN to indicate that we cannot discard the layout segment, but that it must be returned, then this can result in an unnecessary layoutget storm. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:03 -04:00
Gustavo A. R. Silva	01e03bdc74	NFS: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:02 -04:00
Trond Myklebust	c8d07159c9	NFSv4: Mark the inode change attribute up to date in update_changeattr() When we update the change attribute, we should also clear the flag that says it is out of date. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:02 -04:00
Trond Myklebust	5636ec4eb6	NFSv4: Detect nlink changes on cross-directory renames too If the object being renamed from one directory to another is also a directory, then 'nlink' will change for both directories. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:02 -04:00
Trond Myklebust	3c591175d6	NFSv4: bump/drop the nlink count on the parent dir when we mkdir/rmdir Ensure that we always bump or drop the nlink count on the parent directory when we do a mkdir or a rmdir(). This needs to be done by hand as we don't have pre/post op attributes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:01 -04:00
Trond Myklebust	c16467dc03	pnfs: Fix handling of NFS4ERR_OLD_STATEID replies to layoutreturn If the server tells us that out layoutreturn raced with another layout update, then we must ensure that the new layout segments are not in use before we resend with an updated layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-08-08 16:50:01 -04:00
Trond Myklebust	6ea76bf513	NFSv4: Fix _nfs4_do_setlk() The patch to fix the case where a lock request was interrupted ended up changing default handling of errors such as NFS4ERR_DENIED and caused the client to immediately resend the lock request. Let's do a partial revert of that request so that the default is now to exit, but change the way we handle resends to take into account the fact that the user may have interrupted the request. Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: `a3cf9bca2a` ("NFSv4: Don't add a new lock on an interrupted wait..") Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-08-01 23:17:06 -04:00
Bill Baker	0f90be132c	NFSv4 client live hangs after live data migration recovery After a live data migration event at the NFS server, the client may send I/O requests to the wrong server, causing a live hang due to repeated recovery events. On the wire, this will appear as an I/O request failing with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly. NFS4ERR_BADSSESSION is returned because the session ID being used was issued by the other server and is not valid at the old server. The failure is caused by async worker threads having cached the transport (xprt) in the rpc_task structure. After the migration recovery completes, the task is redispatched and the task resends the request to the wrong server based on the old value still present in tk_xprt. The solution is to recompute the tk_xprt field of the rpc_task structure so that the request goes to the correct server. Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Fixes: `fb43d17210` ("SUNRPC: Use the multipath iterator to assign a ...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-31 12:53:40 -04:00
Olga Kornievskaia	32cd3ee511	NFSv4.0 fix client reference leak in callback If there is an error during processing of a callback message, it leads to refrence leak on the client structure and eventually an unclean superblock. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-31 12:53:40 -04:00
Dan Carpenter	379ebf0796	NFS: silence a harmless uninitialized variable warning kstrtoul() can return -ERANGE so Smatch complains that "num" can be uninitialized. We check that it's within bounds so it's not a huge deal. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-31 12:53:40 -04:00
Dave Wysochanski	016583d703	sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones The existing rpc_print_iostats has a few shortcomings. First, the naming is not consistent with other functions in the kernel that display stats. Second, it is really displaying stats for an rpc_clnt structure as it displays both xprt stats and per-op stats. Third, it does not handle rpc_clnt clones, which is important for the one in-kernel tree caller of this function, the NFS client's nfs_show_stats function. Fix all of the above by renaming the rpc_print_iostats to rpc_clnt_show_stats and looping through any rpc_clnt clones via cl_parent. Once this interface is fixed, this addresses a problem with NFSv4. Before this patch, the /proc/self/mountstats always showed incorrect counts for NFSv4 lease and session related opcodes such as SEQUENCE, RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0 even though many ops would go over the wire. The reason for this is there are multiple rpc_clnt structures allocated for any given NFSv4 mount, and inside nfs_show_stats() we callled into rpc_print_iostats() which only handled one of them, nfs_server->client. Fix these counts by calling sunrpc's new rpc_clnt_show_stats() function, which handles cloned rpc_clnt structs and prints the stats together. Note that one side-effect of the above is that multiple mounts from the same NFS server will show identical counts in the above ops due to the fact the one rpc_clnt (representing the NFSv4 client state) is shared across mounts. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-31 12:53:35 -04:00
Dan Carpenter	0914bb965e	pnfs/blocklayout: off by one in bl_map_stripe() "dev->nr_children" is the number of children which were parsed successfully in bl_parse_stripe(). It could be all of them and then, in that case, it is equal to v->stripe.volumes_count. Either way, the > should be >= so that we don't go beyond the end of what we're supposed to. Fixes: `5c83746a0c` ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-30 13:19:40 -04:00
Calum Mackay	23a88ade71	nfs: Referrals not inheriting proto setting from parent Commit `530ea42192` ("nfs: Referrals should use the same proto setting as their parent") encloses the fix with #ifdef CONFIG_SUNRPC_XPRT_RDMA. CONFIG_SUNRPC_XPRT_RDMA is a tristate option, so it should be tested with #if IS_ENABLED(). Fixes: `530ea42192` ("nfs: Referrals should use the same proto setting as their parent") Reported-by: Helen Chao <helen.chao@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Bill Baker <bill.baker@oracle.com> Signed-off-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-30 13:19:40 -04:00
Jeff Layton	8b199e58d4	nfs: initiate returning delegation when reclaiming one that's been recalled When reclaiming a delegation via CLAIM_PREVIOUS open, the server can indicate that the delegation has been recalled since it was issued by setting the "recalled" flag in the delegation. Ensure that we respect the flag by initiating a delegation return when it is set. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-30 13:19:40 -04:00
Souptick Joarder	01a368441f	fs: nfs: Adding new return type vm_fault_t Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. see commit `1c8f422059` ("mm: change return type to vm_fault_t") for reference. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-30 13:19:40 -04:00
Chengguang Xu	12b289cfac	nfs: add error check in nfs_idmap_prepare_message() Even though the caller of nfs_idmap_prepare_message() checks return code in their side but it's better to add an error check for match_int() so that we can avoid unnecessary operations when bad int arg is detected. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-07-30 13:19:40 -04:00
Lance Shelton	a61246c961	Fix error code in nfs_lookup_verify_inode() Return -ESTALE to force a lookup when the file has no more links Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	3825827ebf	NFS: More excessive attribute revalidation in nfs_execute_ok() execute_ok() will only check the mode bits if the object is not a directory, so we don't need to revalidate the attributes in that case. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	cf8340277f	NFS: Fix excessive attribute revalidation in nfs_execute_ok() When nfs_update_inode() sets NFS_INO_INVALID_ACCESS it is a sign that we want to revalidate the access cache, not the inode attributes. In fact we only want to revalidate here if we see that the mode bits are invalid, so check for NFS_INO_INVALID_OTHER instead. Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	7be7b3ca16	NFS: Ensure we immediately start writeback on rescheduled writes If the writes are being rescheduled due to a pNFS error, then we really want to immediately start a new flush. The O_DIRECT code already does this, so we only need to worry about buffered writes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	bd3d16a887	NFSv4.1: Fix a potential layoutget/layoutrecall deadlock If the client is sending a layoutget, but the server issues a callback to recall what it thinks may be an outstanding layout, then we may find an uninitialised layout attached to the inode due to the layoutget. In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT rather than NFS4ERR_DELAY, as the latter can end up deadlocking. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	af9b6d7570	pNFS: Parse the results of layoutget on open even if permissions checks fail Even if the results of the permissions checks failed, we should parse the results of the layout on open call so that we can return the layout if required. Note that we also want to ignore the sequence counter for whether or not a layout recall occurred. If the recall pertained to our OPEN, then the callback will know, and will attempt to wait for us to finih processing anyway. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	b2b1ff3da6	NFS: Allow optimisation of lseek(fd, SEEK_CUR, 0) on directories There should be no need to grab the inode lock if we're only reading the file offset. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	411ae722d1	pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout() If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT then we need to wait for all outstanding layoutget calls to complete before we can send a new one. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	056f9ad62e	pNFS/flexfiles: Ensure we always return a layout if it has layoutstats If a layout segment is carrying layoutstats or layout error information, then we always want to return it rather than using a forgetful model. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	f0b429819b	pNFS: Ignore non-recalled layouts in pnfs_layout_need_return() If a layout has been recalled, then we should fire off a layoutreturn as soon as all the layout segments that match the recall have been retired. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:25 -04:00
Trond Myklebust	00bcbe119f	pNFS: Don't update the stateid when replying NFS4ERR_DELAY to a layout recall RFC5661 doesn't state directly that the client should update the layout stateid if it returns NFS4ERR_NOMATCHING_LAYOUT in response to a recall, however it does state that this error will "cleanly indicate completion" on par with returning the layout. For this reason, we assume that the client should update the layout stateid. The Linux pNFS server definitely does expect this behaviour. However, if the client replies NFS4ERR_DELAY, then it is stating that the recall was not processed, so it would be very wrong to update the layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:24 -04:00
Trond Myklebust	e0b7d420f7	pNFS: Don't discard layout segments that are marked for return If there are layout segments that are marked for return, then we need to ensure that pnfs_mark_matching_lsegs_return() does not just silently discard them, but it should tell the caller that there is a layoutreturn scheduled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-07-26 16:25:24 -04:00
Al Viro	44907d7900	get rid of 'opened' argument of ->atomic_open() - part 3 now it can be done... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:20 -04:00
Al Viro	b452a458ca	getting rid of 'opened' argument of ->atomic_open() - part 2 __gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open() don't need 'opened' anymore. Get rid of that argument in those. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:20 -04:00
Al Viro	be12af3ef5	getting rid of 'opened' argument of ->atomic_open() - part 1 'opened' argument of finish_open() is unused. Kill it. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:19 -04:00
Al Viro	73a09dd943	introduce FMODE_CREATED and switch to it Parallel to FILE_CREATED, goes into ->f_mode instead of *opened. NFS is a bit of a wart here - it doesn't have file at the point where FILE_CREATED used to be set, so we need to propagate it there (for now). IMA is another one (here and everywhere)... Note that this needs do_dentry_open() to leave old bits in ->f_mode alone - we want it to preserve FMODE_CREATED if it had been already set (no other bit can be there). Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:18 -04:00
Al Viro	b0c6108ecf	nfs_instantiate(): prevent multiple aliases for directory inode Since NFS allows open-by-fhandle, we have to cope with the possibility of mkdir vs. open-by-guessed-handle races. A local filesystem could decide what the inumber of the new object will be and insert a locked inode with that inumber into icache _before_ the on-disk data structures begin to look good and unlock it only once it has a dentry alias, so that open-by-handle coming first would quietly fail and mkdir coming first would have open-by-handle grab its dentry. For NFS it's a non-starter - the icache key is server-supplied fhandle and we do not get that until the object has been fully created on server. We really have to deal with the possibility that open-by-handle gets the in-core inode and attaches a dentry to it before mkdir does. Solution: let nfs_mkdir() use d_splice_alias() to catch those. We can * get an error. Just return it to our caller. * get NULL - no preexisting dentry aliases, we'd just done what d_add() would've done. Success. * get a reference to preexisting alias. In that case the alias had been moved in place of nfs_mkdir() argument (and hashed there), while nfs_mkdir() argument is left unhashed negative. Which is just fine for ->mkdir() callers, all we need is to release the reference we'd got from d_splice_alias() and report success. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-06-28 15:28:48 -04:00
Linus Torvalds	27db64f65f	NFS client bugfixes for Linux 4.18 Hightlights include: Bugfixes: - Fix an rcu deadlock in nfs_delegation_find_inode() - Fix NFSv4 deadlocks due to not freeing the session slot in layoutget - Don't send layoutreturn if the layout is already invalid - Prevent duplicate XID allocation - flexfiles: Don't tie up all the rpciod threads in resends -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbK9avAAoJEA4mA3inWBJclq0P/1VCigyDlsbtdby3z2leV84k l0asrGjOQndljJ7I21awAgEo8KvXOd66cMv6YT3+UqEW18aNblH4/ngjyId6hPVb RZDX7tsG16ZHEqfe9f9irNZo90mdvuSC4ChJ/CbbPesaK9pblE1d76b/qUVr4FUX Gj7JPAC5ckoiZXPFfRWfc+o7JnGvs5wkEuDTy+ig6v7BRdL64hdPG3veRNpmLIAZ uS/NCyRpO+nFN/ukmvuoI2ZQ3qfHubHBD+rHxr1UKT/ad7dywLmL2UBaYQ0Tl3bq /iSQHutgJYj/80VaRTqdlLt/m4ebUZg+9BEZgM5MvqBWkXcpXND51zxExVJN4cGW BOytqjLz0gP1OGb8w+Oow58K8l4XyEgHe2CtZ6Yz8Vwof7nchkpv7RSX50hJFIcA YlikeDyDzfOmTT6ove5kF31WQSa3Bk6OMEei0of6hWU3UVHyEdr9az73pm/CLSHE /R7w0osU3B9tmQD4btQeJ2DxP+syQwhelOYodyVTwOlkmmGg7DSV7fehnGyH8t8f I4Yp8f0raiYGbwonYVE2+zDO140VRETEfTE4XQZnn41fZUfB74oIqk77JtgvGMk2 /+XFNCYBGadHdSBdxyJmhSjhoAWrhgChEIz1G12SiHrNvqIRY/uHhdCX1Ut5vlPf 5aqyn/yXm6rUH7aNh/Gd =tz0M -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Hightlights include: - fix an rcu deadlock in nfs_delegation_find_inode() - fix NFSv4 deadlocks due to not freeing the session slot in layoutget - don't send layoutreturn if the layout is already invalid - prevent duplicate XID allocation - flexfiles: Don't tie up all the rpciod threads in resends" * tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Process writeback resends from nfsiod context as well pNFS/flexfiles: Don't tie up all the rpciod threads in resends sunrpc: Prevent duplicate XID allocation pNFS: Don't send layoutreturn if the layout is already invalid pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception NFS: Fix an rcu deadlock in nfs_delegation_find_inode()	2018-06-22 06:21:34 +09:00
Trond Myklebust	7b0df92ac1	pNFS/flexfiles: Process writeback resends from nfsiod context as well Although the writeback resends are more robust than the reads, since they are not immediately rescheduled by the same thread, we are better off processing them in the same place as the reads. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-19 09:25:27 -04:00
Trond Myklebust	42f86b44a4	pNFS/flexfiles: Don't tie up all the rpciod threads in resends We do not want to have rpciod threads perform recursive calls into the RPC layer since that can deadlock. In particular, having to wait for a layoutget can be nasty... We want rather to defer scheduling those retries until we're in the rpc_release() callback, since that is called from the nfsiod workqueue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-19 09:25:27 -04:00
Trond Myklebust	c8bf707353	pNFS: Don't send layoutreturn if the layout is already invalid If the layout was invalidated due to a reboot, then don't try to send a layoutreturn for it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-19 08:52:27 -04:00
Trond Myklebust	2dbf8dffbf	pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception Right now, we can call nfs_commit_inode() while holding the session slot, which could lead to NFSv4 deadlocks. Ensure we only keep the slot if the server returned a layout that we have to process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-19 08:52:27 -04:00
Linus Torvalds	7a932516f5	vfs/y2038: inode timestamps conversion to timespec64 This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. There were no conflicts between this and the contents of linux-next until just before the merge window, when we saw multiple problems: - A minor conflict with my own y2038 fixes, which I could address by adding another patch on top here. - One semantic conflict with late changes to the NFS tree. I addressed this by merging Deepa's original branch on top of the changes that now got merged into mainline and making sure the merge commit includes the necessary changes as produced by coccinelle. - A trivial conflict against the removal of staging/lustre. - Multiple conflicts against the VFS changes in the overlayfs tree. These are still part of linux-next, but apparently this is no longer intended for 4.18 [1], so I am ignoring that part. As Deepa writes: The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions. Thomas Gleixner adds: I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window. [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+ Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/ WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy A3HkjIBrKW5AgQDxfgvm =CZX2 -----END PGP SIGNATURE----- Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()	2018-06-15 07:31:07 +09:00
Anna Schumaker	d5681f59ee	NFS: Fix an rcu deadlock in nfs_delegation_find_inode() I was able to reproduce this pretty regularily using xfstests generic/013 on NFS v4.0. Reported-by: Ross Zwisler <Ross.Zwisler@linux.intel.com> Fixes: `6c34265502` (NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-14 14:05:38 -04:00
Arnd Bergmann	15eefe2a99	Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into vfs-timespec64 Pull the timespec64 conversion from Deepa Dinamani: "The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The flag patch applies cleanly. I've not seen the timestamps update logic change often. The series applies cleanly on 4.17-rc6 and linux-next tip (top commit: next-20180517). I'm not sure how to merge this kind of a series with a flag patch. We are targeting 4.18 for this. Let me know if you have other suggestions. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. I've tried to keep the conversions with the script simple, to aid in the reviews. I've kept all the internal filesystem data structures and function signatures the same. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions." I've pulled it into a branch based on top of the NFS changes that are now in mainline, so I could resolve the non-obvious conflict between the two while merging. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2018-06-14 14:54:00 +02:00
Linus Torvalds	b08fc5277a	- Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees) -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7 oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2 zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8 tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3 BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3 LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh AtfvEeaPHaOyD8/h2Q== =zUUp -----END PGP SIGNATURE----- Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...	2018-06-12 18:28:00 -07:00
Kees Cook	6396bb2215	treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) \| kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) \| kzalloc(sizeof(TYPE) * C2, ...) \| kzalloc(C1 * C2 * C3, ...) \| kzalloc(C1 * C2, ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Linus Torvalds	0725d4e1b8	NFS client updates for Linux 4.18 Highlights include: Stable fixes: - Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message - Fix a hang due to incorrect error returns in rpcrdma_convert_iovs() - Revert an incorrect change to the NFSv4.1 callback channel - Fix a bug in the NFSv4.1 sequence error handling Features and optimisations: - Support for piggybacking a LAYOUTGET operation to the OPEN compound - RDMA performance enhancements to deal with transport congestion - Add proper SPDX tags for NetApp-contributed RDMA source - Do not request delegated file attributes (size+change) from the server - Optimise away a GETATTR in the lookup revalidate code when doing NFSv4 OPEN - Optimise away unnecessary lookups for rename targets - Misc performance improvements when freeing NFSv4 delegations Bugfixes and cleanups: - Try to fail quickly if proto=rdma - Clean up RDMA receive trace points - Fix sillyrename to return the delegation when appropriate - Misc attribute revalidation fixes - Immediately clear the pNFS layout on a file when the server returns ESTALE - Return NFS4ERR_DELAY when delegation/layout recalls fail due to igrab() - Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbH8gIAAoJEA4mA3inWBJcpzYQAJYY3ykt9oLQgm/2b/D/weDe 6890M9W5nIeuZq5soWSpYsZTxqIFbGV4laG/eCTW1gUN1TitSZsoOp7kqhRHXOjq Rv3ZvjlZsP2qv2SnzsEmhJsynfyB46d19smSTJhgQ8dnXhaZv04Wsd4krLHx0z6p uUUis5Q1m+vL7HsFPp3iUareO/DFKeSkw2cQ2V5ksTIEiAzX7GC+Ex/KKWf82nrJ hm7+Nq7rLf1QHJkQvsc3fYCMR4gIzEwUu6F8RyxCoAVgD6O90Hx6NbxnINaHDD4N U0nRP5LwCyN9hbPWvwcH7Sn4ePDTos2yj2tFO5NP9btTLDVLFSGYZ2a74d9PRdAf 9jn6f6juSDwI7T6NXvkHzzkJG6Or9ABAUZo+yX5JoD6lmgOcPUJpLRy6fu7UxAuN a5OZ7d9edYpOi0Kys8sDSIlLlxZtFkvybOMVuI3dSHsI+c0g39w8oarpqT2wXWMs /ZtFz0FCreHhKkNtz7Z49z1UQHDv/XYM0WkcO+eaeK58RLIEE0pZHoMvPKP63lkI nbbgHvBRAu38Jtvvu65Hpb/VpBcqNGM5hjN1cfW/BOqAPKW23s4vWVj+/1silfW/ uw0MkNrDC9endoALp/YMCcMwPvEw9Awt9y4KjMgfVgSnKwXd0HaSZ2zE6aJU3Wry Fy2Tv0e0OH3z9Bi/LNuJ =YWSl -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message - Fix a hang due to incorrect error returns in rpcrdma_convert_iovs() - Revert an incorrect change to the NFSv4.1 callback channel - Fix a bug in the NFSv4.1 sequence error handling Features and optimisations: - Support for piggybacking a LAYOUTGET operation to the OPEN compound - RDMA performance enhancements to deal with transport congestion - Add proper SPDX tags for NetApp-contributed RDMA source - Do not request delegated file attributes (size+change) from the server - Optimise away a GETATTR in the lookup revalidate code when doing NFSv4 OPEN - Optimise away unnecessary lookups for rename targets - Misc performance improvements when freeing NFSv4 delegations Bugfixes and cleanups: - Try to fail quickly if proto=rdma - Clean up RDMA receive trace points - Fix sillyrename to return the delegation when appropriate - Misc attribute revalidation fixes - Immediately clear the pNFS layout on a file when the server returns ESTALE - Return NFS4ERR_DELAY when delegation/layout recalls fail due to igrab() - Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY" * tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (80 commits) skip LAYOUTRETURN if layout is invalid NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY NFSv4: Fix a typo in nfs41_sequence_process NFSv4: Revert commit `5f83d86cf5` ("NFSv4.x: Fix wraparound issues..") NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab() NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab() NFSv4.0: Remove transport protocol name from non-UCS client ID NFSv4.0: Remove cl_ipaddr from non-UCS client ID NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined NFS: Filter cache invalidation when holding a delegation NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes() NFS: Improve caching while holding a delegation NFS: Fix attribute revalidation NFS: fix up nfs_setattr_update_inode NFSv4: Ensure the inode is clean when we set a delegation NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access NFSv4: Don't ask for delegated attributes when adding a hard link NFSv4: Don't ask for delegated attributes when revalidating the inode NFS: Pass the inode down to the getattr() callback NFSv4: Don't request size+change attribute if they are delegated to us ...	2018-06-12 10:09:03 -07:00
Olga Kornievskaia	93b7f7ad20	skip LAYOUTRETURN if layout is invalid Currently, when IO to DS fails, client returns the layout and retries against the MDS. However, then on umounting (inode eviction) it returns the layout again. This is because pnfs_return_layout() was changed in commit `d78471d32b` ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR") to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned the layout, it will be returned again. Instead, let's also check if we have already marked the layout invalid. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-12 08:48:04 -04:00
Trond Myklebust	f9312a5410	NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP, then it thinks we're trying to replay an existing request. If so, then let's just bump the sequence ID and retry the operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-09 19:10:31 -04:00
Trond Myklebust	995891006c	NFSv4: Fix a typo in nfs41_sequence_process We want to compare the slot_id to the highest slot number advertised by the server. Fixes: `3be0f80b5f` ("NFSv4.1: Fix up replays of interrupted requests") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-09 12:59:55 -04:00
Trond Myklebust	fc40724fc6	NFSv4: Revert commit `5f83d86cf5` ("NFSv4.x: Fix wraparound issues..") The correct behaviour for NFSv4 sequence IDs is to wrap around to the value 0 after 0xffffffff. See https://tools.ietf.org/html/rfc5661#section-2.10.6.1 Fixes: `5f83d86cf5` ("NFSv4.x: Fix wraparound issues when validing...") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-09 12:56:30 -04:00
Trond Myklebust	ce5624f7e6	NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-08 16:36:10 -04:00
Trond Myklebust	6c34265502	NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab() If the attempt to recall the delegation fails because the inode is in the process of being evicted from cache, then use NFS4ERR_DELAY to ask the server to retry later. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-08 16:36:10 -04:00
Chuck Lever	025bb9f872	NFSv4.0: Remove transport protocol name from non-UCS client ID Commit `69dd716c5f` ("NFSv4: Add socket proto argument to setclientid") (2007) added the transport protocol name to the client ID string, but the patch description doesn't explain why this was necessary. At that time, the only transport protocol name that would have been used is "tcp" (for both IPv4 and IPv6), resulting in no additional distinctiveness of the client ID string. Since there is one client instance, the server should recognize it's state whether the client is connecting via TCP or RDMA. Same client, same lease. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-06 11:45:52 -04:00
Chuck Lever	848a4eb2e3	NFSv4.0: Remove cl_ipaddr from non-UCS client ID It is possible for two distinct clients to have the same cl_ipaddr: - if the client admin disables callback with clientaddr=0.0.0.0 on more than one client - if two clients behind separate NATs use the same private subnet number - if the client admin specifies the same address via clientaddr= mount option (pointing the server at the same NAT box, for example) Because of the way the Linux NFSv4.0 client constructs its client ID string by default, such clients could interfere with each others' lease state when mounting the same server: scnprintf(str, len, "Linux NFSv4.0 %s/%s %s", clp->cl_ipaddr, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR), rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO)); cl_ipaddr is set to the value of the clientaddr= mount option. Two clients whose addresses are 192.168.3.77 that mount the same server (whose public IP address is, say, 3.4.5.6) would both generate the same client ID string when sending a SETCLIENTID: Linux NFSv4.0 192.168.3.77/3.4.5.6 tcp and thus the server would not be able to distinguish the clients' leases. If both clients are using AUTH_SYS when sending SETCLIENTID then the server could possibly permit the two clients to interfere with or purge each others' leases. To better ensure that Linux's NFSv4.0 client ID strings are distinct in these cases, remove cl_ipaddr from the client ID string and replace it with something more likely to be unique. Note that the replacement looks a lot like the uniform client ID string. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-06 11:45:44 -04:00
Deepa Dinamani	95582b0083	vfs: change inode times to use struct timespec64 struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct $ iattr \\| inode \\| kstat $ { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec t, + struct timespec64 t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec t + struct timespec64 t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode node1; local idexpression struct inode node2; local idexpression struct iattr attr1; local idexpression struct iattr attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; \| - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) \| - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) \| - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) \| - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) \| ts = current_time(e) \| fn_update_time(..., &ts,...) \| inode_node->i_xtime = ts \| node1->i_xtime = ts \| ts = inode_node->i_xtime \| <+... attr1->ia_xtime ...+> = ts \| ts = attr1->ia_xtime \| ts.tv_sec \| ts.tv_nsec \| btrfs_set_stack_timespec_sec(..., ts.tv_sec) \| btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) \| - ts = timespec64_to_timespec( + ts = ... -) \| - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) \| - ts = E3 + ts = timespec_to_timespec64(E3) \| - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) \| fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) \| - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) \| - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) \| - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) \| node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) \| - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) \| - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) \| - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode node; struct iattr attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); \| fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); \| - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode node; struct iattr attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); \| + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode node; struct iattr attr; struct kstat stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); \| + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); \| + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); \| + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); \| + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode node; struct inode node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr attrp; struct iattr attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \\| attrp->ia_xtime2 \\| attr.ia_xtime2 \) = node->i_xtime1 ; \| node->i_xtime2 = $ node2->i_xtime1 \\| timespec64_trunc(...) $; \| node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = $ts \\| current_time(...) $; \| node->i_xtime1 = node->i_xtime3 = $ts \\| current_time(...) $; \| stat->xtime = node2->i_xtime1; \| stat1.xtime = node2->i_xtime1; \| ( node->i_xtime2 \\| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; \| ( attrp->ia_xtime1 \\| attr.ia_xtime1 \) = attrp2->ia_xtime2; \| - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); \| - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); \| node->i_xtime1 = current_time(...); \| node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); \| node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); \| - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>	2018-06-05 16:57:31 -07:00
Trond Myklebust	977294c719	NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined Fix a compiler warning: fs/nfs/nfs4proc.c:910:13: warning: 'nfs4_layoutget_release' defined but not used [-Wunused-function] static void nfs4_layoutget_release(void *calldata) ^~~~~~~~~~~~~~~~~~~~~~ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-05 10:38:39 -04:00
Trond Myklebust	3f0b3cf46e	NFS: Filter cache invalidation when holding a delegation If the client holds a delegation, then ensure we filter out attempts to invalidate the size, owner, group owner, or mode unless we made the change, in which case, check that NFS_INO_REVAL_FORCED is set by the caller. Always filter out attempts to invalidate the change attribute and size, since we are authoritative for those. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	4ebe83af20	NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes() If we hold a delegation, we should not need to call nfs_check_inode_attributes() since we already know which attributes are valid, and which ones may still need revalidation. The state of the NFS_INO_REVAL_FORCED flag is therefore irrelevant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	c80d17c55d	NFS: Improve caching while holding a delegation Make sure that the client completely ignores change attribute and size changes on the server when it holds a delegation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	0b467264d0	NFS: Fix attribute revalidation Don't mark attributes as invalid just because they have changed. Instead, for the purposes of adjusting the attribute cache timeout, keep a separate variable that tracks whether or not a change occurred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	6a97d02dfe	NFS: fix up nfs_setattr_update_inode Always try to set the attributes, even if we don't have a valid struct nfs_fattr. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	97c2c17af9	NFSv4: Ensure the inode is clean when we set a delegation If there are attributes that are still invalid when we set a delegation, then we need to set the NFS_INO_REVAL_FORCED flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:58 -04:00
Trond Myklebust	7c6726546c	NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access If we hold a delegation, we don't need to care about whether or not the inode attributes are up to date. We know we can cache the results of this call regardless. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 15:03:20 -04:00
Trond Myklebust	2f28dc385a	NFSv4: Don't ask for delegated attributes when adding a hard link Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 12:07:07 -04:00
Trond Myklebust	771734f291	NFSv4: Don't ask for delegated attributes when revalidating the inode Again, when revalidating the inode, we don't need to ask for attributes for which we are authoritative. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 12:07:07 -04:00
Trond Myklebust	a841b54dbd	NFS: Pass the inode down to the getattr() callback Allow the getattr() callback to check things like whether or not we hold a delegation so that it can adjust the attributes that it is asking for. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 12:07:07 -04:00
Trond Myklebust	30846df06f	NFSv4: Don't request size+change attribute if they are delegated to us When we hold a delegation, we should not need to request attributes such as the file size or the change attribute. For some servers, avoiding asking for these unneeded attributes can improve the overall system performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-06-04 12:07:07 -04:00
Trond Myklebust	ae55e59da0	pnfs: Don't release the sequence slot until we've processed layoutget on open If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:12 -04:00
Trond Myklebust	32f1c28f3d	pnfs: Don't call commit on failed layoutget-on-open If the layoutget on open call failed, we can't really commit the inode, so don't bother calling it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:12 -04:00
Trond Myklebust	64294b08f9	pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached data If we're only opening the file for reading, and the file is empty and/or we already have cached data, then heuristically optimise away the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:12 -04:00
Trond Myklebust	8dc96566c0	NFSv4/pnfs: Don't switch off layoutget-on-open for transient errors Ensure that we only switch off the LAYOUTGET operation in the OPEN compound when the server is truly broken, and/or it is complaining that the compound is too large. Currently, we end up turning off the functionality permanently, even for transient errors such as EACCES or ENOSPC. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Trond Myklebust	d49e0d5b99	NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised data We need to ensure that pnfs_parse_lgopen() doesn't try to parse a struct nfs4_layoutget_res that was not filled by a successful call to decode_layoutget(). This can happen if we performed a cached open, or if either the OP_ACCESS or OP_GETATTR operations preceding the OP_LAYOUTGET in the compound returned an error. By initialising the 'status' field to NFS4ERR_DELAY, we ensure that pnfs_parse_lgopen() won't try to interpret the structure. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	30ae2412e9	pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGET The flag was not always being cleared after LAYOUTGET on OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	c49b5209f9	pnfs: Add barrier to prevent lgopen using LAYOUTGET during recall Since the LAYOUTGET on OPEN can be sent without prior inode information, existing methods to prevent LAYOUTGET from being sent while processing CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET was being sent, and if so ignore the results. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	6e01260cee	pnfs: Stop attempting LAYOUTGET on OPEN on failure Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	78746a384c	pnfs: Add LAYOUTGET to OPEN of an existing file Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Trond Myklebust	29a8bfe52d	pNFS: Refactor nfs4_layoutget_release() Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c where it can be reused by the layoutget on open code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	2409a976a2	pnfs: Add LAYOUTGET to OPEN of a new file This triggers when have no pre-existing inode to attach to. The preexisting case is saved for later. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	5e36e2a941	pnfs: Change pnfs_alloc_init_layoutget_args call signature Don't send in a layout, instead use the (possibly NULL) inode. This is needed for LAYOUTGET attached to an OPEN where the inode is not yet set. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	1b146fcff7	pnfs: Move nfs4_opendata into nfs4_fs.h It will be needed now by the pnfs code. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	56f487f8c8	pnfs: Add conditional encode/decode of LAYOUTGET within OPEN compound Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	dacb452db8	pnfs: move allocations out of nfs4_proc_layoutget They work better in the new alloc_init function. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	587f03deb6	pnfs: refactor send_layoutget Pull out the alloc/init part for eventual reuse by OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	f86c3ac502	pnfs: Add layout driver flag PNFS_LAYOUTGET_ON_OPEN Driver can set flag to allow LAYOUTGET to be sent with OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	3b65a30df9	NFS4: move ctx into nfs4_run_open_task Preparing to add conditional LAYOUTGET to OPEN rpc, the LAYOUTGET will need the ctx info. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:11 -04:00
Fred Isaman	808ba32abe	pnfs: Store return value of decode_layoutget for later processing This will be needed to seperate return value of OPEN and LAYOUTGET when they are combined into a single RPC. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:10 -04:00
Fred Isaman	34ec9aac7d	pnfs: Remove redundant assignment from nfs4_proc_layoutget(). nfs_init_sequence() will clear this for us. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:03:10 -04:00
Benjamin Coddington	a3cf9bca2a	NFSv4: Don't add a new lock on an interrupted wait for LOCK If the wait for a LOCK operation is interrupted, and then the file is closed, the locks cleanup code will assume that no new locks will be added to the inode after it has completed. We already have a mechanism to detect if there was signal, so let's use that to avoid recreating the local lock once the RPC completes. Also skip re-sending the LOCK operation for the various error cases if we were signaled. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix inverted test of locks_lock_inode_wait()] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	cf61eb2686	NFSv4: Always clear the pNFS layout when handling ESTALE If we get an ESTALE error in response to an RPC call operating on the file on the MDS, we should immediately cancel the layout for that file. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Dave Wysochanski	d68894800e	NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d' that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which is a stack char array variable of length NFS_UINT_MAXLEN == 11. If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion overflows into a negative value, for example: crash> p (unsigned) (0x80000000) $1 = 2147483648 crash> p (signed) (0x80000000) $2 = -2147483648 The '-' sign is written to the buffer and this causes a 1 byte overflow when the NULL byte is written, which corrupts kernel stack memory. If CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic: [11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c [11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac [11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8 [11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c [11558053.650107] Call Trace: [11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b [11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2 [11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140 [11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30 [11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc] [11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0 [11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0 [11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0 [11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Fix this by calling the internally defined nfs_map_numeric_to_string() function which properly uses '%u' to convert this __u32. For consistency, also replace the one other place where snprintf is called. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Stephen Johnston <sjohnsto@redhat.com> Fixes: `cf4ab538f1` ("NFSv4: Fix the string length returned by the idmapper") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	d554168f87	NFS: Fix up nfs_post_op_update_inode() to force ctime updates We do not want to ignore ctime updates that originate from functions such as link(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	472f761e11	NFS: Ensure we revalidate the inode correctly after setacl Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	59a707b0d4	NFS: Ensure we revalidate the inode correctly after remove or rename We may need to revalidate the change attribute, ctime and the nlinks count. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	821a868a23	NFS: Set the force revalidate flag if the inode is not completely initialised Ensure that a delegation doesn't cause us to skip initialising the inode if it was incomplete when we exited nfs_fhget() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	3cb3fd6da4	NFS: Fix up sillyrename() Ensure that we register the fact that the inode ctime has changed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	ed7e9ad090	NFSv4: Fix sillyrename to return the delegation when appropriate Ensure that we pass down the inode of the file being deleted so that we can return any delegation being held. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Trond Myklebust	991eedb137	NFSv4: Only pass the delegation to setattr if we're sending a truncate Even then it isn't really necessary. The reason why we may not want to pass in a stateid in other cases is that we cannot use the delegation credential. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Anna Schumaker	2f261020b6	NFS: Merge nfs41_free_stateid() with _nfs41_free_stateid() Having these exist as two functions doesn't seem to add anything useful, and I think merging them together makes this easier to follow. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Anna Schumaker	fba83f3411	NFS: Pass "privileged" value to nfs4_init_sequence() We currently have a separate function just to set this, but I think it makes more sense to set it at the same time as the other values in nfs4_init_sequence() Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Anna Schumaker	e9ae1ee2b2	NFS: Move call to nfs4_state_protect() to nfs4_commit_setup() Rather than doing this in the generic NFS client code. Let's put this with the other v4 stuff so it's all in one place. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
Anna Schumaker	fb91fb0ee7	NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup() This doesn't really need to be in the generic NFS client code, and I think it makes more sense to keep the v4 code in one place. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:16 -04:00
NeilBrown	e04bbf6b1b	NFS: Avoid quadratic search when freeing delegations. There are three places that walk all delegation for an nfs_client and restart whenever they find something interesting - potentially resulting in a quadratic search: If there are 10,000 uninteresting delegations followed by 10,000 interesting one, then the code skips over 100,000,000 delegations, which can take a noticeable amount of time. Of these nfs_delegation_reap_unclaimed() and nfs_reap_expired_delegations() are only called during unusual events: a server reboots or reports expired delegations, probably due to a network partition. Optimizing these is not particularly important. The third, nfs_client_return_marked_delegations(), is called periodically via nfs_expire_unreferenced_delegations(). It could cause periodic problems on a busy server. New delegations are added to the end of the list, so if there are 10,000 open files with delegations, and 10,000 more recently opened files that received delegations but are now closed, then nfs_client_return_marked_delegations() can take seconds to skip over the 10,000 open files 10,000 times. That is a waste of time. The avoid this waste a place-holder (an inode) is kept when locks are dropped, so that the place can usually be found again after taking rcu_readlock(). This place holder ensure that we find the right starting point in the list of nfs_servers, and makes is probable that we find the right starting point in the list of delegations. We might need to occasionally restart at the head of that list. It might be possible that the place_holder inode could lose its delegation separately, and then get a new one using the same (freed and then reallocated) 'struct nfs_delegation'. Were this to happen, the new delegation would be at the end of the list and we would miss returning some other delegations. This would have the effect of unnecessarily delaying the return of some unused delegations until the next time this function is called - typically 90 seconds later. As this is not a correctness issue and is vanishingly unlikely to happen, it does not seem worth addressing. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 15:02:14 -04:00
NeilBrown	3ca951b618	NFS: use cond_resched() when restarting walk of delegation list. In three places we walk the list of delegations for an nfs_client until an interesting one is found, then we act of that delegation and restart the walk. New delegations are added to the end of a list and the interesting delegations are usually old, so in many case we won't repeat a long walk over and over again, but it is possible - particularly if the first server in the list has a large number of uninteresting delegations. In each cache the work done on interesting delegations will often complete without sleeping, so this could loop many times without giving up the CPU. So add a cond_resched() at an appropriate point to avoid hogging the CPU for too long. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 14:59:19 -04:00
NeilBrown	f389349142	NFS: slight optimization for walking list for delegations There are 3 places where we walk the list of delegations for an nfs_client. In each case there are two nested loops, one for nfs_servers and one for nfs_delegations. When we find an interesting delegation we try to get an active reference to the server. If that fails, it is pointless to continue to look at the other delegation for the server as we will never be able to get an active reference. So instead of continuing in the inner loop, break out and continue in the outer loop. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-31 14:59:19 -04:00
Trond Myklebust	9f6d44d418	NFS: Optimise away lookups for rename targets We can optimise away any lookup for a rename target, unless we're being asked to revalidate a dentry that might be in use. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-28 13:29:19 -04:00
Trond Myklebust	73dd684a4d	NFS: If the VFS sets LOOKUP_REVAL then force a lookup of the dentry If nfs_lookup_revalidate() is called with LOOKUP_REVAL because a previous path lookup failed, then we ought to force a full lookup of the component name. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-28 13:29:19 -04:00
Trond Myklebust	479219218f	NFS: Optimise away the close-to-open GETATTR when we have NFSv4 OPEN NFSv4 should not need to perform an extra close-to-open GETATTR as part of the process of looking up a regular file, since the OPEN call will do that for us. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2018-05-28 13:29:19 -04:00
Deepa Dinamani	0a2dfbecb3	fs: nfs: get rid of memcpys for inode times Subsequent patches in the series convert inode timestamps to use struct timespec64 instead of struct timespec as part of solving the y2038 problem. This will lead to type mismatch for memcpys. Use regular assignments instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: trond.myklebust@primarydata.com	2018-05-25 15:31:13 -07:00
Christoph Hellwig	c350637227	proc: introduce proc_create_net{,_data} Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-16 07:24:30 +02:00
Linus Torvalds	a1bf4c7da6	NFS client updates for Linux 4.17 Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlrNG1IACgkQ18tUv7Cl QOvotw//fQoUgQ/AOJGlZo/4ws2mGJN3dfwwKM8xYOnHaxppOYubZRHwvswK8d22 +XR/Q6IVbUxI3mJluv1L0d9CJT06s3c9CO90McIJbk4CWihGP19bNIY4JiPlzrbv 4FDiyOvMBej2UXbHX5EzKj0srxyBoEVf3iUAIa6DaHi3c6EIUo6fP3d2eRNJStqd WMyZs+nqr2W9biyClxntT7l/Sk+o+4I7M3Oo9pjjS+PiePYdaMrL5T1kPeHaJshF GMGXkbvVdqpDRiXX84R9+2/nuSiA15eEnaR94UNvs84oLR3qob3ZhxhudqFdSPrX RS6E7m34gY/EaQm/wbB26PZm+3jHd4Pqm5SKLbyFfoCmG6oMwBvXNRJZas1DFaHM CMOECvfAr6kixVLkAN0MNQ2Ku/FuJ52OLP1dRLmxsblocnhEPujc6RSz6Ju/v3a0 adbpmJMA2IoSGgXMu3g1VGnjHfMj7ZmjtpigXVvlcUqQGCL7t4ngh23cpeTQeJ76 bMwSHUQu18NbmtJjBTE+PIm7mdCrpQD7ZuOPWpK62zxLYUnnv7nm75m84DrDru7d XAmrCmdUJNrVWQs6BAtCXgO4PZ6xNGLosb0xTQXTAQYftc+DRJ9SW/VGc0Mp1L9m 0G0iz++b8cy4Pih5UCDJcCkpjCIvHLcn72zn1kbufWqG3xr2koc= =IlWo -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails" * tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits) NFS: advance nfs_entry cookie only after decoding completes successfully NFSv3/acl: forget acl cache after setattr NFSv4.1: Fix exclusive create NFSv4: Declare the size up to date after it was set. nfs: Use ida_simple API NFSv4: Fix the nfs_inode_set_delegation() arguments NFSv4: Clean up CB_GETATTR encoding NFSv4: Don't ask for attributes when ACCESS is protected by a delegation NFSv4: Add a helper to encode/decode struct timespec NFSv4: Clean up encode_attrs NFSv4; Clean up XDR encoding of type bitmap4 NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group SUNRPC: Add a helper for encoding opaque data inline SUNRPC: Add helpers for decoding opaque and string types NFSv4: Ignore change attribute invalidations if we hold a delegation NFS: More fine grained attribute tracking NFS: Don't force unnecessary cache invalidation in nfs_update_inode() NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() NFS: Don't force a revalidation of all attributes if change is missing NFS: Convert NFS_INO_INVALID flags to unsigned long ...	2018-04-12 12:55:50 -07:00
Frank Sorenson	98de9ce6f6	NFS: advance nfs_entry cookie only after decoding completes successfully In nfs[34]_decode_dirent, the cookie is advanced as soon as it is read, but decoding may still fail later in the function, returning an error. Because the cookie has been advanced, the failing entry is not re-requested from the server, resulting in a missing directory entry. In addition, nfs v3 and v4 read the cookie at different locations in the xdr_stream, so the behavior of the two can be inconsistent. Fix these by reading the cookie into a temporary variable, and only advancing the cookie once the entire entry has been decoded from the xdr_stream successfully. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
chendt	dbc898ae10	NFSv3/acl: forget acl cache after setattr Sync of ACL with std permissions fail，We need to forget the ACL cache after setattr. Reproduction: #!/bin/bash touch testfile cat <<EOF >testfile #!/bin/bash echo "Test was executed" EOF chmod u=rwx testfile chmod g=rw- testfile chmod o=r-- testfile chacl u::r--,g::rwx,o:rw- testfile chmod u+w testfile ls -l testfile chacl -l testfile Output： -rw-rwxrw- 1 root root 0 Mar 28 05:29 testfile testfile [u::r--,g::rwx,o::rw-] Signed-off-by: chendt.fnst <chendt.fnst@cn.fujitsu.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Kinglong Mee <Kinglong Mee> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	609339c123	NFSv4.1: Fix exclusive create When we use EXCLUSIVE4_1 mode, the server returns an attribute mask where all the bits indicate which attributes were set, and where the verifier was stored. In order to figure out which attribute we have to resend, we need to clear out the attributes that are set in exclcreat_bitmask. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> [Anna: Fixed typo NFS4_CREATE_EXCLUSIVE4 -> NFS4_CREATE_EXCLUSIVE] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	f6cdfa6dd6	NFSv4: Declare the size up to date after it was set. When we've changed the file size, then ensure we declare it to be up to date in the inode attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Matthew Wilcox	aae5730e2d	nfs: Use ida_simple API Allocate the owner_id when we allocate the state and free it when we free the state. That lets us get rid of a gnarly ida_pre_get() / ida_get_new() loop. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	35156bfff3	NFSv4: Fix the nfs_inode_set_delegation() arguments Neither nfs_inode_set_delegation() nor nfs_inode_reclaim_delegation() are generic code. They have no business delving into NFSv4 OPEN xdr structures, so let's replace the "struct nfs_openres" parameter. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	8b06494624	NFSv4: Clean up CB_GETATTR encoding Replace the open coded bitmap implementation with a generic one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	8bcbe7d98c	NFSv4: Don't ask for attributes when ACCESS is protected by a delegation If we hold a delegation, then the results of the ACCESS call are protected anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	36b3743fef	NFSv4: Add a helper to encode/decode struct timespec Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	40a3426c75	NFSv4: Clean up encode_attrs Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	37c88763de	NFSv4; Clean up XDR encoding of type bitmap4 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	e8d8aa46be	NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	d943f2dd8d	NFSv4: Ignore change attribute invalidations if we hold a delegation Don't bother even recording an invalid change attribute if we hold a delegation since we already know the state of our attribute cache. We can rely on the fact that we will pick up a copy from the server when we return the delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	16e1437517	NFS: More fine grained attribute tracking Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by a call to nfs_post_op_update_inode_locked(), then it will not be cleared until all the attributes have been revalidated. This means, for instance, that NFSv4 writes will always force a full attribute revalidation. Track the ctime, mtime, size and change attribute separately from the other attributes so that we can have nfs_post_op_update_inode_locked() set them correctly, and later have the cache consistency bitmask be able to clear them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	cac88f942d	NFS: Don't force unnecessary cache invalidation in nfs_update_inode() If we managed to revalidate all the attributes, then there is no reason to mark them as invalid again. We do, however want to ensure that we set nfsi->attrtimeo correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	783b194c6e	NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() If we received weak cache consistency data from the server, then those attributes are up to date, and there is no reason to mark them as dirty in the attribute cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	8619ddd07b	NFS: Don't force a revalidation of all attributes if change is missing Even if the change attribute is missing, it is still OK to mark the other attributes as being up to date. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	c01d36457d	NFSv4: Don't return the delegation when not needed by NFSv4.x (x>0) Starting with NFSv4.1, the server is able to deduce the client id from the SEQUENCE op which means it can always figure out whether or not the client is holding a delegation on a file that is being changed. For that reason, RFC5661 does not require a delegation to be unconditionally recalled on operations such as SETATTR, RENAME, or REMOVE. Note that for now, we continue to return READ delegations since that is still expected by the Linux knfsd server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	c135cb39a9	NFS: Remove the unused return_delegation() callback Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	199366f017	NFS: Move the delegation return down into _nfs4_do_setattr() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	977fcc2b0b	NFS: Add a delegation return into nfs4_proc_unlink_setup() Ensure that when we do finally delete the file, then we return the delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	f2c2c552f1	NFS: Move delegation recall into the NFSv4 callback for rename_setup() Move the delegation recall out of the generic code, and into the NFSv4 specific callback. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	912678dbc5	NFS: Move the delegation return down into nfs4_proc_remove() Move the delegation return out of generic code and down into the NFSv4 specific unlink code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	9f76827287	NFS: Move the delegation return down into nfs4_proc_link() Move the delegation return out of generic code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Trond Myklebust	f50862423f	NFSv4: Fix nfs4_return_incompatible_delegation The 'fmode' argument can take an FMODE_EXEC value, which we want to filter out before comparing to the delegation type. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Jeff Layton	571745935b	nfs4: wake any lock waiters on successful RECLAIM_COMPLETE If we have a RECLAIM_COMPLETE with a populated cl_lock_waitq, then that implies that a reconnect has occurred. Since we can't expect a CB_NOTIFY_LOCK callback at that point, just wake up the entire queue so that all the tasks can re-poll for their locks. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Jeff Layton	5656610325	nfs4: don't compare clientid in nfs4_wake_lock_waiter The task is expected to sleep for a while here, and it's possible that a new EXCHANGE_ID has occurred in the interim, and we were assigned a new clientid. Since this is a per-client list, there isn't a lot of value in vetting the clientid on the incoming request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Jeff Layton	41a7462018	nfs4: always reset notified flag to false before repolling for lock We may get a notification and lose the race to another client. Ensure that we wait again for a notification in that case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2018-04-10 16:06:22 -04:00
Linus Torvalds	62f8e6c5dc	fscache development -----BEGIN PGP SIGNATURE----- iQIVAwUAWsdxrvu3V2unywtrAQJVmQ/9Fv8d/Ecdwv5nxVBmN7uA8lOYcHEbZWmd FhFQE8qYLjKMo9Fy4tPkBbu1l6CVnetaTRE5qwixACJAftrdjABKJAazGR3Uxief 0jMSWScrV1XCeRErPcczHcx52Hefl8f1DQdA3zpoF0ewz7CjyxMxkl67bsYJbNKE T4ebCu5IJk+5PPwwMM3REKjQbunSXXnzgCLUI2cc0Yf76CTVpx6p+NpxV+2wq0p7 vym83F68qACAEzNH+oozN7IwqjkWyYOnTtCLiMsh4iq30jP6ohtLom6RcRp7QUxM Z9hxgG3NptypuVBO1jKxaQ6XZGgAasYmppOmJ/SoALv2PKsAbxi372lTR4ikceKq H4oNTbs5tVmyvu3qFwtLN+vX+GdfaoSUnUG8vTvnCB3tHHtYj7q5QeFE0HaX4QSq oLANkCOZU8TJsT30pxsCNYiqc5HK9kaLjUQId9K+xq7mM/IuhtNtBQ+ZpqAh5IxB 4bXKYLdeJ1myZrkYTa6gcTqeFax3djCBJ3UvjTnuqRZAaQg079WkG84Kdq1ZjDRp IQpKQnPX9JGhjW1zqLK1Ay8h+HFPgWR5BBVOaLwImr1mH+ccG0iNIeDjrOc8h6J5 e60XM/x2dIYxpXyFYAkldbAI24aRg1FNzfniG4rSAPecf3SwWrxg/qK7uujLbJHM fKNA80yifHo= =ukqs -----END PGP SIGNATURE----- Merge tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: "Three patches that fix some of AFS's usage of fscache: (1) Need to invalidate the cache if a foreign data change is detected on the server. (2) Move the vnode ID uniquifier (equivalent to i_generation) from the auxiliary data to the index key to prevent a race between file delete and a subsequent file create seeing the same index key. (3) Need to retire cookies that correspond to files that we think got deleted on the server. Four patches to fix some things in fscache and cachefiles: (4) Fix a couple of checker warnings. (5) Correctly indicate to the end-of-operation callback whether an operation completed or was cancelled. (6) Add a check for multiple cookie relinquishment. (7) Fix a path through the asynchronous write that doesn't wake up a waiter for a page if the cache decides not to write that page, but discards it instead. A couple of patches to add tracepoints to fscache and cachefiles: (8) Add tracepoints for cookie operators, object state machine execution, cachefiles object management and cachefiles VFS operations. (9) Add tracepoints for fscache operation management and page wrangling. And then three development patches: (10) Attach the index key and auxiliary data to the cookie, pass this information through various fscache-netfs API functions and get rid of the callbacks to the netfs to get it. This means that the cache can get at this information, even if the netfs goes away. It also means that the cache can be lazy in updating the coherency data. (11) Pass the object data size through various fscache-netfs API rather than calling back to the netfs for it, and store the value in the object. This makes it easier to correctly resize the object, as the size is updated on writes to the cache, rather than calling back out to the netfs. (12) Maintain a catalogue of allocated cookies. This makes it possible to catch cookie collision up front rather than down in the bowels of the cache being run from a service thread from the object state machine. This will also make it possible in the future to reconnect to a cookie that's not gone dead yet because it's waiting for finalisation of the storage and also make it possible to bring cookies online if the cache is added after the cookie has been obtained" * tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: Maintain a catalogue of allocated cookies fscache: Pass object size in rather than calling back for it fscache: Attach the index key and aux data to the cookie fscache: Add more tracepoints fscache: Add tracepoints fscache: Fix hanging wait on page discarded by writeback fscache: Detect multiple relinquishment of a cookie fscache: Pass the correct cancelled indications to fscache_op_complete() fscache, cachefiles: Fix checker warnings afs: Be more aggressive in retiring cached vnodes afs: Use the vnode ID uniquifier in the cache key not the aux data afs: Invalidate cache on server data change	2018-04-07 09:08:24 -07:00
Linus Torvalds	9022ca6b11	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, including Christoph's I_DIRTY patches" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: move I_DIRTY_INODE to fs.h ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) call ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) call gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) calls fs: fold open_check_o_direct into do_dentry_open vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents vfs: make sure struct filename->iname is word-aligned get rid of pointless includes of fs_struct.h [poll] annotate SAA6588_CMD_POLL users	2018-04-06 11:07:08 -07:00
David Howells	ee1235a9a0	fscache: Pass object size in rather than calling back for it Pass the object size in to fscache_acquire_cookie() and fscache_write_page() rather than the netfs providing a callback by which it can be received. This makes it easier to update the size of the object when a new page is written that extends the object. The current object size is also passed by fscache to the check_aux function, obviating the need to store it in the aux data. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>	2018-04-06 14:05:14 +01:00
David Howells	402cb8dda9	fscache: Attach the index key and aux data to the cookie Attach copies of the index key and auxiliary data to the fscache cookie so that: (1) The callbacks to the netfs for this stuff can be eliminated. This can simplify things in the cache as the information is still available, even after the cache has relinquished the cookie. (2) Simplifies the locking requirements of accessing the information as we don't have to worry about the netfs object going away on us. (3) The cache can do lazy updating of the coherency information on disk. As long as the cache is flushed before reboot/poweroff, there's no need to update the coherency info on disk every time it changes. (4) Cookies can be hashed or put in a tree as the index key is easily available. This allows: (a) Checks for duplicate cookies can be made at the top fscache layer rather than down in the bowels of the cache backend. (b) Caching can be added to a netfs object that has a cookie if the cache is brought online after the netfs object is allocated. A certain amount of space is made in the cookie for inline copies of the data, but if it won't fit there, extra memory will be allocated for it. The downside of this is that live cache operation requires more memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>	2018-04-04 13:41:28 +01:00
Linus Torvalds	ce6eba3dba	Merge branch 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull wait_var_event updates from Ingo Molnar: "This introduces the new wait_var_event() API, which is a more flexible waiting primitive than wait_on_atomic_t(). All wait_on_atomic_t() users are migrated over to the new API and wait_on_atomic_t() is removed. The migration fixes one bug and should result in no functional changes for the other usecases" * 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/wait: Improve __var_waitqueue() code generation sched/wait: Remove the wait_on_atomic_t() API sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API sched/wait: Introduce wait_var_event()	2018-04-02 16:50:39 -07:00
Peter Zijlstra	723c921e7d	sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 08:23:21 +01:00
Linus Torvalds	df09348f78	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - backport-friendly part of lock_parent() race fix - a fix for an assumption in the heurisic used by path_connected() that is not true on NFS - livelock fixes for d_alloc_parallel() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Teach path_connected to handle nfs filesystems with multiple roots. fs: dcache: Use READ_ONCE when accessing i_dir_seq fs: dcache: Avoid livelock between d_alloc_parallel and __d_add lock_parent() needs to recheck if dentry got __dentry_kill'ed under it	2018-03-15 18:57:14 -07:00
Eric W. Biederman	95dd77580c	fs: Teach path_connected to handle nfs filesystems with multiple roots. On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: `397d425dc2` ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-03-15 18:48:38 -04:00
Trond Myklebust	c4f24df942	NFS: Fix unstable write completion We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to ensure that all outstanding COMMIT requests to the inode in question are complete. Currently we may exit early from both nfs_commit_inode() and nfs_write_inode() even if there are COMMIT requests in flight, or unstable writes on the commit list. In order to get the right semantics w.r.t. sync_inode(), we don't need to have nfs_commit_inode() reset the inode dirty flags when called from nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that nfs_write_inode() leaves them in the right state if there are outstanding commits, or stable pages. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: `dc4fd9ab01` ("nfs: don't wait on commit in nfs_commit_inode()...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2018-03-08 12:56:32 -05:00

1 2 3 4 5 ...

5263 Commits