linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 09:09:36 +07:00

Author	SHA1	Message	Date
Jeff Layton	a1d617d8f1	nfs: allow blocking locks to be awoken by lock callbacks Add a waitqueue head to the client structure. Have clients set a wait on that queue prior to requesting a lock from the server. If the lock is blocked, then we can use that to wait for wakeups. Note that we do need to do this "manually" since we need to set the wait on the waitqueue prior to requesting the lock, but requesting a lock can involve activities that can block. However, only do that for NFSv4.1 locks, either by compiling out all of the waitqueue handling when CONFIG_NFS_V4_1 is disabled, or skipping all of it at runtime if we're dealing with v4.0, or v4.1 servers that don't send lock callbacks. Note too that even when we expect to get a lock callback, RFC5661 section 20.11.4 is pretty clear that we still need to poll for them, so we do still sleep on a timeout. We do however always poll at the longest interval in that case. Signed-off-by: Jeff Layton <jlayton@redhat.com> [Anna: nfs4_retry_setlk() "status" should default to -ERESTARTSYS] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 15:54:27 -04:00
Jeff Layton	d2f3a7f918	nfs: move nfs4 lock retry attempt loop to a separate function This also consolidates the waiting logic into a single function, instead of having it spread across two like it is now. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	1ea67dbd98	nfs: move nfs4_set_lock_state call into caller We need to have this info set up before adding the waiter to the waitqueue, so move this out of the _nfs4_proc_setlk and into the caller. That's more efficient anyway since we don't need to do this more than once if we end up waiting on the lock. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	db783688d4	nfs: add handling for CB_NOTIFY_LOCK in client For now, the callback doesn't do anything. Support for that will be added in later patches. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	a8ce377a5d	nfs: track whether server sets MAY_NOTIFY_LOCK flag We want to handle the two cases differently, such that we poll more aggressively when we don't expect a callback. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	66f570ab73	nfs: use safe, interruptible sleeps when waiting to retry LOCK We actually want to use TASK_INTERRUPTIBLE sleeps when we're in the process of polling for a NFSv4 lock. If there is a signal pending when the task wakes up, then we'll be returning an error anyway. So, we might as well wake up immediately for non-fatal signals as well. That allows us to return to userland more quickly in that case, but won't change the error that userland sees. Also, there is no need to use the *_unsafe sleep variants here, as no vfs-layer locks should be held at this point. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	75575ddf29	nfs: eliminate pointless and confusing do_vfs_lock wrappers Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Jeff Layton	b60475c940	nfs: the length argument to read_buf should be unsigned Since it gets passed through to xdr_inline_decode, we might as well have read_buf expect what it expects -- a size_t. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-22 13:56:04 -04:00
Chao Yu	f844cd0d76	nfs: cover ->migratepage with CONFIG_MIGRATION It will be more clean to use CONFIG_MIGRATION to cover nfs' private .migratepage in nfs_file_aops like we do in other part of nfs operations. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-20 09:29:39 -04:00
Jeff Layton	ca440c383a	pnfs: add a new mechanism to select a layout driver according to an ordered list Currently, the layout driver selection code always chooses the first one from the list. That's not really ideal however, as the server can send the list of layout types in any order that it likes. It's up to the client to select the best one for its needs. This patch adds an ordered list of preferred driver types and has the selection code sort the list of available layout drivers according to it. Any unrecognized layout type is sorted to the end of the list. For now, the order of preference is hardcoded, but it should be possible to make this configurable in the future. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:11:13 -04:00
Andy Adamson	04fa2c6bb5	NFS pnfs data server multipath session trunking Try all multipath addresses for a data server. The first address that successfully connects and creates a session is the DS mount address. All subsequent addresses are tested for session trunking and added as aliases. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:37 -04:00
Andy Adamson	ad0849a7ef	NFS test session trunking with exchange id Use an async exchange id call to test for session trunking To conform with RFC 5661 section 18.35.4, the Non-Update on Existing Clientid case, save the exchange id verifier in cl_confirm and use it for the session trunking exhange id test. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Andy Adamson	04ea1b3e6d	NFS add xprt switch addrs test to match client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Andy Adamson	ba84db96aa	NFS detect session trunking Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Andy Adamson	e7b7cbf662	NFS refactor nfs4_check_serverowner_major_id For session trunking, to compare nfs41_exchange_id_res with existing nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Andy Adamson	8e548edb40	NFS refactor nfs4_match_clientids For session trunking, to compare nfs41_exchange_id_res with exiting nfs_client. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Andy Adamson	8d89bd70bc	NFS setup async exchange_id Testing an rpc_xprt for session trunking should not delay application progress over already established transports. Setup exchange_id to be able to be an async call to test an rpc_xprt for session trunking use. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Trond Myklebust	5405fc44c3	NFSv4.x: Add kernel parameter to control the callback server Add support for the kernel parameter nfs.callback_nr_threads to set the number of threads that will be assigned to the callback channel. Add support for the kernel parameter nfs.nfs.max_session_cb_slots to set the maximum size of the callback channel slot table. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Trond Myklebust	bb6aeba736	NFSv4.x: Switch to using svc_set_num_threads() to manage the callback threads This will allow us to bump the number of callback threads at will. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Trond Myklebust	3b01c11ee8	NFSv4.x: Fix up the global tracking of the callback server Ensure that the nfs_callback_info[] array correctly tracks the struct svc_serv. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Trond Myklebust	d002526886	SUNRPC: Initialise struct svc_serv backchannel fields during __svc_create() Clean up. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Trond Myklebust	f4b52bb084	NFSv4.x: Set up struct svc_serv_ops for the callback channel In order to manage the threads using svc_set_num_threads, we need to fill in a few extra fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:36 -04:00
Jeff Layton	3132e49ece	pnfs: track multiple layout types in fsinfo structure Current NFSv4.1/pNFS client assumes that MDS supports only one layout type. While it's true for most existing servers, nevertheless, this can be change in the near future. For now, this patch just plumbs in the ability to track a list of layouts in the fsinfo structure. The existing behavior of the client is preserved, by having it just select the first entry in the list. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-09-19 13:08:35 -04:00
Trond Myklebust	b519d408ea	NFSv4.1: Fix the CREATE_SESSION slot number accounting Ensure that we conform to the algorithm described in RFC5661, section 18.36.4 for when to bump the sequence id. In essence we do it for all cases except when the RPC call timed out, or in case of the server returning NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org	2016-09-11 14:56:44 -04:00
Trond Myklebust	334a8f3711	pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs If there are outstanding LAYOUTGET rpc calls, then we want to ensure that we keep the layout stateid around so we that don't inadvertently pick up an old/misordered sequence id. The race is as follows: Client Server ====== ====== LAYOUTGET(seqid) LAYOUTGET(seqid) return LAYOUTGET(seqid+1) return LAYOUTGET(seqid+2) process LAYOUTGET(seqid+2) forget layout process LAYOUTGET(seqid+1) If it forgets the layout stateid before processing seqid+1, then the client will not check the layout->plh_barrier, and so will set the stateid with seqid+1. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-04 12:59:00 -04:00
Trond Myklebust	52ec7be2e2	pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present If the server fails to set lrp->res.lrs_present in the LAYOUTRETURN reply, then that means it believes the client holds no more layout state for that file, and that the layout stateid is now invalid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:38 -04:00
Trond Myklebust	2a59a04116	pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID If the layout was marked as invalid, we want to ensure to initialise the layout header fields correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:37 -04:00
Trond Myklebust	bf0291dd22	pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised According to RFC5661, the client is responsible for serialising LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case where we send both in parallel. Client Server ====== ====== LAYOUTGET(seqid=X) LAYOUTRETURN(seqid=X) LAYOUTGET return seqid=X+1 LAYOUTRETURN return seqid=X+2 Process LAYOUTRETURN Forget layout stateid Process LAYOUTGET Set seqid=X+1 The client processes the layoutget/layoutreturn in the wrong order, and since the result of the layoutreturn was to clear the only existing layout segment, the client forgets the layout stateid. When the LAYOUTGET comes in, it is treated as having a completely new stateid, and so the client sets the wrong sequence id... Fix is to check if there are outstanding LAYOUTGET requests before we send the LAYOUTRETURN (note that LAYOUGET will already wait if it sees an outstanding LAYOUTRETURN). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:37 -04:00
Trond Myklebust	c49edecd51	NFS: Fix error reporting in nfs_file_write() When doing O_DSYNC writes, the actual write errors are reported through generic_write_sync(), so we must test the result. Reported-by: J. R. Okajima <hooanon05g@gmail.com> Fixes: `18290650b1` ("NFS: Move buffered I/O locking into nfs_file_write()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-09-03 12:10:36 -04:00
Trond Myklebust	98b0f80c23	NFSv4.x: Fix a refcount leak in nfs_callback_up_net On error, the callers expect us to return without bumping nn->cb_users[]. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.7+	2016-08-30 09:26:57 -04:00
Benjamin Coddington	52442f9b11	NFS4: Avoid migration loops If a server returns itself as a location while migrating, the client may end up getting stuck attempting to migrate twice to the same server. Catch this by checking if the nfs_client found is the same as the existing client. For the other two callers to nfs4_set_client, the nfs_client will always be ERR_PTR(-EINVAL). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-30 09:26:32 -04:00
Trond Myklebust	3dc147359e	pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails If the attempt to connect to a DS fails inside ff_layout_pg_init_read or ff_layout_pg_init_write, then we currently end up clearing the layout segment carried by the struct nfs_pageio_descriptor, causing an Oops when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist. The fix is to ensure we return the layout and then retry. Fixes: `446ca21953` ("pNFS/flexfiles: When initing reads or writes, we...") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-29 15:21:16 -04:00
Trond Myklebust	d138027a82	NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-28 14:23:27 -04:00
Trond Myklebust	2e80dbe7ac	NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN Defer freeing the slot until after we have processed the results from OPEN and LAYOUTGET. This means that the server can rely on the mechanism in RFC5661 Section 2.10.6.3 to ensure that replies to an OPEN or LAYOUTGET/RETURN RPC call don't race with the callbacks that apply to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-28 14:23:27 -04:00
Trond Myklebust	07e8dcbda7	NFSv4.1: Defer bumping the slot sequence number until we free the slot For operations like OPEN or LAYOUTGET, which return recallable state (i.e. delegations and layouts) we want to enable the mechanism for resolving recall races in RFC5661 Section 2.10.6.3. To do so, we will want to defer bumping the slot's sequence number until we have finished processing the RPC results. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-28 14:23:26 -04:00
Trond Myklebust	045d2a6d07	NFSv4.1: Delay callback processing when there are referring triples If CB_SEQUENCE tells us that the processing of this request depends on the completion of one or more referring triples (see RFC 5661 Section 2.10.6.3), delay the callback processing until after the RPC requests being referred to have completed. If we end up delaying for more than 1/2 second, then fall back to returning NFS4ERR_DELAY in reply to the callback. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-28 14:23:26 -04:00
Trond Myklebust	e09c978aae	NFSv4.1: Fix Oopsable condition in server callback races The slot table hasn't been an array since v3.7. Ensure that we use nfs4_lookup_slot() to access the slot correctly. Fixes: `87dda67e73` ("NFSv4.1: Allow SEQUENCE to resize the slot table...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.8+	2016-08-28 14:23:22 -04:00
Benjamin Coddington	41963c10c4	pnfs/blocklayout: update last_write_offset atomically with extents Block/SCSI layout write completion may add committable extents to the extent tree before updating the layout's last-written byte under the inode lock. If a sync happens before this value is updated, then prepare_layoutcommit may find and encode these extents which would produce a LAYOUTCOMMIT request whose encoded extents are larger than the request's loca_length. Fix this by using a last-written byte value that is updated atomically with the extent tree so that commitable extents always match. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-23 11:41:38 -04:00
Trond Myklebust	b88fa69eaa	pNFS: The client must not do I/O to the DS if it's lease has expired Ensure that the client conforms to the normative behaviour described in RFC5661 Section 12.7.2: "If a client believes its lease has expired, it MUST NOT send I/O to the storage device until it has validated its lease." So ensure that we wait for the lease to be validated before using the layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.20+	2016-08-23 11:27:01 -04:00
Trond Myklebust	9a0fe86745	pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls We normally want to update the stateid and then retry, Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-19 16:27:31 -04:00
Trond Myklebust	15d03055cf	pNFS/flexfiles: Set reasonable default retrans values for the data channel Prior to this patch, the retrans value was set at 5, meaning that we could see a maximum retransmission timeout value of more than 6 minutes. That's a tad high for NFSv3 where the protocol does allow the server to drop requests at any time. Since this is a data channel, let's just set retrans to 0, and the default timeout to 60s. The user can continue to adjust these defaults using the dataserver_retrans and dataserver_timeo module parameters. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-16 11:16:19 -04:00
Trond Myklebust	a956beda19	NFS: Allow the mount option retrans=0 We should allow retrans=0 as just meaning that every timeout is a major timeout, and that there is no increment in the timeout value. For instance, this means that we would allow TCP users to specify a flat timeout value of 60s, by specifying "timeo=600,retrans=0" in their mount option string. Siged-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-16 11:00:06 -04:00
Trond Myklebust	1c8d477a77	pNFS/flexfiles: Fix layoutstat periodic reporting Putting the periodicity timer in the mirror instances is causing non-scalable reporting behaviour and missed reporting intervals. When you recall layouts and/or implement client side mirroring, it leads to consecutive reports with only a few ms between RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: `d0379a5d06` ("pNFS/flexfiles: Support server-supplied...")	2016-08-14 23:01:10 -04:00
Linus Torvalds	9909170065	NFS client bugfixes for Linux 4.8 Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXrh03AAoJEGcL54qWCgDyp4EQALwZpmYCxWJE5xSHW95Fs124 HYM8g4LznOfs3/ohInb1ja2FaQqUy0XEk3pSjNKfyYgjuwB4qJSOpnAqoIKxJFGB h4582leYZOZYMMCGslS2I4zcElBYO1WjnKNyb7MpZjCHmN0AdFfIcOXd2K7eL9hM /poImcs5KfMGIEJqmKqMUxmJ3RjxpK3LySQAes/Y5odOiHC4SGJdGUmSeuPGTbQd YjFWVHRFU6kVAzPd2Jl46Sgy6SpDaVz82HodXCSY+8lklmIkbIsVqJs0VWo3WkfL r5WLQ3PzZvloQ7o/E9tZGiB/LEi7roa51hYsG4sleN6Kap5vwyWg0QIKjqyJdFxB JmFanlCMfae3zNz4cusvgu1okvMnNqO4uRXJIAKfk64k775N9ebY7TXAZUK4/UbY 4nxCHcxygamP/k/8HYFpc4964tMaimIs9JUdojad5a3dzffwXcgEC/0HPUih9R+i DO/cbVtWeDkmQPLrUqFfOAbmQdyAjELrv48d5BVIst49uuCULU2LlDlVLiAvaZvq s2YNmr7lkHowvgaH4ShL89wuyyD14Xu5/f49oFBFNKEQay9YthQ8s3XmdZBG7Zl0 oyA1XJjWEq3p8nvPGIqFD26w75ppUbAWLTHsyoU0YfEYrZJrF9jPxowI7WlHgfVo Io79x1sbgTrckjG+osAf =UHph -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures	2016-08-12 12:32:24 -07:00
Linus Torvalds	835c92d43b	Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull qstr constification updates from Al Viro: "Fairly self-contained bunch - surprising lot of places passes struct qstr * as an argument when const struct qstr * would suffice; it complicates analysis for no good reason. I'd prefer to feed that separately from the assorted fixes (those are in #for-linus and with somewhat trickier topology)" * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qstr: constify instances in adfs qstr: constify instances in lustre qstr: constify instances in f2fs qstr: constify instances in ext2 qstr: constify instances in vfat qstr: constify instances in procfs qstr: constify instances in fuse qstr constify instances in fs/dcache.c qstr: constify instances in nfs qstr: constify instances in ocfs2 qstr: constify instances in autofs4 qstr: constify instances in hfs qstr: constify instances in hfsplus qstr: constify instances in logfs qstr: constify dentry_init_security	2016-08-06 09:49:02 -04:00
Trond Myklebust	8d480326c3	NFSv4: Cap the transport reconnection timer at 1/2 lease period We don't want to miss a lease period renewal due to the TCP connection failing to reconnect in a timely fashion. To ensure this doesn't happen, cap the reconnection timer so that we retry the connection attempt at least every 1/2 lease period. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 19:22:22 -04:00
Trond Myklebust	fb10fb67ad	NFSv4: Cleanup the setting of the nfs4 lease period Make a helper function nfs4_set_lease_period() and have nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 19:13:08 -04:00
Trond Myklebust	206b3bb574	NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED We should handle those errors in the same way we handle the other stateid errors: by invalidating the faulty layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-08-05 12:18:10 -04:00
Linus Torvalds	7f155c7026	NFS client updates for Linux 4.8 Highlights include: Stable bugfixes: - nfs: don't create zero-length requests - Several LAYOUTGET bugfixes Features: - Several performance related features - More aggressive caching when we can rely on close-to-open cache consistency - Remove serialisation of O_DIRECT reads and writes - Optimise several code paths to not flush to disk unnecessarily. However allow for the idiosyncracies of pNFS for those layout types that need to issue a LAYOUTCOMMIT before the metadata can be updated on the server. - SUNRPC updates to the client data receive path - pNFS/SCSI support RH/Fedora dm-mpath device nodes - pNFS files/flexfiles can now use unprivileged ports when the generic NFS mount options allow it. Bugfixes: - Don't use RDMA direct data placement together with data integrity or privacy security flavours - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential security holes. - Several layout recall fixes to improve NFSv4.1 protocol compliance. - Fix an Oops in the pNFS files and flexfiles connection setup to the DS - Allow retry of operations that used a returned delegation stateid - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate() -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXnSq8AAoJEGcL54qWCgDyn8cP/RCHLekUCq7Klh+NAnEsvuBi C7w9YpVHaC83/8Q0tR6LyFShSBJBWi/clWwO0IEomkNK/MuO77v4iyPujtEyqowK 0+eWFh/e8CsTf7mNGoi0avrHAZDB3deSuOQeYbwnNWHmd7qKVkB6tHus8LQjk852 eqwYmZ4kVr+eaCO6MttCCxJHf6datPnsbe0stiC9MpxmCzsdpZmFptfauidsFX+p 0U1IHi/ABN6zIFoc4R0iXXbaDb8ErxGw32SWIb8cnnWwdlSD8I0+Jqxs4opp23LY lAm9E0vtDJ49bJBllYl0dUmizdhJ3+NefK4aqPh5H5h3Csub+MLIsuQv/+r2AOhH qLBi5kThpspPhGHZ40VDmfV825+csUPTc8WkDaNLvb4f4UGIPakK/KBrBtxiqn+P 0etvYiWBuoBaqRVQpstawnyDdnBK0IMF/3LAULo+ozo7iTkpaZmOALYgPcBUYw2f d6pxZGeNN0GwWfjDmoUDGC07OpO/CSN5WouArgKsp5+VhjzPxjyaZLCnUhzHzXiM RV1oBytEs/iw2BLXX809noM9mqHYkdgSVmrZ9OvvDMslcLHaslpq6eaJKZSWqV2J fAws6rbcZdTFSnbAWr0OSxct6w6BijEjc3/uk+wWRtw9nkOhFqtlxI3y7k4odpW9 wVcEmRNkxfA0LlYNXWuL =WNyE -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - nfs: don't create zero-length requests - several LAYOUTGET bugfixes Features: - several performance related features - more aggressive caching when we can rely on close-to-open cache consistency - remove serialisation of O_DIRECT reads and writes - optimise several code paths to not flush to disk unnecessarily. However allow for the idiosyncracies of pNFS for those layout types that need to issue a LAYOUTCOMMIT before the metadata can be updated on the server. - SUNRPC updates to the client data receive path - pNFS/SCSI support RH/Fedora dm-mpath device nodes - pNFS files/flexfiles can now use unprivileged ports when the generic NFS mount options allow it. Bugfixes: - Don't use RDMA direct data placement together with data integrity or privacy security flavours - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential security holes. - Several layout recall fixes to improve NFSv4.1 protocol compliance. - Fix an Oops in the pNFS files and flexfiles connection setup to the DS - Allow retry of operations that used a returned delegation stateid - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate()" * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits) pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding NFSv4: Clean up lookup of SECINFO_NO_NAME NFSv4.2: Fix warning "variable ‘stateids’ set but not used" NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’" SUNRPC: Fix a compiler warning in fs/nfs/clnt.c pNFS: Remove redundant smp_mb() from pnfs_init_lseg() pNFS: Cleanup - do layout segment initialisation in one place pNFS: Remove redundant stateid invalidation pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() pNFS: Clear the layout metadata if the server changed the layout stateid pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id pNFS: Do not set plh_return_seq for non-callback related layoutreturns pNFS: Ensure layoutreturn acts as a completion for layout callbacks pNFS: Fix CB_LAYOUTRECALL stateid verification pNFS: Always update the layout barrier seqid on LAYOUTGET pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set pNFS: Clear the layout return tracking on layout reinitialisation pNFS: LAYOUTRETURN should only update the stateid if the layout is valid nfs: don't create zero-length requests ...	2016-07-30 16:33:25 -07:00
Linus Torvalds	c624c86615	This is mostly clean ups and small fixes. Some of the more visible changes are: . The function pid code uses the event pid filtering logic . [ku]probe events have access to current->comm . trace_printk now has sample code . PCI devices now trace physical addresses . stack tracing has less unnessary functions traced -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXl+d2AAoJEKKk/i67LK/83QEH/RDJ0mcfFVsuEeOnZZrZXABm 4Rxk4FE5UAD+TSrVycwwzcbQab1iPK63mMdYvIBvaOiIC6/OJaEVM7jzZxnNGqmr pj0H8bxwOr58pe5pfnP92ow5qTLLzsXraWNl5sRXhSSHON7CXpGVzkErB58GmMYd 8p6d9ziifQjo8X2O6XC9rGAvYLY5kEkVvyfuE1hI7muNTeOjyOT4EqpkNzxdBk+I QkGZGsk3Xhc8II9nu8FPWkaD26TatGJoZtZmVWHOzfsb3HNzG4RXla+WVOQ5u1HV noVyB1CJHhkO5CEBPdYIqwBWPQU4B9HfG4gVcUpDDVRxfzMpnEcKi1uwe+uDjfs= =XFcv -----END PGP SIGNATURE----- Merge tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This is mostly clean ups and small fixes. Some of the more visible changes are: - The function pid code uses the event pid filtering logic - [ku]probe events have access to current->comm - trace_printk now has sample code - PCI devices now trace physical addresses - stack tracing has less unnessary functions traced" * tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk, tracing: Avoiding unneeded blank lines tracing: Use __get_str() when manipulating strings tracing, RAS: Cleanup on __get_str() usage tracing: Use outer () on __get_str() definition ftrace: Reduce size of function graph entries tracing: Have HIST_TRIGGERS select TRACING tracing: Using for_each_set_bit() to simplify trace_pid_write() ftrace: Move toplevel init out of ftrace_init_tracefs() tracing/function_graph: Fix filters for function_graph threshold tracing: Skip more functions when doing stack tracing of events tracing: Expose CPU physical addresses (resource values) for PCI devices tracing: Show the preempt count of when the event was called tracing: Add trace_printk sample code tracing: Choose static tp_printk buffer by explicit nesting count tracing: expose current->comm to [ku]probe events ftrace: Have set_ftrace_pid use the bitmap like events do tracing: Move pid_list write processing into its own function tracing: Move the pid_list seq_file functions to be global tracing: Move filtered_pid helper functions into trace.c tracing: Make the pid filtering helper functions global	2016-07-28 18:20:09 -07:00
Linus Torvalds	1c88e19b0f	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "The rest of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits) mm, compaction: simplify contended compaction handling mm, compaction: introduce direct compaction priority mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations mm, page_alloc: make THP-specific decisions more generic mm, page_alloc: restructure direct compaction handling in slowpath mm, page_alloc: don't retry initial attempt in slowpath mm, page_alloc: set alloc_flags only once in slowpath lib/stackdepot.c: use __GFP_NOWARN for stack allocations mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB mm, kasan: account for object redzone in SLUB's nearest_obj() mm: fix use-after-free if memory allocation failed in vma_adjust() zsmalloc: Delete an unnecessary check before the function call "iput" mm/memblock.c: fix index adjustment error in __next_mem_range_rev() mem-hotplug: alloc new page from a nearest neighbor node when mem-offline mm: optimize copy_page_to/from_iter_iovec mm: add cond_resched() to generic_swapfile_activate() Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode mm: hwpoison: remove incorrect comments make __section_nr() more efficient ...	2016-07-28 16:36:48 -07:00
Mel Gorman	11fb998986	mm: move most file-based accounting to the node There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Linus Torvalds	6784725ab0	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...	2016-07-28 12:59:05 -07:00
Linus Torvalds	554828ee0d	Merge branch 'salted-string-hash' This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash	2016-07-28 12:26:31 -07:00
Benjamin Coddington	944171cbf4	pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes, and in that case NFS_INO_INVALID_ATTR is never set on the second pass through nfs_update_inode(). The existing check to skip the clearing of NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this case (see commit `10b7e9ad44`: "pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is outstanding then attributes will need upating, so always set NFS_INO_INVALID_ATTR. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-28 14:49:08 -04:00
Linus Torvalds	d05d7f4079	Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...	2016-07-26 15:03:07 -07:00
Trond Myklebust	698c937b0d	NFSv4: Clean up lookup of SECINFO_NO_NAME Use the minor version ops cached in struct nfs_client instead of looking them up again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-26 10:59:23 -04:00
Trond Myklebust	6fdf339b0c	NFSv4.2: Fix warning "variable ‘stateids’ set but not used" Replace it with a test for whether or not the sent a stateid in violation of what we asked for. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 17:36:06 -04:00
Trond Myklebust	139978239b	NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’" Make it static Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 17:35:56 -04:00
Trond Myklebust	1592c4d62a	Merge branch 'nfs-rdma'	2016-07-24 17:09:02 -04:00
Trond Myklebust	668f455dac	Merge branch 'pnfs'	2016-07-24 17:08:59 -04:00
Trond Myklebust	362745268c	Merge branch 'writeback'	2016-07-24 17:08:31 -04:00
Trond Myklebust	01d7b29f0e	pNFS: Remove redundant smp_mb() from pnfs_init_lseg() It's not visible yet, and won't be until after we grab the inode->i_lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:43 -04:00
Trond Myklebust	119cef97a4	pNFS: Cleanup - do layout segment initialisation in one place ...instead of splitting the initialisation over init_lseg() and pnfs_layout_process(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:42 -04:00
Trond Myklebust	28c1acffea	pNFS: Remove redundant stateid invalidation The layout stateid will be invalidated once it holds no more layout segments anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:42 -04:00
Trond Myklebust	f71dfe8fc9	pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() That's already being taken care of in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:41 -04:00
Trond Myklebust	d9b61708fe	pNFS: Clear the layout metadata if the server changed the layout stateid If the server changed the layout stateid's "other" field, then we should treat the old layout as being completely gone. In that case, we want to clear the metadata such as scheduled layoutreturns. Do this by calling pnfs_mark_layout_stateid_invalid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:41 -04:00
Trond Myklebust	5f46be049b	pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:40 -04:00
Trond Myklebust	e036f46453	NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id When determining which layout segments to return, we do want pnfs_mark_matching_lsegs_return to check that they match the layout sequence id. This ensures that we don't waste time if the server is replaying a layout recall that has already been satisfied. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:40 -04:00
Trond Myklebust	2d6cf5ab0b	pNFS: Do not set plh_return_seq for non-callback related layoutreturns In cases where we need to send a layoutreturn in order to propagate an error, we should not tie that to a specific layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:39 -04:00
Trond Myklebust	e5fd1904b8	pNFS: Ensure layoutreturn acts as a completion for layout callbacks When we return NFS_OK to the CB_LAYOUTRECALL, we are required to send a layoutreturn that "completes" that layout recall request, using the correct stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:39 -04:00
Trond Myklebust	793b7fe558	pNFS: Fix CB_LAYOUTRECALL stateid verification We want to evaluate in this order: If the client holds no layout for this inode, then return NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout. If the client finds the inode among the list of layouts, but the corresponding stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the server to retry once the outstanding LAYOUTGET is complete. If the current layout stateid's "other" field does not match the recalled stateid, return NFS4ERR_BAD_STATEID. If already processing a layout recall with a newer stateid, return NFS4ERR_OLD_STATEID. This can only happens for servers that are non-compliant with the NFSv4.1 protocol. If already processing a layout recall with an older stateid, return NFS4ERR_DELAY to ask the server to retry once the outstanding LAYOUTRETURN is complete. Again, this is technically incompliant with the NFSv4.1 protocol. If the current layout sequence id is newer than the recalled stateid's sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol non-compliance. If the current layout sequence id is older than the recalled stateid's sequence id+1, return NFS4ERR_DELAY. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:38 -04:00
Trond Myklebust	ecebb80bf3	pNFS: Always update the layout barrier seqid on LAYOUTGET Currently, pnfs_set_layout_stateid() will update the layout sequence id barrier only if the stateid itself is newer than the current layout stateid. However in a situation where multiple LAYOUTGET calls and a LAYOUTRETURN raced, it is entirely possible for one of the LAYOUTGET to set the current stateid to something newer than the LAYOUTRETURN that needs to set the barrier. The fix is to allow the "update_barrier" flag to force a check as to whether or not the barrier needs to be updated. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:38 -04:00
Trond Myklebust	13bede18de	pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set If the layout stateid is invalid, then pnfs_set_layout_stateid() must always initialise it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 16:16:25 -04:00
Trond Myklebust	8e0acf9046	pNFS: Clear the layout return tracking on layout reinitialisation Ensure that we don't carry over layoutreturn info from a previous incarnation of this layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 12:51:49 -04:00
Trond Myklebust	45fcc7bca7	pNFS: LAYOUTRETURN should only update the stateid if the layout is valid If the layout was completely returned, then ignore the returned layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-24 12:51:49 -04:00
Trond Myklebust	dc05973b28	Merge commit 'e7bdea7750eb' Needed in order to work on top of pNFS changes in Linus' upstream kernel.	2016-07-24 12:51:10 -04:00
Benjamin Coddington	149a4fddd0	nfs: don't create zero-length requests NFS doesn't expect requests with wb_bytes set to zero and may make unexpected decisions about how to handle that request at the page IO layer. Skip request creation if we won't have any wb_bytes in the request. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-22 15:15:16 -04:00
Artem Savkov	297fae4d0b	Fix NULL pointer dereference in bl_free_device(). When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on blkdev_get_by_() step we get an pnfs_block_dev struct that is uninitialized except for bdev field which is set to whatever error blkdev_get_by_() returns. bl_free_device() then tries to call blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference. Fixing this by setting bdev in struct pnfs_block_dev only if we didn't get an error from blkdev_get_by_*(). Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-22 15:14:21 -04:00
Trond Myklebust	e033fb51eb	pNFS/files: filelayout_write_done_cb must call nfs_writeback_update_inode() All write callbacks are required to call nfs_writeback_update_inode() upon success to ensure that file size changes are recorded, and the attribute cache is invalidated. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-21 09:46:42 -04:00
Al Viro	beffb8feb6	qstr: constify instances in nfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-07-20 23:30:06 -04:00
Tigran Mkrtchyan	b224f7cb63	nfs4: flexfiles: respect noresvport when establishing connections to DSes Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:25 -04:00
Tigran Mkrtchyan	3fc75f1208	nfs4: clnt: respect noresvport when establishing connections to DSes result: $ mount -o vers=4.1 dcache-lab007:/ /pnfs $ cp /etc/profile /pnfs tcp 0 0 131.169.185.68:1005 131.169.191.141:32049 ESTABLISHED tcp 0 0 131.169.185.68:751 131.169.191.144:2049 ESTABLISHED $ $ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs $ cp /etc/profile /pnfs tcp 0 0 131.169.185.68:34894 131.169.191.141:32049 ESTABLISHED tcp 0 0 131.169.185.68:35722 131.169.191.144:2049 ESTABLISHED $ Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:25 -04:00
Benjamin Coddington	d9c0ce0e45	pnfs/blocklayout: put deviceid node after releasing bl_ext_lock The last put of deviceid nodes for SCSI layouts may sleep, so we shouldn't hold any spinlocks. Make sure we put them outside the bl_ext_lock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:24 -04:00
Scott Mayhew	ce52914eb7	sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags A generic_cred can be used to look up a unx_cred or a gss_cred, so it's not really safe to use the the generic_cred->acred->ac_flags to store the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated with the auth_cred to be in a state where they're perpetually doing 4K NFS_FILE_SYNC writes. This can be reproduced as follows: 1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys. They do not need to be the same export, nor do they even need to be from the same NFS server. Also, v3 is fine. $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5 $ sudo mount -o v3,sec=sys server2:/export /mnt/sys 2. As the normal user, before accessing the kerberized mount, kinit with a short lifetime (but not so short that renewing the ticket would leave you within the 4-minute window again by the time the original ticket expires), e.g. $ kinit -l 10m -r 60m 3. Do some I/O to the kerberized mount and verify that the writes are wsize, UNSTABLE: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 4. Wait until you're within 4 minutes of key expiry, then do some more I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets set. Verify that the writes are 4K, FILE_SYNC: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 5. Now do some I/O to the sec=sys mount. This will cause RPC_CRED_NO_CRKEY_TIMEOUT to be set: $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1 6. Writes for that user will now be permanently 4K, FILE_SYNC for that user, regardless of which mount is being written to, until you reboot the client. Renewing the kerberos ticket (assuming it hasn't already expired) will have no effect. Grabbing a new kerberos ticket at this point will have no effect either. Move the flag to the auth->au_flags field (which is currently unused) and rename it slightly to reflect that it's no longer associated with the auth_cred->ac_flags. Add the rpc_auth to the arg list of rpcauth_cred_key_to_expire and check the au_flags there too. Finally, add the inode to the arg list of nfs_ctx_key_to_expire so we can determine the rpc_auth to pass to rpcauth_cred_key_to_expire. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:24 -04:00
Steve Dickson	e68fd7c807	mount: use sec= that was specified on the command line When older servers return RPC_AUTH_NULL, it means the rpc creds will be ignored. In that case use the sec= that was specified instead of setting sec=null Fixes Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1112983 Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-19 16:23:23 -04:00
Trond Myklebust	f7db0b2838	pNFS: Fix LAYOUTGET handling of NFS4ERR_BAD_STATEID and NFS4ERR_EXPIRED We want to recover the open stateid if there is no layout stateid and/or the stateid argument matches an open stateid. Otherwise throw out the existing layout and recover from scratch, as the layout stateid is bad. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:23:23 -04:00
Trond Myklebust	66b53f3258	pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGET Instead of giving up altogether and falling back to doing I/O through the MDS, which may make the situation worse, wait for 2 lease periods for the callback to resolve itself, and then try destroying the existing layout. Only if this was an attempt at getting a first layout, do we give up altogether, as the server is clearly crazy. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:23:22 -04:00
Trond Myklebust	e85d7ee420	pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICT They are not the same error, and need to be handled differently. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:23:22 -04:00
Trond Myklebust	56b38a1f7c	pNFS: Fix post-layoutget error handling in pnfs_update_layout() The non-retry error path is currently broken and ends up releasing the reference to the layout twice. It also can end up clearing the NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race. In addition, the retry path will fail to decrement the plh_outstanding counter. Fixes: `183d9e7b11` ("pnfs: rework LAYOUTGET retry handling") Cc: stable@vger.kernel.org # 4.7 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2016-07-19 16:22:47 -04:00
Trond Myklebust	10b7e9ad44	pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding We know that the attributes will need updating if there is still a LAYOUTCOMMIT outstanding. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-18 00:51:01 -04:00
Daniel Bristot de Oliveira	752d596b39	tracing: Use __get_str() when manipulating strings Use __get_str(str) rather than __get_dynamic_array(str) when deadling with strings. It is just a code cleanup, no changes on tracepoint ABI. Link: http://lkml.kernel.org/r/ea260df91817411cca2a1f3db2abd88860094788.1467407618.git.bristot@redhat.com Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-nfs@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2016-07-15 15:52:20 -04:00
Kinglong Mee	c77efc1e78	nfs/blocklayout: Check max uuids and devices before decoding Avoid nfs return uuids/devices larger than maximum. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-15 15:51:11 -04:00
Kinglong Mee	ecc2b88c4a	nfs/blocklayout: Make sure calculate signature length aligned Avoid a bad nfs server return an unaligned length of signature. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-15 15:51:11 -04:00
Christoph Hellwig	11487ddbdb	nfs/blocklayout: support RH/Fedora dm-mpath device nodes Instead of reusing the wwn-* names for multipath devices nodes RHEL and Fedora introduce new dm-mpath-uuid-* nodes with a slightly different naming scheme. Try these names first to ensure we always get a multipath-capable device if it exists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-15 15:46:08 -04:00
Christoph Hellwig	d702d41ed4	nfs/blocklayout: refactor open-by-wwn The current code works with the standard udev/systemd names, but we'll have to add another method in the next patch. Refactor it into a separate helper to make room for the new variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-15 15:46:08 -04:00
Christoph Hellwig	0173ca0544	nfs/blocklayout: use proper fmode for opening block devices This was fixed for the original block layout code a while ago, but also needs to be fixed for the SCSI layout path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-15 15:46:08 -04:00
Trond Myklebust	8b7d9d09b2	NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes" We're not holding any locks, so both nfs_wb_all() and inode_dio_wait() are unenforcible and have livelock potential. Just limit ourselves to flushing out the data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-14 12:42:40 -04:00
Chuck Lever	a4e187d83d	NFS: Don't drop CB requests with invalid principals Before commit `778be232a2` ("NFS do not find client in NFSv4 pg_authenticate"), the Linux callback server replied with RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB request. Let's restore that behavior so the server has a chance to do something useful about it, and provide a warning that helps admins correct the problem. Fixes: `778be232a2` ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2016-07-11 15:50:43 -04:00
Trond Myklebust	9a773e7c8d	NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... Prevent filesystem freezes while handling the write page fault. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-07-05 19:11:08 -04:00

1 2 3 4 5 ...

4493 Commits