linux_dsm_epyc7002/fs/nfs
Peter Staubach 38c73044f5 NFS: read-modify-write page updating
Hi.

I have a proposal for possibly resolving this issue.

I believe that this situation occurs due to the way that the
Linux NFS client handles writes which modify partial pages.

The Linux NFS client handles partial page modifications by
allocating a page from the page cache, copying the data from
the user level into the page, and then keeping track of the
offset and length of the modified portions of the page.  The
page is not marked as up to date because there are portions
of the page which do not contain valid file contents.

When a read call comes in for a portion of the page, the
contents of the page must be read in the from the server.
However, since the page may already contain some modified
data, that modified data must be written to the server
before the file contents can be read back in the from server.
And, since the writing and reading can not be done atomically,
the data must be written and committed to stable storage on
the server for safety purposes.  This means either a
FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT.
This has been discussed at length previously.

This algorithm could be described as modify-write-read.  It
is most efficient when the application only updates pages
and does not read them.

My proposed solution is to add a heuristic to decide whether
to do this modify-write-read algorithm or switch to a read-
modify-write algorithm when initially allocating the page
in the write system call path.  The heuristic uses the modes
that the file was opened with, the offset in the page to
read from, and the size of the region to read.

If the file was opened for reading in addition to writing
and the page would not be filled completely with data from
the user level, then read in the old contents of the page
and mark it as Uptodate before copying in the new data.  If
the page would be completely filled with data from the user
level, then there would be no reason to read in the old
contents because they would just be copied over.

This would optimize for applications which randomly access
and update portions of files.  The linkage editor for the
C compiler is an example of such a thing.

I tested the attached patch by using rpmbuild to build the
current Fedora rawhide kernel.  The kernel without the
patch generated about 269,500 WRITE requests.  The modified
kernel containing the patch generated about 261,000 WRITE
requests.  Thus, about 8,500 fewer WRITE requests were
generated.  I suspect that many of these additional
WRITE requests were probably FILE_SYNC requests to WRITE
a single page, but I didn't test this theory.

The difference between this patch and the previous one was
to remove the unneeded PageDirty() test.  I then retested to
ensure that the resulting system continued to behave as
desired.

	Thanx...

		ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10 08:54:16 -04:00
..
callback_proc.c nfs41: Backchannel: CB_SEQUENCE validation 2009-06-17 14:11:43 -07:00
callback_xdr.c nfs41: Backchannel: update cb_sequence args and results 2009-06-17 14:11:40 -07:00
callback.c NFSv4: Clean up the nfs.callback_tcpport option 2009-08-09 15:06:19 -04:00
callback.h nfs41: Backchannel: update cb_sequence args and results 2009-06-17 14:11:40 -07:00
client.c NFSv4: Add 'server capability' flags for NFSv4 recommended attributes 2009-08-09 15:06:19 -04:00
delegation.c headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
delegation.h NFSv4: Convert delegation->type field to fmode_t 2008-12-23 15:21:53 -05:00
dir.c NFSv4: Fix a problem whereby a buggy server can oops the kernel 2009-07-21 19:22:38 -04:00
direct.c nfs41 commit sequence setup done support 2009-06-17 10:46:50 -07:00
file.c NFS: read-modify-write page updating 2009-08-10 08:54:16 -04:00
fscache-index.c NFS: Add read context retention for FS-Cache to call back with 2009-04-03 16:42:44 +01:00
fscache.c NFS: Store pages from an NFS inode into a local cache 2009-04-03 16:42:45 +01:00
fscache.h NFS: Display local caching state 2009-04-03 16:42:47 +01:00
getroot.c headers: mnt_namespace.h redux 2009-07-08 09:31:56 -07:00
idmap.c nfs: fix sparse warnings 2008-02-20 16:15:44 -05:00
inode.c NFSv4: Add 'server capability' flags for NFSv4 recommended attributes 2009-08-09 15:06:19 -04:00
internal.h NFS: Add a ->migratepage() aop for NFS 2009-08-10 08:54:13 -04:00
iostat.h remove put_cpu_no_resched() 2009-06-16 19:47:48 -07:00
Kconfig Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd 2009-06-22 12:55:50 -07:00
Makefile NFS: Define and create server-level objects 2009-04-03 16:42:42 +01:00
mount_clnt.c nfs: Keep index within mnt_errtbl[] 2009-08-09 15:06:19 -04:00
namespace.c NFS: Fix nfs_path() to always return a '/' at the beginning of the path 2009-06-22 21:28:25 -07:00
nfs2xdr.c NFS: Fix the type of struct nfs_fattr->mode 2009-03-11 14:10:26 -04:00
nfs3acl.c nfs: remove unnecessary NFS_INO_INVALID_ACL checks 2009-06-17 18:02:14 -07:00
nfs3proc.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-04-02 21:09:10 -07:00
nfs3xdr.c NFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs 2009-04-21 07:46:49 -07:00
nfs4_fs.h NFSv4: Fix an NFSv4 mount regression 2009-07-21 16:48:07 -04:00
nfs4namespace.c NFS: Fix misparsing of nfsv4 fs_locations attribute (take 2) 2009-03-10 20:33:17 -04:00
nfs4proc.c NFSv4: Add 'server capability' flags for NFSv4 recommended attributes 2009-08-09 15:06:19 -04:00
nfs4renewd.c nfs41: introduce get_state_renewal_cred 2009-06-17 12:25:11 -07:00
nfs4state.c NFSv4: Fix an Oops in nfs4_free_lock_state 2009-07-21 16:47:46 -04:00
nfs4xdr.c NFSv4: Don't do idmapper upcalls for asynchronous RPC calls 2009-08-09 15:06:19 -04:00
nfsroot.c NFS: Update MNT and MNT3 reply decoding functions 2009-06-17 18:02:13 -07:00
pagelist.c NFS: Throttle page dirtying while we're flushing to disk 2009-03-11 14:10:30 -04:00
proc.c NFS: Optimise NFS close() 2009-03-19 15:35:50 -04:00
read.c headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
super.c NFS: Correct the NFS mount path when following a referral 2009-06-22 21:28:25 -07:00
symlink.c nfs: remove unnecessary NFS_NEED_* defines 2008-04-23 16:13:37 -04:00
sysctl.c [PATCH] nfs: fix congestion control 2007-03-16 19:25:05 -07:00
unlink.c nfs41: use rpc prepare call state for session reset 2009-06-17 12:25:07 -07:00
write.c NFS: Add a ->migratepage() aop for NFS 2009-08-10 08:54:13 -04:00