Commit Graph

965 Commits

Author SHA1 Message Date
Trond Myklebust
b5418383ef NFS: do_setlk(): don't flush caches when we have a delegation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:50 -04:00
Trond Myklebust
7e5f614660 NFS: Revert commit 44dd151d
Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it
is on disk". While it is true that the write may fail, that is always the
case. There is no reason why we should treat data on pages that are not
already marked as PG_uptodate as being special. The only thing we gain is a
noticeable slowdown when re-reading these pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:46 -04:00
Trond Myklebust
efc91ed019 NFS: Optimise append writes with holes
If a file is being extended, and we're creating a hole, we might as well
declare the entire page to be up to date.

This patch significantly improves the write performance for sparse files
in the case where lseek(SEEK_END) is used to append several non-contiguous
writes at intervals of < PAGE_SIZE.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:45 -04:00
Trond Myklebust
2116271a34 NFS: Add correct bounds checking to NFSv2 locks
NFSv2 file locking currently fails the Connectathon tests, because the
calls to the VFS locking code do not return an EINVAL error if the
struct file_lock overflows the 32-bit boundaries.

The problem is due to the fact that we occasionally call helpers from
fs/locks.c in order to avoid RPC calls to the server when we know that a
local process holds the lock. These helpers are, of course, always
64-bit enabled, so EINVAL is not returned in cases when it would if
the call had gone to the NLM code.

For consistency, we therefore add support for a bounds-checking helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:40 -04:00
Trond Myklebust
f3d47a3a6a NFS: Fix a preemption count leak in nfs_update_request
The commit 2785259631 (nfs: use GFP_NOFS
preloads for radix-tree insertion) appears to have introduced a bug:
We only want to call radix_tree_preload() once after creating a request.
Calling it every time we loop after we created the request, will cause
preemption count leaks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nick Piggin <npiggin@suse.de>
2008-07-09 12:08:39 -04:00
Trond Myklebust
0b4aae7aad NFS: Reduce the stack usage in NFSv3 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:38 -04:00
Trond Myklebust
57dc9a5747 NFS: Reduce the stack usage in NFSv4 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:37 -04:00
Trond Myklebust
2aac05a919 NFS: Fix readdir cache invalidation
invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.

Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes). We should therefore remove the condition that we only do this on
page->index==0.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-08 15:22:40 -04:00
Andi Kleen
9465efc9e9 Remove BKL from remote_llseek v2
- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-02 15:06:27 -06:00
Trond Myklebust
03fa9e84e5 NFS: nfs_updatepage(): don't mark page as dirty if an error occurred
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:07 -04:00
Trond Myklebust
b7e2445737 NFS: Fix filehandle size comparisons in the mount code
Fix a sign issue in xdr_decode_fhstatus3()
Fix incorrect comparison in nfs_validate_mount_data()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:06 -04:00
Trond Myklebust
33852a1f2b NFS: Reduce the NFS mount code stack usage.
This appears to fix the Oops reported in
  http://bugzilla.kernel.org/show_bug.cgi?id=10826

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:05 -04:00
Adrian Bunk
1d2e88e73e nfs: make nfs4_drop_state_owner() static
nfs4_drop_state_owner() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:31 -07:00
Jan Blunck
31f31db1a1 nfs: path_{get,put}() cleanups
Here are some more places where path_{get,put}() can be used instead of
dput()/mntput() pair.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:30 -07:00
Harvey Harrison
3110ff8048 nfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:29 -07:00
Eric Paris
46c8ac7425 nfs/lsm: make NFSv4 set LSM mount options
NFSv3 get_sb operations call into the LSM layer to set security options passed
from userspace.  NFSv4 hooks were not originally added since it was reasonably
late in the merge window and NFSv3 was the only thing that had regressed (v4
has never supported any LSM options)

This patch makes NFSv4 call into the LSM to set security options rather than
just blindly dropping them with no notice to the user as happens today.  This
patch was tested in a simple NFSv4 environment with the context= option and
appeared to work as expected.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:27 -07:00
Trond Myklebust
3a6258e1fb NFSv4: Check the return value of decode_compound_hdr_arg()
If decode_compound_hdr_arg() returns a resource error, then we cannot
proceed to process the callback. Return a 'GARBAGE_ARGS' rpc-level error to
the caller instead.
If, however, the minor version field is incorrect, then we need to
propagate the resulting NFS4ERR_MINOR_VERS_MISMATCH error back as the
compound status field (setting the nops field to 0).

Finally, if encode_compound_hdr_res() returns an error, we need to return
an RPC_SYSTEM_ERR to the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:26 -07:00
Fred Isaman
38def50fab nfs: fix race in nfs_dirty_request
When called from nfs_flush_incompatible, the req is not locked, so
req->wb_page might be set to NULL before it is used by PageWriteback.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:23 -07:00
Trond Myklebust
b0b539739f NFS: Ensure that 'noac' and/or 'actimeo=0' turn off attribute caching
Both the 'noac' and 'actimeo=0' mount options should ensure that attributes
are not cached, however a bug in nfs_attribute_timeout() means that
currently, the attributes may in fact get cached for up to one jiffy. This
has been seen to cause corruption in some applications.

The reason for the bug is that the time_in_range() test returns 'true' as
long as the current time lies between nfsi->read_cache_jiffies and
nfsi->read_cache_jiffies + nfsi->attrtimeo. In other words, if jiffies
equals nfsi->read_cache_jiffies, then we still cache the attribute data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:21 -07:00
Miklos Szeredi
fa799759f9 mm: bdi: expose the BDI object in sysfs for NFS
Register NFS' backing_dev_info under sysfs with the name "nfs-MAJOR:MINOR"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Denis V. Lunev
34b37235c6 nfs: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
36a5aeb878 proc: remove proc_root_fs
Use creation by full path instead: "fs/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Al Viro
42faad9965 [PATCH] restore sane ->umount_begin() API
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-25 09:23:25 -04:00
Linus Torvalds
563307b2fa Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
  SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
  make nfs_automount_list static
  NFS: remove duplicate flags assignment from nfs_validate_mount_data
  NFS - fix potential NULL pointer dereference v2
  SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
  SUNRPC: Fix a race in gss_refresh_upcall()
  SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
  SUNRPC: Remove the unused export of xprt_force_disconnect
  SUNRPC: remove XS_SENDMSG_RETRY
  SUNRPC: Protect creds against early garbage collection
  NFSv4: Attempt to use machine credentials in SETCLIENTID calls
  NFSv4: Reintroduce machine creds
  NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
  nfs: fix printout of multiword bitfields
  nfs: return negative error value from nfs{,4}_stat_to_errno
  NLM/lockd: Ensure client locking calls use correct credentials
  NFS: Remove the buggy lock-if-signalled case from do_setlk()
  NLM/lockd: Fix a race when cancelling a blocking lock
  NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
  NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
  ...
2008-04-24 11:46:16 -07:00
Trond Myklebust
233607dbbc Merge branch 'devel' 2008-04-24 14:01:02 -04:00
Jeff Layton
06e02d66fa NFS: don't let nfs_callback_svc exit on unexpected svc_recv errors (try #2)
When svc_recv returns an unexpected error, nfs_callback_svc will print a
warning and exit. This problematic for several reasons. In particular,
it will cause the reference counts for the thread to be wrong, and no
new thread will be started until all nfs4 mounts are unmounted.

Rather than exiting on error from svc_recv, have the thread do a 1s
sleep and then retry the loop. This is unlikely to cause any harm, and
if the error turns out to be something temporary then it may be able to
recover.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:42 -04:00
J. Bruce Fields
e1ba1ab76e nfsd: fix comment
Obvious comment nit.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:42 -04:00
Jeff Layton
a277e33cbe NFS: convert nfs4 callback thread to kthread API
There's a general push to convert kernel threads to use the (much
cleaner) kthread API. This patch converts the NFSv4 callback kernel
thread to the kthread API. In addition to being generally cleaner this
also removes the dependency on signals when shutting down the thread.

Note that this patch depends on the recent patches to svc_recv() to
make it check kthread_should_stop() periodically. Those patches are
in Bruce's tree at the moment and are slated for 2.6.26 along with
the lockd conversion, so this conversion is probably also appropriate
for 2.6.26.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:40 -04:00
J. Bruce Fields
065f30ec14 nfs: remove unnecessary NFS_NEED_* defines
Thanks to Robert Day for pointing out that these two defines are unused.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <trond@netapp.com>Trond Myklebust <trond@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
2008-04-23 16:13:37 -04:00
Adrian Bunk
a3dab29353 make nfs_automount_list static
nfs_automount_list can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:29 -04:00
Jeff Layton
daa7da5fd3 NFS: remove duplicate flags assignment from nfs_validate_mount_data
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:25 -04:00
Cyrill Gorcunov
63649bd708 NFS - fix potential NULL pointer dereference v2
There is possible NULL pointer dereference if kstr[n]dup failed.
So fix them for safety.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:22 -04:00
Trond Myklebust
a2b2bb8822 NFSv4: Attempt to use machine credentials in SETCLIENTID calls
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:59 -04:00
Trond Myklebust
7c67db3a8a NFSv4: Reintroduce machine creds
We need to try to ensure that we always use the same credentials whenever
we re-establish the clientid on the server. If not, the server won't
recognise that we're the same client, and so may not allow us to recover
state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:56 -04:00
Trond Myklebust
78ea323be6 NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
With the recent change to generic creds, we can no longer use
cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and
AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name
instead...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:53 -04:00
Fred Isaman
4410924157 nfs: fix printout of multiword bitfields
Benny points out that zero-padding of multiword bitfields is necessary,
and that delimiting each word is nice to avoid endianess confusion.

bhalevy: without zero padding output can be ambiguous. Also,
since the printed array of two 32-bit unsigned integers is not a
64-bit number, delimiting the output with a semicolon makes more sense.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:50 -04:00
Benny Halevy
856dff3d38 nfs: return negative error value from nfs{,4}_stat_to_errno
All use sites for nfs{,4}_stat_to_errno negate their return value.
It's more efficient to return a negative error from the stat_to_errno convertors
rather than negating its return value everywhere. This also produces slightly
smaller code.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:47 -04:00
Trond Myklebust
c4d7c402b7 NFS: Remove the buggy lock-if-signalled case from do_setlk()
Both NLM and NFSv4 should be able to clean up adequately in the case where
the user interrupts the RPC call...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:52 -04:00
Trond Myklebust
536ff0f809 NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:33 -04:00
Trond Myklebust
c1d519312d NFSv4: Only increment the sequence id if the server saw it
It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
before the actual stateful operation has been executed (for instance in the
PUTFH call). There is no way to tell from the overall status result which
operations were executed from the COMPOUND.

The fix is to move incrementing of the sequence id into the XDR layer,
so that we do it as we process the results from the stateful operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:15 -04:00
Trond Myklebust
35d05778e2 NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired()
There should be no need to invalidate a perfectly good state owner just
because of a stale filehandle. Doing so can cause the state recovery code
to break, since nfs4_get_renew_cred() and nfs4_get_setclientid_cred() rely
on finding active state owners.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:12 -04:00
Trond Myklebust
dbae4c73f0 NFS: Ensure that rpc_run_task() errors are propagated back to the caller
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:08 -04:00
Trond Myklebust
c9d8f89d98 NFS: Ensure that the write code cleans up properly when rpc_run_task() fails
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:05 -04:00
Trond Myklebust
fdd1e74c89 NFS: Ensure that the read code cleans up properly when rpc_run_task() fails
In the case of readpage() we need to ensure that the pages get unlocked,
and that the error is flagged.

In the case of O_DIRECT, we need to ensure that the pages are all released.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:01 -04:00
Trond Myklebust
73e3302f60 NFS: Fix nfs_wb_page() to always exit with an error or a clean page
It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:52:58 -04:00
Dave Hansen
2c463e9548 [PATCH] r/o bind mounts: check mnt instead of superblock directly
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Bryan Wu
240ee83118 fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private()
NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine
http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992

Signed-off-by: Bryan Wu <cooloney@kernel.org>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08 21:06:56 -04:00
Jeff Layton
66d3aac041 NFS: initialize flags field in nfs_open_context
The nfs_open_context struct had a "flags" field added recently, but the
allocator isn't initializing it. It also looks like the allocator isn't
initializing the mode or list either, but they seem to be overwritten
by the caller, so that's less of an issue.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08 21:06:53 -04:00
Al Viro
c35038beca [PATCH] do shrink_submounts() for all fs types
... and take it out of ->umount_begin() instances.  Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:58 -04:00
Linus Torvalds
7d3628b230 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  [NET] ifb: set separate lockdep classes for queue locks
  [IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
  [TCP]: Fix shrinking windows with window scaling
  netpoll: zap_completion_queue: adjust skb->users counter
  bridge: use time_before() in br_fdb_cleanup()
  [TG3]: Fix build warning on sparc32.
  MAINTAINERS: bluez-devel is subscribers-only
  audit: netlink socket can be auto-bound to pid other than current->pid (v2)
  [NET]: Fix permissions of /proc/net
  [SCTP]: Fix a race between module load and protosw access
  [NETFILTER]: ipt_recent: sanity check hit count
  [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
  [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
  [IPV4]: esp_output() misannotations
  [8021Q]: vlan_dev misannotations
  xfrm: ->eth_proto is __be16
  [IPV4]: ipv4_is_lbcast() misannotations
  [SUNRPC]: net/* NULL noise
  [SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
  [PKT_SCHED]: annotate cls_u32
  ...
2008-03-21 07:57:45 -07:00
Chuck Lever
ecfc555a83 NFS: Always enable NFS direct I/O
Since O_DIRECT is a standard feature that is enabled in most distros,
eliminate the CONFIG_NFS_DIRECTIO build option, and change the
fs/nfs/Makefile to always build in the NFS direct I/O engine.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:34 -04:00
Chuck Lever
82d101d58a NFS: Show most mount options via nfs_show_options()
Display all mount options in /proc/mount which may be needed to reconstruct
a previous mount.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:29 -04:00
Chuck Lever
3f8400d1f1 NFS: Save the values of the "mount*=" mount options
Save the value of the mountproto= mountport= mountvers= and mountaddr=
options so that these values can be displayed later via
nfs_show_options().

This preserves the intent of the original mount options, should the file
system need to be remounted based on what's displayed in /proc/mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:22 -04:00
Chuck Lever
f22d6d79fe NFS: Save the value of the "port=" mount option
During a remount based on the mount options displayed in /proc/mounts, we
want to preserve the original behavior of the mount request.  Let's save
the original setting of the "port=" mount option in the mount's nfs_server
structure.

This allows us to simplify the default behavior of port setting for NFSv4
mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the
NFS server's port, unless the user specified the "port=" mount option;
Users can force the client to skip the RPC bind by explicitly specifying
"port=<value>".

NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC
bind, unless the user specifies "port=".  Users can force an RPC bind for
NFSv4 by explicitly specifying "port=0".

I added a couple of extra comments to clarify this behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:19 -04:00
Chuck Lever
78fa701f34 NFS: Fix up data types of fields in nfs_parsed_mount_options
Clean up: make data types of fields in nfs_parsed_mount_options more
consistent with other uses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:16 -04:00
Chuck Lever
2d76743227 NFS: numeric mount parameters are unsigned
Clean up: use %u instead of %d when displaying NFS mount options.

Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats.  The mount
option is called "namlen" without the "e".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:13 -04:00
Jeff Layton
7bda2cdf48 NFS: clean up short packet handling for NFSv4 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in decode_readdir().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:10 -04:00
Jeff Layton
643f81115b NFS: clean up short packet handling for NFSv3 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs3_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:06 -04:00
Jeff Layton
caa02bd540 NFS: clean up short packet handling for NFSv2 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:03 -04:00
Fred Isaman
4af68bffac nfs: remove duplicate initializations of nfs_read_data field
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:59 -04:00
Fred
6d884e8fc8 nfs: nfs_redirty_request
Both flush functions have the same error handling routine.  Pull
it out as a function.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:56 -04:00
Trond Myklebust
c7c350e92a Merge branch 'hotfixes' into devel 2008-03-19 17:59:44 -04:00
Fred Isaman
f8512ad0da nfs: don't ignore return value from nfs_pageio_add_request
Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

In read path:
  call nfs_pageio_add_request from readpage_async_filler
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
  BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
  do_generic_mapping_read will eventually call lock_page, resulting in deadlock

In write path:
  page is marked dirty by generic_perform_write
  nfs_writepages is called
  call nfs_pageio_add_request from nfs_page_async_flush
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
  The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:02 -04:00
David S. Miller
2f633928cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-17 23:44:31 -07:00
Al Viro
e6f1cebf71 [NET] endianness noise: INADDR_ANY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17 22:44:53 -07:00
Fred Isaman
2f42b5d043 NFS: fix encode_fsinfo_maxsz
The previous value was not taking into account space for bitmap array size.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-14 13:47:17 -04:00
Trond Myklebust
98a8e32394 SUNRPC: Add a helper rpcauth_lookup_generic_cred()
The NFSv4 protocol allows clients to negotiate security protocols on the
fly in the case where an administrator on the server changes the export
settings and/or in the case where we may have a filesystem migration event.

Instead of having the NFS client code cache credentials that are tied to a
particular AUTH method it is therefore preferable to have a generic credential
that can be converted into whatever AUTH is in use by the RPC client when
the read/write/sillyrename/... is put on the wire.

We do this by means of the new "generic" credential, which basically just
caches the minimal information that is needed to look up an RPCSEC_GSS,
AUTH_SYS, or AUTH_NULL credential.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-14 13:42:49 -04:00
Trond Myklebust
9446389ef6 Merge commit 'origin' into devel 2008-03-08 11:49:24 -05:00
Linus Torvalds
4c1aa6f8b9 Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
  NFS: Fix the fsid revalidation in nfs_update_inode()
  SUNRPC: Fix a nfs4 over rdma transport oops
  NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
2008-03-07 12:08:07 -08:00
Trond Myklebust
4e99a1ff34 NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:41 -05:00
Trond Myklebust
c37dcd334c NFS: Fix the fsid revalidation in nfs_update_inode()
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:37 -05:00
Trond Myklebust
af1b8c2ff7 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:33:40 -05:00
Eric Paris
f9c3a38021 NFS: use new LSM interfaces to explicitly set mount options
NFS and SELinux worked together previously because SELinux had NFS
specific knowledge built in.  This design was approved by both groups
back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and
the usage of nfs_clone_mount_data showed this to be a poor fragile
solution.  This patch fixes the NFS functionality regression by making
use of the new LSM interfaces to allow an FS to explicitly set its own
mount options.

The explicit setting of mount options is done in the nfs get_sb
functions which are called before the generic vfs hooks try to set mount
options for filesystems which use text mount data.

This does not currently support NFSv4 as that functionality did not
exist in previous kernels and thus there is no regression.  I will be
adding the needed code, which I believe to be the exact same as the v3
code, in nfs4_get_sb for 2.6.26.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-03-06 08:40:59 +11:00
Trond Myklebust
cdd0972945 Merge branch 'cleanups' into next 2008-02-28 23:48:05 -08:00
Trond Myklebust
5e4424af9a SUNRPC: Remove now-redundant RCU-safe rpc_task free path
Now that we've tightened up the locking rules for RPC queue wakeups, we can
remove the RCU-safe kfree calls...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-28 23:26:28 -08:00
Trond Myklebust
f6a1cc8930 SUNRPC: Add a (empty for the moment) destructor for rpc_wait_queues
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-28 23:17:27 -08:00
Trond Myklebust
5d00837b90 SUNRPC: Run rpc timeout functions as callbacks instead of in softirqs
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).

The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:44 -08:00
Trond Myklebust
fda1393938 SUNRPC: Convert users of rpc_wake_up_task to use rpc_wake_up_queued_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:42 -08:00
Trond Myklebust
101070ca2f NFS: Ensure that the asynchronous RPC calls complete on nfsiod.
We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod
rather than on rpciod, so that they don't deadlock when the resulting
umount calls rpc_shutdown_client(). Hence we specify that read, write and
commit calls must complete on nfsiod.
Ditto for NFSv4 open, lock, locku and close asynchronous calls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:37 -08:00
Trond Myklebust
5746006f1d NFS: Add an nfsiod workqueue
NFS post-rpciod cleanups often involve tasks that cannot be safely
performed within the rpciod context (due to deadlock concerns). We
therefore add a dedicated NFS workqueue that can perform tasks like
cleaning up state after an interrupted NFSv4 open() call, or calling
put_nfs_open_context() after an asynchronous read or write call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:36 -08:00
Trond Myklebust
383ba71938 NFS: Fix a deadlock with lazy umount
We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare()
and task->tk_ops->rpc_call_done() to call mntput() in any way, since
that will cause a deadlock when the call to rpc_shutdown_client() attempts
to wait on 'task' to complete.

We can avoid the above deadlock by moving calls to mntput to
task->tk_ops->rpc_release() callback, since at that time the task will be
marked as completed, and so rpc_shutdown_client won't attempt to wait on
it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:33 -08:00
Trond Myklebust
4b5621f6b1 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 15:56:29 -08:00
Pavel Emelyanov
5216a8e70e Wrap buffers used for rpc debug printks into RPC_IFDEBUG
Sorry for the noise, but here's the v3 of this compilation fix :)

There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.

Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-21 18:42:29 -05:00
Harvey Harrison
90dc7d2796 nfs: fix sparse warnings
fs/nfs/nfs4state.c:788:34: warning: Using plain integer as NULL pointer
fs/nfs/delegation.c:52:34: warning: Using plain integer as NULL pointer
fs/nfs/idmap.c:312:12: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:257:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:270:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:281:6: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 16:15:44 -05:00
Jeff Layton
1227a74e2e NFS: flush signals before taking down callback thread
Now that the reference counting on the callback thread is working as
expected, it uncovers another problem.  Peter Staubach noticed while
testing that patch on an older kernel that he would occasionally see
this printk in rpc_register fire:

    "RPC: failed to contact portmap (errno -512).

The NFSv4 callback thread is signaled by nfs_callback_down(), but never
flushes that signal. All of the shutdown processing is done with that
signal pending. This makes it fail the call to unregister the port with
the portmapper.

In actuality, this rpc_register call isn't necessary at all since the
port isn't actually registered with the portmapper anymore. Regardless,
there doesn't seem to be any reason to leave the signal pending while
the thread is being shut down and flushing it should generally silence
that printk.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 13:32:43 -05:00
Trond Myklebust
52833e897f Merge branch 'linus_origin' into hotfixes 2008-02-15 13:36:30 -05:00
Jan Blunck
1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck
4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Nick Piggin
2785259631 nfs: use GFP_NOFS preloads for radix-tree insertion
NFS should use GFP_NOFS mode radix tree preloads rather than GFP_ATOMIC
allocations at radix-tree insertion-time.  This is important to reduce the
atomic memory requirement.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:09 -05:00
Olga Kornievskaia
8d042218b0 NFS: add missing spkm3 strings to mount option parser
This patch adds previous missing spkm3 string values that are needed
to parse mount options in the kernel.
2008-02-13 23:24:08 -05:00
Jeff Layton
25606656b1 NFS: remove error field from nfs_readdir_descriptor_t
The error field in nfs_readdir_descriptor_t is never used outside of the
function in which it is set. Remove the field and change the place that
does use it to use an existing local variable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:07 -05:00
Dan Muntz
497799e7c0 NFS: missing spaces in KERN_WARNING
The warning message for a v4 server returning various bad sequence-ids is
missing spaces.

Signed-off-by: Dan Muntz <dmuntz@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:06 -05:00
Jeff Layton
8e60029f40 NFS: fix reference counting for NFSv4 callback thread
The reference counting for the NFSv4 callback thread stays artificially
high. When this thread comes down, it doesn't properly tear down the
svc_serv, causing a memory leak. In my testing on an older kernel on
x86_64, memory would leak out of the 8k kmalloc slab. So, we're leaking
at least a page of memory every time the thread comes down.

svc_create() creates the svc_serv with a sv_nrthreads count of 1, and
then svc_create_thread() increments that count. Whenever the callback
thread is started it has a sv_nrthreads count of 2. When coming down, it
calls svc_exit_thread() which decrements that count and if it hits 0, it
tears everything down. That never happens here since the count is always
at 2 when the thread exits.

The problem is that nfs_callback_up() should be calling svc_destroy() on
the svc_serv on both success and failure. This is how lockd_up_proto()
handles the reference counting, and doing that here fixes the leak.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:04 -05:00
Trond Myklebust
5d47a35600 NFS: Fix a potential file corruption issue when writing
If the inode is flagged as having an invalid mapping, then we can't rely on
the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation"
write optimisation in nfs_updatepage(), since that will cause NFS to write
out areas of the page that are no longer guaranteed to be up to date.

A potential corruption could occur in the following scenario:

client 1			client 2
===============			===============
				fd=open("f",O_CREAT|O_WRONLY,0644);
				write(fd,"fubar\n",6);	// cache last page
				close(fd);
fd=open("f",O_WRONLY|O_APPEND);
write(fd,"foo\n",4);
close(fd);

				fd=open("f",O_WRONLY|O_APPEND);
				write(fd,"bar\n",4);
				close(fd);
-----
The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because
client 2 does not update the cached page after re-opening the file for
write. Instead it keeps it marked as PageUptodate() until someone calls
invaldate_inode_pages2() (typically by calling read()).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-07 19:20:20 -05:00
David Howells
e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Christoph Lameter
eebd2aa355 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:13 -08:00
Tom Tucker
d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Linus Torvalds
75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
Trond Myklebust
3fbd67ad61 NFSv4: Iterate through all nfs_clients when the server recalls a delegation
The same delegation may have been handed out to more than one nfs_client.
Ensure that if a recall occurs, we return all instances.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00
Trond Myklebust
57bfa89171 NFSv4: Deal more correctly with duplicate delegations
If a (broken?) server hands out two different delegations for the same
file, then we should return one of them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00