If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
Acked-by: Jeff Layton <jlayton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
The handling of the CIFS_INO_INVALID_MAPPING flag is racy. It's possible
for two tasks to attempt to revalidate the mapping at the same time. The
first sees that CIFS_INO_INVALID_MAPPING is set. It clears the flag and
then calls invalidate_inode_pages2 to start shooting down the pagecache.
While that's going on, another task checks the flag and sees that it's
clear. It then ends up trusting the pagecache to satisfy a read when it
shouldn't.
Fix this by adding a bitlock to ensure that the clearing of the flag is
atomic with respect to the actual cache invalidation. Also, move the
other existing users of cifs_invalidate_mapping to use a new
cifs_zap_mapping() function that just sets the INVALID_MAPPING bit and
then uses the standard codepath to handle the invalidation.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>
In later patches, we'll need to have a bitlock, so go ahead and convert
these bools to use atomic bitops instead.
Also, clean up the initialization of the flags field. There's no need
to unset each bit individually just after it was zeroed on allocation.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>
For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This issue was found by Coverity (CID 1202536)
This proposes a fix for a statement that creates dead code.
The "rc < 0" statement is within code that is run
with "rc > 0".
It seems like "err < 0" was meant to be used here.
This way, the error code is returned by the function.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
Coverity says:
*** CID 1202537: Dereference after null check (FORWARD_NULL)
/fs/cifs/file.c: 2873 in cifs_user_readv()
2867 cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize);
2868 npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
2869
2870 /* allocate a readdata struct */
2871 rdata = cifs_readdata_alloc(npages,
2872 cifs_uncached_readv_complete);
>>> CID 1202537: Dereference after null check (FORWARD_NULL)
>>> Comparing "rdata" to null implies that "rdata" might be null.
2873 if (!rdata) {
2874 rc = -ENOMEM;
2875 goto error;
2876 }
2877
2878 rc = cifs_read_allocate_pages(rdata, npages);
...when we "goto error", rc will be non-zero, and then we end up trying
to do a kref_put on the rdata (which is NULL). Fix this by replacing
the "goto error" with a "break".
Reported-by: <scan-admin@coverity.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Problem reported in Red Hat bz 1040329 for strict writes where we cache
only when we hold oplock and write direct to the server when we don't.
When we receive an oplock break, we first change the oplock value for
the inode in cifsInodeInfo->oplock to indicate that we no longer hold
the oplock before we enqueue a task to flush changes to the backing
device. Once we have completed flushing the changes, we return the
oplock to the server.
There are 2 ways here where we can have data corruption
1) While we flush changes to the backing device as part of the oplock
break, we can have processes write to the file. These writes check for
the oplock, find none and attempt to write directly to the server.
These direct writes made while we are flushing from cache could be
overwritten by data being flushed from the cache causing data
corruption.
2) While a thread runs in cifs_strict_writev, the machine could receive
and process an oplock break after the thread has checked the oplock and
found that it allows us to cache and before we have made changes to the
cache. In that case, we end up with a dirty page in cache when we
shouldn't have any. This will be flushed later and will overwrite all
subsequent writes to the part of the file represented by this page.
Before making any writes to the server, we need to confirm that we are
not in the process of flushing data to the server and if we are, we
should wait until the process is complete before we attempt the write.
We should also wait for existing writes to complete before we process
an oplock break request which changes oplock values.
We add a version specific downgrade_oplock() operation to allow for
differences in the oplock values set for the different smb versions.
Cc: stable@vger.kernel.org
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit 7f25bba819 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.
Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.
It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... we are doing them on adjacent parts of file, so what happens is that
each subsequent call works to rebuild the iov_iter to exact state it
had been abandoned in by previous one. Just keep it through the entire
cifs_iovec_read(). And use copy_page_to_iter() instead of doing
kmap/copy_to_user/kunmap manually...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... by that point the request we'd just resent is in the
head of the list anyway. Just return to the beginning of
the loop body...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
and use generic_file_aio_write rather than __generic_file_aio_write
in cifs_writev.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull CIFS fixes from Steve French:
"Three cifs fixes, the most important fixing the problem with passing
bogus pointers with writev (CVE-2014-0069).
Two additional cifs fixes are still in review (including the fix for
an append problem which Al also discovered)"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix too big maxBuf size for SMB3 mounts
cifs: ensure that uncached writes handle unmapped areas correctly
[CIFS] Fix cifsacl mounts over smb2 to not call cifs
It's possible for userland to pass down an iovec via writev() that has a
bogus user pointer in it. If that happens and we're doing an uncached
write, then we can end up getting less bytes than we expect from the
call to iov_iter_copy_from_user. This is CVE-2014-0069
cifs_iovec_write isn't set up to handle that situation however. It'll
blindly keep chugging through the page array and not filling those pages
with anything useful. Worse yet, we'll later end up with a negative
number in wdata->tailsz, which will confuse the sending routines and
cause an oops at the very least.
Fix this by having the copy phase of cifs_iovec_write stop copying data
in this situation and send the last write as a short one. At the same
time, we want to avoid sending a zero-length write to the server, so
break out of the loop and set rc to -EFAULT if that happens. This also
allows us to handle the case where no address in the iovec is valid.
[Note: Marking this for stable on v3.4+ kernels, but kernels as old as
v2.6.38 may have a similar problem and may need similar fix]
Cc: <stable@vger.kernel.org> # v3.4+
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
When mounting with smb2/smb3 (e.g. vers=2.1) and cifsacl mount option,
it was trying to get the mode by querying the acl over the cifs
rather than smb2 protocol. This patch makes that protocol
independent and makes cifsacl smb2 mounts return a more intuitive
operation not supported error (until we add a worker function
for smb2_get_acl).
Note that a previous patch fixed getxattr/setxattr for the CIFSACL xattr
which would unconditionally call cifs_get_acl and cifs_set_acl (even when
mounted smb2). I made those protocol independent last week (new protocol
version operations "get_acl" and "set_acl" but did not add an
smb2_get_acl and smb2_set_acl yet so those now simply return EOPNOTSUPP
which at least is better than sending cifs requests on smb2 mount)
The previous patches did not fix the one remaining case though ie
mounting with "cifsacl" when getting mode from acl would unconditionally
end up calling "cifs_get_acl_from_fid" even for smb2 - so made that protocol
independent but to make that protocol independent had to make sure that the callers
were passing the protocol independent handle structure (cifs_fid) instead
of cifs specific _u16 network file handle (ie cifs_fid instead of cifs_fid->fid)
Now mount with smb2 and cifsacl mount options will return EOPNOTSUP (instead
of timing out) and a future patch will add smb2 operations (e.g. get_smb2_acl)
to enable this.
Signed-off-by: Steve French <smfrench@gmail.com>
Pull CIFS fixes from Steve French:
"Small fix from Jeff for writepages leak, and some fixes for ACLs and
xattrs when SMB2 enabled.
Am expecting another fix from Jeff and at least one more fix (for
mounting SMB2 with cifsacl) in the next week"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] clean up page array when uncached write send fails
cifs: use a flexarray in cifs_writedata
retrieving CIFS ACLs when mounted with SMB2 fails dropping session
Add protocol specific operation for CIFS xattrs
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
pos_before_write .. pos_before_write + written - 1
instead. Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().
All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().
The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In the event that a send fails in an uncached write, or we end up
needing to reissue it (-EAGAIN case), we'll kfree the wdata but
the pages currently leak.
Fix this by adding a new kref release routine for uncached writedata
that releases the pages, and have the uncached codepaths use that.
[original patch by Jeff modified to fix minor formatting problems]
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Rename CIFSSMBOpen to CIFS_open and make it take
cifs_open_parms structure as a parm.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Opens on current cifs/smb2/smb3 mounts with O_DIRECT flag fail
even when caching is disabled on the mount. This was
reported by those running SMB2 benchmarks who need to
be able to pass O_DIRECT on many of their open calls to
reduce caching effects, but would also be needed by other
applications.
When mounting with forcedirectio ("cache=none") cifs and smb2/smb3
do not go through the page cache and thus opens with O_DIRECT flag
should work (when posix extensions are negotiated we even are
able to send the flag to the server). This patch fixes that
in a simple way.
The 9P client has a similar situation (caching is often disabled)
and takes the same approach to O_DIRECT support ie works if caching
disabled, but if client caching enabled it fails with EINVAL.
A followon idea for a future patch as Pavel noted, could
be that files opened with O_DIRECT could cause us to change
inode->i_fop on the fly from
cifs_file_strict_ops
to
cifs_file_direct_ops
which would allow us to support this on non-forcedirectio mounts
(cache=strict and cache=loose) as well.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
In cifs_readpages(), we may decide we don't want to read a page after all -
but the page may already have passed through fscache_read_or_alloc_pages() and
thus have marks and reservations set. Thus we have to call
fscache_readpages_cancel() or fscache_uncache_page() on the pages we're
returning to clear the marks.
NFS, AFS and 9P should be unaffected by this as they call read_cache_pages()
which does the cleanup for you.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
When reading a single page with cifs_readpage(), we make a call to
fscache_read_or_alloc_page() which once done, asynchronously calls
the completion function cifs_readpage_from_fscache_complete(). This
completion function unlocks the page once it has been populated from
cache. The module then attempts to unlock the page a second time in
cifs_readpage() which leads to warning messages.
In case of a successful call to fscache_read_or_alloc_page() we should skip
the second unlock_page() since this will be called by the
cifs_readpage_from_fscache_complete() once the page has been populated by
fscache.
With the modifications to cifs_readpage_worker(), we will need to re-grab the
page lock in cifs_write_begin().
The problem was first noticed when testing new fscache patches for cifs.
https://bugzilla.redhat.com/show_bug.cgi?id=1005737
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We do not need to take a reference to the pagecache in
cifs_readpage_worker() since the calling function will have already
taken one before passing the pointer to the page as an argument to the
function.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that force a client to purge cache pages when a server requests it.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
that prepare the code to handle different types of SMB2 leases.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request. Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.
Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from
Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In the cifs_reopen_file function, if the following statement is
asserted:
(tcon->unix_ext && cap_unix(tcon->ses) &&
(CIFS_UNIX_POSIX_PATH_OPS_CAP &
(tcon->fsUnixInfo.Capability)))
and we succeed to open with cifs_posix_open, the function jumps
to the label reopen_success and checks for oparms.reconnect
which is not initialized.
This issue has been reported by scan.coverity.com
Signed-off-by: Andi Shyti <andi@etezian.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If we request reading or writing on a file that needs to be
reopened, it causes the deadlock: we are already holding rw
semaphore for reading and then we try to acquire it for writing
in cifs_relock_file. Fix this by acquiring the semaphore for
reading in cifs_relock_file due to we don't make any changes in
locks and don't need a write access.
CC: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
This is a follow-on patch for 8/8 patch from the durable handles
series. It fixes the problem when durable file handle timeout
expired on the server and reopen returns -ENOENT for such files.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
On reconnects, we need to reopen file and then obtain all byte-range
locks held by the client. SMB2 protocol provides feature to make
this process atomic by reconnecting to the same file handle
with all it's byte-range locks. This patch adds this capability
for SMB2 shares.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steven French <steven@steven-GA-970A-DS3.(none)>
Pull second set of VFS changes from Al Viro:
"Assorted f_pos race fixes, making do_splice_direct() safe to call with
i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
->d_hash/->d_compare calling conventions changes from Linus, misc
stuff all over the place."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
Document ->tmpfile()
ext4: ->tmpfile() support
vfs: export lseek_execute() to modules
lseek_execute() doesn't need an inode passed to it
block_dev: switch to fixed_size_llseek()
cpqphp_sysfs: switch to fixed_size_llseek()
tile-srom: switch to fixed_size_llseek()
proc_powerpc: switch to fixed_size_llseek()
ubi/cdev: switch to fixed_size_llseek()
pci/proc: switch to fixed_size_llseek()
isapnp: switch to fixed_size_llseek()
lpfc: switch to fixed_size_llseek()
locks: give the blocked_hash its own spinlock
locks: add a new "lm_owner_key" lock operation
locks: turn the blocked_list into a hashtable
locks: convert fl_link to a hlist_node
locks: avoid taking global lock if possible when waking up blocked waiters
locks: protect most of the file_lock handling with i_lock
locks: encapsulate the fl_link list handling
locks: make "added" in __posix_lock_file a bool
...
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.
->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.
Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.
For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.
On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.
With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.
Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.
Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.
Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).
This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.
We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.
Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
It's not obvious from reading the macro names that these macros
are for debugging. Convert the names to a single more typical
kernel style cifs_dbg macro.
cERROR(1, ...) -> cifs_dbg(VFS, ...)
cFYI(1, ...) -> cifs_dbg(FYI, ...)
cFYI(DBG2, ...) -> cifs_dbg(NOISY, ...)
Move the terminating format newline from the macro to the call site.
Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the
"CIFS VFS: " prefix for VFS messages.
Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y)
$ size fs/cifs/cifs.ko*
text data bss dec hex filename
265245 2525 132 267902 4167e fs/cifs/cifs.ko.new
268359 2525 132 271016 422a8 fs/cifs/cifs.ko.old
Other miscellaneous changes around these conversions:
o Miscellaneous typo fixes
o Add terminating \n's to almost all formats and remove them
from the macros to be more kernel style like. A few formats
previously had defective \n's
o Remove unnecessary OOM messages as kmalloc() calls dump_stack
o Coalesce formats to make grep easier,
added missing spaces when coalescing formats
o Use %s, __func__ instead of embedded function name
o Removed unnecessary "cifs: " prefixes
o Convert kzalloc with multiply to kcalloc
o Remove unused cifswarn macro
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
cifsFileInfo objects hold references to dentries and it is possible that
these will still be around in workqueues when VFS decides to kill super
block during unmount.
This results in panics like this one:
BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:943!
[..]
Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0)
[..]
Call Trace:
[<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60
[<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0
[<ffffffff8119f946>] kill_anon_super+0x16/0x30
[<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs]
[<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80
[<ffffffff811a085e>] deactivate_super+0x4e/0x70
[<ffffffff811bb417>] mntput_no_expire+0xd7/0x130
[<ffffffff811bc30c>] sys_umount+0x9c/0x3c0
[<ffffffff81657c19>] system_call_fastpath+0x16/0x1b
Fix this by making each cifsFileInfo object hold a reference to cifs
super block, which implicitly keeps VFS super block around as well.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Samba rejected libreoffice's attempt to open a file with illegal
O_EXCL (without O_CREAT). Mask this flag off (as the local
linux file system case does) for this case, so that we
don't have disable Unix Extensions unnecessarily due to
the Samba error (Samba server is also being fixed).
See https://bugzilla.samba.org/show_bug.cgi?id=9519
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
Use INVALID_UID and INVALID_GID instead of NO_CHANGE_64 to indicate
the value should not be changed.
In cifs_fill_unix_set_info convert from kuids and kgids into uids and
gids that will fit in FILE_UNIX_BASIC_INFO.
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
If we have mandatory byte-range locks on a file we can't cache reads
because pagereading may have conflicts with these locks on the server.
That's why we should allow level2 oplocks for files without mandatory
locks only.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If we have a read oplock and set a read lock in it, we can't write to the
locked area - so, filemap_fdatawrite may fail with a no information for a
userspace application even if we request a write to non-locked area. Fix
this by writing directly to the server and then breaking oplock level from
level2 to None.
Also remove CONFIG_CIFS_SMB2 ifdefs because it's suitable for both CIFS
and SMB2 protocols.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that solution has data races and can end up two identical writes to the
server: when clientCanCacheAll value can be changed during the execution
of __generic_file_aio_write.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If we have a read oplock and set a read lock in it, we can't write to the
locked area - so, filemap_fdatawrite may fail with a no information for a
userspace application even if we request a write to non-locked area. Fix
this by populating the page cache without marking affected pages dirty
after a successful write directly to the server.
Also remove CONFIG_CIFS_SMB2 ifdefs because it's suitable for both CIFS
and SMB2 protocols.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
by using cifs_invalidate_mapping rather than invalidate_remote_inode
in cifs_oplock_break - this invalidates all inode pages and resets
fscache cookies.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
We don't need to permit a write to the area locked with a read lock
by any process including the process that issues the write.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
If we netogiate mandatory locking style, have a read lock and try
to set a write lock we end up with a write lock in vfs cache and
no lock in cifs lock cache - that's wrong. Fix it by returning
from cifs_setlk immediately if a error occurs during setting a lock.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
that reacquires byte-range locks when a file is reopened.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Commit eddb079deb created a regression in the writepages codepath.
Previously, whenever it needed to check the size of the file, it did so
by consulting the inode->i_size field directly. With that patch, the
i_size was fetched once on entry into the writepages code and that value
was used henceforth.
If the file is changing size though (for instance, if someone is writing
to it or has truncated it), then that value is likely to be wrong. This
can lead to data corruption. Pages past the EOF at the time that the
writepages call was issued may be silently dropped and ignored because
cifs_writepages wrongly assumes that the file must have been truncated
in the interim.
Fix cifs_writepages to properly fetch the size from the inode->i_size
field instead to properly account for this possibility.
Original bug report is here:
https://bugzilla.kernel.org/show_bug.cgi?id=50991
Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().
Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.
Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FL_CLOSE is quite common when you close a file on which you hold a
lock. The spurious "Unknown lock flags" message in cFYI is
confusing in this case.
Reported-by: Alexander Bokovoy <abokovoy@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now we walk though cifsFileInfo's list for every incoming lease
break and look for an equivalent there. That approach misses lease
breaks that come just after an open response - we don't have time
to populate new cifsFileInfo structure to the list. Fix this by
adding new list of pending opens and look for a lease there if we
didn't find it in the list of cifsFileInfo structures.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we have a file opened with read oplock and we are writing a data
to this file, we need to store the data in the cache and then send to
the server to ensure that the next read operation will get a coherent
data.
Also mark it as CONFIG_CIFS_SMB2 because it's more suitable for SMB2
code but can fix some CIFS problems too (when server delays sending
an oplock break after a write request). We can drop this ifdefs
dependence in future.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Currently CIFS code accept read/write ops on mandatory locked area
when two processes use the same file descriptor - it's wrong.
Fix this by serializing io and brlock operations on the inode.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
and allow several processes to walk through the lock list and read
can_cache_brlcks value if they are not going to modify them.
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now we need to lock/unlock a spinlock while processing brlock ops
on the inode. Move brlocks of a fid to a separate list and attach
all such lists to the inode. This let us not hold a spinlock.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Replace the "marshal_iov" function with a "read_into_pages" function.
That function will copy the read data off the socket and into the
pages array, kmapping and reading pages one at a time.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
We'll need an array to put into a smb_rqst, so convert this into an array
instead of (ab)using the lru list_head.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Eventually, we're going to want to append a list of pages to
cifs_readdata instead of a list of kvecs. To prepare for that, turn
the kvec array allocation into a separate one and just keep a
pointer to it in the readdata.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
This is help us to extend the code for future protocols that can use
another fid mechanism (as SMB2 that has it divided into two parts:
persistent and violatile).
Also rename variables and refactor the code around the changes.
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
The readpages bug is a regression that was introduced in 6993f74a5.
This also fixes a couple of similar bugs in the uncached read and write
codepaths.
Also, prevent this sort of thing in the future by having cifsFileInfo_get
take the spinlock itself, and adding a _locked variant for use in places
that are already holding the lock. The _put code has always done that
so this makes for a less confusing interface.
Cc: <stable@vger.kernel.org> # 3.5.x
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Since both CIFS and SMB2 use ses->capabilities (server->capabilities)
field but flags are different we should make such checks protocol
independent.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
struct file_lock is pretty large, so we really don't want that on the
stack in a potentially long call chain. Reorganize the arguments to
CIFSSMBPosixLock to eliminate the need for that.
Eliminate the get_flag and simply use a non-NULL pLockInfo to indicate
that this is a "get" operation. In order to do that, need to add a new
loff_t argument for the start_offset.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Those macros add a newline on their own, so there's not any need to
embed one in the message itself.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
when cifs_reconnect sets maxBuf to 0 and we try to calculate a size
of memory we need to store locks.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
While traversing the linked list of open file handles, if the identfied
file handle is invalid, a reopen is attempted and if it fails, we
resume traversing where we stopped and cifs can oops while accessing
invalid next element, for list might have changed.
So mark the invalid file handle and attempt reopen if no
valid file handle is found in rest of the list.
If reopen fails, move the invalid file handle to the end of the list
and start traversing the list again from the begining.
Repeat this four times before giving up and returning an error if
file reopen keeps failing.
Cc: <stable@vger.kernel.org>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
CIFS brlock cache can be used by several file handles if we have a
write-caching lease on the file that is supported by SMB2 protocol.
Prepate the code to handle this situation correctly by sorting brlocks
by a fid to easily push them in portions when lease break comes.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Convert cifs_iovec_read to use async I/O. This also raises the limit on
the rsize for uncached reads. We first allocate a set of pages to hold
the replies, then issue the reads in parallel and then collect the
replies and copy the results into the iovec.
A possible future optimization would be to kmap and inline the iovec
buffers and read the data directly from the socket into that. That would
require some rather complex conversion of the iovec into a kvec however.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
This isn't strictly necessary for the async readpages code, but the
uncached version will need to be able to collect the replies after
issuing the calls. Add a kref to cifs_readdata and use change the
code to take and put references appropriately.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cached and uncached reads will need to do different things here to
handle the difference when the pages are in pagecache and not. Abstract
out the function that marshals the page list into a kvec array.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
We'll need different completion routines for an uncached read. Allow
the caller to set the one he needs at allocation time. Also, move
most of these functions to file.c so we can make more of them static.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
In the recent update of the cifs_iovec_write code to use async writes,
the handling of the file position was broken. That patch added a local
"offset" variable to handle the offset, and then only updated the
original "*poffset" before exiting.
Unfortunately, it copied off the original offset from the beginning,
instead of doing so after generic_write_checks had been called. Fix
this by moving the initialization of "offset" after that in the
function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if the second process blocked on the lock in the
same time.
Fix it by using posix_lock_file rather than posix_lock_file_wait
under cinode->lock_mutex. If we request a blocking lock and
posix_lock_file indicates that there is another lock that prevents
us, wait untill that lock is released and restart our call.
Cc: stable@kernel.org
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We can deadlock if we have a write oplock and two processes
use the same file handle. In this case the first process can't
unlock its lock if another process blocked on the lock in the
same time.
Fix this by removing lock_mutex protection from waiting on a
blocked lock and protect only posix_lock_file call.
Cc: stable@kernel.org
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_update_eof has the potential to be racy if multiple threads are
trying to modify it at the same time. Protect modifications of the
server_eof value with the inode->i_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
We'll need to do something a bit different depending on the caller.
Abstract the code that marshals the page array into an iovec.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Use DIV_ROUND_UP. Also, PAGE_SIZE is more appropriate here since these
aren't pagecache pages.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
The gfp flags are currently set to __GPF_HIGHMEM, which doesn't allow
for any reclaim. Make this more resilient by or'ing that with
GFP_KERNEL. Also, get rid of the goto and unify the exit codepath.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Pavel Shilovsky <piastry@etersoft.ru>
We'll need a different set of write completion ops when not writing out
of the pagecache.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
We'll need this to handle rwpidforward option correctly when we use
async writes in the aio_write op.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Some servers sets this value less than 50 that was hardcoded and
we lost the connection if when we exceed this limit. Fix this by
respecting this value - not sending more than the server allows.
Cc: stable@kernel.org
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <stevef@smf-gateway.(none)>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Steve French <stevef@smf-gateway.(none)>
Reorganize the code to make the memory already allocated before
spinlock'ed loop.
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Reorder parms of cifs_lock_init, trivially simplify getlk code and
remove extra {} in cifs_lock_add_if.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now we allocate a lock structure at first, then we request to the server
and save the lock if server returned OK though void function - it prevents
the situation when we locked a file on the server and then return -ENOMEM
from setlk.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
to prevent the mutex being unlocked twice if we interrupt a blocked lock.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
that reduces a traffic and increases a performance.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that reduces a traffic and increases a performance.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
to handle all lock requests on the client in an exclusive oplock case.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If we have an oplock and negotiate mandatory locking style we handle
all brlock requests on the client.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently we cap the rsize at a value that fits in CIFSMaxBufSize. That's
not needed any longer for readpages. Allow the use of larger values for
readpages. cifs_iovec_read and cifs_read however are still limited to the
CIFSMaxBufSize. Make sure they don't exceed that.
The patch also changes the rsize defaults. The default when unix
extensions are enabled is set to 1M for parity with the wsize, and there
is a hard cap of ~16M.
When unix extensions are not enabled, the default is set to 60k. According
to MS-CIFS, Windows servers can only send a max of 60k at a time, so
this is more efficient than requesting a larger size. If the user wishes
however, the max can be extended up to 128k - the length of the READ_RSP
header.
Really old servers however require a special hack to ensure that we don't
request too large a read.
Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Now that we have code in place to do asynchronous reads, convert
cifs_readpages to use it. The new cifs_readpages walks the page_list
that gets passed in, locks and adds the pages to the pagecache and
sets up cifs_readdata to handle the reads.
The rest is handled by the cifs_async_readv infrastructure.
Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Smatch complains that the cast to "int" in min_t() changes very large
values of current_read_size into negative values and so min_t()
could return the wrong value. I removed the const as well, as that
doesn't do anything here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that let us do local lock checks before requesting to the server.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Split cifs_lock into several functions and let CIFSSMBLock get pid
as an argument.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Add mount options backupuid and backugid.
It allows an authenticated user to access files with the intent to back them
up including their ACLs, who may not have access permission but has
"Backup files and directories user right" on them (by virtue of being part
of the built-in group Backup Operators.
When mount options backupuid is specified, cifs client restricts the
use of backup intents to the user whose effective user id is specified
along with the mount option.
When mount options backupgid is specified, cifs client restricts the
use of backup intents to the users whose effective user id belongs to the
group id specified along with the mount option.
If an authenticated user is not part of the built-in group Backup Operators
at the server, access to such files is denied, even if allowed by the client.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Thus spake Jeff Layton:
"Making that a module parm would allow you to set that parameter at boot
time without needing to add special startup scripts. IMO, all of the
procfile "switches" under /proc/fs/cifs should be module parms
instead."
This patch doesn't alter the default behavior (Oplocks are enabled by
default).
To disable oplocks when loading the module, use
modprobe cifs enable_oplocks=0
(any of '0' or 'n' or 'N' conventions can be used).
To disable oplocks at runtime using the new interface, use
echo 0 > /sys/module/cifs/parameters/enable_oplocks
The older /proc/fs/cifs/OplockEnabled interface will be deprecated
after two releases. A subsequent patch will add an warning message
about this deprecation.
Changes since v2:
- make enable_oplocks a 'bool'
Changes since v1:
- eliminate the use of extra variable by renaming the old one to
enable_oplocks and make it an 'int' type.
Reported-by: Alexander Swen <alex@swen.nu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
server->maxBuf is the maximum SMB size (including header) that the
server can handle. CIFSMaxBufSize is the maximum amount of data (sans
header) that the client can handle. Currently maxBuf is being capped at
CIFSMaxBufSize + the max headers size, and the two values are used
somewhat interchangeably in the code.
This makes little sense as these two values are not related at all.
Separate them and make sure the code uses the right values in the right
places.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently, we take a sb->s_active reference and a cifsFileInfo reference
when an oplock break workqueue job is queued. This is unnecessary and
more complicated than it needs to be. Also as Al points out,
deactivate_super has non-trivial locking implications so it's best to
avoid that if we can.
Instead, just cancel any pending oplock breaks for this filehandle
synchronously in cifsFileInfo_put after taking it off the lists.
That should ensure that this job doesn't outlive the structures it
depends on.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
isofs: Remove global fs lock
jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
mm/truncate.c: fix build for CONFIG_BLOCK not enabled
fs:update the NOTE of the file_operations structure
Remove dead code in dget_parent()
AFS: Fix silly characters in a comment
switch d_add_ci() to d_splice_alias() in "found negative" case as well
simplify gfs2_lookup()
jfs_lookup(): don't bother with . or ..
get rid of useless dget_parent() in btrfs rename() and link()
get rid of useless dget_parent() in fs/btrfs/ioctl.c
fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
drivers: fix up various ->llseek() implementations
fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
Ext4: handle SEEK_HOLE/SEEK_DATA generically
Btrfs: implement our own ->llseek
fs: add SEEK_HOLE and SEEK_DATA flags
reiserfs: make reiserfs default to barrier=flush
...
Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add rwpidforward mount option that switches on a mode when we forward
pid of a process who opened a file to any read and write operation.
This can prevent applications like WINE from failing on read or write
operation on a previously locked file region from the same netfd from
another process if we use mandatory brlock style.
It is actual for WINE because during a run of WINE program two processes
work on the same netfd - share the same file struct between several VFS
fds:
1) WINE-server does open and lock;
2) WINE-application does read and write.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We need it to make them work with mandatory locking style because
we can fail in a situation like when kernel need to flush dirty pages
and there is a lock held by a process who opened file.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Have cifs_writepages issue asynchronous writes instead of waiting on
each write call to complete before issuing another. This also allows us
to return more quickly from writepages. It can just send out all of the
I/Os and not wait around for the replies.
In the WB_SYNC_ALL case, if the write completes with a retryable error,
then the completion workqueue job will resend the write.
This also changes the page locking semantics a little bit. Instead of
holding the page lock until the response is received, release it after
doing the send. This will reduce contention for the page lock and should
prevent processes that have the file mmap'ed from being blocked
unnecessarily.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Simplify many places when we call cifs_revalidate/invalidate to make
it do what it exactly needs.
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Recently introduced strictcache mode brought a new code that can be
efficiently used by directio part. That's let us add vectored operations
and break unnecessary cifs_user_read and cifs_user_write.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Add this let us drop filemap_write_and_wait from cifs_invalidate_mapping
and simplify the code to properly process invalidate logic.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This is more or less the same patch as before, but with some merge
conflicts fixed up.
If a process has a dirty page mapped into its page tables, then it has
the ability to change it while the client is trying to write the data
out to the server. If that happens after the signature has been
calculated then that signature will then be wrong, and the server will
likely reset the TCP connection.
This patch adds a page_mkwrite handler for CIFS that simply takes the
page lock. Because the page lock is held over the life of writepage and
writepages, this prevents the page from becoming writeable until
the write call has completed.
With this, we can also remove the "sign_zero_copy" module option and
always inline the pages when writing.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This flag currently only affects whether we allow "zero-copy" writes
with signing enabled. Typically we map pages in the pagecache directly
into the write request. If signing is enabled however and the contents
of the page change after the signature is calculated but before the
write is sent then the signature will be wrong. Servers typically
respond to this by closing down the socket.
Still, this can provide a performance benefit so the "Experimental" flag
was overloaded to allow this. That's really not a good place for this
option however since it's not clear what that flag does.
Move that flag instead to a new module parameter that better describes
its purpose. That's also better since it can be set at module insertion
time by configuring modprobe.d.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_close doesn't check that the filp->private_data is non-NULL before
trying to put it. That can cause an oops in certain error conditions
that can occur on open or lookup before the private_data is set.
Reported-by: Ben Greear <greearb@candelatech.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Variable 'i' should be unsigned long as it's used in circle with num_pages,
and bytes_read/total_written should be ssize_t according to return value.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
New compiler warnings that I noticed when building a patchset based
on recent Fedora kernel:
fs/cifs/cifssmb.c: In function 'CIFSSMBSetFileSize':
fs/cifs/cifssmb.c:4813:8: warning: variable 'data_offset' set but not used
[-Wunused-but-set-variable]
fs/cifs/file.c: In function 'cifs_open':
fs/cifs/file.c:349:24: warning: variable 'pCifsInode' set but not used
[-Wunused-but-set-variable]
fs/cifs/file.c: In function 'cifs_partialpagewrite':
fs/cifs/file.c:1149:23: warning: variable 'cifs_sb' set but not used
[-Wunused-but-set-variable]
fs/cifs/file.c: In function 'cifs_iovec_write':
fs/cifs/file.c:1740:9: warning: passing argument 6 of 'CIFSSMBWrite2' from
incompatible pointer type [enabled by default]
fs/cifs/cifsproto.h:337:12: note: expected 'unsigned int *' but argument is
of type 'size_t *'
fs/cifs/readdir.c: In function 'cifs_readdir':
fs/cifs/readdir.c:767:23: warning: variable 'cifs_sb' set but not used
[-Wunused-but-set-variable]
fs/cifs/cifs_dfs_ref.c: In function 'cifs_dfs_d_automount':
fs/cifs/cifs_dfs_ref.c:342:2: warning: 'rc' may be used uninitialized in
this function [-Wuninitialized]
fs/cifs/cifs_dfs_ref.c:278:6: note: 'rc' was declared here
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If we don't have Exclusive oplock we write a data to the server.
Also set invalidate_mapping flag on the inode if we wrote something
to the server. Add cifs_iovec_write to let the client write iovec
buffers through CIFSSMBWrite2.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Read from the cache if we have at least Level II oplock - otherwise
read from the server. Add cifs_user_readv to let the client read into
iovec buffers.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Invalidate inode mapping if we don't have at least Level II oplock.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Invalidate inode mapping if we don't have at least Level II oplock in
cifs_strict_fsync. Also remove filemap_write_and_wait call from cifs_fsync
because it is previously called from vfs_fsync_range. Add file operations'
structures for strict cache mode.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
On strict cache mode when we close the last file handle of the inode we
should set invalid_mapping flag on this inode to prevent data coherency
problem when we open it again but it has been modified on the server.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Since we don't time out individual requests anymore, remove the code
that we used to use for setting timeouts on different requests.
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If CIFSSMBWrite2 returns -EAGAIN, then the error should be considered
temporary. CIFS should retry the write instead of setting an error on
the mapping and returning.
For WB_SYNC_ALL, just retry the write immediately. In the WB_SYNC_NONE
case, call redirty_page_for_writeback on all of the pages that didn't
get written out and then move on.
Also, fix up the handling of a short write with a successful return
code. MS-CIFS says that 0 bytes_written means ENOSPC or EFBIG. It
doesn't mention what a short, but non-zero write means, so for now
treat it as we would an -EAGAIN return.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we get oplock break notification we should set the appropriate
value of OplockLevel field in oplock break acknowledge according to
the oplock level held by the client in this time. As we only can have
level II oplock or no oplock in the case of oplock break, we should be
aware only about clientCanCacheRead field in cifsInodeInfo structure.
Also fix bug connected with wrong interpretation of OplockLevel field
during oplock break notification processing.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We call CIFSSMBUnixSetPathInfo in these functions, but we have a
filehandle since an open was just done. Switch these functions to
use CIFSSMBUnixSetFileInfo instead.
In practice, these codepaths are only used if posix opens are broken.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Make the code more general for use in posix and non-posix open.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Delete cifs_open_inode_helper and move non-posix open related things
to cifs_nt_open function.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Some of the code under CONFIG_CIFS_ACL is dependent upon code under
CONFIG_CIFS_EXPERIMENTAL, but the Kconfig options don't reflect that
dependency. Move more of the ACL code out from under
CONFIG_CIFS_EXPERIMENTAL and under CONFIG_CIFS_ACL.
Also move find_readable_file out from other any sort of Kconfig
option and make it a function normally compiled in.
Reported-and-Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cfile may very well be freed after the cifsFileInfo_put. Make sure we
have a valid pointer to the superblock for cifs_sb_deactive.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
All the callers already have a pointer to struct cifsInodeInfo. Use it.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch is based on Dan's original patch. His original description is
below:
Smatch complained about a couple checking for NULL after dereferencing
bugs. I'm not super familiar with the code so I did the conservative
thing and move the dereferences after the checks.
The dereferences in cifs_lock() and cifs_fsync() were added in
ba00ba64cf "cifs: make various routines use the cifsFileInfo->tcon
pointer". The dereference in find_writable_file() was added in
6508d904e6 "cifs: have find_readable/writable_file filter by fsuid".
The comments there say it's possible to trigger the NULL dereference
under stress.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Simplify many places when we need to set oplock level on an inode.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Stanse found that pSMBFile in cifs_ioctl and file->f_path.dentry in
cifs_user_write are dereferenced prior their test to NULL.
The alternative is not to dereference them before the tests. The patch is
to point out the problem, you have to decide.
While at it we cache the inode in cifs_user_write to a local variable
and use all over the function.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: Cleanup and thus reduce smb session structure and fields used during authentication
NTLM auth and sign - Use appropriate server challenge
cifs: add kfree() on error path
NTLM auth and sign - minor error corrections and cleanup
NTLM auth and sign - Use kernel crypto apis to calculate hashes and smb signatures
NTLM auth and sign - Define crypto hash functions and create and send keys needed for key exchange
cifs: cifs_convert_address() returns zero on error
NTLM auth and sign - Allocate session key/client response dynamically
cifs: update comments - [s/GlobalSMBSesLock/cifs_file_list_lock/g]
cifs: eliminate cifsInodeInfo->write_behind_rc (try #6)
[CIFS] Fix checkpatch warnings and bump cifs version number
cifs: wait for writeback to complete in cifs_flush
cifs: convert cifsFileInfo->count to non-atomic counter
We leak 256 bytes here on this error path.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This removes more dead code that was somehow missed by commit 0d99519efe
(writeback: remove unused nonblocking and congestion checks). There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.
The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion. The latter will lead to more seeky IO.
The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.
We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
that would skip some dirty inodes on congestion and page out others, which
is unfair in terms of LRU age.
Inspired by Christoph Hellwig. Thanks!
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GlobalSMBSesLock is now cifs_file_list_lock. Update comments to reflect this.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
write_behind_rc is redundant and just adds complexity to the code. What
we really want to do instead is to use mapping_set_error to reset the
flags on the mapping when we find a writeback error and can't report it
to userspace yet.
For cifs_flush and cifs_fsync, we shouldn't reset the flags since errors
returned there do get reported to userspace.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The f_op->flush operation is the last chance to return a writeback
related error when closing a file. Ensure that we don't miss reporting
any errors by waiting for writeback to complete in cifs_flush before
proceeding.
There's no reason to do this when the file isn't open for write
however.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: David Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The count for cifsFileInfo is currently an atomic, but that just adds
complexity for little value. We generally need to hold cifs_file_list_lock
to traverse the lists anyway so we might as well make this counter
non-atomic and simply use the cifs_file_list_lock to protect it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Now that it's feasible for a cifsFileInfo to outlive the filp under
which it was created, move the close processing into cifsFileInfo_put.
This means that the last user of the filehandle always does the actual
on the wire close call. This also allows us to get rid of the closePend
flag from cifsFileInfo. If we have an active reference to the file
then it's never going to have a close pending.
cifs_close is converted to simply put the filehandle.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...and make it non-inlined in preparation for the move of most of
cifs_close to it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Convert this lock to a regular spinlock
A rwlock_t offers little value here. It's more expensive than a regular
spinlock unless you have a fairly large section of code that runs under
the read lock and can benefit from the concurrency.
Additionally, we need to ensure that the refcounting for files isn't
racy and to do that we need to lock areas that can increment it for
write. That means that the areas that can actually use a read_lock are
very few and relatively infrequently used.
While we're at it, change the name to something easier to type, and fix
a bug in find_writable_file. cifsFileInfo_put can sleep and shouldn't be
called while holding the lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's currently in dir.c which makes little sense...
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
All the remaining users of cifsFileInfo->pfile just use it to get
at the f_flags/f_mode. Now that we store that separately in the
cifsFileInfo, there's no need to consult the pfile at all from
a cifsFileInfo pointer.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Have cifs_write take a cifsFileInfo pointer instead of a filp. Since
cifsFileInfo holds references on the dentry, and that holds one to
the inode, we can eliminate some unneeded NULL pointer checks.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Add a f_flags field that holds the f_flags field from the filp. We'll
need this info in case the filp ever goes away before the cifsFileInfo
does. Have cifs_reopen_file use that value instead of filp->f_flags
too and have it take a cifsFileInfo arg instead of a filp.
While we're at it, get rid of some bogus cargo-cult NULL pointer
checks in that function and reduce the level of indentation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It already takes a file pointer. The inode associated with that had damn
well better be the same one we're passing in anyway. Thus, there's no
need for a separate argument here.
Also, get rid of the bogus check for a null pCifsInode pointer. The
CIFS_I macro uses container_of(), and that will virtually never return a
NULL pointer anyway.
Finally, move the setting of the canCache* flags outside of the lock.
Other places in the code don't hold that lock when setting it, so I
assume it's not really needed here either.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Eliminate the poor, misunderstood "oflags" option from cifs_new_fileinfo.
The callers mostly pass in the filp->f_flags here.
That's not correct however since we're checking that value for
the presence of FMODE_READ. Luckily that only affects how the f_list is
ordered. What it really wants here is the file->f_mode. Just use that
field from the filp to determine it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The way flags are passed and converted for cifs_posix_open is rather
non-sensical. Some callers call cifs_posix_convert_flags on the flags
before they pass them to cifs_posix_open, whereas some don't. Two flag
conversion steps is just confusing though.
Change the function instead to clearly expect input in f_flags format,
and fix the callers to pass that in. Then, have cifs_posix_open call
cifs_convert_posix_flags to do the conversion. Move cifs_posix_open to
file.c as well so we can keep cifs_convert_posix_flags as a static
function.
Fix it also to not ignore O_CREAT, O_EXCL and O_TRUNC, and instead have
cifs_reopen_file mask those bits off before calling cifs_posix_open.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs: eliminate cifs_posix_open_inode_helper
This function is redundant. The only thing it does is set the canCache
flags, but those get set in cifs_new_fileinfo anyway.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Filesystems aren't really supposed to do anything with a vfsmount. It's
considered a layering violation since vfsmounts are entirely managed at
the VFS layer.
CIFS currently keeps an active reference to a vfsmount in order to
prevent the superblock vanishing before an oplock break has completed.
What we really want to do instead is to keep sb->s_active high until the
oplock break has completed. This patch borrows the scheme that NFS uses
for handling sillyrenames.
An atomic_t is added to the cifs_sb_info. When it transitions from 0 to
1, an extra reference to the superblock is taken (by bumping the
s_active value). When it transitions from 1 to 0, that reference is
dropped and a the superblock teardown may proceed if there are no more
references to it.
Also, the vfsmount pointer is removed from cifsFileInfo and from
cifs_new_fileinfo, and some bogus forward declarations are removed from
cifsfs.h.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifsFileInfo is a bit problematic. It contains a reference back to the
struct file itself. This makes it difficult for a cifsFileInfo to exist
without a corresponding struct file.
It would be better instead of the cifsFileInfo just held info pertaining
to the open file on the server instead without any back refrences to the
struct file. This would allow it to exist after the filp to which it was
originally attached was closed.
Much of the use of the file pointer in this struct is to get at the
dentry. Begin divorcing the cifsFileInfo from the struct file by
keeping a reference to the dentry. Since the dentry will have a
reference to the inode, we can eliminate the "pInode" field too and
convert the igrab/iput to dget/dput.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>