Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done. For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If a fuse dev connection is broken, wake up any
processes that are blocking, in a poll system call,
on one of the files in the now defunct filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Single threaded NTFS-3G could get stuck if a delayed RELEASE reply
triggered a DESTROY request via path_put().
Fix this by
a) making RELEASE requests synchronous, whenever possible, on fuseblk
filesystems
b) if not possible (triggered by an asynchronous read/write) then do
the path_put() in a separate thread with schedule_work().
Reported-by: Oliver Neukum <oneukum@suse.de>
Cc: stable@kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
In kernel ABI version 7.16 and later FUSE_IOCTL_RETRY reply from a
unrestricted IOCTL request shall return with an array of 'struct
fuse_ioctl_iovec' instead of 'struct iovec'. This fixes the ABI
ambiguity of 32bit vs. 64bit.
Reported-by: "ccmail111" <ccmail111@yahoo.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Tejun Heo <tj@kernel.org>
Verify that the total length of the iovec returned in FUSE_IOCTL_RETRY
doesn't overflow iov_length().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: <stable@kernel.org> [2.6.31+]
If a 32bit CUSE server is run on 64bit this results in EIO being
returned to the caller.
The reason is that FUSE_IOCTL_RETRY reply was defined to use 'struct
iovec', which is different on 32bit and 64bit archs.
Work around this by looking at the size of the reply to determine
which struct was used. This is only needed if CONFIG_COMPAT is
defined.
A more permanent fix for the interface will be to use the same struct
on both 32bit and 64bit.
Reported-by: "ccmail111" <ccmail111@yahoo.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: <stable@kernel.org> [2.6.31+]
The attribute cache for a file was not being cleared when a file is opened
with O_TRUNC.
If the filesystem's open operation truncates the file ("atomic_o_trunc"
feature flag is set) then the kernel should invalidate the cached st_mtime
and st_ctime attributes.
Also i_size should be explicitly be set to zero as it is used sometimes
without refreshing the cache.
Signed-off-by: Ken Sumrall <ksumrall@android.com>
Cc: Anfei <anfei.zhou@gmail.com>
Cc: "Anand V. Avati" <avati@gluster.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sparse doesn't understand lock annotations of the form
__releases(&foo->lock). Change them to __releases(foo->lock). Same
for __acquires().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Userspace filesystem can request data to be stored in the inode's
mapping. This request is synchronous and has no reply. If the write
to the fuse device returns an error then the store request was not
fully completed (but may have updated some pages).
If the stored data overflows the current file size, then the size is
extended, similarly to a write(2) on the filesystem.
Pages which have been completely stored are marked uptodate.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
mm: export generic_pipe_buf_*() to modules
fuse: support splice() reading from fuse device
fuse: allow splice to move pages
mm: export remove_from_page_cache() to modules
mm: export lru_cache_add_*() to modules
fuse: support splice() writing to fuse device
fuse: get page reference for readpages
fuse: use get_user_pages_fast()
fuse: remove unneeded variable
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
move pages from the pipe buffer into the page cache. This allows
populating the fuse filesystem's cache without ever touching the page
contents, i.e. zero copy read capability.
The following steps are performed when trying to move a page into the
page cache:
- buf->ops->confirm() to make sure the new page is uptodate
- buf->ops->steal() to try to remove the new page from it's previous place
- remove_from_page_cache() on the old page
- add_to_page_cache_locked() on the new page
If any of the above steps fail (non fatally) then the code falls back
to copying the page. In particular ->steal() will fail if there are
external references (other than the page cache and the pipe buffer) to
the page.
Also since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail. This could
be fixed by creating a new atomic replace_page_cache_page() function.
fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed.
A number of sanity checks were added to make sure the stolen pages
don't have weird flags set, etc... These could be moved into generic
splice/steal code.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acquire a page ref on pages in ->readpages() and release them when the
read has finished. Not acquiring a reference didn't seem to cause any
trouble since the page is locked and will not be kicked out of the
page cache during the read.
However the following patches will want to remove the page from the
cache so a separate ref is needed. Making the reference in req->pages
explicit also makes the code easier to understand.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Replace uses of get_user_pages() with get_user_pages_fast(). It looks
nicer and should be faster in most cases.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
"map" isn't needed any more after: 0bd87182d3 "fuse: fix kunmap in
fuse_ioctl_copy_user"
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The cache alias problem will happen if the changes of user shared mapping
is not flushed before copying, then user and kernel mapping may be mapped
into two different cache line, it is impossible to guarantee the coherence
after iov_iter_copy_from_user_atomic. So the right steps should be:
flush_dcache_page(page);
kmap_atomic(page);
write to page;
kunmap_atomic(page);
flush_dcache_page(page);
More precisely, we might create two new APIs flush_dcache_user_page and
flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
Here is a snippet tested on omap2430 with VIPT cache, and I think it is
not ARM-specific:
int val = 0x11111111;
fd = open("abc", O_RDWR);
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*(addr+0) = 0x44444444;
tmp = *(addr+0);
*(addr+1) = 0x77777777;
write(fd, &val, sizeof(int));
close(fd);
The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777.
Signed-off-by: Anfei <anfei.zhou@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <linux-arch@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Looks like another victim of the confusing kmap() vs kmap_atomic() API
differences.
Reported-by: Todor Gyumyushev <yodor1@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
fuse_direct_io() has a loop where requests are allocated in each
iteration. if allocation fails, the loop is broken out and follows
into an unconditional fuse_put_request() on that invalid pointer.
Signed-off-by: Anand V. Avati <avati@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@kernel.org
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ff->fc and ff->nodeid instead of file->f_dentry->d_inode in the
fuse_file_poll() implementation.
This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Create a helper for sending an IOCTL request that doesn't use a struct
inode.
This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Make fuse_sync_release() a generic helper function that doesn't need a
struct inode pointer. This makes it suitable for use by CUSE.
Change return value of fuse_release_common() from int to void.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Move setting ff->fh, ff->nodeid and file->private_data outside
fuse_finish_open(). Add ->open_flags member to struct fuse_file.
This simplifies the argument passing to fuse_finish_open() and
fuse_release_fill(), and paves the way for creating an open helper
that doesn't need an inode pointer.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Use ff->fc and ff->nodeid instead of passing down the inode.
This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Add new members ->fc and ->nodeid to struct fuse_file. This will aid
in converting functions for use by CUSE, where the inode is not owned
by a fuse filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Move code operating on the inode out from fuse_direct_io().
This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Move out code from fuse_write_fill() which is not common to all
callers. Remove two function arguments which become unnecessary.
Also remove unnecessary memset(), the request is already initialized
to zero.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse_file_alloc() was structured in weird way. The success path was
split between else block and code following the block. Restructure
the code such that it's easier to read and modify.
* Unindent success path of fuse_release_common() to ease future
changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
MAP_PRIVATE mmap could return stale data from the cache for
"direct_io" files. Fix this by flushing the cache on mmap.
Found with a slightly modified fsx-linux.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fix the following warning:
fs/fuse/file.c: In function 'fuse_direct_io':
fs/fuse/file.c:1002: warning: passing argument 3 of 'fuse_get_user_pages' from incompatible pointer type
This was introduced by commit f4975c67 "fuse: allow kernel to access
"direct_io" files".
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Allow MAP_PRIVATE mmaps of "direct_io" files. This is necessary for
execute support.
MAP_SHARED mappings require some sort of coherency between the
underlying file and the mapping. With "direct_io" it is difficult to
provide this, so for the moment just disallow shared (read-write and
read-only) mappings altogether.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Allow the kernel read and write on "direct_io" files. This is
necessary for nfs export and execute support.
The implementation is simple: if an access from the kernel is
detected, don't perform get_user_pages(), just use the kernel address
provided by the requester to copy from/to the userspace filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags. There should be no functional change.
This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg. virtual_address to the
driver, which might be important in some special cases).
This is required for a subsequent fix. And will also make it easier to
merge page_mkwrite() with fault() in future.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This bug was found with smatch (http://repo.or.cz/w/smatch.git/). If
we return directly the inode->i_mutex lock doesn't get released.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
ff is set to NULL and then dereferenced on line 65. Compile tested only.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up annotations of fc->lock
fuse: fix sparse warning in ioctl
fuse: update interface version
fuse: add fuse_conn->release()
fuse: separate out fuse_conn_init() from new_conn()
fuse: add fuse_ prefix to several functions
fuse: implement poll support
fuse: implement unsolicited notification
fuse: add file kernel handle
fuse: implement ioctl support
fuse: don't let fuse_req->end() put the base reference
fuse: move FUSE_MINOR to miscdevice.h
fuse: style fixes
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Makes the existing annotations match the more common one per line style
and adds a few missing annotations.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE. With or without CUSE
export, having the function names scoped is a good idea for
debuggability.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Implement poll support. Polled files are indexed using kh in a RB
tree rooted at fuse_conn->polled_files.
Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.
fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.
http://thread.gmane.org/gmane.linux.kernel/726176
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed. Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.
This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Generic ioctl support is tricky to implement because only the ioctl
implementation itself knows which memory regions need to be read
and/or written. To support this, fuse client can request retry of
ioctl specifying memory regions to read and write. Deep copying
(nested pointers) can be implemented by retrying multiple times
resolving one depth of dereference at a time.
For security and cleanliness considerations, ioctl implementation has
restricted mode where the kernel determines data transfer directions
and sizes using the _IOC_*() macros on the ioctl command. In this
mode, retry is not allowed.
For all FUSE servers, restricted mode is enforced. Unrestricted ioctl
will be used by CUSE.
Plese read the comment on top of fs/fuse/file.c::fuse_file_do_ioctl()
for more information.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>