linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 01:20:38 +07:00

Author	SHA1	Message	Date
Miklos Szeredi	4a2abf99f9	fuse: add FUSE_WRITE_KILL_PRIV In the FOPEN_DIRECT_IO case the write path doesn't call file_remove_privs() and that means setuid bit is not cleared if unpriviliged user writes to a file with setuid bit set. pjdfstest chmod test 12.t tests this and fails. Fix this by adding a flag to the FUSE_WRITE message that requests clearing privileges on the given file. This needs This better than just calling fuse_remove_privs(), because the attributes may not be up to date, so in that case a write may miss clearing the privileges. Test case: $ passthrough_ll /mnt/pasthrough-mnt -o default_permissions,allow_other,cache=never $ mkdir /mnt/pasthrough-mnt/testdir $ cd /mnt/pasthrough-mnt/testdir $ prove -rv pjdfstests/tests/chmod/12.t Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com>	2019-05-27 11:42:36 +02:00
Ian Abbott	6407f44aaf	fuse: Add ioctl flag for x32 compat ioctl Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl request comes from a process running a 32-bit ABI, but cannot tell whether the requesting process is using legacy IA32 emulation or x32 ABI. In particular, the server does not know the size of the client process's `time_t` type. For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags are currently set in the ioctl input request (`struct fuse_ioctl_in` member `flags`) for a 32-bit requesting process. This patch defines a new flag `FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is using the x32 ABI. This allows the server process to distinguish between requests coming from client processes using IA32 emulation or the x32 ABI and so infer the size of the client process's `time_t` type and any other IA32/x32 differences. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Alan Somers	7142fd1be3	fuse: fix changelog entry for protocol 7.9 Retroactively add changelog entry for the atime and mtime "now" flags. This was an oversight in commit `17637cbaba` ("fuse: improve utimes support"). Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Alan Somers	68065b8415	fuse: fix changelog entry for protocol 7.12 This was a mistake in the comment in commit `e0a43ddcc0` ("fuse: allow umask processing in userspace"). Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Alan Somers	154603fe3e	fuse: document fuse_fsync_in.fsync_flags The FUSE_FSYNC_DATASYNC flag was introduced by commit `b6aeadeda2` ("[PATCH] FUSE - file operations") as a magic number. No new values have been added to fsync_flags since. Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Kirill Smelkov	bbd84f3365	fuse: Add FOPEN_STREAM to use stream_open() Starting from commit `9c225f2655` ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit `10dce8af34` ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit `581d21a2d0` ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM \| FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Kirill Smelkov	ad2ba64dd4	fuse: allow filesystems to have precise control over data cache On networked filesystems file data can be changed externally. FUSE provides notification messages for filesystem to inform kernel that metadata or data region of a file needs to be invalidated in local page cache. That provides the basis for filesystem implementations to invalidate kernel cache explicitly based on observed filesystem-specific events. FUSE has also "automatic" invalidation mode() when the kernel automatically invalidates data cache of a file if it sees mtime change. It also automatically invalidates whole data cache of a file if it sees file size being changed. The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA. However, due to probably historical reason, that capability controls only whether mtime change should be resulting in automatic invalidation or not. A change in file size always results in invalidating whole data cache of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+). The filesystem I write[1] represents data arrays stored in networked database as local files suitable for mmap. It is read-only filesystem - changes to data are committed externally via database interfaces and the filesystem only glues data into contiguous file streams suitable for mmap and traditional array processing. The files are big - starting from hundreds gigabytes and more. The files change regularly, and frequently by data being appended to their end. The size of files thus changes frequently. If a file was accessed locally and some part of its data got into page cache, we want that data to stay cached unless there is memory pressure, or unless corresponding part of the file was actually changed. However current FUSE behaviour - when it sees file size change - is to invalidate the whole file. The data cache of the file is thus completely lost even on small size change, and despite that the filesystem server is careful to accurately translate database changes into FUSE invalidation messages to kernel. Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA capability, indicates to kernel that it is fully responsible for data cache invalidation, then the kernel won't invalidate files data cache on size change and only truncate that cache to new size in case the size decreased. () see `72d0d248ca` "fuse: add FUSE_AUTO_INVAL_DATA init flag", `eed2179efe` "fuse: invalidate inode mapping if mtime changes" (+) in writeback mode the kernel does not invalidate data cache on file size change, but neither it allows the filesystem to set the size due to external event (see `8373200b12` "fuse: Trust kernel i_size only") [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20 Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:06 +02:00
Chad Austin	d9a9ea94f7	fuse: support clients that don't implement 'opendir' Allow filesystems to return ENOSYS from opendir, preventing the kernel from sending opendir and releasedir messages in the future. This avoids userspace transitions when filesystems don't need to keep track of state per directory handle. A new capability flag, FUSE_NO_OPENDIR_SUPPORT, parallels FUSE_NO_OPEN_SUPPORT, indicating the new semantics for returning ENOSYS from opendir. Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:15 +01:00
Dan Schatzberg	5571f1e654	fuse: enable caching of symlinks FUSE file reads are cached in the page cache, but symlink reads are not. This patch enables FUSE READLINK operations to be cached which can improve performance of some FUSE workloads. In particular, I'm working on a FUSE filesystem for access to source code and discovered that about a 10% improvement to build times is achieved with this patch (there are a lot of symlinks in the source tree). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:07 +02:00
Constantine Shulyupin	5da784cce4	fuse: add max_pages to init_out Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 \| grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:06 +02:00
Miklos Szeredi	6433b8998a	fuse: add FOPEN_CACHE_DIR Add flag returned by OPENDIR request to allow kernel to cache directory contents in page cache. The effect of FOPEN_CACHE_DIR is twofold: a) if not already cached, it writes entries into the cache b) if already cached, it allows reading entries from the cache The FOPEN_KEEP_CACHE has the same effect as on regular files: unless this flag is given the cache is cleared upon completion of open. So FOPEN_KEEP_CACHE and FOPEN_KEEP_CACHE flags should be used together to make use of the directory caching facility introduced in the following patches. The FUSE_AUTO_INVAL_DATA flag returned in INIT reply also has the same affect on the directory cache as it has on data cache for regular files. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Niels de Vos	88bc7d5097	fuse: add support for copy_file_range() There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Szymon Lukasz	3b7008b226	fuse: return -ECONNABORTED on /dev/fuse read after abort Currently the userspace has no way of knowing whether the fuse connection ended because of umount or abort via sysfs. It makes it hard for filesystems to free the mountpoint after abort without worrying about removing some new mount. The patch fixes it by returning different errors when userspace reads from /dev/fuse (-ENODEV for umount and -ECONNABORTED for abort). Add a new capability flag FUSE_ABORT_ERROR. If set and the connection is gone because of sysfs abort, reading from the device will return -ECONNABORTED. Signed-off-by: Szymon Lukasz <noh4hss@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-03-20 17:11:44 +01:00
Greg Kroah-Hartman	e2be04c7f9	License cleanup: add SPDX license identifier to uapi header files with a license Many user space API headers have licensing information, which is either incomplete, badly formatted or just a shorthand for referring to the license under which the file is supposed to be. This makes it hard for compliance tools to determine the correct license. Update these files with an SPDX license identifier. The identifier was chosen based on the license information in the file. GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license identifier with the added 'WITH Linux-syscall-note' exception, which is the officially assigned exception identifier for the kernel syscall exception: NOTE! This copyright does not cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does not fall under the heading of "derived work". This exception makes it possible to include GPL headers into non GPL code, without confusing license compliance tools. Headers which have either explicit dual licensing or are just licensed under a non GPL license are updated with the corresponding SPDX identifier and the GPLv2 with syscall exception identifier. The format is: ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE) SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. The update does not remove existing license information as this has to be done on a case by case basis and the copyright holders might have to be consulted. This will happen in a separate step. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:20:11 +01:00
Seth Forshee	60bcc88ad1	fuse: Add posix ACL support Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with userspace. When it is set in the INIT response, ACL support will be enabled. ACL support also implies "default_permissions". When ACL support is enabled, the kernel will cache and have responsibility for enforcing ACLs. ACL xattrs will be passed to userspace, which is responsible for updating the ACLs in the filesystem, keeping the file mode in sync, and inheritance of default ACLs when new filesystem nodes are created. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-10-01 07:32:32 +02:00
Miklos Szeredi	5e940c1dd3	fuse: handle killpriv in userspace fs Only userspace filesystem can do the killing of suid/sgid without races. So introduce an INIT flag and negotiate support for this. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-10-01 07:32:32 +02:00
Miklos Szeredi	5c672ab3f0	fuse: serialize dirops by default Negotiate with userspace filesystems whether they support parallel readdir and lookup. Disable parallelism by default for fear of breaking fuse filesystems. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `9902af79c0` ("parallel lookups: actual switch to rwsem") Fixes: `d9b3dbdcfd` ("fuse: switch to ->iterate_shared()")	2016-06-30 13:10:49 +02:00
Ravishankar N	0b5da8db14	fuse: add support for SEEK_HOLE and SEEK_DATA in lseek A useful performance improvement for accessing virtual machine images via FUSE mount. See https://bugzilla.redhat.com/show_bug.cgi?id=1220173 for a use-case for glusterFS. Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2015-11-10 10:32:37 +01:00
Miklos Szeredi	00c570f4ba	fuse: device fd clone Allow an open fuse device to be "cloned". Userspace can create a clone by: newfd = open("/dev/fuse", O_RDWR) ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd); At this point newfd will refer to the same fuse connection as oldfd. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>	2015-07-01 16:26:08 +02:00
Andrew Gallagher	d7afaec0b5	fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT Here some additional changes to set a capability flag so that clients can detect when it's appropriate to return -ENOSYS from open. This amends the following commit introduced in 3.14: `7678ac5061` fuse: support clients that don't implement 'open' However we can only add the flag to 3.15 and later since there was no protocol version update in 3.14. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+	2014-07-22 16:37:43 +02:00
Miklos Szeredi	1560c974dc	fuse: add renameat2 support Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-28 16:43:44 +02:00
Maxim Patlasov	ab9e13f7c7	fuse: allow ctime flushing to userspace The patch extends fuse_setattr_in, and extends the flush procedure (fuse_flush_times()) called on ->write_inode() to send the ctime as well as mtime. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-28 14:19:24 +02:00
Miklos Szeredi	e27c9d3877	fuse: fuse: add time_gran to INIT_OUT Allow userspace fs to specify time granularity. This is needed because with writeback_cache mode the kernel is responsible for generating mtime and ctime, but if the underlying filesystem doesn't support nanosecond granularity then the cache will contain a different value from the one stored on the filesystem resulting in a change of times after a cache flush. Make the default granularity 1s. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-28 14:19:23 +02:00
Pavel Emelyanov	4d99ff8f12	fuse: Turn writeback cache on Introduce a bit kernel and userspace exchange between each-other on the init stage and turn writeback on if the userspace want this and mount option 'allow_wbcache' is present (controlled by fusermount). Also add each writable file into per-inode write list and call the generic_file_aio_write to make use of the Linux page cache engine. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Miklos Szeredi	60b9df7a54	fuse: add flag to turn on async direct IO Without async DIO write requests to a single file were always serialized. With async DIO that's no longer the case. So don't turn on async DIO by default for fear of breaking backward compatibility. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-05-01 14:37:21 +02:00
Miklos Szeredi	4c82456eeb	fuse: fix type definitions in uapi header Commit `7e98d53086` (Synchronize fuse header with one used in library) added #ifdef __linux__ around defines if it is not set. The kernel build is self-contained and can be built on non-Linux toolchains. After the mentioned commit builds on non-Linux toolchains will try to include stdint.h and fail due to -nostdinc, and then fail with a bunch of undefined type errors. Fix by checking for __KERNEL__ instead of __linux__ and using the standard int types instead of the linux specific ones. Reported-by: Arve Hjønnevåg <arve@android.com> Reported-by: Colin Cross <ccross@android.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-04-17 12:30:40 +02:00
Eric Wong	634734b63a	fuse: allow control of adaptive readdirplus use For some filesystems (e.g. GlusterFS), the cost of performing a normal readdir and readdirplus are identical. Since adaptively using readdirplus has no benefit for those systems, give users/filesystems the option to control adaptive readdirplus use. v2 of this patch incorporates Miklos's suggestion to simplify the code, as well as improving consistency of macro names and documentation. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-02-07 14:25:44 +01:00
Miklos Szeredi	7e98d53086	Synchronize fuse header with one used in library The library one has provisions for use in *BSD, add them to the kernel one too. They don't hurt and ease maintenance. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-02-07 11:58:12 +01:00
Enke Chen	0415d29102	fuse: send poll events commit `626cf23660` "poll: add poll_requested_events()..." enabled us to send the requested events to the filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-02-04 16:14:32 +01:00
Miklos Szeredi	23c153e541	fuse: bump version for READDIRPLUS Yeah, we have a capability flag for this as well, so this is not strictly necessary, but it doesn't hurt either. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-31 17:08:11 +01:00
Anand V. Avati	0b05b18381	fuse: implement NFS-like readdirplus support This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. Signed-off-by: Anand V. Avati <avati@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-01-24 16:21:25 +01:00
David Howells	607ca46e97	UAPI: (Scripted) Disintegrate include/linux Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-13 10:46:48 +01:00

32 Commits