mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-16 06:46:13 +07:00
fddb5d430a
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8
("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
105 lines
3.6 KiB
C
105 lines
3.6 KiB
C
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
|
#ifndef _UAPI_LINUX_FCNTL_H
|
|
#define _UAPI_LINUX_FCNTL_H
|
|
|
|
#include <asm/fcntl.h>
|
|
#include <linux/openat2.h>
|
|
|
|
#define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0)
|
|
#define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1)
|
|
|
|
/*
|
|
* Cancel a blocking posix lock; internal use only until we expose an
|
|
* asynchronous lock api to userspace:
|
|
*/
|
|
#define F_CANCELLK (F_LINUX_SPECIFIC_BASE + 5)
|
|
|
|
/* Create a file descriptor with FD_CLOEXEC set. */
|
|
#define F_DUPFD_CLOEXEC (F_LINUX_SPECIFIC_BASE + 6)
|
|
|
|
/*
|
|
* Request nofications on a directory.
|
|
* See below for events that may be notified.
|
|
*/
|
|
#define F_NOTIFY (F_LINUX_SPECIFIC_BASE+2)
|
|
|
|
/*
|
|
* Set and get of pipe page size array
|
|
*/
|
|
#define F_SETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 7)
|
|
#define F_GETPIPE_SZ (F_LINUX_SPECIFIC_BASE + 8)
|
|
|
|
/*
|
|
* Set/Get seals
|
|
*/
|
|
#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
|
|
#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
|
|
|
|
/*
|
|
* Types of seals
|
|
*/
|
|
#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
|
|
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
|
|
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
|
|
#define F_SEAL_WRITE 0x0008 /* prevent writes */
|
|
#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
|
|
/* (1U << 31) is reserved for signed error codes */
|
|
|
|
/*
|
|
* Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
|
|
* underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
|
|
* the specific file.
|
|
*/
|
|
#define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11)
|
|
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
|
|
#define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13)
|
|
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)
|
|
|
|
/*
|
|
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
|
|
* used to clear any hints previously set.
|
|
*/
|
|
#define RWH_WRITE_LIFE_NOT_SET 0
|
|
#define RWH_WRITE_LIFE_NONE 1
|
|
#define RWH_WRITE_LIFE_SHORT 2
|
|
#define RWH_WRITE_LIFE_MEDIUM 3
|
|
#define RWH_WRITE_LIFE_LONG 4
|
|
#define RWH_WRITE_LIFE_EXTREME 5
|
|
|
|
/*
|
|
* The originally introduced spelling is remained from the first
|
|
* versions of the patch set that introduced the feature, see commit
|
|
* v4.13-rc1~212^2~51.
|
|
*/
|
|
#define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET
|
|
|
|
/*
|
|
* Types of directory notifications that may be requested.
|
|
*/
|
|
#define DN_ACCESS 0x00000001 /* File accessed */
|
|
#define DN_MODIFY 0x00000002 /* File modified */
|
|
#define DN_CREATE 0x00000004 /* File created */
|
|
#define DN_DELETE 0x00000008 /* File removed */
|
|
#define DN_RENAME 0x00000010 /* File renamed */
|
|
#define DN_ATTRIB 0x00000020 /* File changed attibutes */
|
|
#define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
|
|
|
|
#define AT_FDCWD -100 /* Special value used to indicate
|
|
openat should use the current
|
|
working directory. */
|
|
#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
|
|
#define AT_REMOVEDIR 0x200 /* Remove directory instead of
|
|
unlinking file. */
|
|
#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
|
|
#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
|
|
#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
|
|
|
|
#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
|
|
#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
|
|
#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
|
|
#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
|
|
|
|
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
|
|
|
|
#endif /* _UAPI_LINUX_FCNTL_H */
|