License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2017-02-09 00:51:36 +07:00
|
|
|
#ifndef _LINUX_SCHED_TASK_H
|
|
|
|
#define _LINUX_SCHED_TASK_H
|
|
|
|
|
2017-02-03 21:24:12 +07:00
|
|
|
/*
|
|
|
|
* Interface between the scheduler and various task lifetime (fork()/exit())
|
|
|
|
* functionality:
|
|
|
|
*/
|
|
|
|
|
2017-02-09 00:51:36 +07:00
|
|
|
#include <linux/sched.h>
|
fork: add clone3
This adds the clone3 system call.
As mentioned several times already (cf. [7], [8]) here's the promised
patchset for clone3().
We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
free flag from clone().
Independent of the CLONE_PIDFD patchset a time namespace has been discussed
at Linux Plumber Conference last year and has been sent out and reviewed
(cf. [5]). It is expected that it will go upstream in the not too distant
future. However, it relies on the addition of the CLONE_NEWTIME flag to
clone(). The only other good candidate - CLONE_DETACHED - is currently not
recyclable as we have identified at least two large or widely used
codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
blocked. clone3() has the advantage that it will unblock this patchset
again. In general, clone3() is extensible and allows for the implementation
of new features.
The idea is to keep clone3() very simple and close to the original clone(),
specifically, to keep on supporting old clone()-based workloads.
We know there have been various creative proposals how a new process
creation syscall or even api is supposed to look like. Some people even
going so far as to argue that the traditional fork()+exec() split should be
abandoned in favor of an in-kernel version of spawn(). Independent of
whether or not we personally think spawn() is a good idea this patchset has
and does not want to have anything to do with this.
One stance we take is that there's no real good alternative to
clone()+exec() and we need and want to support this model going forward;
independent of spawn().
The following requirements guided clone3():
- bump the number of available flags
- move arguments that are currently passed as separate arguments
in clone() into a dedicated struct clone_args
- choose a struct layout that is easy to handle on 32 and on 64 bit
- choose a struct layout that is extensible
- give new flags that currently need to abuse another flag's dedicated
return argument in clone() their own dedicated return argument
(e.g. CLONE_PIDFD)
- use a separate kernel internal struct kernel_clone_args that is
properly typed according to current kernel conventions in fork.c and is
different from the uapi struct clone_args
- port _do_fork() to use kernel_clone_args so that all process creation
syscalls such as fork(), vfork(), clone(), and clone3() behave identical
(Arnd suggested, that we can probably also port do_fork() itself in a
separate patchset.)
- ease of transition for userspace from clone() to clone3()
This very much means that we do *not* remove functionality that userspace
currently relies on as the latter is a good way of creating a syscall
that won't be adopted.
- do not try to be clever or complex: keep clone3() as dumb as possible
In accordance with Linus suggestions (cf. [11]), clone3() has the following
signature:
/* uapi */
struct clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
};
/* kernel internal */
struct kernel_clone_args {
u64 flags;
int __user *pidfd;
int __user *child_tid;
int __user *parent_tid;
int exit_signal;
unsigned long stack;
unsigned long stack_size;
unsigned long tls;
};
long sys_clone3(struct clone_args __user *uargs, size_t size)
clone3() cleanly supports all of the supported flags from clone() and thus
all legacy workloads.
The advantage of sticking close to the old clone() is the low cost for
userspace to switch to this new api. Quite a lot of userspace apis (e.g.
pthreads) are based on the clone() syscall. With the new clone3() syscall
supporting all of the old workloads and opening up the ability to add new
features should make switching to it for userspace more appealing. In
essence, glibc can just write a simple wrapper to switch from clone() to
clone3().
There has been some interest in this patchset already. We have received a
patch from the CRIU corner for clone3() that would set the PID/TID of a
restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
/* User visible differences to legacy clone() */
- CLONE_DETACHED will cause EINVAL with clone3()
- CSIGNAL is deprecated
It is superseeded by a dedicated "exit_signal" argument in struct
clone_args freeing up space for additional flags.
This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
/* References */
[1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
[2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
[3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
[4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
[5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
[6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
[7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
[8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
[9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
[10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
[11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
2019-05-25 16:36:41 +07:00
|
|
|
#include <linux/uaccess.h>
|
2017-02-09 00:51:36 +07:00
|
|
|
|
2017-02-04 07:20:53 +07:00
|
|
|
struct task_struct;
|
2017-05-15 10:54:33 +07:00
|
|
|
struct rusage;
|
2017-02-04 07:20:53 +07:00
|
|
|
union thread_union;
|
|
|
|
|
fork: add clone3
This adds the clone3 system call.
As mentioned several times already (cf. [7], [8]) here's the promised
patchset for clone3().
We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
free flag from clone().
Independent of the CLONE_PIDFD patchset a time namespace has been discussed
at Linux Plumber Conference last year and has been sent out and reviewed
(cf. [5]). It is expected that it will go upstream in the not too distant
future. However, it relies on the addition of the CLONE_NEWTIME flag to
clone(). The only other good candidate - CLONE_DETACHED - is currently not
recyclable as we have identified at least two large or widely used
codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
blocked. clone3() has the advantage that it will unblock this patchset
again. In general, clone3() is extensible and allows for the implementation
of new features.
The idea is to keep clone3() very simple and close to the original clone(),
specifically, to keep on supporting old clone()-based workloads.
We know there have been various creative proposals how a new process
creation syscall or even api is supposed to look like. Some people even
going so far as to argue that the traditional fork()+exec() split should be
abandoned in favor of an in-kernel version of spawn(). Independent of
whether or not we personally think spawn() is a good idea this patchset has
and does not want to have anything to do with this.
One stance we take is that there's no real good alternative to
clone()+exec() and we need and want to support this model going forward;
independent of spawn().
The following requirements guided clone3():
- bump the number of available flags
- move arguments that are currently passed as separate arguments
in clone() into a dedicated struct clone_args
- choose a struct layout that is easy to handle on 32 and on 64 bit
- choose a struct layout that is extensible
- give new flags that currently need to abuse another flag's dedicated
return argument in clone() their own dedicated return argument
(e.g. CLONE_PIDFD)
- use a separate kernel internal struct kernel_clone_args that is
properly typed according to current kernel conventions in fork.c and is
different from the uapi struct clone_args
- port _do_fork() to use kernel_clone_args so that all process creation
syscalls such as fork(), vfork(), clone(), and clone3() behave identical
(Arnd suggested, that we can probably also port do_fork() itself in a
separate patchset.)
- ease of transition for userspace from clone() to clone3()
This very much means that we do *not* remove functionality that userspace
currently relies on as the latter is a good way of creating a syscall
that won't be adopted.
- do not try to be clever or complex: keep clone3() as dumb as possible
In accordance with Linus suggestions (cf. [11]), clone3() has the following
signature:
/* uapi */
struct clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
};
/* kernel internal */
struct kernel_clone_args {
u64 flags;
int __user *pidfd;
int __user *child_tid;
int __user *parent_tid;
int exit_signal;
unsigned long stack;
unsigned long stack_size;
unsigned long tls;
};
long sys_clone3(struct clone_args __user *uargs, size_t size)
clone3() cleanly supports all of the supported flags from clone() and thus
all legacy workloads.
The advantage of sticking close to the old clone() is the low cost for
userspace to switch to this new api. Quite a lot of userspace apis (e.g.
pthreads) are based on the clone() syscall. With the new clone3() syscall
supporting all of the old workloads and opening up the ability to add new
features should make switching to it for userspace more appealing. In
essence, glibc can just write a simple wrapper to switch from clone() to
clone3().
There has been some interest in this patchset already. We have received a
patch from the CRIU corner for clone3() that would set the PID/TID of a
restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
/* User visible differences to legacy clone() */
- CLONE_DETACHED will cause EINVAL with clone3()
- CSIGNAL is deprecated
It is superseeded by a dedicated "exit_signal" argument in struct
clone_args freeing up space for additional flags.
This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
/* References */
[1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
[2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
[3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
[4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
[5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
[6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
[7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
[8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
[9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
[10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
[11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
2019-05-25 16:36:41 +07:00
|
|
|
/* All the bits taken by the old clone syscall. */
|
|
|
|
#define CLONE_LEGACY_FLAGS 0xffffffffULL
|
|
|
|
|
|
|
|
struct kernel_clone_args {
|
|
|
|
u64 flags;
|
|
|
|
int __user *pidfd;
|
|
|
|
int __user *child_tid;
|
|
|
|
int __user *parent_tid;
|
|
|
|
int exit_signal;
|
|
|
|
unsigned long stack;
|
|
|
|
unsigned long stack_size;
|
|
|
|
unsigned long tls;
|
|
|
|
};
|
|
|
|
|
2017-02-03 21:24:12 +07:00
|
|
|
/*
|
|
|
|
* This serializes "schedule()" and also protects
|
|
|
|
* the run-queue from deletions/modifications (but
|
|
|
|
* _adding_ to the beginning of the run-queue has
|
|
|
|
* a separate lock).
|
|
|
|
*/
|
|
|
|
extern rwlock_t tasklist_lock;
|
|
|
|
extern spinlock_t mmlist_lock;
|
|
|
|
|
2017-02-04 07:20:53 +07:00
|
|
|
extern union thread_union init_thread_union;
|
|
|
|
extern struct task_struct init_task;
|
|
|
|
|
2017-02-03 21:24:12 +07:00
|
|
|
#ifdef CONFIG_PROVE_RCU
|
|
|
|
extern int lockdep_tasklist_lock_is_held(void);
|
|
|
|
#endif /* #ifdef CONFIG_PROVE_RCU */
|
|
|
|
|
|
|
|
extern asmlinkage void schedule_tail(struct task_struct *prev);
|
|
|
|
extern void init_idle(struct task_struct *idle, int cpu);
|
|
|
|
|
|
|
|
extern int sched_fork(unsigned long clone_flags, struct task_struct *p);
|
|
|
|
extern void sched_dead(struct task_struct *p);
|
|
|
|
|
|
|
|
void __noreturn do_task_dead(void);
|
|
|
|
|
|
|
|
extern void proc_caches_init(void);
|
|
|
|
|
2019-01-04 06:28:03 +07:00
|
|
|
extern void fork_init(void);
|
|
|
|
|
2017-02-03 21:24:12 +07:00
|
|
|
extern void release_task(struct task_struct * p);
|
|
|
|
|
|
|
|
#ifdef CONFIG_HAVE_COPY_THREAD_TLS
|
|
|
|
extern int copy_thread_tls(unsigned long, unsigned long, unsigned long,
|
|
|
|
struct task_struct *, unsigned long);
|
|
|
|
#else
|
|
|
|
extern int copy_thread(unsigned long, unsigned long, unsigned long,
|
|
|
|
struct task_struct *);
|
|
|
|
|
|
|
|
/* Architectures that haven't opted into copy_thread_tls get the tls argument
|
|
|
|
* via pt_regs, so ignore the tls argument passed via C. */
|
|
|
|
static inline int copy_thread_tls(
|
|
|
|
unsigned long clone_flags, unsigned long sp, unsigned long arg,
|
|
|
|
struct task_struct *p, unsigned long tls)
|
|
|
|
{
|
|
|
|
return copy_thread(clone_flags, sp, arg, p);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
extern void flush_thread(void);
|
|
|
|
|
|
|
|
#ifdef CONFIG_HAVE_EXIT_THREAD
|
|
|
|
extern void exit_thread(struct task_struct *tsk);
|
|
|
|
#else
|
|
|
|
static inline void exit_thread(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
extern void do_group_exit(int);
|
|
|
|
|
2017-02-05 20:35:41 +07:00
|
|
|
extern void exit_files(struct task_struct *);
|
|
|
|
extern void exit_itimers(struct signal_struct *);
|
|
|
|
|
fork: add clone3
This adds the clone3 system call.
As mentioned several times already (cf. [7], [8]) here's the promised
patchset for clone3().
We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
free flag from clone().
Independent of the CLONE_PIDFD patchset a time namespace has been discussed
at Linux Plumber Conference last year and has been sent out and reviewed
(cf. [5]). It is expected that it will go upstream in the not too distant
future. However, it relies on the addition of the CLONE_NEWTIME flag to
clone(). The only other good candidate - CLONE_DETACHED - is currently not
recyclable as we have identified at least two large or widely used
codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
blocked. clone3() has the advantage that it will unblock this patchset
again. In general, clone3() is extensible and allows for the implementation
of new features.
The idea is to keep clone3() very simple and close to the original clone(),
specifically, to keep on supporting old clone()-based workloads.
We know there have been various creative proposals how a new process
creation syscall or even api is supposed to look like. Some people even
going so far as to argue that the traditional fork()+exec() split should be
abandoned in favor of an in-kernel version of spawn(). Independent of
whether or not we personally think spawn() is a good idea this patchset has
and does not want to have anything to do with this.
One stance we take is that there's no real good alternative to
clone()+exec() and we need and want to support this model going forward;
independent of spawn().
The following requirements guided clone3():
- bump the number of available flags
- move arguments that are currently passed as separate arguments
in clone() into a dedicated struct clone_args
- choose a struct layout that is easy to handle on 32 and on 64 bit
- choose a struct layout that is extensible
- give new flags that currently need to abuse another flag's dedicated
return argument in clone() their own dedicated return argument
(e.g. CLONE_PIDFD)
- use a separate kernel internal struct kernel_clone_args that is
properly typed according to current kernel conventions in fork.c and is
different from the uapi struct clone_args
- port _do_fork() to use kernel_clone_args so that all process creation
syscalls such as fork(), vfork(), clone(), and clone3() behave identical
(Arnd suggested, that we can probably also port do_fork() itself in a
separate patchset.)
- ease of transition for userspace from clone() to clone3()
This very much means that we do *not* remove functionality that userspace
currently relies on as the latter is a good way of creating a syscall
that won't be adopted.
- do not try to be clever or complex: keep clone3() as dumb as possible
In accordance with Linus suggestions (cf. [11]), clone3() has the following
signature:
/* uapi */
struct clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
};
/* kernel internal */
struct kernel_clone_args {
u64 flags;
int __user *pidfd;
int __user *child_tid;
int __user *parent_tid;
int exit_signal;
unsigned long stack;
unsigned long stack_size;
unsigned long tls;
};
long sys_clone3(struct clone_args __user *uargs, size_t size)
clone3() cleanly supports all of the supported flags from clone() and thus
all legacy workloads.
The advantage of sticking close to the old clone() is the low cost for
userspace to switch to this new api. Quite a lot of userspace apis (e.g.
pthreads) are based on the clone() syscall. With the new clone3() syscall
supporting all of the old workloads and opening up the ability to add new
features should make switching to it for userspace more appealing. In
essence, glibc can just write a simple wrapper to switch from clone() to
clone3().
There has been some interest in this patchset already. We have received a
patch from the CRIU corner for clone3() that would set the PID/TID of a
restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
/* User visible differences to legacy clone() */
- CLONE_DETACHED will cause EINVAL with clone3()
- CSIGNAL is deprecated
It is superseeded by a dedicated "exit_signal" argument in struct
clone_args freeing up space for additional flags.
This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
/* References */
[1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
[2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
[3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
[4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
[5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
[6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
[7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
[8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
[9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
[10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
[11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
2019-05-25 16:36:41 +07:00
|
|
|
extern long _do_fork(struct kernel_clone_args *kargs);
|
2019-07-14 23:20:47 +07:00
|
|
|
extern bool legacy_clone_args_valid(const struct kernel_clone_args *kargs);
|
2017-02-03 21:24:12 +07:00
|
|
|
extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
|
|
|
|
struct task_struct *fork_idle(int);
|
2019-04-26 07:11:25 +07:00
|
|
|
struct mm_struct *copy_init_mm(void);
|
2017-02-03 21:24:12 +07:00
|
|
|
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
|
2018-07-22 21:07:11 +07:00
|
|
|
extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
|
2017-02-03 21:24:12 +07:00
|
|
|
|
2017-02-05 21:30:50 +07:00
|
|
|
extern void free_task(struct task_struct *tsk);
|
|
|
|
|
2017-02-06 17:12:45 +07:00
|
|
|
/* sched_exec is called by processes performing an exec */
|
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
extern void sched_exec(void);
|
|
|
|
#else
|
|
|
|
#define sched_exec() {}
|
|
|
|
#endif
|
|
|
|
|
2019-07-05 05:13:23 +07:00
|
|
|
static inline struct task_struct *get_task_struct(struct task_struct *t)
|
|
|
|
{
|
|
|
|
refcount_inc(&t->usage);
|
|
|
|
return t;
|
|
|
|
}
|
2017-02-05 21:30:50 +07:00
|
|
|
|
|
|
|
extern void __put_task_struct(struct task_struct *t);
|
|
|
|
|
|
|
|
static inline void put_task_struct(struct task_struct *t)
|
|
|
|
{
|
2019-01-18 19:27:29 +07:00
|
|
|
if (refcount_dec_and_test(&t->usage))
|
2017-02-05 21:30:50 +07:00
|
|
|
__put_task_struct(t);
|
|
|
|
}
|
|
|
|
|
2019-09-14 19:33:34 +07:00
|
|
|
void put_task_struct_rcu_user(struct task_struct *task);
|
2017-02-03 21:24:12 +07:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT
|
|
|
|
extern int arch_task_struct_size __read_mostly;
|
|
|
|
#else
|
|
|
|
# define arch_task_struct_size (sizeof(struct task_struct))
|
|
|
|
#endif
|
|
|
|
|
2017-08-17 03:00:58 +07:00
|
|
|
#ifndef CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST
|
|
|
|
/*
|
|
|
|
* If an architecture has not declared a thread_struct whitelist we
|
|
|
|
* must assume something there may need to be copied to userspace.
|
|
|
|
*/
|
|
|
|
static inline void arch_thread_struct_whitelist(unsigned long *offset,
|
|
|
|
unsigned long *size)
|
|
|
|
{
|
|
|
|
*offset = 0;
|
|
|
|
/* Handle dynamically sized thread_struct. */
|
|
|
|
*size = arch_task_struct_size - offsetof(struct task_struct, thread);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-02-03 21:24:12 +07:00
|
|
|
#ifdef CONFIG_VMAP_STACK
|
|
|
|
static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
|
|
|
|
{
|
|
|
|
return t->stack_vm_area;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-02-06 16:57:33 +07:00
|
|
|
/*
|
|
|
|
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
|
|
|
|
* subscriptions and synchronises with wait4(). Also used in procfs. Also
|
|
|
|
* pins the final release of task.io_context. Also protects ->cpuset and
|
|
|
|
* ->cgroup.subsys[]. And ->vfork_done.
|
|
|
|
*
|
|
|
|
* Nests both inside and outside of read_lock(&tasklist_lock).
|
|
|
|
* It must not be nested with write_lock_irq(&tasklist_lock),
|
|
|
|
* neither inside nor outside.
|
|
|
|
*/
|
|
|
|
static inline void task_lock(struct task_struct *p)
|
|
|
|
{
|
|
|
|
spin_lock(&p->alloc_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void task_unlock(struct task_struct *p)
|
|
|
|
{
|
|
|
|
spin_unlock(&p->alloc_lock);
|
|
|
|
}
|
|
|
|
|
2017-02-09 00:51:36 +07:00
|
|
|
#endif /* _LINUX_SCHED_TASK_H */
|