License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2012-02-22 04:19:22 +07:00
|
|
|
/*
|
|
|
|
* Copyright (C) 1994 Linus Torvalds
|
|
|
|
*
|
|
|
|
* Pentium III FXSR, SSE support
|
|
|
|
* General FPU state handling cleanups
|
|
|
|
* Gareth Hughes <gareth@valinux.com>, May 2000
|
|
|
|
* x86-64 work by Andi Kleen 2002
|
|
|
|
*/
|
|
|
|
|
2015-04-24 07:54:44 +07:00
|
|
|
#ifndef _ASM_X86_FPU_INTERNAL_H
|
|
|
|
#define _ASM_X86_FPU_INTERNAL_H
|
2012-02-22 04:19:22 +07:00
|
|
|
|
2012-07-25 06:05:27 +07:00
|
|
|
#include <linux/compat.h>
|
2015-04-26 21:56:05 +07:00
|
|
|
#include <linux/sched.h>
|
2012-02-22 04:19:22 +07:00
|
|
|
#include <linux/slab.h>
|
2019-04-03 23:41:44 +07:00
|
|
|
#include <linux/mm.h>
|
2015-04-22 15:58:10 +07:00
|
|
|
|
2012-02-22 04:19:22 +07:00
|
|
|
#include <asm/user.h>
|
2015-04-24 07:46:00 +07:00
|
|
|
#include <asm/fpu/api.h>
|
2015-04-28 13:41:33 +07:00
|
|
|
#include <asm/fpu/xstate.h>
|
2016-01-27 04:12:04 +07:00
|
|
|
#include <asm/cpufeature.h>
|
2016-06-02 00:42:20 +07:00
|
|
|
#include <asm/trace/fpu.h>
|
2012-02-22 04:19:22 +07:00
|
|
|
|
2015-04-30 01:24:14 +07:00
|
|
|
/*
|
|
|
|
* High level FPU state handling functions:
|
|
|
|
*/
|
2017-09-23 18:37:45 +07:00
|
|
|
extern void fpu__prepare_read(struct fpu *fpu);
|
|
|
|
extern void fpu__prepare_write(struct fpu *fpu);
|
2015-04-30 01:24:14 +07:00
|
|
|
extern void fpu__save(struct fpu *fpu);
|
2015-04-30 02:09:18 +07:00
|
|
|
extern int fpu__restore_sig(void __user *buf, int ia32_frame);
|
2015-04-30 01:24:14 +07:00
|
|
|
extern void fpu__drop(struct fpu *fpu);
|
2019-04-03 23:41:52 +07:00
|
|
|
extern int fpu__copy(struct task_struct *dst, struct task_struct *src);
|
2015-04-30 01:35:33 +07:00
|
|
|
extern void fpu__clear(struct fpu *fpu);
|
2015-05-05 20:56:33 +07:00
|
|
|
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
|
|
|
|
extern int dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
|
2015-04-30 01:24:14 +07:00
|
|
|
|
2015-05-05 20:56:33 +07:00
|
|
|
/*
|
|
|
|
* Boot time FPU initialization functions:
|
|
|
|
*/
|
|
|
|
extern void fpu__init_cpu(void);
|
|
|
|
extern void fpu__init_system_xstate(void);
|
|
|
|
extern void fpu__init_cpu_xstate(void);
|
|
|
|
extern void fpu__init_system(struct cpuinfo_x86 *c);
|
2015-04-26 21:56:05 +07:00
|
|
|
extern void fpu__init_check_bugs(void);
|
|
|
|
extern void fpu__resume_cpu(void);
|
2016-01-07 05:24:53 +07:00
|
|
|
extern u64 fpu__get_supported_xfeatures_mask(void);
|
2015-04-26 21:56:05 +07:00
|
|
|
|
2015-05-05 16:34:49 +07:00
|
|
|
/*
|
|
|
|
* Debugging facility:
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_X86_DEBUG_FPU
|
|
|
|
# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
|
|
|
|
#else
|
2015-05-27 17:22:29 +07:00
|
|
|
# define WARN_ON_FPU(x) ({ (void)(x); 0; })
|
2015-05-05 16:34:49 +07:00
|
|
|
#endif
|
|
|
|
|
2015-02-07 03:02:01 +07:00
|
|
|
/*
|
2015-05-05 20:56:33 +07:00
|
|
|
* FPU related CPU feature flag helper routines:
|
2015-02-07 03:02:01 +07:00
|
|
|
*/
|
2012-02-22 04:19:22 +07:00
|
|
|
static __always_inline __pure bool use_xsaveopt(void)
|
|
|
|
{
|
2016-01-27 04:12:05 +07:00
|
|
|
return static_cpu_has(X86_FEATURE_XSAVEOPT);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline __pure bool use_xsave(void)
|
|
|
|
{
|
2016-01-27 04:12:05 +07:00
|
|
|
return static_cpu_has(X86_FEATURE_XSAVE);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline __pure bool use_fxsr(void)
|
|
|
|
{
|
2016-01-27 04:12:05 +07:00
|
|
|
return static_cpu_has(X86_FEATURE_FXSR);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-05 20:56:33 +07:00
|
|
|
/*
|
|
|
|
* fpstate handling functions:
|
|
|
|
*/
|
|
|
|
|
|
|
|
extern union fpregs_state init_fpstate;
|
|
|
|
|
|
|
|
extern void fpstate_init(union fpregs_state *state);
|
|
|
|
#ifdef CONFIG_MATH_EMULATION
|
|
|
|
extern void fpstate_init_soft(struct swregs_state *soft);
|
|
|
|
#else
|
|
|
|
static inline void fpstate_init_soft(struct swregs_state *soft) {}
|
|
|
|
#endif
|
2017-01-25 01:25:46 +07:00
|
|
|
|
|
|
|
static inline void fpstate_init_xstate(struct xregs_state *xsave)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* XRSTORS requires these bits set in xcomp_bv, or it will
|
|
|
|
* trigger #GP:
|
|
|
|
*/
|
|
|
|
xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask;
|
|
|
|
}
|
|
|
|
|
2015-05-05 20:56:33 +07:00
|
|
|
static inline void fpstate_init_fxstate(struct fxregs_state *fx)
|
|
|
|
{
|
|
|
|
fx->cwd = 0x37f;
|
|
|
|
fx->mxcsr = MXCSR_DEFAULT;
|
|
|
|
}
|
2015-04-28 16:25:02 +07:00
|
|
|
extern void fpstate_sanitize_xstate(struct fpu *fpu);
|
2012-02-22 04:19:22 +07:00
|
|
|
|
2012-09-22 07:18:44 +07:00
|
|
|
#define user_insn(insn, output, input...) \
|
|
|
|
({ \
|
|
|
|
int err; \
|
2018-11-29 05:20:11 +07:00
|
|
|
\
|
|
|
|
might_fault(); \
|
|
|
|
\
|
2012-09-22 07:18:44 +07:00
|
|
|
asm volatile(ASM_STAC "\n" \
|
|
|
|
"1:" #insn "\n\t" \
|
|
|
|
"2: " ASM_CLAC "\n" \
|
|
|
|
".section .fixup,\"ax\"\n" \
|
|
|
|
"3: movl $-1,%[err]\n" \
|
|
|
|
" jmp 2b\n" \
|
|
|
|
".previous\n" \
|
|
|
|
_ASM_EXTABLE(1b, 3b) \
|
|
|
|
: [err] "=r" (err), output \
|
|
|
|
: "0"(0), input); \
|
|
|
|
err; \
|
|
|
|
})
|
|
|
|
|
2019-04-03 23:41:50 +07:00
|
|
|
#define kernel_insn_err(insn, output, input...) \
|
|
|
|
({ \
|
|
|
|
int err; \
|
|
|
|
asm volatile("1:" #insn "\n\t" \
|
|
|
|
"2:\n" \
|
|
|
|
".section .fixup,\"ax\"\n" \
|
|
|
|
"3: movl $-1,%[err]\n" \
|
|
|
|
" jmp 2b\n" \
|
|
|
|
".previous\n" \
|
|
|
|
_ASM_EXTABLE(1b, 3b) \
|
|
|
|
: [err] "=r" (err), output \
|
|
|
|
: "0"(0), input); \
|
|
|
|
err; \
|
|
|
|
})
|
|
|
|
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
#define kernel_insn(insn, output, input...) \
|
2012-07-25 06:05:28 +07:00
|
|
|
asm volatile("1:" #insn "\n\t" \
|
|
|
|
"2:\n" \
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore) \
|
|
|
|
: output : input)
|
2012-07-25 06:05:28 +07:00
|
|
|
|
2015-04-30 22:15:32 +07:00
|
|
|
static inline int copy_fregs_to_user(struct fregs_state __user *fx)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2012-09-22 07:18:44 +07:00
|
|
|
return user_insn(fnsave %[fx]; fwait, [fx] "=m" (*fx), "m" (*fx));
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-04-30 22:15:32 +07:00
|
|
|
static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2016-08-04 03:45:50 +07:00
|
|
|
if (IS_ENABLED(CONFIG_X86_32))
|
2012-09-22 07:18:44 +07:00
|
|
|
return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
|
2019-01-18 06:05:40 +07:00
|
|
|
else
|
2012-09-22 07:18:44 +07:00
|
|
|
return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
|
2012-02-22 04:19:22 +07:00
|
|
|
|
|
|
|
}
|
|
|
|
|
2015-05-25 16:27:46 +07:00
|
|
|
static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2019-01-18 06:05:40 +07:00
|
|
|
if (IS_ENABLED(CONFIG_X86_32))
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
2019-01-18 06:05:40 +07:00
|
|
|
else
|
|
|
|
kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2019-04-03 23:41:50 +07:00
|
|
|
static inline int copy_kernel_to_fxregs_err(struct fxregs_state *fx)
|
|
|
|
{
|
|
|
|
if (IS_ENABLED(CONFIG_X86_32))
|
|
|
|
return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
|
|
|
else
|
|
|
|
return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
|
|
|
|
}
|
|
|
|
|
2015-04-30 22:15:32 +07:00
|
|
|
static inline int copy_user_to_fxregs(struct fxregs_state __user *fx)
|
2012-09-26 05:42:18 +07:00
|
|
|
{
|
2016-08-04 03:45:50 +07:00
|
|
|
if (IS_ENABLED(CONFIG_X86_32))
|
2012-09-26 05:42:18 +07:00
|
|
|
return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
2019-01-18 06:05:40 +07:00
|
|
|
else
|
2012-09-26 05:42:18 +07:00
|
|
|
return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
|
|
|
|
}
|
|
|
|
|
2015-05-25 16:27:46 +07:00
|
|
|
static inline void copy_kernel_to_fregs(struct fregs_state *fx)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
2012-09-26 05:42:18 +07:00
|
|
|
}
|
|
|
|
|
2019-04-03 23:41:50 +07:00
|
|
|
static inline int copy_kernel_to_fregs_err(struct fregs_state *fx)
|
|
|
|
{
|
|
|
|
return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
|
|
|
}
|
|
|
|
|
2015-04-30 22:15:32 +07:00
|
|
|
static inline int copy_user_to_fregs(struct fregs_state __user *fx)
|
2012-09-26 05:42:18 +07:00
|
|
|
{
|
|
|
|
return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-04-30 16:34:09 +07:00
|
|
|
static inline void copy_fxregs_to_kernel(struct fpu *fpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2016-08-04 03:45:50 +07:00
|
|
|
if (IS_ENABLED(CONFIG_X86_32))
|
x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
So 6 years ago we made the FPU fpstate dynamically allocated:
aa283f49276e ("x86, fpu: lazy allocation of FPU area - v5")
61c4628b5386 ("x86, fpu: split FPU state from task struct - v5")
In hindsight this was a mistake:
- it complicated context allocation failure handling, such as:
/* kthread execs. TODO: cleanup this horror. */
if (WARN_ON(fpstate_alloc_init(fpu)))
force_sig(SIGKILL, tsk);
- it caused us to enable irqs in fpu__restore():
local_irq_enable();
/*
* does a slab alloc which can sleep
*/
if (fpstate_alloc_init(fpu)) {
/*
* ran out of memory!
*/
do_group_exit(SIGKILL);
return;
}
local_irq_disable();
- it (slightly) slowed down task creation/destruction by adding
slab allocation/free pattens.
- it made access to context contents (slightly) slower by adding
one more pointer dereference.
The motivation for the dynamic allocation was two-fold:
- reduce memory consumption by non-FPU tasks
- allocate and handle only the necessary amount of context for
various XSAVE processors that have varying hardware frame
sizes.
These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.
For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.
Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.
So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.
We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.
This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 09:19:39 +07:00
|
|
|
asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
|
2019-01-18 06:05:40 +07:00
|
|
|
else
|
x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
So 6 years ago we made the FPU fpstate dynamically allocated:
aa283f49276e ("x86, fpu: lazy allocation of FPU area - v5")
61c4628b5386 ("x86, fpu: split FPU state from task struct - v5")
In hindsight this was a mistake:
- it complicated context allocation failure handling, such as:
/* kthread execs. TODO: cleanup this horror. */
if (WARN_ON(fpstate_alloc_init(fpu)))
force_sig(SIGKILL, tsk);
- it caused us to enable irqs in fpu__restore():
local_irq_enable();
/*
* does a slab alloc which can sleep
*/
if (fpstate_alloc_init(fpu)) {
/*
* ran out of memory!
*/
do_group_exit(SIGKILL);
return;
}
local_irq_disable();
- it (slightly) slowed down task creation/destruction by adding
slab allocation/free pattens.
- it made access to context contents (slightly) slower by adding
one more pointer dereference.
The motivation for the dynamic allocation was two-fold:
- reduce memory consumption by non-FPU tasks
- allocate and handle only the necessary amount of context for
various XSAVE processors that have varying hardware frame
sizes.
These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.
For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.
Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.
So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.
We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.
This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 09:19:39 +07:00
|
|
|
asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state.fxsave));
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-25 14:55:39 +07:00
|
|
|
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
|
|
|
|
#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27"
|
|
|
|
#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37"
|
|
|
|
#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
|
|
|
|
#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
|
|
|
|
#define XRSTORS ".byte " REX_PREFIX "0x0f,0xc7,0x1f"
|
|
|
|
|
2015-11-19 18:25:25 +07:00
|
|
|
#define XSTATE_OP(op, st, lmask, hmask, err) \
|
|
|
|
asm volatile("1:" op "\n\t" \
|
|
|
|
"xor %[err], %[err]\n" \
|
|
|
|
"2:\n\t" \
|
|
|
|
".pushsection .fixup,\"ax\"\n\t" \
|
|
|
|
"3: movl $-2,%[err]\n\t" \
|
|
|
|
"jmp 2b\n\t" \
|
|
|
|
".popsection\n\t" \
|
2018-11-27 20:32:00 +07:00
|
|
|
_ASM_EXTABLE(1b, 3b) \
|
2015-11-19 18:25:25 +07:00
|
|
|
: [err] "=r" (err) \
|
|
|
|
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
|
|
|
|
: "memory")
|
|
|
|
|
2015-11-19 18:25:26 +07:00
|
|
|
/*
|
|
|
|
* If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
|
|
|
|
* format and supervisor states in addition to modified optimization in
|
|
|
|
* XSAVEOPT.
|
|
|
|
*
|
|
|
|
* Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
|
|
|
|
* supports modified optimization which is not supported by XSAVE.
|
|
|
|
*
|
|
|
|
* We use XSAVE as a fallback.
|
|
|
|
*
|
|
|
|
* The 661 label is defined in the ALTERNATIVE* macros as the address of the
|
|
|
|
* original instruction which gets replaced. We need to use it here as the
|
|
|
|
* address of the instruction where we might get an exception at.
|
|
|
|
*/
|
|
|
|
#define XSTATE_XSAVE(st, lmask, hmask, err) \
|
|
|
|
asm volatile(ALTERNATIVE_2(XSAVE, \
|
|
|
|
XSAVEOPT, X86_FEATURE_XSAVEOPT, \
|
|
|
|
XSAVES, X86_FEATURE_XSAVES) \
|
|
|
|
"\n" \
|
|
|
|
"xor %[err], %[err]\n" \
|
|
|
|
"3:\n" \
|
|
|
|
".pushsection .fixup,\"ax\"\n" \
|
|
|
|
"4: movl $-2, %[err]\n" \
|
|
|
|
"jmp 3b\n" \
|
|
|
|
".popsection\n" \
|
|
|
|
_ASM_EXTABLE(661b, 4b) \
|
|
|
|
: [err] "=r" (err) \
|
|
|
|
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
|
|
|
|
: "memory")
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
|
|
|
|
* XSAVE area format.
|
|
|
|
*/
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
#define XSTATE_XRESTORE(st, lmask, hmask) \
|
2015-11-19 18:25:26 +07:00
|
|
|
asm volatile(ALTERNATIVE(XRSTOR, \
|
|
|
|
XRSTORS, X86_FEATURE_XSAVES) \
|
|
|
|
"\n" \
|
|
|
|
"3:\n" \
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
_ASM_EXTABLE_HANDLE(661b, 3b, ex_handler_fprestore)\
|
|
|
|
: \
|
2015-11-19 18:25:26 +07:00
|
|
|
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
|
|
|
|
: "memory")
|
2015-11-19 18:25:25 +07:00
|
|
|
|
2015-05-25 14:55:39 +07:00
|
|
|
/*
|
|
|
|
* This function is called only during boot time when x86 caps are not set
|
|
|
|
* up and alternative can not be used yet.
|
|
|
|
*/
|
2015-05-24 14:23:25 +07:00
|
|
|
static inline void copy_xregs_to_kernel_booting(struct xregs_state *xstate)
|
2015-05-25 14:55:39 +07:00
|
|
|
{
|
|
|
|
u64 mask = -1;
|
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
2015-11-19 18:25:25 +07:00
|
|
|
int err;
|
2015-05-25 14:55:39 +07:00
|
|
|
|
|
|
|
WARN_ON(system_state != SYSTEM_BOOTING);
|
|
|
|
|
2019-03-30 01:52:59 +07:00
|
|
|
if (boot_cpu_has(X86_FEATURE_XSAVES))
|
2015-11-19 18:25:25 +07:00
|
|
|
XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
|
2015-05-25 14:55:39 +07:00
|
|
|
else
|
2015-11-19 18:25:25 +07:00
|
|
|
XSTATE_OP(XSAVE, xstate, lmask, hmask, err);
|
2015-05-24 14:23:25 +07:00
|
|
|
|
|
|
|
/* We should never fault when copying to a kernel buffer: */
|
|
|
|
WARN_ON_FPU(err);
|
2015-05-25 14:55:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This function is called only during boot time when x86 caps are not set
|
|
|
|
* up and alternative can not be used yet.
|
|
|
|
*/
|
2015-05-27 19:04:44 +07:00
|
|
|
static inline void copy_kernel_to_xregs_booting(struct xregs_state *xstate)
|
2015-05-25 14:55:39 +07:00
|
|
|
{
|
2015-05-27 19:04:44 +07:00
|
|
|
u64 mask = -1;
|
2015-05-25 14:55:39 +07:00
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
2015-11-19 18:25:25 +07:00
|
|
|
int err;
|
2015-05-25 14:55:39 +07:00
|
|
|
|
|
|
|
WARN_ON(system_state != SYSTEM_BOOTING);
|
|
|
|
|
2019-03-30 01:52:59 +07:00
|
|
|
if (boot_cpu_has(X86_FEATURE_XSAVES))
|
2015-11-19 18:25:25 +07:00
|
|
|
XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
|
2015-05-25 14:55:39 +07:00
|
|
|
else
|
2015-11-19 18:25:25 +07:00
|
|
|
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
|
2015-05-24 14:23:25 +07:00
|
|
|
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
/*
|
|
|
|
* We should never fault when copying from a kernel buffer, and the FPU
|
|
|
|
* state we set at boot time should be valid.
|
|
|
|
*/
|
2015-05-24 14:23:25 +07:00
|
|
|
WARN_ON_FPU(err);
|
2015-05-25 14:55:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Save processor xstate to xsave area.
|
|
|
|
*/
|
2015-05-24 14:23:25 +07:00
|
|
|
static inline void copy_xregs_to_kernel(struct xregs_state *xstate)
|
2015-05-25 14:55:39 +07:00
|
|
|
{
|
|
|
|
u64 mask = -1;
|
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
2015-11-19 18:25:26 +07:00
|
|
|
int err;
|
2015-05-25 14:55:39 +07:00
|
|
|
|
2017-09-23 20:00:06 +07:00
|
|
|
WARN_ON_FPU(!alternatives_patched);
|
2015-05-25 14:55:39 +07:00
|
|
|
|
2015-11-19 18:25:26 +07:00
|
|
|
XSTATE_XSAVE(xstate, lmask, hmask, err);
|
2015-05-25 14:55:39 +07:00
|
|
|
|
2015-05-24 14:23:25 +07:00
|
|
|
/* We should never fault when copying to a kernel buffer: */
|
|
|
|
WARN_ON_FPU(err);
|
2015-05-25 14:55:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Restore processor xstate from xsave area.
|
|
|
|
*/
|
2015-05-24 14:23:25 +07:00
|
|
|
static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
|
2015-05-25 14:55:39 +07:00
|
|
|
{
|
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
|
|
|
|
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls. Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.
Unfortunately, there have been bugs in this validation code. For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0. As-is, such bugs are exploitable to read the FPU
registers of other processes on the system. To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers. Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.
This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers. Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits. There can even be reserved bits outside the
xstate_header which are easy to forget about. For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load
damaged SSEx MXCSR register").
Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.
We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d297 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions"). This new patch is also a bit
different. First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches. In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 20:00:09 +07:00
|
|
|
XSTATE_XRESTORE(xstate, lmask, hmask);
|
2015-05-25 14:55:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Save xstate to user space xsave area.
|
|
|
|
*
|
|
|
|
* We don't use modified optimization because xrstor/xrstors might track
|
|
|
|
* a different application.
|
|
|
|
*
|
|
|
|
* We don't use compacted format xsave area for
|
|
|
|
* backward compatibility for old applications which don't understand
|
|
|
|
* compacted format of xsave area.
|
|
|
|
*/
|
|
|
|
static inline int copy_xregs_to_user(struct xregs_state __user *buf)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Clear the xsave header first, so that reserved fields are
|
|
|
|
* initialized to zero.
|
|
|
|
*/
|
|
|
|
err = __clear_user(&buf->header, sizeof(buf->header));
|
|
|
|
if (unlikely(err))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2015-11-19 18:25:25 +07:00
|
|
|
stac();
|
|
|
|
XSTATE_OP(XSAVE, buf, -1, -1, err);
|
|
|
|
clac();
|
|
|
|
|
2015-05-25 14:55:39 +07:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Restore xstate from user space xsave area.
|
|
|
|
*/
|
|
|
|
static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
|
|
|
|
{
|
|
|
|
struct xregs_state *xstate = ((__force struct xregs_state *)buf);
|
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
2015-11-19 18:25:25 +07:00
|
|
|
int err;
|
|
|
|
|
|
|
|
stac();
|
|
|
|
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
|
|
|
|
clac();
|
2015-05-25 14:55:39 +07:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-04-03 23:41:50 +07:00
|
|
|
/*
|
|
|
|
* Restore xstate from kernel space xsave area, return an error code instead of
|
|
|
|
* an exception.
|
|
|
|
*/
|
|
|
|
static inline int copy_kernel_to_xregs_err(struct xregs_state *xstate, u64 mask)
|
|
|
|
{
|
|
|
|
u32 lmask = mask;
|
|
|
|
u32 hmask = mask >> 32;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-02-22 04:19:22 +07:00
|
|
|
/*
|
|
|
|
* These must be called with preempt disabled. Returns
|
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.
Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.
So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.
Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.
[*] Probably not true anymore, modern instructions always leave the FPU
state intact, even if exceptions are pending: because pending FP
exceptions are posted on the next FP instruction, not asynchronously.
They were truly asynchronous back in the IRQ13 case, and we had to
synchronize with them, but that code is not working anymore: we don't
have IRQ13 mapped in the IDT anymore.
But a cleanup patch is obviously not the place to change subtle behavior.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 07:53:16 +07:00
|
|
|
* 'true' if the FPU state is still intact and we can
|
|
|
|
* keep registers active.
|
|
|
|
*
|
|
|
|
* The legacy FNSAVE instruction cleared all FPU state
|
|
|
|
* unconditionally, so registers are essentially destroyed.
|
|
|
|
* Modern FPU state can be kept in registers, if there are
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
* no pending FP exceptions.
|
2012-02-22 04:19:22 +07:00
|
|
|
*/
|
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.
Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.
So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.
Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.
[*] Probably not true anymore, modern instructions always leave the FPU
state intact, even if exceptions are pending: because pending FP
exceptions are posted on the next FP instruction, not asynchronously.
They were truly asynchronous back in the IRQ13 case, and we had to
synchronize with them, but that code is not working anymore: we don't
have IRQ13 mapped in the IDT anymore.
But a cleanup patch is obviously not the place to change subtle behavior.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 07:53:16 +07:00
|
|
|
static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
if (likely(use_xsave())) {
|
2015-04-30 16:34:09 +07:00
|
|
|
copy_xregs_to_kernel(&fpu->state.xsave);
|
2019-01-18 01:38:20 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* AVX512 state is tracked here because its use is
|
|
|
|
* known to slow the max clock speed of the core.
|
|
|
|
*/
|
|
|
|
if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
|
|
|
|
fpu->avx512_timestamp = jiffies;
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
return 1;
|
|
|
|
}
|
2012-02-22 04:19:22 +07:00
|
|
|
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
if (likely(use_fxsr())) {
|
2015-04-30 16:34:09 +07:00
|
|
|
copy_fxregs_to_kernel(fpu);
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
return 1;
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 08:32:18 +07:00
|
|
|
* Legacy FPU register saving, FNSAVE always clears FPU registers,
|
|
|
|
* so we have to mark them inactive:
|
2012-02-22 04:19:22 +07:00
|
|
|
*/
|
2015-05-25 15:57:06 +07:00
|
|
|
asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
|
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.
Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.
So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.
Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.
[*] Probably not true anymore, modern instructions always leave the FPU
state intact, even if exceptions are pending: because pending FP
exceptions are posted on the next FP instruction, not asynchronously.
They were truly asynchronous back in the IRQ13 case, and we had to
synchronize with them, but that code is not working anymore: we don't
have IRQ13 mapped in the IDT anymore.
But a cleanup patch is obviously not the place to change subtle behavior.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 07:53:16 +07:00
|
|
|
|
|
|
|
return 0;
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2017-08-24 04:16:29 +07:00
|
|
|
static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate, u64 mask)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2015-05-24 14:23:25 +07:00
|
|
|
if (use_xsave()) {
|
2017-08-24 04:16:29 +07:00
|
|
|
copy_kernel_to_xregs(&fpstate->xsave, mask);
|
2015-05-24 14:23:25 +07:00
|
|
|
} else {
|
|
|
|
if (use_fxsr())
|
2015-05-25 16:59:35 +07:00
|
|
|
copy_kernel_to_fxregs(&fpstate->fxsave);
|
2015-05-24 14:23:25 +07:00
|
|
|
else
|
2015-05-25 16:59:35 +07:00
|
|
|
copy_kernel_to_fregs(&fpstate->fsave);
|
2015-05-24 14:23:25 +07:00
|
|
|
}
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-25 16:59:35 +07:00
|
|
|
static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2014-12-21 21:02:23 +07:00
|
|
|
/*
|
|
|
|
* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
|
|
|
|
* pending. Clear the x87 state here by setting it to fixed values.
|
|
|
|
* "m" is a random variable that should be in L1.
|
|
|
|
*/
|
2016-01-27 04:12:05 +07:00
|
|
|
if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
|
2014-01-12 10:15:52 +07:00
|
|
|
asm volatile(
|
|
|
|
"fnclex\n\t"
|
|
|
|
"emms\n\t"
|
|
|
|
"fildl %P[addr]" /* set F?P to defined value */
|
2015-05-25 16:59:35 +07:00
|
|
|
: : [addr] "m" (fpstate));
|
2014-01-12 10:15:52 +07:00
|
|
|
}
|
2012-02-22 04:19:22 +07:00
|
|
|
|
2017-08-24 04:16:29 +07:00
|
|
|
__copy_kernel_to_fpregs(fpstate, -1);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-25 15:57:06 +07:00
|
|
|
extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
|
2015-05-05 20:56:33 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* FPU context switch related helper methods:
|
|
|
|
*/
|
|
|
|
|
|
|
|
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
|
|
|
|
|
|
|
|
/*
|
2016-10-05 07:34:36 +07:00
|
|
|
* The in-register FPU state for an FPU context on a CPU is assumed to be
|
|
|
|
* valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
|
|
|
|
* matches the FPU.
|
|
|
|
*
|
|
|
|
* If the FPU register state is valid, the kernel can skip restoring the
|
|
|
|
* FPU state from memory.
|
|
|
|
*
|
|
|
|
* Any code that clobbers the FPU registers or updates the in-memory
|
|
|
|
* FPU state for a task MUST let the rest of the kernel know that the
|
2016-10-14 19:15:30 +07:00
|
|
|
* FPU registers are no longer valid for this task.
|
2016-10-05 07:34:36 +07:00
|
|
|
*
|
2016-10-14 19:15:30 +07:00
|
|
|
* Either one of these invalidation functions is enough. Invalidate
|
|
|
|
* a resource you control: CPU if using the CPU for something else
|
|
|
|
* (with preemption disabled), FPU for the current task, or a task that
|
|
|
|
* is prevented from running by the current task.
|
2015-05-05 20:56:33 +07:00
|
|
|
*/
|
2016-10-14 19:15:30 +07:00
|
|
|
static inline void __cpu_invalidate_fpregs_state(void)
|
2015-05-05 20:56:33 +07:00
|
|
|
{
|
2016-10-14 19:15:30 +07:00
|
|
|
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
|
2015-05-05 20:56:33 +07:00
|
|
|
}
|
|
|
|
|
2016-10-05 07:34:36 +07:00
|
|
|
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
|
|
|
|
{
|
|
|
|
fpu->last_cpu = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
|
2015-05-05 20:56:33 +07:00
|
|
|
{
|
x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
to the context that is currently loaded. It never changed during the
lifetime of a task - it remained stable/constant.
After deferred FPU registers loading until return to userland was
implemented, the content of fpu_fpregs_owner_ctx may change during
preemption and must not be cached.
This went unnoticed for some time and was now noticed, in particular
since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
reusing it in the retry loop:
copy_fpstate_to_sigframe()
load fpu_fpregs_owner_ctx and save on stack
fpregs_lock()
copy_fpregs_to_sigframe() /* failed */
fpregs_unlock()
*** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
fault_in_pages_writeable() /* succeed, retry */
fpregs_lock()
__fpregs_load_activate()
fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
copy_fpregs_to_sigframe() /* succeeds, random FPU content */
This is a comparison of the assembly produced by gcc 9, without vs with this
patch:
| # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size))
| cmpq %rdx, %rax # tmp183, _4
| jb .L190 #,
|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|-#APP
|-# 512 "arch/x86/include/asm/fpu/internal.h" 1
|- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__
|-# 0 "" 2
|-#NO_APP
|- movq %rax, -88(%rbp) # pfo_ret__, %sfp
…
|-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|- movq -88(%rbp), %rcx # %sfp, pfo_ret__
|- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#APP
|+# 512 "arch/x86/include/asm/fpu/internal.h" 1
|+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__
|+# 0 "" 2
|+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|+#NO_APP
|+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp
Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
fpu_fpregs_owner_ctx during preemption points.
The Fixes: tag points to the commit where deferred FPU loading was
added. Since this commit, the compiler is no longer allowed to move the
load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
section. A task preemption will change its value and stale content will
be observed.
[ bp: Massage. ]
Debugged-by: Austin Clements <austin@google.com>
Debugged-by: David Chase <drchase@golang.org>
Debugged-by: Ian Lance Taylor <ian@airs.com>
Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Austin Clements <austin@google.com>
Cc: Barret Rhoden <brho@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Chase <drchase@golang.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: ian@airs.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Bleecher Snyder <josharian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
2019-11-28 15:53:06 +07:00
|
|
|
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
|
2015-05-05 20:56:33 +07:00
|
|
|
}
|
|
|
|
|
2016-10-05 07:34:37 +07:00
|
|
|
/*
|
|
|
|
* These generally need preemption protection to work,
|
|
|
|
* do try to avoid using these on their own:
|
|
|
|
*/
|
|
|
|
static inline void fpregs_deactivate(struct fpu *fpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2015-04-23 17:18:28 +07:00
|
|
|
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
|
2016-06-02 00:42:20 +07:00
|
|
|
trace_x86_fpu_regs_deactivated(fpu);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2016-10-05 07:34:37 +07:00
|
|
|
static inline void fpregs_activate(struct fpu *fpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2015-04-23 17:24:59 +07:00
|
|
|
this_cpu_write(fpu_fpregs_owner_ctx, fpu);
|
2016-06-02 00:42:20 +07:00
|
|
|
trace_x86_fpu_regs_activated(fpu);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2019-04-03 23:41:38 +07:00
|
|
|
/*
|
|
|
|
* Internal helper, do not use directly. Use switch_fpu_return() instead.
|
|
|
|
*/
|
2019-04-03 23:41:52 +07:00
|
|
|
static inline void __fpregs_load_activate(void)
|
2019-04-03 23:41:38 +07:00
|
|
|
{
|
2019-04-03 23:41:52 +07:00
|
|
|
struct fpu *fpu = ¤t->thread.fpu;
|
|
|
|
int cpu = smp_processor_id();
|
|
|
|
|
2019-06-05 00:54:12 +07:00
|
|
|
if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
|
2019-04-03 23:41:52 +07:00
|
|
|
return;
|
|
|
|
|
2019-04-03 23:41:38 +07:00
|
|
|
if (!fpregs_state_valid(fpu, cpu)) {
|
2019-04-03 23:41:52 +07:00
|
|
|
copy_kernel_to_fpregs(&fpu->state);
|
2019-04-03 23:41:38 +07:00
|
|
|
fpregs_activate(fpu);
|
2019-04-03 23:41:52 +07:00
|
|
|
fpu->last_cpu = cpu;
|
2019-04-03 23:41:38 +07:00
|
|
|
}
|
2019-04-03 23:41:52 +07:00
|
|
|
clear_thread_flag(TIF_NEED_FPU_LOAD);
|
2019-04-03 23:41:38 +07:00
|
|
|
}
|
|
|
|
|
2012-02-22 04:19:22 +07:00
|
|
|
/*
|
|
|
|
* FPU state switching for scheduling.
|
|
|
|
*
|
|
|
|
* This is a two-stage process:
|
|
|
|
*
|
2016-10-14 19:15:31 +07:00
|
|
|
* - switch_fpu_prepare() saves the old state.
|
|
|
|
* This is done within the context of the old process.
|
2012-02-22 04:19:22 +07:00
|
|
|
*
|
2019-04-03 23:41:52 +07:00
|
|
|
* - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
|
|
|
|
* will get loaded on return to userspace, or when the kernel needs it.
|
2019-04-03 23:41:36 +07:00
|
|
|
*
|
2019-04-03 23:41:45 +07:00
|
|
|
* If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
|
|
|
|
* are saved in the current thread's FPU register state.
|
|
|
|
*
|
|
|
|
* If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
|
|
|
|
* hold current()'s FPU registers. It is required to load the
|
|
|
|
* registers before returning to userland or using the content
|
|
|
|
* otherwise.
|
|
|
|
*
|
2019-04-03 23:41:36 +07:00
|
|
|
* The FPU context is only stored/restored for a user task and
|
2019-06-05 00:54:12 +07:00
|
|
|
* PF_KTHREAD is used to distinguish between kernel and user threads.
|
2012-02-22 04:19:22 +07:00
|
|
|
*/
|
2019-03-30 01:52:59 +07:00
|
|
|
static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2019-06-05 00:54:12 +07:00
|
|
|
if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
|
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.
Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.
So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.
Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.
[*] Probably not true anymore, modern instructions always leave the FPU
state intact, even if exceptions are pending: because pending FP
exceptions are posted on the next FP instruction, not asynchronously.
They were truly asynchronous back in the IRQ13 case, and we had to
synchronize with them, but that code is not working anymore: we don't
have IRQ13 mapped in the IDT anymore.
But a cleanup patch is obviously not the place to change subtle behavior.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-27 07:53:16 +07:00
|
|
|
if (!copy_fpregs_to_fpstate(old_fpu))
|
2015-04-23 22:39:04 +07:00
|
|
|
old_fpu->last_cpu = -1;
|
2015-02-07 03:02:03 +07:00
|
|
|
else
|
2015-04-23 22:39:04 +07:00
|
|
|
old_fpu->last_cpu = cpu;
|
2015-02-07 03:02:03 +07:00
|
|
|
|
2015-04-23 17:18:28 +07:00
|
|
|
/* But leave fpu_fpregs_owner_ctx! */
|
2016-06-02 00:42:20 +07:00
|
|
|
trace_x86_fpu_regs_deactivated(old_fpu);
|
2019-04-03 23:41:36 +07:00
|
|
|
}
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-05 20:56:33 +07:00
|
|
|
/*
|
|
|
|
* Misc helper functions:
|
|
|
|
*/
|
|
|
|
|
2012-02-22 04:19:22 +07:00
|
|
|
/*
|
2019-04-03 23:41:52 +07:00
|
|
|
* Load PKRU from the FPU context if available. Delay loading of the
|
|
|
|
* complete FPU state until the return to userland.
|
2012-02-22 04:19:22 +07:00
|
|
|
*/
|
2019-04-03 23:41:52 +07:00
|
|
|
static inline void switch_fpu_finish(struct fpu *new_fpu)
|
2012-02-22 04:19:22 +07:00
|
|
|
{
|
2019-04-03 23:41:44 +07:00
|
|
|
u32 pkru_val = init_pkru_value;
|
|
|
|
struct pkru_state *pk;
|
2016-10-14 19:15:31 +07:00
|
|
|
|
2019-04-03 23:41:44 +07:00
|
|
|
if (!static_cpu_has(X86_FEATURE_FPU))
|
|
|
|
return;
|
2012-02-22 04:19:22 +07:00
|
|
|
|
2019-04-03 23:41:52 +07:00
|
|
|
set_thread_flag(TIF_NEED_FPU_LOAD);
|
2019-04-03 23:41:44 +07:00
|
|
|
|
|
|
|
if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
|
|
|
|
return;
|
2015-04-23 17:31:17 +07:00
|
|
|
|
2019-04-03 23:41:44 +07:00
|
|
|
/*
|
|
|
|
* PKRU state is switched eagerly because it needs to be valid before we
|
|
|
|
* return to userland e.g. for a copy_to_user() operation.
|
|
|
|
*/
|
|
|
|
if (current->mm) {
|
|
|
|
pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
|
|
|
|
if (pk)
|
|
|
|
pkru_val = pk->pkru;
|
|
|
|
}
|
|
|
|
__write_pkru(pkru_val);
|
2012-02-22 04:19:22 +07:00
|
|
|
}
|
|
|
|
|
2015-05-05 20:56:33 +07:00
|
|
|
/*
|
|
|
|
* MXCSR and XCR definitions:
|
|
|
|
*/
|
|
|
|
|
|
|
|
extern unsigned int mxcsr_feature_mask;
|
|
|
|
|
|
|
|
#define XCR_XFEATURE_ENABLED_MASK 0x00000000
|
|
|
|
|
|
|
|
static inline u64 xgetbv(u32 index)
|
|
|
|
{
|
|
|
|
u32 eax, edx;
|
|
|
|
|
|
|
|
asm volatile(".byte 0x0f,0x01,0xd0" /* xgetbv */
|
|
|
|
: "=a" (eax), "=d" (edx)
|
|
|
|
: "c" (index));
|
|
|
|
return eax + ((u64)edx << 32);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void xsetbv(u32 index, u64 value)
|
|
|
|
{
|
|
|
|
u32 eax = value;
|
|
|
|
u32 edx = value >> 32;
|
|
|
|
|
|
|
|
asm volatile(".byte 0x0f,0x01,0xd1" /* xsetbv */
|
|
|
|
: : "a" (eax), "d" (edx), "c" (index));
|
|
|
|
}
|
|
|
|
|
2015-04-24 07:54:44 +07:00
|
|
|
#endif /* _ASM_X86_FPU_INTERNAL_H */
|