linux_dsm_epyc7002/arch/x86/kernel
Ingo Molnar 7366ed771f x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
So 6 years ago we made the FPU fpstate dynamically allocated:

  aa283f4927 ("x86, fpu: lazy allocation of FPU area - v5")
  61c4628b53 ("x86, fpu: split FPU state from task struct - v5")

In hindsight this was a mistake:

   - it complicated context allocation failure handling, such as:

		/* kthread execs. TODO: cleanup this horror. */
		if (WARN_ON(fpstate_alloc_init(fpu)))
			force_sig(SIGKILL, tsk);

   - it caused us to enable irqs in fpu__restore():

                local_irq_enable();
                /*
                 * does a slab alloc which can sleep
                 */
                if (fpstate_alloc_init(fpu)) {
                        /*
                         * ran out of memory!
                         */
                        do_group_exit(SIGKILL);
                        return;
                }
                local_irq_disable();

   - it (slightly) slowed down task creation/destruction by adding
     slab allocation/free pattens.

   - it made access to context contents (slightly) slower by adding
     one more pointer dereference.

The motivation for the dynamic allocation was two-fold:

   - reduce memory consumption by non-FPU tasks

   - allocate and handle only the necessary amount of context for
     various XSAVE processors that have varying hardware frame
     sizes.

These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.

For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.

Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.

So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.

We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.

This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
..
acpi Initial ACPI support for arm64: 2015-04-24 08:23:45 -07:00
apic This is the final removal (after several years!) of the obsolete cpus_* 2015-04-20 10:19:03 -07:00
cpu x86/fpu: Move various internal function prototypes to fpu/internal.h 2015-05-19 15:47:48 +02:00
fpu x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) 2015-05-19 15:47:49 +02:00
kprobes Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 14:37:47 -07:00
.gitignore
alternative.c x86/alternatives: Guard NOPs optimization 2015-04-06 09:24:09 +02:00
amd_gart_64.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
amd_nb.c x86, amd_nb: Add device IDs to NB tables for F15h M60h 2014-10-20 14:18:45 +02:00
apb_timer.c x86/platform: Remove unused function from apb_timer.c 2014-12-23 10:43:35 +01:00
aperture_64.c
apm_32.c cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic 2014-11-12 21:17:27 +01:00
asm-offsets_32.c x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better 2015-03-17 09:25:29 +01:00
asm-offsets_64.c x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0 2015-03-06 08:32:58 +01:00
asm-offsets.c
audit_64.c x86: hook up execveat system call 2014-12-13 12:42:51 -08:00
bootflag.c
check.c
cpuid.c x86, cpuid: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:51 -07:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
devicetree.c x86/mm: Use early_memunmap() instead of early_iounmap() 2015-02-24 15:58:06 +01:00
doublefault.c
dumpstack_32.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
dumpstack_64.c x86/kernel: Use kstack_end() in dumpstack_64.c 2015-02-23 18:37:13 +01:00
dumpstack.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
e820.c Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-18 11:42:49 -04:00
early_printk.c TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
early-quirks.c drm/i915/skl: Add the additional graphics stolen sizes 2014-09-24 14:47:39 +02:00
entry_32.S x86/asm/entry/32: Tidy up JNZ instructions after TESTs 2015-04-11 10:40:02 +02:00
entry_64.S x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue 2015-04-26 17:57:38 -07:00
espfix_64.c x86, espfix: Remove stale ptemask 2014-11-11 17:57:46 +01:00
ftrace.c module: remove mod arg from module_free, rename module_memfree(). 2015-01-20 11:38:33 +10:30
head32.c x86: Store a per-cpu shadow copy of CR4 2015-02-04 12:10:42 +01:00
head64.c Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:19:10 -07:00
head_32.S x86/asm/boot: Use already defined KEEP_SEGMENTS macro in head_{32,64}.S 2015-02-19 10:05:04 +01:00
head_64.S x86/asm: Optimize unnecessarily wide TEST instructions 2015-03-07 11:12:43 +01:00
head.c
hpet.c kernel.h: remove ancient __FUNCTION__ hack 2015-02-12 18:54:13 -08:00
hw_breakpoint.c perf/x86: Remove get_hbp_len and replace with bp_len 2014-12-03 15:14:30 +01:00
i386_ksyms_32.c
i8237.c
i8253.c
i8259.c x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts 2014-10-28 12:01:08 +01:00
io_delay.c
ioport.c x86/asm/entry: Rename 'init_tss' to 'cpu_tss' 2015-03-06 08:32:58 +01:00
iosf_mbi.c x86/platform/intel/iosf: Add debugfs config option for IOSF 2014-09-19 13:08:43 +02:00
irq_32.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
irq_64.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
irq_work.c x86: Tell irq work about self IPI support 2014-09-13 18:38:29 +02:00
irq.c x86: fix up obsolete cpu function usage. 2015-03-05 15:25:05 +10:30
irqinit.c x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout 2015-04-08 09:02:13 +02:00
jump_label.c
kdebugfs.c
kexec-bzimage64.c kexec-bzimage64: fix sparse warnings 2014-10-14 02:18:21 +02:00
kgdb.c Linux 4.0-rc7 2015-04-08 09:01:54 +02:00
ksysfs.c
kvm.c watchdog: introduce the hardlockup_detector_disable() function 2015-04-14 16:48:59 -07:00
kvmclock.c x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit 2014-12-10 12:49:39 +01:00
ldt.c
livepatch.c livepatch: kernel: add support for live patching 2014-12-22 15:40:49 +01:00
machine_kexec_32.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
machine_kexec_64.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
Makefile x86/fpu: Move i387.c and xsave.c to arch/x86/kernel/fpu/ 2015-05-19 15:47:15 +02:00
mcount_64.S ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter 2014-12-01 14:08:58 -05:00
mmconf-fam10h_64.c
module.c x86/mm/KASLR: Propagate KASLR status to kernel proper 2015-04-03 15:26:15 +02:00
mpparse.c x86, apic: Remove mps_oem_check callback 2014-07-31 08:05:42 -07:00
msr.c x86, msr: Use seek definitions instead of hard-coded values 2014-10-17 13:40:55 -07:00
nmi_selftest.c
nmi.c
paravirt_patch_32.c
paravirt_patch_64.c x86_64/entry/xen: Do not invoke espfix64 on Xen 2014-07-28 15:25:40 -07:00
paravirt-spinlocks.c
paravirt.c x86: expose number of page table levels on Kconfig level 2015-04-14 16:49:02 -07:00
pci-calgary_64.c
pci-dma.c arch/x86/kernel/pci-dma.c: fix dma_generic_alloc_coherent() when CONFIG_DMA_CMA is enabled 2014-06-04 16:53:57 -07:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
pcspeaker.c
perf_regs.c perf/x86/64: Report regs_user->ax too in get_regs_user() 2015-04-11 13:08:53 +02:00
pmc_atom.c x86: pmc_atom: Expose contents of PSS 2015-01-20 12:50:14 +01:00
pmem.c x86/mm: Add support for the non-standard protected e820 type 2015-04-01 17:02:43 +02:00
probe_roms.c
process_32.c x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
process_64.c x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
process.c x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
ptrace.c x86/fpu: Rename regset FPU register accessors 2015-05-19 15:47:35 +02:00
pvclock.c x86: pvclock: Really remove the sched notifier for cross-cpu migrations 2015-04-27 15:49:30 +02:00
quirks.c x86: HPET force enable for e6xx based systems 2014-09-15 17:53:35 -07:00
reboot_fixups_32.c
reboot.c x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk 2015-04-01 14:08:09 +02:00
relocate_kernel_32.S x86/asm: Optimize unnecessarily wide TEST instructions 2015-03-07 11:12:43 +01:00
relocate_kernel_64.S x86/asm: Replace "MOVQ $imm, %reg" with MOVL 2015-04-01 13:17:39 +02:00
resource.c x86: don't exclude low BIOS area when allocating address space for non-PCI cards 2014-07-16 12:29:36 -06:00
rtc.c kernel.h: remove ancient __FUNCTION__ hack 2015-02-12 18:54:13 -08:00
setup_percpu.c x86: Convert a few more per-CPU items to read-mostly ones 2014-11-04 20:13:28 +01:00
setup.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:31:32 -07:00
signal.c x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
smp.c
smpboot.c x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
stacktrace.c
step.c
sys_x86_64.c x86/mm: Improve AMD Bulldozer ASLR workaround 2015-03-31 10:01:17 +02:00
syscall_32.c x86/compat: Merge native and compat 32-bit syscall tables 2015-03-04 06:16:21 +01:00
syscall_64.c
sysfb_efi.c
sysfb_simplefb.c x86/simplefb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
sysfb.c x86/sysfb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
tboot.c
tce_64.c
test_nx.c
test_rodata.c treewide: Fix typo in printk messages 2015-03-06 23:05:39 +01:00
time.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
tls.c x86, tls: Interpret an all-zero struct user_desc as "no segment" 2015-01-22 21:45:07 +01:00
tls.h
topology.c
trace_clock.c
tracepoint.c
traps.c x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) 2015-05-19 15:47:49 +02:00
tsc_msr.c
tsc_sync.c
tsc.c x86/tsc: Change Fast TSC calibration failed from error to info 2015-01-23 10:53:52 +01:00
uprobes.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
verify_cpu.S
vm86_32.c x86/asm/entry: Rename 'init_tss' to 'cpu_tss' 2015-03-06 08:32:58 +01:00
vmlinux.lds.S x86-64: Use RIP-relative addressing for most per-CPU accesses 2014-11-04 20:43:14 +01:00
vsmp_64.c x86/apic/vsmp: Make is_vsmp_box() static 2014-08-01 15:09:45 -07:00
vsyscall_64.c x86_64/vsyscall: Restore orig_ax after vsyscall seccomp 2014-11-10 10:46:35 +01:00
vsyscall_emu_64.S
vsyscall_gtod.c time: Rename timekeeper::tkr to timekeeper::tkr_mono 2015-03-27 09:45:06 +01:00
vsyscall_trace.h
x86_init.c Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" 2014-11-11 15:14:30 -07:00
x8664_ksyms_64.c x86_64: kasan: add interceptors for memset/memmove/memcpy functions 2015-02-13 21:21:41 -08:00