linux_dsm_epyc7002/arch/x86/kernel
Thomas Gleixner 6bca69ada4 x86/kvm: Sanitize kvm_async_pf_task_wait()
While working on the entry consolidation I stumbled over the KVM async page
fault handler and kvm_async_pf_task_wait() in particular. It took me a
while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
invocations are just cargo cult programming. Several patches "fixed" RCU
splats by curing the symptoms without noticing that the code is flawed 
from a design perspective.

The main problem is that this async injection is not based on a proper
handshake mechanism and only respects the minimal requirement, i.e. the
guest is not in a state where it has interrupts disabled.

Aside of that the actual code is a convoluted one fits it all swiss army
knife. It is invoked from different places with different RCU constraints:

  1) Host side:

     vcpu_enter_guest()
       kvm_x86_ops->handle_exit()
         kvm_handle_page_fault()
           kvm_async_pf_task_wait()

     The invocation happens from fully preemptible context.

  2) Guest side:

     The async page fault interrupted:

         a) user space

	 b) preemptible kernel code which is not in a RCU read side
	    critical section

     	 c) non-preemtible kernel code or a RCU read side critical section
	    or kernel code with CONFIG_PREEMPTION=n which allows not to
	    differentiate between #2b and #2c.

RCU is watching for:

  #1  The vCPU exited and current is definitely not the idle task

  #2a The #PF entry code on the guest went through enter_from_user_mode()
      which reactivates RCU

  #2b There is no preemptible, interrupts enabled code in the kernel
      which can run with RCU looking away. (The idle task is always
      non preemptible).

I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU
voodoo at all.

In #2c RCU is eventually not watching, but as that state cannot schedule
anyway there is no point to worry about it so it has to invoke
rcu_irq_enter() before running that code. This can be optimized, but this
will be done as an extra step in course of the entry code consolidation
work.

So the proper solution for this is to:

  - Split kvm_async_pf_task_wait() into schedule and halt based waiting
    interfaces which share the enqueueing code.

  - Add comments (condensed form of this changelog) to spare others the
    time waste and pain of reverse engineering all of this with the help of
    uncomprehensible changelogs and code history.

  - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(),
    user mode and schedulable kernel side async page faults (#1, #2a, #2b)

  - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel
    case (#2c).

    For this case also remove the rcu_irq_exit()/enter() pair around the
    halt as it is just a pointless exercise:

       - vCPUs can VMEXIT at any random point and can be scheduled out for
         an arbitrary amount of time by the host and this is not any
         different except that it voluntary triggers the exit via halt.

       - The interrupted context could have RCU watching already. So the
	 rcu_irq_exit() before the halt is not gaining anything aside of
	 confusing the reader. Claiming that this might prevent RCU stalls
	 is just an illusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
2020-05-19 15:53:58 +02:00
..
acpi ACPI, x86/boot: make acpi_nobgrt static 2020-04-08 14:32:03 +02:00
apic x86/apic: Move TSC deadline timer debug printk 2020-05-01 19:15:41 +02:00
cpu A set of fixes for x86 and objtool: 2020-04-19 11:58:32 -07:00
fpu x86/pkeys: Add check for pkey "overflow" 2020-02-24 20:25:21 +01:00
kprobes x86/optprobe: Fix OPTPROBE vs UACCESS 2020-03-20 13:06:22 +01:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
alternative.c x86/alternatives: Mark text_poke_loc_init() static 2020-03-25 12:42:35 +01:00
amd_gart_64.c x86/mm: thread pgprot_t through init_memory_mapping() 2020-04-10 15:36:21 -07:00
amd_nb.c x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessor 2020-03-17 10:25:58 +01:00
apb_timer.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
aperture_64.c x86/gart: Exclude GART aperture from kcore 2019-03-23 12:11:49 +01:00
apm_32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 2019-05-24 17:39:02 +02:00
asm-offsets_32.c x86 entry code updates: 2020-03-30 19:14:28 -07:00
asm-offsets_64.c x86/entry: Move max syscall number calculation to syscallhdr.sh 2020-03-21 16:03:21 +01:00
asm-offsets.c efi/x86: Avoid using code32_start 2020-03-08 09:58:17 +01:00
audit_64.c
bootflag.c
check.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
cpuid.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 142 2019-05-30 11:25:17 -07:00
crash_core_32.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
crash_core_64.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
crash_dump_32.c
crash_dump_64.c fs/core/vmcore: Move sev_active() reference to x86 arch code 2019-08-09 22:52:10 +10:00
crash.c x86/crash: Use resource_size() 2020-01-09 14:40:03 +01:00
devicetree.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
doublefault_32.c x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit 2019-11-26 22:00:04 +01:00
dumpstack_32.c x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area 2019-11-26 21:53:34 +01:00
dumpstack_64.c x86/unwind: Prevent false warnings for non-current tasks 2020-04-25 12:22:28 +02:00
dumpstack.c x86/kasan: Print original address on #GP 2019-12-31 13:15:38 +01:00
e820.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
early_printk.c efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation 2019-02-04 08:27:30 +01:00
early-quirks.c x86/intel: Disable HPET on Intel Ice Lake platforms 2019-11-29 12:17:58 +01:00
ebda.c
eisa.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243 2019-06-19 17:09:07 +02:00
espfix_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
ftrace_32.S x86/ftrace: Get rid of function_hook 2019-10-25 10:52:22 +02:00
ftrace_64.S New tracing features: 2019-11-27 11:42:01 -08:00
ftrace.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-01-28 09:44:15 -08:00
head32.c x86/boot: Mostly revert commit ae7e1238e6 ("Add ACPI RSDP address to setup_header") 2018-11-20 09:43:10 +01:00
head64.c x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area 2019-10-11 18:38:15 +02:00
head_32.S x86/boot: Remove KEEP_SEGMENTS support 2020-02-22 23:37:37 +01:00
head_64.S x86/asm/64: Change all ENTRY+END to SYM_CODE_* 2019-10-18 11:58:26 +02:00
hpet.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
hw_breakpoint.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
i8237.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
i8253.c x86/timer: Skip PIT initialization on modern chipsets 2019-06-29 11:35:35 +02:00
i8259.c
idt.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:22:57 -07:00
ima_arch.c EFI updates for v5.7: 2020-02-26 15:21:22 +01:00
io_delay.c x86/io_delay: Define IO_DELAY macros in C instead of Kconfig 2019-05-24 08:46:06 +02:00
ioport.c x86/iopl: Include prototype header for ksys_ioperm() 2020-02-17 16:36:53 +01:00
irq_32.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_64.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_work.c
irq.c x86/irq: Remove useless return value from do_IRQ() 2020-02-27 14:48:40 +01:00
irqflags.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
irqinit.c x86: Replace setup_irq() by request_irq() 2020-03-21 15:15:47 +01:00
itmt.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
jailhouse.c x86/jailhouse: Only enable platform UARTs if available 2019-10-10 15:43:59 +02:00
jump_label.c x86/jump_label: Move 'inline' keyword placement 2020-03-27 11:05:41 +01:00
kdebugfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kexec-bzimage64.c efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specific 2020-02-23 21:59:42 +01:00
kgdb.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
ksysfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kvm.c x86/kvm: Sanitize kvm_async_pf_task_wait() 2020-05-19 15:53:58 +02:00
kvmclock.c x86/vdso: Use generic VDSO clock mode storage 2020-02-17 14:40:23 +01:00
ldt.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
livepatch.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
machine_kexec_32.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
machine_kexec_64.c x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y 2019-12-23 12:58:41 +01:00
Makefile Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2020-04-02 14:49:46 -07:00
mmconf-fam10h_64.c
module.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mpparse.c x86/boot: Fix memory leak in default_get_smp_config() 2019-07-16 23:13:48 +02:00
msr.c x86/msr: Restrict MSR access when the kernel is locked down 2019-08-19 21:54:16 -07:00
nmi_selftest.c
nmi.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
paravirt_patch.c x86/paravirt: Standardize 'insn_buff' variable names 2019-04-29 16:05:49 +02:00
paravirt-spinlocks.c
paravirt.c x86/ioperm: Add new paravirt function update_io_bitmap() 2020-02-29 12:43:09 +01:00
pci-dma.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
pci-iommu_table.c
pci-swiotlb.c dma-mapping: fix filename references 2019-09-03 08:36:30 +02:00
pcspeaker.c
perf_regs.c perf/x86/regs: Check reserved bits 2019-06-24 19:19:24 +02:00
platform-quirks.c
pmem.c
probe_roms.c
process_32.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
process_64.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
process.c Support for "split lock" detection: 2020-03-30 19:35:52 -07:00
process.h x86: Use the correct SPDX License Identifier in headers 2019-10-01 20:31:35 +02:00
ptrace.c x86/ptrace: Document FSBASE and GSBASE ABI oddities 2019-11-26 22:00:12 +01:00
pvclock.c x86/vdso: Use generic VDSO clock mode storage 2020-02-17 14:40:23 +01:00
quirks.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
reboot_fixups_32.c
reboot.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
relocate_kernel_32.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
relocate_kernel_64.S x86/kexec: Make relocate_kernel_64.S objtool clean 2020-03-25 18:28:28 +01:00
resource.c
rtc.c
setup_percpu.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
setup.c mm: hugetlb: optionally allocate gigantic hugepages using cma 2020-04-10 15:36:21 -07:00
signal_compat.c
signal.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
smp.c x86/smp: Move smp_function_call implementations into IPI code 2019-07-25 16:12:01 +02:00
smpboot.c x86, sched: Move check for CPU type to caller function 2020-04-22 23:10:13 +02:00
stacktrace.c x86 user stack frame reads: switch to explicit __get_user() 2020-02-15 17:26:26 -05:00
step.c
sys_ia32.c x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments 2020-03-21 16:03:24 +01:00
sys_x86_64.c x86: Remove unneeded includes 2020-03-21 16:03:25 +01:00
sysfb_efi.c x86/sysfb_efi: Add quirks for some devices with swapped width and height 2019-07-22 10:47:11 +02:00
sysfb_simplefb.c x86/sysfb: Fix check for bad VRAM size 2020-01-20 10:57:53 +01:00
sysfb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tboot.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
time.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
tls.c x86/tls: Fix possible spectre-v1 in do_get_thread_area() 2019-06-27 23:48:04 +02:00
tls.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193 2019-05-30 11:29:21 -07:00
topology.c x86/smp: Replace cpu_up/down() with add/remove_cpu() 2020-03-25 12:59:35 +01:00
trace_clock.c
tracepoint.c x86/kernel: Fix more -Wmissing-prototypes warnings 2018-12-08 12:24:35 +01:00
traps.c x86/kvm: Handle async page faults directly through do_page_fault() 2020-05-19 15:53:57 +02:00
tsc_msr.c x86 timer updates: 2020-03-30 19:55:39 -07:00
tsc_sync.c x86: Fix a handful of typos 2020-02-16 20:58:06 +01:00
tsc.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 11:04:05 -07:00
umip.c x86/umip: Make umip_insns static 2020-04-15 11:13:12 +02:00
unwind_frame.c x86/unwind: Prevent false warnings for non-current tasks 2020-04-25 12:22:28 +02:00
unwind_guess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
unwind_orc.c x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES 2020-05-03 13:23:28 +02:00
uprobes.c x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments 2019-10-27 09:00:28 +01:00
verify_cpu.S x86/asm: Annotate local pseudo-functions 2019-10-18 10:04:04 +02:00
vm86_32.c x86: switch save_v86_state() to unsafe_put_user() 2020-03-18 20:36:01 -04:00
vmlinux.lds.S Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-31 10:51:12 -07:00
vsmp_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 346 2019-06-05 17:37:08 +02:00
x86_init.c x86/kvm: Handle async page faults directly through do_page_fault() 2020-05-19 15:53:57 +02:00