Go to file
Thomas Gleixner 6bca69ada4 x86/kvm: Sanitize kvm_async_pf_task_wait()
While working on the entry consolidation I stumbled over the KVM async page
fault handler and kvm_async_pf_task_wait() in particular. It took me a
while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
invocations are just cargo cult programming. Several patches "fixed" RCU
splats by curing the symptoms without noticing that the code is flawed 
from a design perspective.

The main problem is that this async injection is not based on a proper
handshake mechanism and only respects the minimal requirement, i.e. the
guest is not in a state where it has interrupts disabled.

Aside of that the actual code is a convoluted one fits it all swiss army
knife. It is invoked from different places with different RCU constraints:

  1) Host side:

     vcpu_enter_guest()
       kvm_x86_ops->handle_exit()
         kvm_handle_page_fault()
           kvm_async_pf_task_wait()

     The invocation happens from fully preemptible context.

  2) Guest side:

     The async page fault interrupted:

         a) user space

	 b) preemptible kernel code which is not in a RCU read side
	    critical section

     	 c) non-preemtible kernel code or a RCU read side critical section
	    or kernel code with CONFIG_PREEMPTION=n which allows not to
	    differentiate between #2b and #2c.

RCU is watching for:

  #1  The vCPU exited and current is definitely not the idle task

  #2a The #PF entry code on the guest went through enter_from_user_mode()
      which reactivates RCU

  #2b There is no preemptible, interrupts enabled code in the kernel
      which can run with RCU looking away. (The idle task is always
      non preemptible).

I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU
voodoo at all.

In #2c RCU is eventually not watching, but as that state cannot schedule
anyway there is no point to worry about it so it has to invoke
rcu_irq_enter() before running that code. This can be optimized, but this
will be done as an extra step in course of the entry code consolidation
work.

So the proper solution for this is to:

  - Split kvm_async_pf_task_wait() into schedule and halt based waiting
    interfaces which share the enqueueing code.

  - Add comments (condensed form of this changelog) to spare others the
    time waste and pain of reverse engineering all of this with the help of
    uncomprehensible changelogs and code history.

  - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(),
    user mode and schedulable kernel side async page faults (#1, #2a, #2b)

  - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel
    case (#2c).

    For this case also remove the rcu_irq_exit()/enter() pair around the
    halt as it is just a pointless exercise:

       - vCPUs can VMEXIT at any random point and can be scheduled out for
         an arbitrary amount of time by the host and this is not any
         different except that it voluntary triggers the exit via halt.

       - The interrupted context could have RCU watching already. So the
	 rcu_irq_exit() before the halt is not gaining anything aside of
	 confusing the reader. Claiming that this might prevent RCU stalls
	 is just an illusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
2020-05-19 15:53:58 +02:00
arch x86/kvm: Sanitize kvm_async_pf_task_wait() 2020-05-19 15:53:58 +02:00
block bdi: use bdi_dev_name() to get device name 2020-05-09 16:07:39 -06:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto gcc-10: avoid shadowing standard library 'free()' in crypto 2020-05-09 15:58:04 -07:00
Documentation Bugfixes, mostly for ARM and AMD, and more documentation. 2020-05-07 09:50:59 -07:00
drivers IOMMU Fixes for Linux v5.7-rc4 2020-05-10 11:26:23 -07:00
fs block-5.7-2020-05-09 2020-05-10 11:16:07 -07:00
include context_tracking: Make guest_enter/exit() .noinstr ready 2020-05-19 15:47:21 +02:00
init gcc-10: mark more functions __init to avoid section mismatch warnings 2020-05-09 17:50:03 -07:00
ipc ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() 2020-05-07 19:27:20 -07:00
kernel lockdep: Prepare for noinstr sections 2020-05-19 15:47:21 +02:00
lib lockdep: Prepare for noinstr sections 2020-05-19 15:47:21 +02:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm block-5.7-2020-05-09 2020-05-10 11:16:07 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-06 20:53:22 -07:00
samples tracing: Fix doc mistakes in trace sample 2020-05-07 13:32:57 -04:00
scripts vmlinux.lds.h: Create section for protection against instrumentation 2020-05-19 15:47:20 +02:00
security selinux/stable-5.7 PR 20200430 2020-04-30 16:35:45 -07:00
sound sound fixes for 5.7-rc4 2020-05-01 11:05:28 -07:00
tools A set of fixes for x86: 2020-05-10 11:59:53 -07:00
usr kbuild: fix comment about missing include guard detection 2020-04-11 12:09:48 +09:00
virt KVM: arm64: Fix 32bit PC wrap-around 2020-05-01 09:51:08 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2020-04-18 13:49:33 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
.mailmap mailmap: Add Sedat Dilek (replacement for expired email address) 2020-04-11 09:28:34 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Hand MIPS over to Thomas 2020-02-24 22:43:18 -08:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Fixes for an endianness handling bug that prevented mounts on 2020-05-08 10:27:00 -07:00
Makefile Linux 5.7-rc5 2020-05-10 15:16:58 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.