linux_dsm_epyc7002/Documentation
Sean Christopherson eca6be566d KVM: doc: Document the life cycle of a VM and its resources
The series to add memcg accounting to KVM allocations[1] states:

  There are many KVM kernel memory allocations which are tied to the
  life of the VM process and should be charged to the VM process's
  cgroup.

While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process.  This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor.  In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero.  A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.

The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.

Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.

Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.

Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.

[1]https://patchwork.kernel.org/patch/10806707/

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-15 19:24:33 +01:00
..
ABI Documentation/ABI: Correct mlxreg-io KernelVersion for 5.0 2019-01-26 11:30:26 -08:00
accelerators ocxl: Document new OCXL IOCTLs 2018-06-03 20:40:33 +10:00
accounting psi: cgroup support 2018-10-26 16:26:32 -07:00
acpi ACPI: property: graph: Update graph documentation to use generic references 2018-07-23 12:44:52 +02:00
admin-guide iommu/vt-d: Leave scalable mode default off 2019-01-30 17:23:58 +01:00
aoe
arm Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
arm64 arm64 festive updates for 4.21 2018-12-25 17:41:56 -08:00
auxdisplay Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/ 2018-04-12 16:08:02 +02:00
backlight
block block: doc: add slice_idle_us to bfq documentation 2019-01-09 07:38:48 -07:00
blockdev zram: idle writeback fixes and cleanup 2019-01-08 17:15:10 -08:00
bpf bpf, doc: update design qa to reflect kern_version requirement 2019-01-07 15:52:00 -08:00
bus-devices
cdrom Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
cgroup-v1 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-10-25 17:15:46 -07:00
cma
connector
console Documentation: corrections to console/console.txt 2018-08-10 16:09:40 -06:00
core-api XArray: Honour reserved entries in xa_insert 2019-01-06 22:12:58 -05:00
cpu-freq Documentation: cpu-freq: Frequencies aren't always sorted 2018-11-07 13:29:04 +01:00
cpuidle Documentation: admin-guide: PM: Add cpuidle document 2018-12-03 10:03:36 +01:00
crypto crypto: skcipher - remove remnants of internal IV generators 2018-12-23 11:52:45 +08:00
dev-tools A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
device-mapper Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
devicetree A single fix for building DT bindings in-tree. 2019-02-02 10:34:32 -08:00
doc-guide doc:process: add links where missing 2018-12-06 10:21:19 -07:00
driver-api pci-v4.21-changes 2019-01-05 17:57:34 -08:00
driver-model Documentation: driver core: remove use of BUS_ATTR 2019-01-08 15:17:45 +01:00
early-userspace Correct gen_init_cpio tool's documentation 2018-11-25 12:25:53 -07:00
EDID Docs/EDID: Calculate CRC while building the code 2018-11-06 07:36:22 -07:00
extcon
fault-injection Documentation: nvme: Documentation for nvme fault injection 2018-03-26 08:53:43 -06:00
fb fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option 2019-01-16 17:42:35 +01:00
features Documentation/features: Add csky kernel features 2019-01-07 22:22:16 +08:00
filesystems Documentation: driver core: remove use of BUS_ATTR 2019-01-08 15:17:45 +01:00
firmware_class
fmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
fpga docs: fpga: add a document for FPGA Device Feature List (DFL) Framework Overview 2018-07-15 13:55:44 +02:00
gpio Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
gpu A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
hid HID: doc: fix wrong data structure reference for UHID_OUTPUT 2018-12-18 14:55:22 +01:00
hwmon hwmon: Introduce SENSOR_DEVICE_ATTR_{RO, RW, WO} and variants 2018-12-16 15:13:22 -08:00
i2c i2c: add i2c bus driver for NVIDIA GPU 2018-11-09 17:46:43 +01:00
ia64
ide Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
iio
infiniband
input Input: add REL_WHEEL_HI_RES and REL_HWHEEL_HI_RES 2018-12-07 16:27:11 +01:00
ioctl seccomp: add a return code to trap to userspace 2018-12-11 16:28:41 -08:00
isdn Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kbuild kbuild: generate asm-generic wrappers if mandatory headers are missing 2019-01-06 09:46:51 +09:00
kdump
kernel-hacking doc:it_IT: translation for kernel-hacking 2018-07-26 16:21:09 -06:00
laptops platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
leds Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
lightnvm
livepatch livepatch: Remove not longer valid limitations from the documentation 2018-05-24 15:37:57 +02:00
locking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
m68k Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
maintainer docs: Fix more broken references 2018-06-15 18:11:26 -03:00
md
media A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
memory-devices
mic
mips Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
misc-devices pci_endpoint_test: Add 2 ioctl commands 2018-07-19 11:46:57 +01:00
mmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
mtd Documentation: mtd: remove stale pxa3xx NAND controller documentation 2018-09-04 23:37:38 +02:00
namespaces
netlabel Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
networking doc: net: fix bad references to network drivers 2019-01-18 11:03:31 -08:00
nfc
nios2
nvdimm libnvdimm/security: Add documentation for nvdimm security support 2018-12-21 12:44:41 -08:00
nvmem Documentation: nvmem: document cell tables and lookup entries 2018-09-28 15:14:54 +02:00
openrisc
parisc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
PCI pci-v4.20-changes 2018-10-25 06:50:48 -07:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf Documentation: perf: Add documentation for ThunderX2 PMU uncore driver 2018-12-06 12:29:47 +00:00
phy
platform
power Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
powerpc powerpc/fadump: Reservationless firmware assisted dump 2018-12-21 11:32:49 +11:00
pps
process docs: fix Co-Developed-by docs 2019-01-04 13:13:48 -08:00
pti
ptp ptp: Fix documentation to match code. 2018-03-26 12:13:21 -04:00
rapidio
RCU doc: Fix "struction" typo in RCU memory-ordering documentation 2018-11-12 08:56:25 -08:00
riscv perf: riscv: Add Document for Future Porting Guide 2018-06-04 14:02:11 -07:00
s390 Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
scheduler This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
scsi SCSI misc on 20181224 2018-12-28 14:48:06 -08:00
security Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-01-02 09:43:14 -08:00
serial Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
sh sh: remove board_time_init() callback 2018-12-18 16:13:04 +01:00
sound Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
sparc sparc64: Add support for ADI (Application Data Integrity) 2018-03-18 07:38:48 -07:00
sphinx Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
sphinx-static docs: improve readability for people with poorer eyesight 2018-10-07 09:16:50 -06:00
spi Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sysctl fs/dcache: Track & report number of negative dentries 2019-01-30 11:02:11 -08:00
target
thermal Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
timers Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
trace Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
translations doc🇮🇹 add some process/* translations 2018-12-06 10:11:40 -07:00
usb Documentation/usb: Fix typo 2018-11-26 16:56:34 +01:00
userspace-api Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-01-02 09:48:13 -08:00
virtual KVM: doc: Document the life cycle of a VM and its resources 2019-03-15 19:24:33 +01:00
vm A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
w1 Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
watchdog watchdog: docs: kernel-api: don't reference removed functions 2018-12-24 13:15:06 +01:00
wimax
x86 x86/resctrl: Avoid confusion over the new X86_RESCTRL config 2019-02-02 10:34:52 +01:00
xilinx Documentation: xilinx: Add documentation for eemi APIs 2018-10-09 13:26:05 +02:00
xtensa
.gitignore
atomic_bitops.txt
atomic_t.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
Changes
clearing-warn-once.txt
CodingStyle
conf.py This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt Documentation: remove stale firmware API reference 2018-05-14 16:44:41 +02:00
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt dma-mapping: deprecate dma_zalloc_coherent 2018-12-20 08:14:09 +01:00
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff
efi-stub.txt efi_stub: update documentation on dtb= parameter 2018-09-09 14:46:44 -06:00
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst docs: tidy up TOCs and refs to license-rules.rst 2018-08-31 16:50:50 -06:00
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt doc: Update removal of RCU-bh/sched update machinery 2018-08-30 10:59:48 -07:00
kobject.txt kref/kobject: Improve documentation 2018-12-06 13:57:03 +01:00
kprobes.txt kprobes/Documentation: Fix various typos 2018-06-22 11:10:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
Makefile kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
memory-barriers.txt Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
men-chameleon-bus.txt
nommu-mmap.txt Documentation: nommu-map: Fix duplicate word typo 2018-06-26 09:01:27 -06:00
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pnp.txt
preempt-locking.txt Documentation: preempt-locking: Use better example 2018-10-12 11:35:47 -06:00
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: Fix several typos in documentation 2018-06-15 13:36:08 +02:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
sgi-ioc4.txt
siphash.txt
SM501.txt
smsc_ece1099.txt
speculation.txt
static-keys.txt Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
SubmittingPatches
svga.txt
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB 2018-10-11 11:28:53 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt vfio/mdev: Check globally for duplicate devices 2018-06-08 10:24:27 -06:00
vfio.txt vfio: fix documentation 2018-05-08 09:16:41 -06:00
video-output.txt
xillybus.txt
xz.txt
zorro.txt