linux_dsm_epyc7002/Documentation
Lai Jiangshan 6b6ff4d1f3 KVM: X86: MMU: Use the correct inherited permissions to get shadow page
commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.

When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions.  Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different.  KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.

For example, here is a shared pagetable:

   pgd[]   pud[]        pmd[]            virtual address pointers
                     /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
        /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
   pgd-|           (shared pmd[] as above)
        \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                     \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)

  pud1 and pud2 point to the same pmd table, so:
  - ptr1 and ptr3 points to the same page.
  - ptr2 and ptr4 points to the same page.

(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)

- First, the guest reads from ptr1 first and KVM prepares a shadow
  page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
  "u--" comes from the effective permissions of pgd, pud1 and
  pmd1, which are stored in pt->access.  "u--" is used also to get
  the pagetable for pud1, instead of "uw-".

- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
  The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
  even though the pud1 pmd (because of the incorrect argument to
  kvm_mmu_get_page in the previous step) has role.access="u--".

- Then the guest reads from ptr3.  The hypervisor reuses pud1's
  shadow pmd for pud2, because both use "u--" for their permissions.
  Thus, the shadow pmd already includes entries for both pmd1 and pmd2.

- At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
  because pud1's shadow page structures included an "uw-" page even though
  its role.access was "u--".

Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.

In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.

The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.

The problem had existed long before the commit 41074d07c7 ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit.  So there is no fixes tag here.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Fixes: cea0f0e7ea ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 12:01:40 +02:00
..
ABI drivers/base/memory: don't store phys_device in memory blocks 2021-03-17 17:06:25 +01:00
accounting
admin-guide drivers/base/memory: don't store phys_device in memory blocks 2021-03-17 17:06:25 +01:00
arm ARM: 9012/1: move device tree mapping out of linear region 2021-05-19 10:13:18 +02:00
arm64 arm64: Add workaround for Arm Cortex-A77 erratum 1508412 2020-10-29 12:56:01 +00:00
block block-5.10-2020-10-24 2020-10-24 12:46:42 -07:00
bpf bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org. 2020-10-11 22:05:47 +02:00
cdrom
core-api dma-mapping: document dma_{alloc,free}_pages 2020-10-23 12:07:46 +02:00
cpu-freq
crypto
dev-tools linux-kselftest-kunit-fixes-5.10-rc5 2020-11-18 11:57:55 -08:00
devicetree dt-bindings: serial: 8250: Remove duplicated compatible strings 2021-05-19 10:13:19 +02:00
doc-guide docs: kerneldoc.py: add support for kerneldoc -nosymbol 2020-10-15 07:49:38 +02:00
driver-api firmware: xilinx: Remove zynqmp_pm_get_eemi_ops() in IS_REACHABLE(CONFIG_ZYNQMP_FIRMWARE) 2021-05-14 09:50:05 +02:00
fault-injection A handful of late-arriving documentation fixes. 2020-10-23 17:13:53 -07:00
fb drm fixes (round two) for 5.10-rc1 2020-10-23 13:56:34 -07:00
features s390 updates for the 5.10 merge window 2020-10-16 12:36:38 -07:00
filesystems seq_file: document how per-entry resources are managed. 2021-03-04 11:38:37 +01:00
firmware_class
firmware-guide Documentation: ACPI: fix spelling mistakes 2020-11-10 18:48:56 +01:00
fpga
gpu drm: Use USB controller's DMA mask when importing dmabufs 2021-03-17 17:06:19 +01:00
hid
hwmon docs: hwmon: mp2975.rst: address some html build warnings 2020-10-28 11:26:10 -06:00
i2c Documentation: i2c: add testunit docs to index 2020-10-05 22:57:45 +02:00
ia64
ide
iio
infiniband
input
isdn
kbuild kbuild: remove unused OBJSIZE 2020-11-02 11:31:00 +09:00
kernel-hacking
leds docs: leds: index.rst: add a missing file 2020-11-02 13:45:37 +01:00
litmus-tests
livepatch
locking Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g 2020-12-30 11:54:11 +01:00
m68k
maintainer
mhi
mips dt: Remove booting-without-of.rst 2020-10-13 13:33:16 -05:00
misc-devices Documentation: remove mic/index from misc-devices/index.rst 2020-11-04 11:38:32 +01:00
netlabel
networking docs: networking: drop special stable handling 2021-03-17 17:06:13 +01:00
nios2
nvdimm
openrisc
parisc
PCI Documentation: better locations for sysfs-pci, sysfs-tagging 2020-10-09 09:33:23 -06:00
pcmcia
power PCI/PM: Rename pci_dev.d3_delay to d3hot_delay 2020-09-29 14:21:50 -05:00
powerpc powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls 2021-05-26 12:06:53 +02:00
process docs: networking: drop special stable handling 2021-03-17 17:06:13 +01:00
RCU Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2020-10-09 08:21:56 +02:00
riscv
s390
scheduler
scsi scsi: libsas: Introduce a _gfp() variant of event notifiers 2021-03-25 09:04:11 +01:00
security watch_queue: Drop references to /dev/watch_queue 2021-03-04 11:37:59 +01:00
sh dt: Remove booting-without-of.rst 2020-10-13 13:33:16 -05:00
sound ALSA: doc: Fix reference to mixart.rst 2021-01-19 18:27:17 +01:00
sparc
sphinx tweewide: Fix most Shebang lines 2021-05-22 11:40:55 +02:00
sphinx-static
spi
staging
target tweewide: Fix most Shebang lines 2021-05-22 11:40:55 +02:00
timers
trace tweewide: Fix most Shebang lines 2021-05-22 11:40:55 +02:00
translations net: switch to the kernel.org patchwork instance 2020-11-11 17:12:00 -08:00
usb
userspace-api Documentation: seccomp: Fix user notification documentation 2021-06-03 09:00:31 +02:00
virt KVM: X86: MMU: Use the correct inherited permissions to get shadow page 2021-06-16 12:01:40 +02:00
vm A handful of late-arriving documentation fixes. 2020-10-23 17:13:53 -07:00
w1 docs: w1: w1_therm: Fix broken xref, mistakes, clarify text 2020-10-08 09:47:15 +02:00
watchdog
x86 x86/CPU/AMD: Save AMD NodeId as cpu_die_id 2020-12-30 11:54:29 +01:00
xtensa xtensa: fix TLBTEMP area placement 2020-11-16 02:13:15 -08:00
.gitignore
asm-annotations.rst x86/entry: Emit a symbol for register restoring thunk 2021-02-03 23:28:40 +01:00
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py This pull contains a series of warning fixes from Mauro; once applied, the 2020-11-03 13:14:14 -08:00
COPYING-logo
docutils.conf
dontdiff kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
index.rst
Kconfig docs: Kconfig/Makefile: add a check for broken ABI files 2020-10-30 13:08:07 +01:00
logo.gif
Makefile A small number of fixes, plus a build tweak to respect the desire for 2020-11-03 09:57:30 -08:00
memory-barriers.txt
SubmittingPatches
watch_queue.rst