Commit Graph

589 Commits

Author SHA1 Message Date
Linus Torvalds
15303ba5d1 KVM changes for 4.16
ARM:
 - Include icache invalidation optimizations, improving VM startup time
 
 - Support for forwarded level-triggered interrupts, improving
   performance for timers and passthrough platform devices
 
 - A small fix for power-management notifiers, and some cosmetic changes
 
 PPC:
 - Add MMIO emulation for vector loads and stores
 
 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
   requiring the complex thread synchronization of older CPU versions
 
 - Improve the handling of escalation interrupts with the XIVE interrupt
   controller
 
 - Support decrement register migration
 
 - Various cleanups and bugfixes.
 
 s390:
 - Cornelia Huck passed maintainership to Janosch Frank
 
 - Exitless interrupts for emulated devices
 
 - Cleanup of cpuflag handling
 
 - kvm_stat counter improvements
 
 - VSIE improvements
 
 - mm cleanup
 
 x86:
 - Hypervisor part of SEV
 
 - UMIP, RDPID, and MSR_SMI_COUNT emulation
 
 - Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
 
 - Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
   features
 
 - Show vcpu id in its anonymous inode name
 
 - Many fixes and cleanups
 
 - Per-VCPU MSR bitmaps (already merged through x86/pti branch)
 
 - Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
 Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
 Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
 xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
 /9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
 FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
 =C/uD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:

   - icache invalidation optimizations, improving VM startup time

   - support for forwarded level-triggered interrupts, improving
     performance for timers and passthrough platform devices

   - a small fix for power-management notifiers, and some cosmetic
     changes

  PPC:

   - add MMIO emulation for vector loads and stores

   - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
     requiring the complex thread synchronization of older CPU versions

   - improve the handling of escalation interrupts with the XIVE
     interrupt controller

   - support decrement register migration

   - various cleanups and bugfixes.

  s390:

   - Cornelia Huck passed maintainership to Janosch Frank

   - exitless interrupts for emulated devices

   - cleanup of cpuflag handling

   - kvm_stat counter improvements

   - VSIE improvements

   - mm cleanup

  x86:

   - hypervisor part of SEV

   - UMIP, RDPID, and MSR_SMI_COUNT emulation

   - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit

   - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
     AVX512 features

   - show vcpu id in its anonymous inode name

   - many fixes and cleanups

   - per-VCPU MSR bitmaps (already merged through x86/pti branch)

   - stable KVM clock when nesting on Hyper-V (merged through
     x86/hyperv)"

* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
  KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
  KVM: PPC: Book3S HV: Branch inside feature section
  KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
  KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
  KVM: PPC: Book3S PR: Fix broken select due to misspelling
  KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
  KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
  KVM: PPC: Book3S HV: Drop locks before reading guest memory
  kvm: x86: remove efer_reload entry in kvm_vcpu_stat
  KVM: x86: AMD Processor Topology Information
  x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
  kvm: embed vcpu id to dentry of vcpu anon inode
  kvm: Map PFN-type memory regions as writable (if possible)
  x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
  KVM: arm/arm64: Fixup userspace irqchip static key optimization
  KVM: arm/arm64: Fix userspace_irqchip_in_use counting
  KVM: arm/arm64: Fix incorrect timer_is_pending logic
  MAINTAINERS: update KVM/s390 maintainers
  MAINTAINERS: add Halil as additional vfio-ccw maintainer
  MAINTAINERS: add David as a reviewer for KVM/s390
  ...
2018-02-10 13:16:35 -08:00
Linus Torvalds
c013632192 2nd set of arm64 updates for 4.16:
Spectre v1 mitigation:
 - back-end version of array_index_mask_nospec()
 - masking of the syscall number to restrict speculation through the
   syscall table
 - masking of __user pointers prior to deference in uaccess routines
 
 Spectre v2 mitigation update:
 - using the new firmware SMC calling convention specification update
 - removing the current PSCI GET_VERSION firmware call mitigation as
   vendors are deploying new SMCCC-capable firmware
 - additional branch predictor hardening for synchronous exceptions and
   interrupts while in user mode
 
 Meltdown v3 mitigation update for Cavium Thunder X: unaffected but
 hardware erratum gets in the way. The kernel now starts with the page
 tables mapped as global and switches to non-global if kpti needs to be
 enabled.
 
 Other:
 - Theoretical trylock bug fixed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlp8lqcACgkQa9axLQDI
 XvH2lxAAnsYqthpGQ11MtDJB+/UiBAFkg9QWPDkwrBDvNhgpll+J0VQuCN1QJ2GX
 qQ8rkv8uV+y4Fqr8hORGJy5At+0aI63ZCJ72RGkZTzJAtbFbFGIDHP7RhAEIGJBS
 Lk9kDZ7k39wLEx30UXIFYTTVzyHar397TdI7vkTcngiTzZ8MdFATfN/hiKO906q3
 14pYnU9Um4aHUdcJ+FocL3dxvdgniuuMBWoNiYXyOCZXjmbQOnDNU2UrICroV8lS
 mB+IHNEhX1Gl35QzNBtC0ET+aySfHBMJmM5oln+uVUljIGx6En1WLj6mrHYcx8U2
 rIBm5qO/X/4iuzYPGkxwQtpjq3wPYxsSUnMdKJrsUZqAfy2QeIhFx6XUtJsZPB2J
 /lgls5xSXMOS7oiOQtmVjcDLBURDmYXGwljXR4n4jLm4CT1V9qSLcKHu1gdFU9Mq
 VuMUdPOnQub1vqKndi154IoYDTo21jAib2ktbcxpJfSJnDYoit4Gtnv7eWY+M3Pd
 Toaxi8htM2HSRwbvslHYGW8ZcVpI79Jit+ti7CsFg7m9Lvgs0zxcnNui4uPYDymT
 jh2JYxuirIJbX9aGGhnmkNhq9REaeZJg9LA2JM8S77FCHN3bnlSdaG6wy899J6EI
 lK4anCuPQKKKhUia/dc1MeKwrmmC18EfPyGUkOzywg/jGwGCmZM=
 =Y0TT
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Catalin Marinas:
 "As I mentioned in the last pull request, there's a second batch of
  security updates for arm64 with mitigations for Spectre/v1 and an
  improved one for Spectre/v2 (via a newly defined firmware interface
  API).

  Spectre v1 mitigation:

   - back-end version of array_index_mask_nospec()

   - masking of the syscall number to restrict speculation through the
     syscall table

   - masking of __user pointers prior to deference in uaccess routines

  Spectre v2 mitigation update:

   - using the new firmware SMC calling convention specification update

   - removing the current PSCI GET_VERSION firmware call mitigation as
     vendors are deploying new SMCCC-capable firmware

   - additional branch predictor hardening for synchronous exceptions
     and interrupts while in user mode

  Meltdown v3 mitigation update:

    - Cavium Thunder X is unaffected but a hardware erratum gets in the
      way. The kernel now starts with the page tables mapped as global
      and switches to non-global if kpti needs to be enabled.

  Other:

   - Theoretical trylock bug fixed"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits)
  arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
  arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
  arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
  arm/arm64: smccc: Make function identifiers an unsigned quantity
  firmware/psci: Expose SMCCC version through psci_ops
  firmware/psci: Expose PSCI conduit
  arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
  arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
  arm/arm64: KVM: Turn kvm_psci_version into a static inline
  arm/arm64: KVM: Advertise SMCCC v1.1
  arm/arm64: KVM: Implement PSCI 1.0 support
  arm/arm64: KVM: Add smccc accessors to PSCI code
  arm/arm64: KVM: Add PSCI_VERSION helper
  arm/arm64: KVM: Consolidate the PSCI include files
  arm64: KVM: Increment PC after handling an SMC trap
  arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  arm64: entry: Apply BP hardening for suspicious interrupts from EL0
  arm64: entry: Apply BP hardening for high-priority synchronous exceptions
  arm64: futex: Mask __user pointers prior to dereference
  ...
2018-02-08 10:44:25 -08:00
Marc Zyngier
6167ec5c91 arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.

If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:54:05 +00:00
Marc Zyngier
a4097b3511 arm/arm64: KVM: Turn kvm_psci_version into a static inline
We're about to need kvm_psci_version in HYP too. So let's turn it
into a static inline, and pass the kvm structure as a second
parameter (so that HYP can do a kern_hyp_va on it).

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:54:03 +00:00
Marc Zyngier
09e6be12ef arm/arm64: KVM: Advertise SMCCC v1.1
The new SMC Calling Convention (v1.1) allows for a reduced overhead
when calling into the firmware, and provides a new feature discovery
mechanism.

Make it visible to KVM guests.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:54:01 +00:00
Marc Zyngier
58e0b2239a arm/arm64: KVM: Implement PSCI 1.0 support
PSCI 1.0 can be trivially implemented by providing the FEATURES
call on top of PSCI 0.2 and returning 1.0 as the PSCI version.

We happily ignore everything else, as they are either optional or
are clarifications that do not require any additional change.

PSCI 1.0 is now the default until we decide to add a userspace
selection API.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:59 +00:00
Marc Zyngier
84684fecd7 arm/arm64: KVM: Add smccc accessors to PSCI code
Instead of open coding the accesses to the various registers,
let's add explicit SMCCC accessors.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:57 +00:00
Marc Zyngier
d0a144f12a arm/arm64: KVM: Add PSCI_VERSION helper
As we're about to trigger a PSCI version explosion, it doesn't
hurt to introduce a PSCI_VERSION helper that is going to be
used everywhere.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:56 +00:00
Marc Zyngier
1a2fb94e6a arm/arm64: KVM: Consolidate the PSCI include files
As we're about to update the PSCI support, and because I'm lazy,
let's move the PSCI include file to include/kvm so that both
ARM architectures can find it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:54 +00:00
Radim Krčmář
7bf14c28ee Merge branch 'x86/hyperv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Topic branch for stable KVM clockource under Hyper-V.

Thanks to Christoffer Dall for resolving the ARM conflict.
2018-02-01 15:04:17 +01:00
Radim Krčmář
e53175395d KVM/ARM Changes for v4.16
The changes for this version include icache invalidation optimizations
 (improving VM startup time), support for forwarded level-triggered
 interrupts (improved performance for timers and passthrough platform
 devices), a small fix for power-management notifiers, and some cosmetic
 changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJacYnLAAoJEEtpOizt6ddyhHUH/1f/AHC4t6sNJJ4LAbWAjuve
 77scB7vsVVpZqHUeA1i8d0vrWJQeqg8CEQ+iP/OVLC+bWVX0yeBtrt/pMJA8sXrV
 Jbo5kQu3NyrRUAew83rcvoqsVVf67BB/NohL7C7sQDvNp2bg2cgzxhpgNJUuUXQC
 WcEOhqstWo6NYJ7xYz5f+utzYQRO0YfnIzoTsoaNgDHSw/V37Ny9O0tYqTQGNYUm
 zZ+cRo3nFRFywbmHhIHvXkxmS0lGdACQWTzyd+qDsgiPJ463vRT6Fc035SSuqX9x
 MmS87cBdt1IK9yi0Firqhuy6CGgHZmnagHizE0arMv72Pcv/ucrkCDRqLQDhSMY=
 =bZLm
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm

KVM/ARM Changes for v4.16

The changes for this version include icache invalidation optimizations
(improving VM startup time), support for forwarded level-triggered
interrupts (improved performance for timers and passthrough platform
devices), a small fix for power-management notifiers, and some cosmetic
changes.
2018-01-31 13:34:41 +01:00
Christoffer Dall
cd15d2050c KVM: arm/arm64: Fixup userspace irqchip static key optimization
When I introduced a static key to avoid work in the critical path for
userspace irqchips which is very rarely used, I accidentally messed up
my logic and used && where I should have used ||, because the point was
to short-circuit the evaluation in case userspace irqchips weren't even
in use.

This fixes an issue when running in-kernel irqchip VMs alongside
userspace irqchip VMs.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Fixes: c44c232ee2d3 ("KVM: arm/arm64: Avoid work when userspace iqchips are not used")
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-31 10:10:49 +01:00
Christoffer Dall
f1d7231ced KVM: arm/arm64: Fix userspace_irqchip_in_use counting
We were not decrementing the static key count in the right location.
kvm_arch_vcpu_destroy() is only called to clean up after a failed
VCPU create attempt, whereas kvm_arch_vcpu_free() is called on teardown
of the VM as well.  Move the static key decrement call to
kvm_arch_vcpu_free().

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-31 10:10:38 +01:00
Christoffer Dall
13e59ece5b KVM: arm/arm64: Fix incorrect timer_is_pending logic
After the recently introduced support for level-triggered mapped
interrupt, I accidentally left the VCPU thread busily going back and
forward between the guest and the hypervisor whenever the guest was
blocking, because I would always incorrectly report that a timer
interrupt was pending.

This is because the timer->irq.level field is not valid for mapped
interrupts, where we offload the level state to the hardware, and as a
result this field is always true.

Luckily the problem can be relatively easily solved by not checking the
cached signal state of either timer in kvm_timer_should_fire() but
instead compute the timer state on the fly, which we do already if the
cached signal state wasn't high.  In fact, the only reason for checking
the cached signal state was a tiny optimization which would only be
potentially faster when the polling loop detects a pending timer
interrupt, which is quite unlikely.

Instead of duplicating the logic from kvm_arch_timer_handler(), we
enlighten kvm_timer_should_fire() to report something valid when the
timer state is loaded onto the hardware.  We can then call this from
kvm_arch_timer_handler() as well and avoid the call to
__timer_snapshot_state() in kvm_arch_timer_get_input_level().

Reported-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-31 10:10:17 +01:00
Linus Torvalds
0aebc6a440 arm64 updates for 4.16:
- Security mitigations:
   - variant 2: invalidating the branch predictor with a call to secure firmware
   - variant 3: implementing KPTI for arm64
 
 - 52-bit physical address support for arm64 (ARMv8.2)
 
 - arm64 support for RAS (firmware first only) and SDEI (software
   delegated exception interface; allows firmware to inject a RAS error
   into the OS)
 
 - Perf support for the ARM DynamIQ Shared Unit PMU
 
 - CPUID and HWCAP bits updated for new floating point multiplication
   instructions in ARMv8.4
 
 - Removing some virtual memory layout printks during boot
 
 - Fix initial page table creation to cope with larger than 32M kernel
   images when 16K pages are enabled
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlpwxDMACgkQa9axLQDI
 XvF55BAAniMpxPXnYNfv6l7/4O8eKo1lJIaG1wbej4JRZ/rT3K4Z3OBXW1dKHO8d
 /PTbVmZ90IqIGROkoDrE+6xyjjn9yK3uuW4ytN2zQkBa8VFaHAnHlX+zKQcuwy9f
 yxwiHk+C7vK5JR7mpXTazjRknsUv1MPtlTt7DQrSdq0KRDJVDNFC+grmbew2rz0X
 cjQDqZqgzuFyrKxdiQVjDmc3zH9NsNBhDo0hlGHf2jK6bGJsAPtI8M2JcLrK8ITG
 Ye/dD7BJp1mWD8ff0BPaMxu24qfAMNLH8f2dpTa986/H78irVz7i/t5HG0/1+5Jh
 EE4OFRTKZ59Qgyo1zWcaJvdp8YjiaX/L4PWJg8CxM5OhP9dIac9ydcFQfWzpKpUs
 xyZfmK6XliGFReAkVOOf5tEqFUDhMtsqhzPYmbmU1lp61wmSYIZ8CTenpWWCJSRO
 NOGyG1X2uFBvP69+iPNlfTGz1r7tg1URY5iO8fUEIhY8LrgyORkiqw4OvPEgnMXP
 Ngy+dXhyvnps2AAWbSX0O4puRlTgEYLT5KaMLzH/+gWsXATT0rzUCD/aOwUQq/Y7
 SWXZHkb3jpmOZZnzZsLL2MNzEIPCFBwSUE9fSv4dA9d/N6tUmlmZALJjHkfzCDpj
 +mPsSmAMTj72kUYzm0b5GCtOu/iQ2kDWOZjOM1m4+v/B+f7JoEE=
 =iEjP
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The main theme of this pull request is security covering variants 2
  and 3 for arm64. I expect to send additional patches next week
  covering an improved firmware interface (requires firmware changes)
  for variant 2 and way for KPTI to be disabled on unaffected CPUs
  (Cavium's ThunderX doesn't work properly with KPTI enabled because of
  a hardware erratum).

  Summary:

   - Security mitigations:
      - variant 2: invalidate the branch predictor with a call to
        secure firmware
      - variant 3: implement KPTI for arm64

   - 52-bit physical address support for arm64 (ARMv8.2)

   - arm64 support for RAS (firmware first only) and SDEI (software
     delegated exception interface; allows firmware to inject a RAS
     error into the OS)

   - perf support for the ARM DynamIQ Shared Unit PMU

   - CPUID and HWCAP bits updated for new floating point multiplication
     instructions in ARMv8.4

   - remove some virtual memory layout printks during boot

   - fix initial page table creation to cope with larger than 32M kernel
     images when 16K pages are enabled"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits)
  arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm
  arm64: Turn on KPTI only on CPUs that need it
  arm64: Branch predictor hardening for Cavium ThunderX2
  arm64: Run enable method for errata work arounds on late CPUs
  arm64: Move BP hardening to check_and_switch_context
  arm64: mm: ignore memory above supported physical address size
  arm64: kpti: Fix the interaction between ASID switching and software PAN
  KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA
  KVM: arm64: Handle RAS SErrors from EL2 on guest exit
  KVM: arm64: Handle RAS SErrors from EL1 on guest exit
  KVM: arm64: Save ESR_EL2 on guest SError
  KVM: arm64: Save/Restore guest DISR_EL1
  KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
  KVM: arm/arm64: mask/unmask daif around VHE guests
  arm64: kernel: Prepare for a DISR user
  arm64: Unconditionally enable IESB on exception entry/return for firmware-first
  arm64: kernel: Survive corrected RAS errors notified by SError
  arm64: cpufeature: Detect CPU RAS Extentions
  arm64: sysreg: Move to use definitions for all the SCTLR bits
  arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
  ...
2018-01-30 13:57:43 -08:00
James Morse
58d6b15e9d KVM: arm/arm64: Handle CPU_PM_ENTER_FAILED
cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if
there is a failure: CPU_PM_ENTER_FAILED.

When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will
return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED,
KVM does nothing, leaving the CPU running with the hyp-stub, at odds
with kvm_arm_hardware_enabled.

Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads
KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER
never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset()
to make sure the hyp-stub is loaded before reloading KVM.

Fixes: 67f6919766 ("arm64: kvm: allows kvm cpu hotplug")
Cc: <stable@vger.kernel.org> # v4.7+
CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-23 16:47:15 +01:00
Radim Krčmář
f44efa5aea KVM/ARM Fixes for v4.15, Round 3 (v2)
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
 the contigious hint for hugetlbfs; supporting an alternative GICv4 init
 sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaXi9yAAoJEEtpOizt6ddymb4H/R6Q7uPSNY31d/wcMHg8qYS7
 foDW76r7mKliRVmCJoq9oqLqC7BLpQszfZ8dFjPSfdLA4xVMsuZ3GG3S7jlghiuN
 9+rZK+ZZX8g5uQNsqVITC3WrXmozBj+VEs/uH2Z1pu0g+siPTp7J2iv5+A5tvM3A
 NCySqgEjefQyy7Zs2r7TuvM+E3p9MY7jZih9E2o8mn2TQipVKrcnHRN3IjNNtI4u
 C17x70OQ1ZY7bwnmPnuPPqnX3H1fQ6+UgwtfDCu3KP7DAFVjqAz03X6wbf1nCLAB
 zzKok/SnIFWpr56JUSOzMpHWG8sOFscdVXxW97a2Ova0ur0rHW2iPiucTb8jOjQ=
 =gJL6
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-for-v4.15-3-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm

KVM/ARM Fixes for v4.15, Round 3 (v2)

Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
2018-01-17 14:59:27 +01:00
James Morse
3368bd8097 KVM: arm64: Handle RAS SErrors from EL1 on guest exit
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.

There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.

For SError that interrupt a guest and are routed to EL2 the existing
behaviour is to inject an impdef SError into the guest.

Add code to handle RAS SError based on the ESR. For uncontained and
uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
errors compromise the host too. All other error types are contained:
For the fatal errors the vCPU can't make progress, so we inject a virtual
SError. We ignore contained errors where we can make progress as if
we're lucky, we may not hit them again.

If only some of the CPUs support RAS the guest will see the cpufeature
sanitised version of the id registers, but we may still take RAS SError
on this CPU. Move the SError handling out of handle_exit() into a new
handler that runs before we can be preempted. This allows us to use
this_cpu_has_cap(), via arm64_is_ras_serror().

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:09:13 +00:00
James Morse
4f5abad9e8 KVM: arm/arm64: mask/unmask daif around VHE guests
Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.

With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.

Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 15:08:24 +00:00
Kristina Martsenko
98732d1b18 KVM: arm/arm64: fix HYP ID map extension to 52 bits
Commit fa2a8445b1 incorrectly masks the index of the HYP ID map pgd
entry, causing a non-VHE kernel to hang during boot. This happens when
VA_BITS=48 and the ID map text is in 52-bit physical memory. In this
case we don't need an extra table level but need more entries in the
top-level table, so we need to map into hyp_pgd and need to use
__kvm_idmap_ptrs_per_pgd to mask in the extra bits. However,
__create_hyp_mappings currently masks by PTRS_PER_PGD instead.

Fix it so that we always use __kvm_idmap_ptrs_per_pgd for the HYP ID
map. This ensures that we use the larger mask for the top-level ID map
table when it has more entries. In all other cases, PTRS_PER_PGD is used
as normal.

Fixes: fa2a8445b1 ("arm64: allow ID map to be extended to 52 bits")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-15 18:20:26 +00:00
James Morse
36989e7fd3 KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.

Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-13 10:44:14 +00:00
Christoffer Dall
f8f85dc00b KVM: arm64: Fix GICv4 init when called from vgic_its_create
Commit 3d1ad640f8 ("KVM: arm/arm64: Fix GICv4 ITS initialization
issues") moved the vgic_supports_direct_msis() check in vgic_v4_init().
However when vgic_v4_init is called from vgic_its_create(), the has_its
field is not yet set. Hence vgic_supports_direct_msis returns false and
vgic_v4_init does nothing.

The gic/its init sequence is a bit messy, so let's be specific about the
prerequisite checks in the various call paths instead of relying on a
common wrapper.

Fixes: 3d1ad640f8 ("KVM: arm/arm64: Fix GICv4 ITS initialization issues")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-12 11:40:21 +01:00
Punit Agrawal
c507babf10 KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
KVM only supports PMD hugepages at stage 2 but doesn't actually check
that the provided hugepage memory pagesize is PMD_SIZE before populating
stage 2 entries.

In cases where the backing hugepage size is smaller than PMD_SIZE (such
as when using contiguous hugepages), KVM can end up creating stage 2
mappings that extend beyond the supplied memory.

Fix this by checking for the pagesize of userspace vma before creating
PMD hugepage at stage 2.

Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-11 15:25:57 +01:00
Marc Zyngier
6840bdd73d arm64: KVM: Use per-CPU vector when BP hardening is enabled
Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 18:46:56 +00:00
Marc Zyngier
17ab9d57de KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartions
The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:46 +01:00
Marc Zyngier
7a3796d2ef KVM: arm/arm64: Preserve Exec permission across R/W permission faults
So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:

	X -> RO -> ROX -> RW -> RWX

Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:

	X -> ROX -> RWX

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:46 +01:00
Marc Zyngier
a9c0e12ebe KVM: arm/arm64: Only clean the dcache on translation fault
The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:45 +01:00
Marc Zyngier
d0e22b4ac3 KVM: arm/arm64: Limit icache invalidation to prefetch aborts
We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).

But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.

As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:45 +01:00
Marc Zyngier
a15f693935 KVM: arm/arm64: Split dcache/icache flushing
As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.

Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:43 +01:00
Marc Zyngier
d68119864e KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
opposite inclusion impossible. Let's start with breaking that
useless dependency.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-08 15:20:43 +01:00
Christoffer Dall
61bbe38027 KVM: arm/arm64: Avoid work when userspace iqchips are not used
We currently check if the VM has a userspace irqchip in several places
along the critical path, and if so, we do some work which is only
required for having an irqchip in userspace.  This is unfortunate, as we
could avoid doing any work entirely, if we didn't have to support
irqchip in userspace.

Realizing the userspace irqchip on ARM is mostly a developer or hobby
feature, and is unlikely to be used in servers or other scenarios where
performance is a priority, we can use a refcounted static key to only
check the irqchip configuration when we have at least one VM that uses
an irqchip in userspace.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
4c60e360d6 KVM: arm/arm64: Provide a get_input_level for the arch timer
The VGIC can now support the life-cycle of mapped level-triggered
interrupts, and we no longer have to read back the timer state on every
exit from the VM if we had an asserted timer interrupt signal, because
the VGIC already knows if we hit the unlikely case where the guest
disables the timer without ACKing the virtual timer interrupt.

This means we rework a bit of the code to factor out the functionality
to snapshot the timer state from vtimer_save_state(), and we can reuse
this functionality in the sync path when we have an irqchip in
userspace, and also to support our implementation of the
get_input_level() function for the timer.

This change also means that we can no longer rely on the timer's view of
the interrupt line to set the active state, because we no longer
maintain this state for mapped interrupts when exiting from the guest.
Instead, we only set the active state if the virtual interrupt is
active, and otherwise we simply let the timer fire again and raise the
virtual interrupt from the ISR.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
df635c5b18 KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQs
For mapped IRQs (with the HW bit set in the LR) we have to follow some
rules of the architecture.  One of these rules is that VM must not be
allowed to deactivate a virtual interrupt with the HW bit set unless the
physical interrupt is also active.

This works fine when injecting mapped interrupts, because we leave it up
to the injector to either set EOImode==1 or manually set the active
state of the physical interrupt.

However, the guest can set virtual interrupt to be pending or active by
writing to the virtual distributor, which could lead to deactivating a
virtual interrupt with the HW bit set without the physical interrupt
being active.

We could set the physical interrupt to active whenever we are about to
enter the VM with a HW interrupt either pending or active, but that
would be really slow, especially on GICv2.  So we take the long way
around and do the hard work when needed, which is expected to be
extremely rare.

When the VM sets the pending state for a HW interrupt on the virtual
distributor we set the active state on the physical distributor, because
the virtual interrupt can become active and then the guest can
deactivate it.

When the VM clears the pending state we also clear it on the physical
side, because the injector might otherwise raise the interrupt.  We also
clear the physical active state when the virtual interrupt is not
active, since otherwise a SPEND/CPEND sequence from the guest would
prevent signaling of future interrupts.

Changing the state of mapped interrupts from userspace is not supported,
and it's expected that userspace unmaps devices from VFIO before
attempting to set the interrupt state, because the interrupt state is
driven by hardware.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
b6909a659f KVM: arm/arm64: Support a vgic interrupt line level sample function
The GIC sometimes need to sample the physical line of a mapped
interrupt.  As we know this to be notoriously slow, provide a callback
function for devices (such as the timer) which can do this much faster
than talking to the distributor, for example by comparing a few
in-memory values.  Fall back to the good old method of poking the
physical GIC if no callback is provided.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
e40cc57bac KVM: arm/arm64: vgic: Support level-triggered mapped interrupts
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.

One way to solve this problem is to side-step the logic of the VGIC and
special case the validation in the injection path, but it has the
unfortunate drawback of having to peak into the physical GIC state
whenever we want to know if the interrupt is pending on the virtual
distributor.

Instead, we can maintain the current semantics of a level triggered
interrupt by sort of treating it as an edge-triggered interrupt,
following from the fact that we only observe an asserting edge.  This
requires us to be a bit careful when populating the LRs and when folding
the state back in though:

 * We lower the line level when populating the LR, so that when
   subsequently observing an asserting edge, the VGIC will do the right
   thing.

 * If the guest never acked the interrupt while running (for example if
   it had masked interrupts at the CPU level while running), we have
   to preserve the pending state of the LR and move it back to the
   line_level field of the struct irq when folding LR state.

   If the guest never acked the interrupt while running, but changed the
   device state and lowered the line (again with interrupts masked) then
   we need to observe this change in the line_level.

   Both of the above situations are solved by sampling the physical line
   and set the line level when folding the LR back.

 * Finally, if the guest never acked the interrupt while running and
   sampling the line reveals that the device state has changed and the
   line has been lowered, we must clear the physical active state, since
   we will otherwise never be told when the interrupt becomes asserted
   again.

This has the added benefit of making the timer optimization patches
(https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
bit simpler, because the timer code doesn't have to clear the active
state on the sync anymore.  It also potentially improves the performance
of the timer implementation because the GIC knows the state or the LR
and only needs to clear the
active state when the pending bit in the LR is still set, where the
timer has to always clear it when returning from running the guest with
an injected timer interrupt.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
70450a9fbe KVM: arm/arm64: Don't cache the timer IRQ level
The timer logic was designed after a strict idea of modeling an
interrupt line level in software, meaning that only transitions in the
level need to be reported to the VGIC.  This works well for the timer,
because the arch timer code is in complete control of the device and can
track the transitions of the line.

However, as we are about to support using the HW bit in the VGIC not
just for the timer, but also for VFIO which cannot track transitions of
the interrupt line, we have to decide on an interface between the GIC
and other subsystems for level triggered mapped interrupts, which both
the timer and VFIO can use.

VFIO only sees an asserting transition of the physical interrupt line,
and tells the VGIC when that happens.  That means that part of the
interrupt flow is offloaded to the hardware.

To use the same interface for VFIO devices and the timer, we therefore
have to change the timer (we cannot change VFIO because it doesn't know
the details of the device it is assigning to a VM).

Luckily, changing the timer is simple, we just need to stop 'caching'
the line level, but instead let the VGIC know the state of the timer
every time there is a potential change in the line level, and when the
line level should be asserted from the timer ISR.  The VGIC can ignore
extra notifications using its validate mechanism.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:46 +01:00
Christoffer Dall
6c1b7521f4 KVM: arm/arm64: Factor out functionality to get vgic mmio requester_vcpu
We are about to distinguish between userspace accesses and mmio traps
for a number of the mmio handlers.  When the requester vcpu is NULL, it
means we are handling a userspace access.

Factor out the functionality to get the request vcpu into its own
function, mostly so we have a common place to document the semantics of
the return value.

Also take the chance to move the functionality outside of holding a
spinlock and instead explicitly disable and enable preemption.  This
supports PREEMPT_RT kernels as well.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:45 +01:00
Christoffer Dall
5a24575032 KVM: arm/arm64: Remove redundant preemptible checks
The __this_cpu_read() and __this_cpu_write() functions already implement
checks for the required preemption levels when using
CONFIG_DEBUG_PREEMPT which gives you nice error messages and such.
Therefore there is no need to explicitly check this using a BUG_ON() in
the code (which we don't do for other uses of per cpu variables either).

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:45 +01:00
Vasyl Gomonovych
4404b336cf KVM: arm: Use PTR_ERR_OR_ZERO()
Fix ptr_ret.cocci warnings:
virt/kvm/arm/vgic/vgic-its.c:971:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02 10:05:45 +01:00
Kristina Martsenko
fa2a8445b1 arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.

This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.

One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.

Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-22 17:37:33 +00:00
Kristina Martsenko
529c4b05a3 arm64: handle 52-bit addresses in TTBR
The top 4 bits of a 52-bit physical address are positioned at bits 2..5
in the TTBR registers. Introduce a couple of macros to move the bits
there, and change all TTBR writers to use them.

Leave TTBR0 PAN code unchanged, to avoid complicating it. A system with
52-bit PA will have PAN anyway (because it's ARMv8.1 or later), and a
system without 52-bit PA can only use up to 48-bit PAs. A later patch in
this series will add a kconfig dependency to ensure PAN is configured.

In addition, when using 52-bit PA there is a special alignment
requirement on the top-level table. We don't currently have any VA_BITS
configuration that would violate the requirement, but one could be added
in the future, so add a compile-time BUG_ON to check for it.

Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: added TTBR_BADD_MASK_52 comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-22 17:35:21 +00:00
Paolo Bonzini
43aabca38a KVM/ARM Fixes for v4.15, Round 2
Fixes:
  - A bug in our handling of SPE state for non-vhe systems
  - A bug that causes hyp unmapping to go off limits and crash the system on
    shutdown
  - Three timer fixes that were introduced as part of the timer optimizations
    for v4.15
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaN5C1AAoJEEtpOizt6ddyodkH/jN1lquFVdYJBlEO6NXiumEk
 GBH6x6CmuGyiUL3J0ffx5U51x0NN2jE89TpH5d1dsnQg77CCjTCxHtQ9suHne3n1
 5/r0BzHZhaCbnbY0f7+E4EL0UOTpiAwUIqin1ufLPjs4XywcFyiLa7xiWkQkDmyr
 WXKOdppTc4j/FUyqb1fQBmYY8pENR5jjfgdaeZ6C6o7e6aksXgrPWqXhV/6OSRLd
 MOcxA06QfwTWy+MT1x4yo1hzCTjOEvvQXT2Va09moiNxT7hVWWvO/kwJVQL+YpWW
 di7t4CLCvGYUsxM5t8fHHV7X+dfd2nvpJA46TWggPye7yMYkTYXFQu1LHwPIdDU=
 =c5Kt
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-fixes-for-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Fixes for v4.15, Round 2

Fixes:
 - A bug in our handling of SPE state for non-vhe systems
 - A bug that causes hyp unmapping to go off limits and crash the system on
   shutdown
 - Three timer fixes that were introduced as part of the timer optimizations
   for v4.15
2017-12-18 12:57:43 +01:00
Wanpeng Li
e39d200fa5 KVM: Fix stack-out-of-bounds read in write_mmio
Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-18 12:57:01 +01:00
Christoffer Dall
0eb7c33cad KVM: arm/arm64: Fix timer enable flow
When enabling the timer on the first run, we fail to ever restore the
state and mark it as loaded.  That means, that in the initial entry to
the VCPU ioctl, unless we exit to userspace for some reason such as a
pending signal, if the guest programs a timer and blocks, we will wait
forever, because we never read back the hardware state (the loaded flag
is not set), and so we think the timer is disabled, and we never
schedule a background soft timer.

The end result?  The VCPU blocks forever, and the only solution is to
kill the thread.

Fixes: 4a2c4da125 ("arm/arm64: KVM: Load the timer state when enabling the timer")
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:24 +01:00
Christoffer Dall
36e5cfd410 KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
The recent timer rework was assuming that once the timer was disabled,
we should no longer see any interrupts from the timer.  This assumption
turns out to not be true, and instead we have to handle the case when
the timer ISR runs even after the timer has been disabled.

This requires a couple of changes:

First, we should never overwrite the cached guest state of the timer
control register when the ISR runs, because KVM may have disabled its
timers when doing vcpu_put(), even though the guest still had the timer
enabled.

Second, we shouldn't assume that the timer is actually firing just
because we see an interrupt, but we should check the actual state of the
timer in the timer control register to understand if the hardware timer
is really firing or not.

We also add an ISB to vtimer_save_state() to ensure the timer is
actually disabled once we enable interrupts, which should clarify the
intention of the implementation, and reduce the risk of unwanted
interrupts.

Fixes: b103cc3f10 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Jia He <hejianet@gmail.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:24 +01:00
Marc Zyngier
f384dcfe4d KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
If we don't have a usable GIC, do not try to set the vcpu affinity
as this is guaranteed to fail.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:23 +01:00
Marc Zyngier
7839c672e5 KVM: arm/arm64: Fix HYP unmapping going off limits
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).

The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.

The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.

The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.

Cc: stable@vger.kernel.org
Reported-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-18 10:53:23 +01:00
Christoffer Dall
9b062471e5 KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl
Move the calls to vcpu_load() and vcpu_put() in to the architecture
specific implementations of kvm_arch_vcpu_ioctl() which dispatches
further architecture-specific ioctls on to other functions.

Some architectures support asynchronous vcpu ioctls which cannot call
vcpu_load() or take the vcpu->mutex, because that would prevent
concurrent execution with a running VCPU, which is the intended purpose
of these ioctls, for example because they inject interrupts.

We repeat the separate checks for these specifics in the architecture
code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and
calling vcpu_load for these ioctls.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:58 +01:00
Christoffer Dall
e83dff5edf KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_mpstate
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_mpstate().

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:54 +01:00
Christoffer Dall
fd2325612c KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_mpstate
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_mpstate().

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14 09:26:54 +01:00