Commit Graph

529 Commits

Author SHA1 Message Date
Marc Zyngier
386627d825 ARM: KVM: Gracefully handle hyp-stubs being restored from under our feet
Should kvm_reboot() be invoked while guest is running, an IPI
wil be issued, forcing the guest to exit and HYP being reset to
the stubs. We will then try to reenter the guest, only to get
an error (HVC_STUB_ERR).

This patch allows this case to be gracefully handled by exiting
the run loop.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:31 -07:00
Marc Zyngier
4d5f9c14fb ARM: KVM: Implement HVC_SOFT_RESTART in the init code
Another missing stub hypercall is HVC_SOFT_RESTART. It turns out
that it is pretty easy to implement in terms of HVC_RESET_VECTORS
(since it needs to turn the MMU off).

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:30 -07:00
Marc Zyngier
a92ce8f6ab ARM: KVM: Convert __cpu_reset_hyp_mode to using __hyp_reset_vectors
We are now able to use the hyp stub to reset HYP mode. Time to
kiss __kvm_hyp_reset goodbye, and use __hyp_reset_vectors.

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:30 -07:00
Marc Zyngier
6bebcecb6c ARM: KVM: Allow the main HYP code to use the init hyp stub implementation
We now have a full hyp-stub implementation in the KVM init code,
but the main KVM code only supports HVC_GET_VECTORS, which is not
enough.

Instead of reinventing the wheel, let's reuse the init implementation
by branching to the idmap page when called with a hyp-stub hypercall.

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:29 -07:00
Marc Zyngier
5d224aa7d4 ARM: KVM: Implement HVC_GET_VECTORS in the init code
Now that we have an infrastructure to handle hypercalls in the KVM
init code, let's implement HVC_GET_VECTORS there.

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:29 -07:00
Marc Zyngier
bc845e4fbb ARM: KVM: Implement HVC_RESET_VECTORS stub hypercall in the init code
In order to restore HYP mode to its original condition, KVM currently
implements __kvm_hyp_reset(). As we're moving towards a hyp-stub
defined API, it becomes necessary to implement HVC_RESET_VECTORS.

This patch adds the HVC_RESET_VECTORS hypercall to the KVM init
code, which so far lacked any form of hypercall support.

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:28 -07:00
Marc Zyngier
467f97b72b ARM: KVM: Convert KVM to use HVC_GET_VECTORS
The conversion of the HYP stub ABI to something similar to arm64
left the KVM code broken, as it doesn't know about the new
stub numbering. Let's move the various #defines to virt.h, and
let KVM use HVC_GET_VECTORS.

Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:49:24 -07:00
Marc Zyngier
9d0d4d34d9 arm: KVM: Treat CP15 accessors returning false as successful
Instead of considering that a CP15 accessor has failed when
returning false, let's consider that it is *always* successful
(after all, we won't stand for an incomplete emulation).

The return value now simply indicates whether we should skip
the instruction (because it has now been emulated), or if we
should leave the PC alone if the emulation has injected an
exception.

Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-04-09 07:49:17 -07:00
Marc Zyngier
b1d4cb6983 arm: KVM: Make unexpected register accesses inject an undef
Reads from write-only system registers are generally confined to
EL1 and not propagated to EL2 (that's what the architecture
mantates). In order to be sure that we have a sane behaviour
even in the unlikely event that we have a broken system, we still
handle it in KVM. Same goes for write to RO registers.

In that case, let's inject an undef into the guest.

Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-04-09 07:49:16 -07:00
Christoffer Dall
328e566479 KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put
We don't have to save/restore the VMCR on every entry to/from the guest,
since on GICv2 we can access the control interface from EL1 and on VHE
systems with GICv3 we can access the control interface from KVM running
in EL2.

GICv3 systems without VHE becomes the rare case, which has to
save/restore the register on each round trip.

Note that userspace accesses may see out-of-date values if the VCPU is
running while accessing the VGIC state via the KVM device API, but this
is already the case and it is up to userspace to quiesce the CPUs before
reading the CPU registers from the GIC for an up-to-date view.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:45:22 -07:00
Suzuki K Poulose
056aad67f8 kvm: arm/arm64: Rework gpa callback handlers
In order to perform an operation on a gpa range, we currently iterate
over each page in a user memory slot for the given range. This is
inefficient while dealing with a big range (e.g, a VMA), especially
while unmaping a range. At present, with stage2 unmap on a range with
a hugepage backed region, we clear the PMD when we unmap the first
page in the loop. The remaining iterations simply traverse the page table
down to the PMD level only to see that nothing is in there.

This patch reworks the code to invoke the callback handlers on the
biggest range possible within the memory slot to to reduce the number of
times the handler is called.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09 07:42:50 -07:00
Linu Cherian
7af4df8579 KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64
Return KVM_USER_MEM_SLOTS for userspace capability query on
NR_MEMSLOTS.

Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Linu Cherian <linu.cherian@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-09 09:13:39 +00:00
Mark Rutland
f050fe7a91 arm: KVM: Survive unknown traps from guests
Currently we BUG() if we see a HSR.EC value we don't recognise. As
configurable disables/enables are added to the architecture (controlled
by RES1/RES0 bits respectively), with associated synchronous exceptions,
it may be possible for a guest to trigger exceptions with classes that
we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently
unallocated HSR EC encodings are reserved, and per ARM DDI
0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c
are reserved for future use with synchronous exceptions, and EC values
within the range 0x2d - 0x3f may be used for either synchronous or
asynchronous exceptions.

The patch makes KVM handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the host dmesg. We could later improve on this with with a new (opt-in)
exit to the host userspace.

Cc: Dave Martin <dave.martin@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:50:45 +00:00
Paolo Bonzini
460df4c1fc KVM: race-free exit from KVM_RUN without POSIX signals
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:

          VCPU thread                     service thread
   --------------------------------------------------------------
        check flag
                                          set flag
                                          raise signal
        (signal handler does nothing)
        KVM_RUN

However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).

As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:

          VCPU thread                     service thread
   --------------------------------------------------------------
                                          raise signal
        signal handler
          set run->immediate_exit
        KVM_RUN
          check run->immediate_exit

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-17 12:27:37 +01:00
Jintack Lim
fb280e9757 KVM: arm/arm64: Set a background timer to the earliest timer expiration
When scheduling a background timer, consider both of the virtual and
physical timer and pick the earliest expiration time.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-08 15:13:35 +00:00
Jintack Lim
a91d18551e KVM: arm/arm64: Initialize the emulated EL1 physical timer
Initialize the emulated EL1 physical timer with the default irq number.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-08 15:13:34 +00:00
Jintack Lim
9171fa2e09 KVM: arm/arm64: Decouple kvm timer functions from virtual timer
Now that we have a separate structure for timer context, make functions
generic so that they can work with any timer context, not just the
virtual timer context.  This does not change the virtual timer
functionality.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-08 15:13:33 +00:00
Jintack Lim
90de943a43 KVM: arm/arm64: Move cntvoff to each timer context
Make cntvoff per each timer context. This is helpful to abstract kvm
timer functions to work with timer context without considering timer
types (e.g. physical timer or virtual timer).

This also would pave the way for ever doing adjustments of the cntvoff
on a per-CPU basis if that should ever make sense.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-08 15:13:33 +00:00
Marc Zyngier
9d93dc1c96 arm/arm64: KVM: Get rid of KVM_MEMSLOT_INCOHERENT
KVM_MEMSLOT_INCOHERENT is not used anymore, as we've killed its
only use in the arm/arm64 MMU code. Let's remove the last artifacts.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30 13:47:38 +00:00
Marc Zyngier
13b7756cec arm/arm64: KVM: Stop propagating cacheability status of a faulted page
Now that we unconditionally flush newly mapped pages to the PoC,
there is no need to care about the "uncached" status of individual
pages - they must all be visible all the way down.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30 13:47:38 +00:00
Vijaya Kumar K
d017d7b0bd KVM: arm/arm64: vgic: Implement VGICv3 CPU interface access
VGICv3 CPU interface registers are accessed using
KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed
as 64-bit. The cpu MPIDR value is passed along with register id.
It is used to identify the cpu for registers access.

The VM that supports SEIs expect it on destination machine to handle
guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility.
Similarly, VM that supports Affinity Level 3 that is required for AArch64
mode, is required to be supported on destination machine. Hence checked
for ICC_CTLR_EL1.A3V compatibility.

The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC
CPU registers for AArch64.

For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but
APIs are not implemented.

Updated arch/arm/include/uapi/asm/kvm.h with new definitions
required to compile for AArch32.

The version of VGIC v3 specification is defined here
Documentation/virtual/kvm/devices/arm-vgic-v3.txt

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30 13:47:25 +00:00
Christoffer Dall
10f92c4c53 KVM: arm/arm64: vgic: Add debugfs vgic-state file
Add a file to debugfs to read the in-kernel state of the vgic.  We don't
do any locking of the entire VGIC state while traversing all the IRQs,
so if the VM is running the user/developer may not see a quiesced state,
but should take care to pause the VM using facilities in user space for
that purpose.

We also don't support LPIs yet, but they can be added easily if needed.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-01-25 13:50:03 +01:00
Jintack Lim
488f94d721 KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
Current KVM world switch code is unintentionally setting wrong bits to
CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
timer.  Bit positions of CNTHCTL_EL2 are changing depending on
HCR_EL2.E2H bit.  EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
not set, but they are 11th and 10th bits respectively when E2H is set.

In fact, on VHE we only need to set those bits once, not for every world
switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
1, which makes those bits have no effect for the host kernel execution.
So we just set those bits once for guests, and that's it.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13 11:19:25 +00:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Vladimir Murzin
2988509dd8 ARM: KVM: Support vGICv3 ITS
This patch allows to build and use vGICv3 ITS in 32-bit mode.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-14 10:32:54 +00:00
Marc Zyngier
94d0e5980d arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU
Architecturally, TLBs are private to the (physical) CPU they're
associated with. But when multiple vcpus from the same VM are
being multiplexed on the same CPU, the TLBs are not private
to the vcpus (and are actually shared across the VMID).

Let's consider the following scenario:

- vcpu-0 maps PA to VA
- vcpu-1 maps PA' to VA

If run on the same physical CPU, vcpu-1 can hit TLB entries generated
by vcpu-0 accesses, and access the wrong physical page.

The solution to this is to keep a per-VM map of which vcpu ran last
on each given physical CPU, and invalidate local TLBs when switching
to a different vcpu from the same VM.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04 17:56:28 +00:00
Marc Zyngier
c8ea0395ff arm/arm64: KVM: Map the BSS at HYP
When used with a compiler that doesn't implement "asm goto"
(such as the AArch64 port of GCC 4.8), jump labels generate a
memory access to find out about the value of the key (instead
of just patching the code). The key itself is likely to be
stored in the BSS.

This is perfectly fine, except that we don't map the BSS at HYP,
leading to an exploding kernel at the first access. The obvious
fix is simply to map the BSS there (which should have been done
a long while ago, but hey...).

Reported-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-10-21 17:26:24 +01:00
Linus Torvalds
6218590bcb KVM updates for v4.9-rc1
All architectures:
   Move `make kvmconfig` stubs from x86;  use 64 bits for debugfs stats.
 
 ARM:
   Important fixes for not using an in-kernel irqchip; handle SError
   exceptions and present them to guests if appropriate; proxying of GICV
   access at EL2 if guest mappings are unsafe; GICv3 on AArch32 on ARMv8;
   preparations for GICv3 save/restore, including ABI docs; cleanups and
   a bit of optimizations.
 
 MIPS:
   A couple of fixes in preparation for supporting MIPS EVA host kernels;
   MIPS SMP host & TLB invalidation fixes.
 
 PPC:
   Fix the bug which caused guests to falsely report lockups; other minor
   fixes; a small optimization.
 
 s390:
   Lazy enablement of runtime instrumentation; up to 255 CPUs for nested
   guests; rework of machine check deliver; cleanups and fixes.
 
 x86:
   IOMMU part of AMD's AVIC for vmexit-less interrupt delivery; Hyper-V
   TSC page; per-vcpu tsc_offset in debugfs; accelerated INS/OUTS in
   nVMX; cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJX9iDrAAoJEED/6hsPKofoOPoIAIUlgojkb9l2l1XVDgsXdgQL
 sRVhYSVv7/c8sk9vFImrD5ElOPZd+CEAIqFOu45+NM3cNi7gxip9yftUVs7wI5aC
 eDZRWm1E4trDZLe54ZM9ThcqZzZZiELVGMfR1+ZndUycybwyWzafpXYsYyaXp3BW
 hyHM3qVkoWO3dxBWFwHIoO/AUJrWYkRHEByKyvlC6KPxSdBPSa5c1AQwMCoE0Mo4
 K/xUj4gBn9eMelNhg4Oqu/uh49/q+dtdoP2C+sVM8bSdquD+PmIeOhPFIcuGbGFI
 B+oRpUhIuntN39gz8wInJ4/GRSeTuR2faNPxMn4E1i1u4LiuJvipcsOjPfe0a18=
 =fZRB
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "All architectures:
   - move `make kvmconfig` stubs from x86
   - use 64 bits for debugfs stats

  ARM:
   - Important fixes for not using an in-kernel irqchip
   - handle SError exceptions and present them to guests if appropriate
   - proxying of GICV access at EL2 if guest mappings are unsafe
   - GICv3 on AArch32 on ARMv8
   - preparations for GICv3 save/restore, including ABI docs
   - cleanups and a bit of optimizations

  MIPS:
   - A couple of fixes in preparation for supporting MIPS EVA host
     kernels
   - MIPS SMP host & TLB invalidation fixes

  PPC:
   - Fix the bug which caused guests to falsely report lockups
   - other minor fixes
   - a small optimization

  s390:
   - Lazy enablement of runtime instrumentation
   - up to 255 CPUs for nested guests
   - rework of machine check deliver
   - cleanups and fixes

  x86:
   - IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
   - Hyper-V TSC page
   - per-vcpu tsc_offset in debugfs
   - accelerated INS/OUTS in nVMX
   - cleanups and fixes"

* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
  KVM: MIPS: Drop dubious EntryHi optimisation
  KVM: MIPS: Invalidate TLB by regenerating ASIDs
  KVM: MIPS: Split kernel/user ASID regeneration
  KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
  KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
  KVM: arm64: Require in-kernel irqchip for PMU support
  KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
  KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
  KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
  KVM: PPC: BookE: Fix a sanity check
  KVM: PPC: Book3S HV: Take out virtual core piggybacking code
  KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
  ARM: gic-v3: Work around definition of gic_write_bpr1
  KVM: nVMX: Fix the NMI IDT-vectoring handling
  KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
  KVM: nVMX: Fix reload apic access page warning
  kvmconfig: add virtio-gpu to config fragment
  config: move x86 kvm_guest.config to a common location
  arm64: KVM: Remove duplicating init code for setting VMID
  ARM: KVM: Support vgic-v3
  ...
2016-10-06 10:49:01 -07:00
Radim Krčmář
45ca877ad0 KVM/ARM Changes for v4.9
- Various cleanups and removal of redundant code
  - Two important fixes for not using an in-kernel irqchip
  - A bit of optimizations
  - Handle SError exceptions and present them to guests if appropriate
  - Proxying of GICV access at EL2 if guest mappings are unsafe
  - GICv3 on AArch32 on ARMv8
  - Preparations for GICv3 save/restore, including ABI docs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX6rKQAAoJEEtpOizt6ddy8i4H/0bfB1EVukggoL/FfGeds/dg
 p2FG0oOsggcSBwK7VXUUvVllO7ioUssRCqqkn1e0/bCLtQrN4ex4PqJ3618EHFz/
 pLP72hf8Zl33rP3OVtPaDcxzjjKKdf+xGbBIv3AE7x7O5rFZg4lWHeWjy4yuhFv2
 Jm+8ul7JCxCMse08Xc90riou4i/jWjyoLadHbAoeX3tR+dVcZyOUZSlgAPI1bS/P
 rOQi/zkl3bT2R3kh28QuEFTrJ9BVTnmw25BRW8DNr6+CWmR9bpM6y7AGzOwrZ3FZ
 F1MbsPpN3ogcjvPg2QTYuOoqrwz8NLLHw5pR5YNj84VppjSpSsAhKU7Ug5Uhsr0=
 =1z/L
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next

KVM/ARM Changes for v4.9

 - Various cleanups and removal of redundant code
 - Two important fixes for not using an in-kernel irqchip
 - A bit of optimizations
 - Handle SError exceptions and present them to guests if appropriate
 - Proxying of GICV access at EL2 if guest mappings are unsafe
 - GICv3 on AArch32 on ARMv8
 - Preparations for GICv3 save/restore, including ABI docs
2016-09-29 16:01:51 +02:00
Vladimir Murzin
6134993789 arm64: KVM: Remove duplicating init code for setting VMID
By now both VHE and non-VHE initialisation sequences query supported
VMID size. Lets keep only single instance of this code under
init_common_resources().

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:30:46 +02:00
Vladimir Murzin
acda5430be ARM: KVM: Support vgic-v3
This patch allows to build and use vgic-v3 in 32-bit mode.

Unfortunately, it can not be split in several steps without extra
stubs to keep patches independent and bisectable.  For instance,
virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
to be already defined.

It is how support has been done:

* handle SGI requests from the guest

* report configured SRE on access to GICv3 cpu interface from the guest

* required vgic-v3 macros are provided via uapi.h

* static keys are used to select GIC backend

* to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
  the static inlines

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22 13:22:21 +02:00
Luiz Capitulino
235539b48a kvm: add stubs for arch specific debugfs support
Two stubs are added:

 o kvm_arch_has_vcpu_debugfs(): must return true if the arch
   supports creating debugfs entries in the vcpu debugfs dir
   (which will be implemented by the next commit)

 o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
   entries in the vcpu debugfs dir

For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16 16:57:47 +02:00
Suzuki K Poulose
293f293637 kvm-arm: Unmap shadow pagetables properly
On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
the userspace buffer gets unmapped. However, when the Hypervisor
process exits without explicit unmap of the guest buffers, the only
notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
) which does nothing on arm. Later this causes us to access pages that
were already released [via exit_mmap() -> unmap_vmas()] when we actually
get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
which unmaps any free'd pages from the linear map.

 [  757.644120] Unable to handle kernel paging request at virtual address
  ffff800661e00000
 [  757.652046] pgd = ffff20000b1a2000
 [  757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
  *pmd=00000047fcc7c003, *pte=00e8004661e00712
 [  757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
 [  757.672041] Modules linked in:
 [  757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G      D
 4.8.0-rc1 #3
 [  757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
  BIOS 3.06.15 Aug 19 2016
 [  757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
 [  757.698840] PC is at __flush_dcache_area+0x1c/0x40
 [  757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
 [  757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
 ...
 [  758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
 [  758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
 [  758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
 [  758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
 [  758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
 [  758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
 [  758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
 [  758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
 [  758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
 [  758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
 [  758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
 [  758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
 [  758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
 [  758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
 [  758.432608] [<ffff200008083624>] work_pending+0x10/0x14
 [  758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)

This patch fixes the issue by moving the kvm_free_stage2_pgd() to
kvm_arch_flush_shadow_all().

Cc: <stable@vger.kernel.org> # 3.9+
Tested-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reported-by: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-09 12:40:30 +02:00
Marc Zyngier
21977a4c57 arm/arm64: KVM: Remove external abort test from MMIO handling
As we know handle external aborts pretty early, we can get rid of
its handling in the MMIO code (which was a bit odd to begin with...).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
4055710bac arm/arm64: KVM: Inject virtual abort when guest exits on external abort
If we spot a data abort bearing the ESR_EL2.EA bit set, we know that
this is an external abort, and that should be punished by the injection
of an abort.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
e656a1f91e arm: KVM: Drop unreachable HYP abort handlers
Both data and prefetch aborts occuring in HYP lead to a well
deserved panic. Let's get rid of these silly handlers.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
5d7bcf7d64 arm: KVM: Inject a Virtual Abort if it was pending
If we have caught an Abort whilst exiting, we've tagged the
exit code with the pending information. In that case, let's
re-inject the error into the guest, after having adjusted
the PC if required.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
c39798f471 arm: KVM: Handle async aborts delivered while at HYP
Just like for arm64, we can handle asynchronous aborts being
delivered at HYP while being caused by the guest. We use
the exact same method to catch such an abort, and soldier on.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
cc325cfc37 arm: KVM: Add HYP async abort handler
If we've exited the guest because it has triggered an asynchronous
abort, a possible course of action is to let it know it screwed up
by giving it a Virtual Abort to chew on.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
bfb78b5c98 arm: KVM: Add Virtual Abort injection helper
Now that we're able to context switch the HCR.VA bit, let's
introduce a helper that injects an Abort into a vcpu.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
1f7e378d12 arm: KVM: Preserve pending Virtual Abort in world switch
The HCR.VA bit is used to signal an Abort to a guest, and has
the peculiar feature of getting cleared when the guest has taken
the abort (this is the only bit that behaves as such in this register).

This means that if we signal such an abort, we must leave it in
the guest context until it disappears from HCR, and at which point
it must be cleared from the context.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
3aedd5c49e arm: KVM: Use common AArch32 conditional execution code
Add the bit of glue and const-ification that is required to use
the code inherited from the arm64 port, and move over to it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
427d7cacf9 arm64: KVM: Move the AArch32 conditional execution to common code
It would make some sense to share the conditional execution code
between 32 and 64bit. In order to achieve this, let's move that
code to virt/kvm/arm/aarch32.c. While we're at it, drop a
superfluous BUG_ON() that wasn't that useful.

Following patches will migrate the 32bit port to that code base.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Christoffer Dall
cf0ba18a44 KVM: arm/arm64: Get rid of exported aliases to static functions
When rewriting the assembly code to C code, it was useful to have
exported aliases or static functions so that we could keep the existing
common C code unmodified and at the same time rewrite arm64 from
assembly to C code, and later do the arm part.

Now when both are done, we really don't need this level of indirection
anymore, and it's time to save a few lines and brain cells.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Mark Rutland
dcadda146f arm/kvm: excise redundant cache maintenance
When modifying Stage-2 page tables, we perform cache maintenance to
account for non-coherent page table walks. However, this is unnecessary,
as page table walks are guaranteed to be coherent in the presence of the
virtualization extensions.

Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
virtualization extensions mandate the multiprocessing extensions.

Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
requirements"), as described in the sub-section titled "TLB maintenance
operations and the memory order model", this maintenance is not required
in the presence of the multiprocessing extensions.

Hence, we need not perform this cache maintenance when modifying Stage-2
entries.

This patch removes the logic for performing the redundant maintenance.
To ensure visibility and ordering of updates, a dsb(ishst) that was
otherwise implicit in the maintenance is folded into kvm_set_pmd() and
kvm_set_pte().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08 12:53:00 +02:00
Marc Zyngier
d2896d4b55 arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
We're trying hard to detect when the HYP idmap overlaps with the
HYP va, as it makes the teardown of a cpu dangerous. But there is
one case where an overlap is completely safe, which is when the
whole of the kernel is idmap'ed, which is likely to happen on 32bit
when RAM is at 0x8000000 and we're using a 2G/2G VA split.

In that case, we can proceed safely.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-06 13:09:31 +02:00
Paolo Bonzini
2eeb321fd2 KVM/ARM Fixes for v4.8-rc3
This tag contains the following fixes on top of v4.8-rc1:
  - ITS init issues
  - ITS error handling issues
  - ITS IRQ leakage fix
  - Plug a couple of ITS race conditions
  - An erratum workaround for timers
  - Some removal of misleading use of errors and comments
  - A fix for GICv3 on 32-bit guests
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXtLicAAoJEEtpOizt6ddyEC4H/16IngntN6Gz1WPwtBBelgyj
 ZfU970uzGOyDtDOeOX1NT+gJpkDvUMhsNlngWnMrMwqqqPVKdE4XBShPiW2v53E7
 JquDTd2kKl+OO9e9XnkLw9yUcARmJFKIjHdISlg+E78t2kcNHn+XB2jrfTLKQVl8
 tk1ztDALb4LXSGYPZQ/uHTYp9U0qei+2SbbQufRcdQ3ggyxLDwPP2aO25amctzEP
 0Y42tlnNoZj7yBBp0X9BWRrHF2AZuOp+qBJnpFiQdsgLL6G1P3DcU/t9+KDjVBVr
 LYKN8jId2r5eyGGg8aKb4I3trevayToWhDw/jzarrTNAovB1cp8G5J7ozfmeS3g=
 =4PCW
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Fixes for v4.8-rc3

This tag contains the following fixes on top of v4.8-rc1:
 - ITS init issues
 - ITS error handling issues
 - ITS IRQ leakage fix
 - Plug a couple of ITS race conditions
 - An erratum workaround for timers
 - Some removal of misleading use of errors and comments
 - A fix for GICv3 on 32-bit guests
2016-08-18 12:19:19 +02:00
Christoffer Dall
9ac7159546 KVM: arm/arm64: Change misleading use of is_error_pfn
When converting a gfn to a pfn, we call gfn_to_pfn_prot, which returns
various kinds of error values.  It turns out that is_error_pfn() only
returns true when the gfn was found in a memory slot and could somehow
not be used, but it does not return true if the gfn does not belong to
any memory slot.

Change use to is_error_noslot_pfn() which covers both cases.

Note: Since we already check for kvm_is_error_hva(hva) explicitly in the
caller of this function while holding the kvm->srcu lock protecting the
memory slots, this should never be a problem, but nevertheless this
change is warranted as it shows the intention of the code.

Reported-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-17 11:38:03 +02:00
Christoffer Dall
a28ebea2ad KVM: Protect device ops->create and list_add with kvm->lock
KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.

Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.

The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-12 12:01:27 +02:00
Paolo Bonzini
6f49b2f341 KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
 has been cooking in -next for a while.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXoydqAAoJEEtpOizt6ddyM3oH/1A4VeG/J9q4fBPXqY2tVWXs
 c3P7UgNcrEgUNs/F9ykQY/lb31deecUzaBt1OyTf+RlsNbihq3dQdYcBhxtUODw/
 Faok582ya3UFgLW+IRHcID0EbkVOpIzMhOStYsnU/Dz7HG1JL9HdPzwkid7iu9LT
 fI6yrrBnJFjdWAAQ4BkcEKBENRsY8NTs7jX5vnFA92MkUBby7BmariPDD3FtrB+f
 Ob9B7CxM30pNqsN7OA/QvFOHMJHxf3s1TBKwmPHe5TLIfSzV1YxcEGiMc0lWqF4v
 BT8ZeMGCtjDw94tND1DskfQQRPaMqPmGuRTrAW/IuE2n92bFtbqIqs7Cbw0fzLE=
 =Vm6Q
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.8 - Take 2

Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
2016-08-04 13:59:56 +02:00
Linus Torvalds
221bb8a46e - ARM: GICv3 ITS emulation and various fixes. Removal of the old
VGIC implementation.
 
 - s390: support for trapping software breakpoints, nested virtualization
 (vSIE), the STHYI opcode, initial extensions for CPU model support.
 
 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
 preliminary to this and the upcoming support for hardware virtualization
 extensions.
 
 - x86: support for execute-only mappings in nested EPT; reduced vmexit
 latency for TSC deadline timer (by about 30%) on Intel hosts; support for
 more than 255 vCPUs.
 
 - PPC: bugfixes.
 
 The ugly bit is the conflicts.  A couple of them are simple conflicts due
 to 4.7 fixes, but most of them are with other trees. There was definitely
 too much reliance on Acked-by here.  Some conflicts are for KVM patches
 where _I_ gave my Acked-by, but the worst are for this pull request's
 patches that touch files outside arch/*/kvm.  KVM submaintainers should
 probably learn to synchronize better with arch maintainers, with the
 latter providing topic branches whenever possible instead of Acked-by.
 This is what we do with arch/x86.  And I should learn to refuse pull
 requests when linux-next sends scary signals, even if that means that
 submaintainers have to rebase their branches.
 
 Anyhow, here's the list:
 
 - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
 by the nvdimm tree.  This tree adds handle_preemption_timer and
 EXIT_REASON_PREEMPTION_TIMER at the same place.  In general all mentions
 of pcommit have to go.
 
 There is also a conflict between a stable fix and this patch, where the
 stable fix removed the vmx_create_pml_buffer function and its call.
 
 - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
 This tree adds kvm_io_bus_get_dev at the same place.
 
 - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
 file was completely removed for 4.8.
 
 - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
 this is a change that should have gone in through the irqchip tree and
 pulled by kvm-arm.  I think I would have rejected this kvm-arm pull
 request.  The KVM version is the right one, except that it lacks
 GITS_BASER_PAGES_SHIFT.
 
 - arch/powerpc: what a mess.  For the idle_book3s.S conflict, the KVM
 tree is the right one; everything else is trivial.  In this case I am
 not quite sure what went wrong.  The commit that is causing the mess
 (fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
 path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
 and arch/powerpc/kvm/.  It's large, but at 396 insertions/5 deletions
 I guessed that it wasn't really possible to split it and that the 5
 deletions wouldn't conflict.  That wasn't the case.
 
 - arch/s390: also messy.  First is hypfs_diag.c where the KVM tree
 moved some code and the s390 tree patched it.  You have to reapply the
 relevant part of commits 6c22c98637, plus all of e030c1125e, to
 arch/s390/kernel/diag.c.  Or pick the linux-next conflict
 resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
 Second, there is a conflict in gmap.c between a stable fix and 4.8.
 The KVM version here is the correct one.
 
 I have pushed my resolution at refs/heads/merge-20160802 (commit
 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
 SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
 x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
 qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
 =42ti
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:

 - ARM: GICv3 ITS emulation and various fixes.  Removal of the
   old VGIC implementation.

 - s390: support for trapping software breakpoints, nested
   virtualization (vSIE), the STHYI opcode, initial extensions
   for CPU model support.

 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
   of cleanups, preliminary to this and the upcoming support for
   hardware virtualization extensions.

 - x86: support for execute-only mappings in nested EPT; reduced
   vmexit latency for TSC deadline timer (by about 30%) on Intel
   hosts; support for more than 255 vCPUs.

 - PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
  KVM: PPC: Introduce KVM_CAP_PPC_HTM
  MIPS: Select HAVE_KVM for MIPS64_R{2,6}
  MIPS: KVM: Reset CP0_PageMask during host TLB flush
  MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
  MIPS: KVM: Sign extend MFC0/RDHWR results
  MIPS: KVM: Fix 64-bit big endian dynamic translation
  MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
  MIPS: KVM: Use 64-bit CP0_EBase when appropriate
  MIPS: KVM: Set CP0_Status.KX on MIPS64
  MIPS: KVM: Make entry code MIPS64 friendly
  MIPS: KVM: Use kmap instead of CKSEG0ADDR()
  MIPS: KVM: Use virt_to_phys() to get commpage PFN
  MIPS: Fix definition of KSEGX() for 64-bit
  KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
  kvm: x86: nVMX: maintain internal copy of current VMCS
  KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
  KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
  KVM: arm64: vgic-its: Simplify MAPI error handling
  KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
  KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
  ...
2016-08-02 16:11:27 -04:00
Radim Krčmář
912902ce78 KVM/ARM changes for Linux 4.8
- GICv3 ITS emulation
 - Simpler idmap management that fixes potential TLB conflicts
 - Honor the kernel protection in HYP mode
 - Removal of the old vgic implementation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXkk6wAAoJECPQ0LrRPXpDkIQP/iJ2yXTxrfbJoyaVq1vuMn3R
 UFhVwNXP8OEjQrmp5lvMBazB1MRBkNDzlVXL1fSb+ijKmbIELOqHhO6ijrkK4zmc
 0Ie0x5Bt4gIFPTZyZORVpy1eU/0YFGWERAfsAjYdMCeKwHjaUCRSrZBXF2YsFTfo
 Hh/ILvHa8TjUXWsQXvtZCL6AAnkDKBsbDWqsq5zspuT+PA8umI+dGLIiULXBpc4t
 S2TCDxOU1JgsAn+Y0XVbPXV9id+bs5LRd6nNH/RmipIVqWmukSrScXOjg/po/l2S
 laO4tHmyEeN6ecnCxWttpjacNwyTDNh5n3lL1ceBnBZFqn1k/7NjqV3fQzJxGd1T
 1U6edE9+EuS9uXWF5XcEuAD660EiMs4FLVSjPgqYQtto3gOHilmuWL9eeeOOgCem
 Lknnu/7G8h36PaQuLnEXWXQb7jeS2rTuC0RqxCG62gD9UWEJTckRz5pRh/e6gz7n
 ZVXMrwGiVZ3zR78qE6i2j5CZ6A0BMAK3nZ85AI3kmgKg0CfVY28uPOj8llAOaYm+
 0XVdfRj7ed75eu3GobjHUyZ0fQ40jovmH2vy3mupBm5XBUHgH/j6X510KJ1UTLWI
 C2EO9KogbjoVeu60mQi4bKGSPi8/wdgYqVft/Qzl5D5iFvQ7Ia+TQNMArCQazBID
 Ihe1E09NGrHjV3Yw/GWV
 =2Del
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next

KVM/ARM changes for Linux 4.8

- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation
2016-07-22 20:27:26 +02:00
Eric Auger
180ae7b118 KVM: arm/arm64: Enable irqchip routing
This patch adds compilation and link against irqchip.

Main motivation behind using irqchip code is to enable MSI
routing code. In the future irqchip routing may also be useful
when targeting multiple irqchips.

Routing standard callbacks now are implemented in vgic-irqfd:
- kvm_set_routing_entry
- kvm_set_irq
- kvm_set_msi

They only are supported with new_vgic code.

Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.

So from now on IRQCHIP routing is enabled and a routing table entry
must exist for irqfd injection to succeed for a given SPI. This patch
builds a default flat irqchip routing table (gsi=irqchip.pin) covering
all the VGIC SPI indexes. This routing table is overwritten by the
first first user-space call to KVM_SET_GSI_ROUTING ioctl.

MSI routing setup is not yet allowed.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22 18:52:01 +01:00
Andre Przywara
1085fdc68c KVM: arm64: vgic-its: Introduce new KVM ITS device
Introduce a new KVM device that represents an ARM Interrupt Translation
Service (ITS) controller. Since there can be multiple of this per guest,
we can't piggy back on the existing GICv3 distributor device, but create
a new type of KVM device.
On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data
structure and store the pointer in the kvm_device data.
Upon an explicit init ioctl from userland (after having setup the MMIO
address) we register the handlers with the kvm_io_bus framework.
Any reference to an ITS thus has to go via this interface.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18 18:14:35 +01:00
Andre Przywara
b46f01ce4d KVM: arm/arm64: Extend arch CAP checks to allow per-VM capabilities
KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18 18:10:31 +01:00
Marc Zyngier
6c41a413fd arm/arm64: Get rid of KERN_TO_HYP
We have both KERN_TO_HYP and kern_hyp_va, which do the exact same
thing. Let's standardize on the latter.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
eac378a9ce arm/arm64: KVM: Check that IDMAP doesn't intersect with VA range
This is more of a safety measure than anything else: If we end-up
with an idmap page that intersect with the range picked for the
the HYP VA space, abort the KVM setup, as it is unsafe to go
further.

I cannot imagine it happening on 64bit (we have a mechanism to
work around it), but could potentially occur on a 32bit system with
the kernel loaded high enough in memory so that in conflicts with
the kernel VA.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
e537ecd7ef arm: KVM: Allow hyp teardown
So far, KVM was getting in the way of kexec on 32bit (and the arm64
kexec hackers couldn't be bothered to fix it on 32bit...).

With simpler page tables, tearing KVM down becomes very easy, so
let's just do it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
cd602a37e8 arm: KVM: Simplify HYP init
Just like for arm64, we can now make the HYP setup a lot simpler,
and we can now initialise it in one go (instead of the two
phases we currently have).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
26781f9ce1 arm/arm64: KVM: Kill free_boot_hyp_pgd
There is no way to free the boot PGD, because it doesn't exist
anymore as a standalone entity.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
12fda8123d arm/arm64: KVM: Drop boot_pgd
Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
0535a3e2b2 arm/arm64: KVM: Always have merged page tables
We're in a position where we can now always have "merged" page
tables, where both the runtime mapping and the idmap coexist.

This results in some code being removed, but there is more to come.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
1df3e2347a arm/arm64: KVM: Export __hyp_text_start/end symbols
Declare the __hyp_text_start/end symbols in asm/virt.h so that
they can be reused without having to declare them locally.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:41:27 +02:00
Marc Zyngier
50926d82fa KVM: arm/arm64: The GIC is dead, long live the GIC
I don't think any single piece of the KVM/ARM code ever generated
as much hatred as the GIC emulation.

It was written by someone who had zero experience in modeling
hardware (me), was riddled with design flaws, should have been
scrapped and rewritten from scratch long before having a remote
chance of reaching mainline, and yet we supported it for a good
three years. No need to mention the names of those who suffered,
the git log is singing their praises.

Thankfully, we now have a much more maintainable implementation,
and we can safely put the grumpy old GIC to rest.

Fellow hackers, please raise your glass in memory of the GIC:

	The GIC is dead, long live the GIC!

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-03 23:09:37 +02:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Marc Zyngier
5900270550 arm/arm64: KVM: Map the HYP text as read-only
There should be no reason for mapping the HYP text read/write.

As such, let's have a new set of flags (PAGE_HYP_EXEC) that allows
execution, but makes the page as read-only, and update the two call
sites that deal with mapping code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 14:01:34 +02:00
Marc Zyngier
74a6b8885f arm/arm64: KVM: Enforce HYP read-only mapping of the kernel's rodata section
In order to be able to use C code in HYP, we're now mapping the kernel's
rodata in HYP. It works absolutely fine, except that we're mapping it RWX,
which is not what it should be.

Add a new HYP_PAGE_RO protection, and pass it as the protection flags
when mapping the rodata section.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 13:59:14 +02:00
Marc Zyngier
c8dddecdeb arm/arm64: KVM: Add a protection parameter to create_hyp_mappings
Currently, create_hyp_mappings applies a "one size fits all" page
protection (PAGE_HYP). As we're heading towards separate protections
for different sections, let's make this protection a parameter, and
let the callers pass their prefered protection (PAGE_HYP for everyone
for the time being).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29 13:59:14 +02:00
James Morse
591d215afc KVM: arm/arm64: Stop leaking vcpu pid references
kvm provides kvm_vcpu_uninit(), which amongst other things, releases the
last reference to the struct pid of the task that was last running the vcpu.

On arm64 built with CONFIG_DEBUG_KMEMLEAK, starting a guest with kvmtool,
then killing it with SIGKILL results (after some considerable time) in:
> cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff80007d5ea080 (size 128):
>  comm "lkvm", pid 2025, jiffies 4294942645 (age 1107.776s)
>  hex dump (first 32 bytes):
>    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>  backtrace:
>    [<ffff8000001b30ec>] create_object+0xfc/0x278
>    [<ffff80000071da34>] kmemleak_alloc+0x34/0x70
>    [<ffff80000019fa2c>] kmem_cache_alloc+0x16c/0x1d8
>    [<ffff8000000d0474>] alloc_pid+0x34/0x4d0
>    [<ffff8000000b5674>] copy_process.isra.6+0x79c/0x1338
>    [<ffff8000000b633c>] _do_fork+0x74/0x320
>    [<ffff8000000b66b0>] SyS_clone+0x18/0x20
>    [<ffff800000085cb0>] el0_svc_naked+0x24/0x28
>    [<ffffffffffffffff>] 0xffffffffffffffff

On x86 kvm_vcpu_uninit() is called on the path from kvm_arch_destroy_vm(),
on arm no equivalent call is made. Add the call to kvm_arch_vcpu_free().

Signed-off-by: James Morse <james.morse@arm.com>
Fixes: 749cf76c5a ("KVM: ARM: Initial skeleton to compile KVM support")
Cc: <stable@vger.kernel.org> # 3.10+
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-27 13:08:10 +02:00
Andrea Gelmini
6a727b0b3f KVM: ARM: Fix typos
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-14 11:16:26 +02:00
Linus Torvalds
e28e909c36 - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat
(kvm_stat had nothing to do with QEMU in the first place -- the tool
    only interprets debugfs)
 - expose per-vm statistics in debugfs and support them in kvm_stat
   (KVM always collected per-vm statistics, but they were summarised into
    global statistics)
 
 x86:
  - fix dynamic APICv (VMX was improperly configured and a guest could
    access host's APIC MSRs, CVE-2016-4440)
  - minor fixes
 
 ARM changes from Christoffer Dall:
  "This set of changes include the new vgic, which is a reimplementation
   of our horribly broken legacy vgic implementation.  The two
   implementations will live side-by-side (with the new being the
   configured default) for one kernel release and then we'll remove the
   legacy one.
 
   Also fixes a non-critical issue with virtual abort injection to
   guests."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJXRz0KAAoJEED/6hsPKofosiMIAIHmRI+9I6VMNmQe5vrZKz9/
 vt89QGxDJrFQwhEuZovenLEDaY6rMIJNguyvIbPhNuXNHIIPWbe6cO6OPwByqkdo
 WI/IIqcAJN/Bpwt4/Y2977A5RwDOwWLkaDs0LrZCEKPCgeh9GWQf+EfyxkDJClhG
 uIgbSAU+t+7b05K3c6NbiQT/qCzDTCdl6In6PI/DFSRRkXDaTcopjjp1PmMUSSsR
 AM8LGhEzMer+hGKOH7H5TIbN+HFzAPjBuDGcoZt0/w9IpmmS5OMd3ZrZ320cohz8
 zZQooRcFrT0ulAe+TilckmRMJdMZ69fyw3nzfqgAKEx+3PaqjKSY/tiEgqqDJHY=
 =EEBK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of KVM updates from Radim Krčmář:
 "General:

   - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat
     had nothing to do with QEMU in the first place -- the tool only
     interprets debugfs)

   - expose per-vm statistics in debugfs and support them in kvm_stat
     (KVM always collected per-vm statistics, but they were summarised
     into global statistics)

  x86:

   - fix dynamic APICv (VMX was improperly configured and a guest could
     access host's APIC MSRs, CVE-2016-4440)

   - minor fixes

  ARM changes from Christoffer Dall:

   - new vgic reimplementation of our horribly broken legacy vgic
     implementation.  The two implementations will live side-by-side
     (with the new being the configured default) for one kernel release
     and then we'll remove the legacy one.

   - fix for a non-critical issue with virtual abort injection to guests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
  tools: kvm_stat: Add comments
  tools: kvm_stat: Introduce pid monitoring
  KVM: Create debugfs dir and stat files for each VM
  MAINTAINERS: Add kvm tools
  tools: kvm_stat: Powerpc related fixes
  tools: Add kvm_stat man page
  tools: Add kvm_stat vm monitor script
  kvm:vmx: more complete state update on APICv on/off
  KVM: SVM: Add more SVM_EXIT_REASONS
  KVM: Unify traced vector format
  svm: bitwise vs logical op typo
  KVM: arm/arm64: vgic-new: Synchronize changes to active state
  KVM: arm/arm64: vgic-new: enable build
  KVM: arm/arm64: vgic-new: implement mapped IRQ handling
  KVM: arm/arm64: vgic-new: Wire up irqfd injection
  KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable
  KVM: arm/arm64: vgic-new: vgic_init: implement map_resources
  KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init
  KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create
  KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init
  ...
2016-05-27 13:41:54 -07:00
Christoffer Dall
35a2d58588 KVM: arm/arm64: vgic-new: Synchronize changes to active state
When modifying the active state of an interrupt via the MMIO interface,
we should ensure that the write has the intended effect.

If a guest sets an interrupt to active, but that interrupt is already
flushed into a list register on a running VCPU, then that VCPU will
write the active state back into the struct vgic_irq upon returning from
the guest and syncing its state.  This is a non-benign race, because the
guest can observe that an interrupt is not active, and it can have a
reasonable expectations that other VCPUs will not ack any IRQs, and then
set the state to active, and expect it to stay that way.  Currently we
are not honoring this case.

Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the
world, change the irq state, potentially queue the irq if we're setting
it to active, and then continue.

We take this chance to slightly optimize these functions by not stopping
the world when touching private interrupts where there is inherently no
possible race.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 16:26:38 +02:00
Andre Przywara
efffe55af5 KVM: arm/arm64: vgic-new: enable build
Now that the new VGIC implementation has reached feature parity with
the old one, add the new files to the build system and add a Kconfig
option to switch between the two versions.
We set the default to the new version to get maximum test coverage,
in case people experience problems they can switch back to the old
behaviour if needed.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:40:09 +02:00
Christoffer Dall
b13216cf60 KVM: arm/arm64: Provide functionality to pause and resume a guest
For some rare corner cases in our VGIC emulation later we have to stop
the guest to make sure the VGIC state is consistent.
Provide the necessary framework to pause and resume a guest.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:43 +02:00
Christoffer Dall
d5a5a0eff3 KVM: arm/arm64: Export mmio_read/write_bus
Rename mmio_{read,write}_bus to kvm_mmio_{read,write}_bus and export
them out of mmio.c.
This will be needed later for the new VGIC implementation.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:42 +02:00
Christoffer Dall
83091db981 KVM: arm/arm64: Fix MMIO emulation data handling
When the kernel was handling a guest MMIO read access internally, we
need to copy the emulation result into the run->mmio structure in order
for the kvm_handle_mmio_return() function to pick it up and inject the
	result back into the guest.

Currently the only user of kvm_io_bus for ARM is the VGIC, which did
this copying itself, so this was not causing issues so far.

But with the upcoming new vgic implementation we need this done
properly.

Update the kvm_handle_mmio_return description and cleanup the code to
only perform a single copying when needed.

Code and commit message inspired by Andre Przywara.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:42 +02:00
Christoffer Dall
41a54482c0 KVM: arm/arm64: Move timer IRQ map to latest possible time
We are about to modify the VGIC to allocate all data structures
dynamically and store mapped IRQ information on a per-IRQ struct, which
is indeed allocated dynamically at init time.

Therefore, we cannot record the mapped IRQ info from the timer at timer
reset time like it's done now, because VCPU reset happens before timer
init.

A possible later time to do this is on the first run of a per VCPU, it
just requires us to move the enable state to be a per-VCPU state and do
the lookup of the physical IRQ number when we are about to run the VCPU.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:41 +02:00
Linus Torvalds
7beaa24ba4 Small release overall.
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
 AMD version)
 
 - s390: polling for interrupts after a VCPU goes to halted state is
 now enabled for s390; use hardware provided information about facility
 bits that do not need any hypervisor activity, and other fixes for
 cpu models and facilities; improve perf output; floating interrupt
 controller improvements.
 
 - MIPS: miscellaneous fixes
 
 - PPC: bugfixes only
 
 - ARM: 16K page size support, generic firmware probing layer for
 timer and GIC
 
 Christoffer Dall (KVM-ARM maintainer) says:
 "There are a few changes in this pull request touching things outside
  KVM, but they should all carry the necessary acks and it made the
  merge process much easier to do it this way."
 
 though actually the irqchip maintainers' acks didn't make it into the
 patches.  Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
 later acked at http://mid.gmane.org/573351D1.4060303@arm.com
 "more formally and for documentation purposes".
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
 LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
 xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
 hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
 Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
 RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
 =llO5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Small release overall.

  x86:
   - miscellaneous fixes
   - AVIC support (local APIC virtualization, AMD version)

  s390:
   - polling for interrupts after a VCPU goes to halted state is now
     enabled for s390
   - use hardware provided information about facility bits that do not
     need any hypervisor activity, and other fixes for cpu models and
     facilities
   - improve perf output
   - floating interrupt controller improvements.

  MIPS:
   - miscellaneous fixes

  PPC:
   - bugfixes only

  ARM:
   - 16K page size support
   - generic firmware probing layer for timer and GIC

  Christoffer Dall (KVM-ARM maintainer) says:
    "There are a few changes in this pull request touching things
     outside KVM, but they should all carry the necessary acks and it
     made the merge process much easier to do it this way."

  though actually the irqchip maintainers' acks didn't make it into the
  patches.  Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
  later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
  formally and for documentation purposes')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
  KVM: MTRR: remove MSR 0x2f8
  KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
  svm: Manage vcpu load/unload when enable AVIC
  svm: Do not intercept CR8 when enable AVIC
  svm: Do not expose x2APIC when enable AVIC
  KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
  svm: Add VMEXIT handlers for AVIC
  svm: Add interrupt injection via AVIC
  KVM: x86: Detect and Initialize AVIC support
  svm: Introduce new AVIC VMCB registers
  KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
  KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
  KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
  KVM: x86: Misc LAPIC changes to expose helper functions
  KVM: shrink halt polling even more for invalid wakeups
  KVM: s390: set halt polling to 80 microseconds
  KVM: halt_polling: provide a way to qualify wakeups during poll
  KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
  kvm: Conditionally register IRQ bypass consumer
  ...
2016-05-19 11:27:09 -07:00
Linus Torvalds
be092017b6 arm64 updates for 4.7:
- virt_to_page/page_address optimisations
 
 - Support for NUMA systems described using device-tree
 
 - Support for hibernate/suspend-to-disk
 
 - Proper support for maxcpus= command line parameter
 
 - Detection and graceful handling of AArch64-only CPUs
 
 - Miscellaneous cleanups and non-critical fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJXNbgkAAoJELescNyEwWM0PtcIAK11xaOMmSqXz8fcTeNLw4dS
 taaPWhjCYus8EhJyvTetfwk74+qVApdvKXKNKgODJXQEjeQx2brdUfbQZb31DTGT
 798UYCAyEYCWkXspqi+/dpZEgUGPYH7uGOu2eDd19+PhTeX/EQSRX3fC9k0BNhvh
 PN9pOgRcKAlIExZ6QYmT0g56VLtbCfFShN41mQ8HdpShl6pPJuhQ+kDDzudmRjuD
 11/oYuOaVTnwbPuXn+sjOrWvMkfINHI70BAQnnBs0v+5c45mzpqEMsy0dYo2Pl2m
 ar5lUFVIZggQkiqcOzqBzEgF+4gNw4LUu1DgK6cNKNMtL6k8E9zeOZMWeSVr0lg=
 =bT5E
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:

 - virt_to_page/page_address optimisations

 - support for NUMA systems described using device-tree

 - support for hibernate/suspend-to-disk

 - proper support for maxcpus= command line parameter

 - detection and graceful handling of AArch64-only CPUs

 - miscellaneous cleanups and non-critical fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
  arm64: do not enforce strict 16 byte alignment to stack pointer
  arm64: kernel: Fix incorrect brk randomization
  arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
  arm64: secondary_start_kernel: Remove unnecessary barrier
  arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
  arm64: Replace hard-coded values in the pmd/pud_bad() macros
  arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
  arm64: Fix typo in the pmdp_huge_get_and_clear() definition
  arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL
  arm64: always use STRICT_MM_TYPECHECKS
  arm64: kvm: Fix kvm teardown for systems using the extended idmap
  arm64: kaslr: increase randomization granularity
  arm64: kconfig: drop CONFIG_RTC_LIB dependency
  arm64: make ARCH_SUPPORTS_DEBUG_PAGEALLOC depend on !HIBERNATION
  arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  arm64: kernel: Add support for hibernate/suspend-to-disk
  PM / Hibernate: Call flush_icache_range() on pages restored in-place
  arm64: Add new asm macro copy_page
  arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
  arm64: kernel: Include _AC definition in page.h
  ...
2016-05-16 17:17:24 -07:00
Catalin Marinas
0648505324 kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables
The ARMv8.1 architecture extensions introduce support for hardware
updates of the access and dirty information in page table entries. With
VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
PTE_AF bit cleared in the stage 2 page table, instead of raising an
Access Flag fault to EL2 the CPU sets the actual page table entry bit
(10). To ensure that kernel modifications to the page table do not
inadvertently revert a bit set by hardware updates, certain Stage 2
software pte/pmd operations must be performed atomically.

The main user of the AF bit is the kvm_age_hva() mechanism. The
kvm_age_hva_handler() function performs a "test and clear young" action
on the pte/pmd. This needs to be atomic in respect of automatic hardware
updates of the AF bit. Since the AF bit is in the same position for both
Stage 1 and Stage 2, the patch reuses the existing
ptep_test_and_clear_young() functionality if
__HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
existing pte_young/pte_mkold mechanism is preserved.

The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
to perform atomic modifications in order to avoid a race with updates of
the AF bit. The arm64 implementation has been re-written using
exclusives.

Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
argument and modify the pte/pmd in place. However, these functions are
only used on local variables rather than actual page table entries, so
it makes more sense to follow the pte_mkwrite() approach for stage 1
attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
functions do not modify the actual page table entries.

The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
explicitly) do not need to be modified since hardware updates of the
dirty status are not supported by KVM, so there is no possibility of
losing such information.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-09 22:23:08 +02:00
Andrea Arcangeli
127393fbe5 mm: thp: kvm: fix memory corruption in KVM with THP enabled
After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.

A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page.  So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).

Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages().  However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages).  So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()).  In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages.  In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.

Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.

Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.

The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Li, Liang Z" <liang.z.li@intel.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
Marc Zyngier
d4b9e0790a arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tables
The ARM architecture mandates that when changing a page table entry
from a valid entry to another valid entry, an invalid entry is first
written, TLB invalidated, and only then the new entry being written.

The current code doesn't respect this, directly writing the new
entry and only then invalidating TLBs. Let's fix it up.

Cc: <stable@vger.kernel.org>
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-04-29 13:46:14 +02:00
AKASHI Takahiro
67f6919766 arm64: kvm: allows kvm cpu hotplug
The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.

This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.

Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
 kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
 added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
 guest-enter after teardown handling]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-28 12:05:46 +01:00
Suzuki K Poulose
9163ee23e7 kvm-arm: Cleanup stage2 pgd handling
Now that we don't have any fake page table levels for arm64,
cleanup the common code to get rid of the dead code.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:58:23 +02:00
Suzuki K Poulose
8684e701df kvm-arm: Cleanup kvm_* wrappers
Now that we have switched to explicit page table routines,
get rid of the obsolete kvm_* wrappers.

Also, kvm_tlb_flush_vmid_by_ipa is now called only on stage2
page tables, hence get rid of the redundant check.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:58:20 +02:00
Suzuki K Poulose
7a1c831ee8 kvm-arm: Add stage2 page table modifiers
Now that the hyp page table is handled by different set of
routines, rename the original shared routines to stage2 handlers.
Also make explicit use of the stage2 page table helpers.

unmap_range has been merged to existing unmap_stage2_range.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:58:18 +02:00
Suzuki K Poulose
64f3249792 kvm-arm: Add explicit hyp page table modifiers
We have common routines to modify hyp and stage2 page tables
based on the 'kvm' parameter. For a smoother transition to
using separate routines for each, duplicate the routines
and modify the copy to work on hyp.

Marks the forked routines with _hyp_ and gets rid of the
kvm parameter which is no longer needed and is NULL for hyp.
Also, gets rid of calls to kvm_tlb_flush_by_vmid_ipa() calls
from the hyp versions. Uses explicit host page table accessors
instead of the kvm_* page table helpers.

Suggested-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:58:16 +02:00
Suzuki K Poulose
70fd190685 kvm-arm: Use explicit stage2 helper routines
We have stage2 page table helpers for both arm and arm64. Switch to
the stage2 helpers for routines that only deal with stage2 page table.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:58:07 +02:00
Suzuki K Poulose
77b5665141 kvm-arm: Remove kvm_pud_huge()
Get rid of kvm_pud_huge() which falls back to pud_huge. Use
pud_huge instead.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:57:15 +02:00
Suzuki K Poulose
bbb3b6b350 kvm-arm: Replace kvm_pmd_huge with pmd_thp_or_huge
Both arm and arm64 now provides a helper, pmd_thp_or_huge()
to check if the given pmd represents a huge page. Use that
instead of our own custom check.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:57:05 +02:00
Suzuki K Poulose
120f0779c3 kvm arm: Move fake PGD handling to arch specific files
Rearrange the code for fake pgd handling, which is applicable
only for arm64. This will later be removed once we introduce
the stage2 page table walker macros.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21 14:56:44 +02:00
Sudeep Holla
06a71a24ba arm64: KVM: unregister notifiers in hyp mode teardown path
Commit 1e947bad0b ("arm64: KVM: Skip HYP setup when already running
in HYP") re-organized the hyp init code and ended up leaving the CPU
hotplug and PM notifier even if hyp mode initialization fails.

Since KVM is not yet supported with ACPI, the above mentioned commit
breaks CPU hotplug in ACPI boot.

This patch fixes teardown_hyp_mode to properly unregister both CPU
hotplug and PM notifiers in the teardown path.

Fixes: 1e947bad0b ("arm64: KVM: Skip HYP setup when already running in HYP")
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-04-06 13:47:52 +02:00
James Morse
5f5560b1c5 arm64: KVM: Register CPU notifiers when the kernel runs at HYP
When the kernel is running at EL2, it doesn't need init_hyp_mode() to
configure page tables for HYP. This function also registers the CPU
hotplug and lower power notifiers that cause HYP to be re-initialised
after the CPU has been reset.

To avoid losing the register state that controls stage2 translation, move
the registering of these notifiers into init_subsystems(), and add a
is_kernel_in_hyp_mode() path to each callback.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 1e947bad0b ("arm64: KVM: Skip HYP setup when already running in HYP")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-31 10:27:28 +02:00
Eric Auger
898f949fb7 KVM: arm/arm64: disable preemption when calling smp_call_function_many
Preemption must be disabled when calling smp_call_function_many

Reported-by: bartosz.wawrzyniak@tieto.com
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21 10:45:22 +01:00
Linus Torvalds
588ab3f9af arm64 updates for 4.6:
- Initial page table creation reworked to avoid breaking large block
   mappings (huge pages) into smaller ones. The ARM architecture requires
   break-before-make in such cases to avoid TLB conflicts but that's not
   always possible on live page tables
 
 - Kernel virtual memory layout: the kernel image is no longer linked to
   the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
   the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
   in physical RAM
 
 - Kernel ASLR: position independent kernel Image and modules being
   randomly mapped in the vmalloc space with the randomness is provided
   by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
   acked by Matt Fleming)
 
 - Implement relative exception tables for arm64, required by KASLR
   (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
   actual x86 conversion to deferred to 4.7 because of the merge
   dependencies)
 
 - Support for the User Access Override feature of ARMv8.2: this allows
   uaccess functions (get_user etc.) to be implemented using LDTR/STTR
   instructions. Such instructions, when run by the kernel, perform
   unprivileged accesses adding an extra level of protection. The
   set_fs() macro is used to "upgrade" such instruction to privileged
   accesses via the UAO bit
 
 - Half-precision floating point support (part of ARMv8.2)
 
 - Optimisations for CPUs with or without a hardware prefetcher (using
   run-time code patching)
 
 - copy_page performance improvement to deal with 128 bytes at a time
 
 - Sanity checks on the CPU capabilities (via CPUID) to prevent
   incompatible secondary CPUs from being brought up (e.g. weird
   big.LITTLE configurations)
 
 - valid_user_regs() reworked for better sanity check of the sigcontext
   information (restored pstate information)
 
 - ACPI parking protocol implementation
 
 - CONFIG_DEBUG_RODATA enabled by default
 
 - VDSO code marked as read-only
 
 - DEBUG_PAGEALLOC support
 
 - ARCH_HAS_UBSAN_SANITIZE_ALL enabled
 
 - Erratum workaround Cavium ThunderX SoC
 
 - set_pte_at() fix for PROT_NONE mappings
 
 - Code clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
 RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
 hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
 YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
 OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
 EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
 dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
 xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
 LepccTgykiUBqW5TRzPz
 =/oS+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Here are the main arm64 updates for 4.6.  There are some relatively
  intrusive changes to support KASLR, the reworking of the kernel
  virtual memory layout and initial page table creation.

  Summary:

   - Initial page table creation reworked to avoid breaking large block
     mappings (huge pages) into smaller ones.  The ARM architecture
     requires break-before-make in such cases to avoid TLB conflicts but
     that's not always possible on live page tables

   - Kernel virtual memory layout: the kernel image is no longer linked
     to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
     of the vmalloc space, allowing the kernel to be loaded (nearly)
     anywhere in physical RAM

   - Kernel ASLR: position independent kernel Image and modules being
     randomly mapped in the vmalloc space with the randomness is
     provided by UEFI (efi_get_random_bytes() patches merged via the
     arm64 tree, acked by Matt Fleming)

   - Implement relative exception tables for arm64, required by KASLR
     (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
     but actual x86 conversion to deferred to 4.7 because of the merge
     dependencies)

   - Support for the User Access Override feature of ARMv8.2: this
     allows uaccess functions (get_user etc.) to be implemented using
     LDTR/STTR instructions.  Such instructions, when run by the kernel,
     perform unprivileged accesses adding an extra level of protection.
     The set_fs() macro is used to "upgrade" such instruction to
     privileged accesses via the UAO bit

   - Half-precision floating point support (part of ARMv8.2)

   - Optimisations for CPUs with or without a hardware prefetcher (using
     run-time code patching)

   - copy_page performance improvement to deal with 128 bytes at a time

   - Sanity checks on the CPU capabilities (via CPUID) to prevent
     incompatible secondary CPUs from being brought up (e.g.  weird
     big.LITTLE configurations)

   - valid_user_regs() reworked for better sanity check of the
     sigcontext information (restored pstate information)

   - ACPI parking protocol implementation

   - CONFIG_DEBUG_RODATA enabled by default

   - VDSO code marked as read-only

   - DEBUG_PAGEALLOC support

   - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

   - Erratum workaround Cavium ThunderX SoC

   - set_pte_at() fix for PROT_NONE mappings

   - Code clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
  arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
  arm64: kasan: Use actual memory node when populating the kernel image shadow
  arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
  arm64: Fix misspellings in comments.
  arm64: efi: add missing frame pointer assignment
  arm64: make mrs_s prefixing implicit in read_cpuid
  arm64: enable CONFIG_DEBUG_RODATA by default
  arm64: Rework valid_user_regs
  arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
  arm64: KVM: Move kvm_call_hyp back to its original localtion
  arm64: mm: treat memstart_addr as a signed quantity
  arm64: mm: list kernel sections in order
  arm64: lse: deal with clobbered IP registers after branch via PLT
  arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
  arm64: kconfig: add submenu for 8.2 architectural features
  arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
  arm64: Add support for Half precision floating point
  arm64: Remove fixmap include fragility
  arm64: Add workaround for Cavium erratum 27456
  arm64: mm: Mark .rodata as RO
  ...
2016-03-17 20:03:47 -07:00
Linus Torvalds
10dc374766 One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes.
 
 * ARM:
 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - various optimizations to the vgic save/restore code.
 
 * PPC:
 - enabled KVM-VFIO integration ("VFIO device")
 - optimizations to speed up IPIs between vcpus
 - in-kernel handling of IOMMU hypercalls
 - support for dynamic DMA windows (DDW).
 
 * s390:
 - provide the floating point registers via sync regs;
 - separated instruction vs. data accesses
 - dirty log improvements for huge guests
 - bugfixes and documentation improvements.
 
 * x86:
 - Hyper-V VMBus hypercall userspace exit
 - alternative implementation of lowest-priority interrupts using vector
 hashing (for better VT-d posted interrupt support)
 - fixed guest debugging with nested virtualizations
 - improved interrupt tracking in the in-kernel IOAPIC
 - generic infrastructure for tracking writes to guest memory---currently
 its only use is to speedup the legacy shadow paging (pre-EPT) case, but
 in the future it will be used for virtual GPUs as well
 - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
 oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
 tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
 Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
 WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
 =OUZZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One of the largest releases for KVM...  Hardly any generic
  changes, but lots of architecture-specific updates.

  ARM:
   - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
   - PMU support for guests
   - 32bit world switch rewritten in C
   - various optimizations to the vgic save/restore code.

  PPC:
   - enabled KVM-VFIO integration ("VFIO device")
   - optimizations to speed up IPIs between vcpus
   - in-kernel handling of IOMMU hypercalls
   - support for dynamic DMA windows (DDW).

  s390:
   - provide the floating point registers via sync regs;
   - separated instruction vs.  data accesses
   - dirty log improvements for huge guests
   - bugfixes and documentation improvements.

  x86:
   - Hyper-V VMBus hypercall userspace exit
   - alternative implementation of lowest-priority interrupts using
     vector hashing (for better VT-d posted interrupt support)
   - fixed guest debugging with nested virtualizations
   - improved interrupt tracking in the in-kernel IOAPIC
   - generic infrastructure for tracking writes to guest
     memory - currently its only use is to speedup the legacy shadow
     paging (pre-EPT) case, but in the future it will be used for
     virtual GPUs as well
   - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
  KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
  KVM: x86: disable MPX if host did not enable MPX XSAVE features
  arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
  arm64: KVM: vgic-v3: Reset LRs at boot time
  arm64: KVM: vgic-v3: Do not save an LR known to be empty
  arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
  arm64: KVM: vgic-v3: Avoid accessing ICH registers
  KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
  KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
  KVM: arm/arm64: vgic-v2: Reset LRs at boot time
  KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
  KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
  KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
  KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
  KVM: s390: allocate only one DMA page per VM
  KVM: s390: enable STFLE interpretation only if enabled for the guest
  KVM: s390: wake up when the VCPU cpu timer expires
  KVM: s390: step the VCPU timer while in enabled wait
  KVM: s390: protect VCPU cpu timer with a seqcount
  KVM: s390: step VCPU cpu timer during kvm_run ioctl
  ...
2016-03-16 09:55:35 -07:00
Linus Torvalds
d4e796152a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Make schedstats a runtime tunable (disabled by default) and
     optimize it via static keys.

     As most distributions enable CONFIG_SCHEDSTATS=y due to its
     instrumentation value, this is a nice performance enhancement.
     (Mel Gorman)

   - Implement 'simple waitqueues' (swait): these are just pure
     waitqueues without any of the more complex features of full-blown
     waitqueues (callbacks, wake flags, wake keys, etc.).  Simple
     waitqueues have less memory overhead and are faster.

     Use simple waitqueues in the RCU code (in 4 different places) and
     for handling KVM vCPU wakeups.

     (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
     Marcelo Tosatti)

   - sched/numa enhancements (Rik van Riel)

   - NOHZ performance enhancements (Rik van Riel)

   - Various sched/deadline enhancements (Steven Rostedt)

   - Various fixes (Peter Zijlstra)

   - ... and a number of other fixes, cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  sched/cputime: Fix steal_account_process_tick() to always return jiffies
  sched/deadline: Remove dl_new from struct sched_dl_entity
  Revert "kbuild: Add option to turn incompatible pointer check into error"
  sched/deadline: Remove superfluous call to switched_to_dl()
  sched/debug: Fix preempt_disable_ip recording for preempt_disable()
  sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
  time, acct: Drop irq save & restore from __acct_update_integrals()
  acct, time: Change indentation in __acct_update_integrals()
  sched, time: Remove non-power-of-two divides from __acct_update_integrals()
  sched/rt: Kick RT bandwidth timer immediately on start up
  sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
  sched/debug: Move sched_domain_sysctl to debug.c
  sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
  sched/rt: Fix PI handling vs. sched_setscheduler()
  sched/core: Remove duplicated sched_group_set_shares() prototype
  sched/fair: Consolidate nohz CPU load update code
  sched/fair: Avoid using decay_load_missed() with a negative value
  sched/deadline: Always calculate end of period on sched_yield()
  sched/cgroup: Fix cgroup entity load tracking tear-down
  rcu: Use simple wait queues where possible in rcutree
  ...
2016-03-14 19:14:06 -07:00
Marc Zyngier
9b4a300443 KVM: arm/arm64: timer: Add active state caching
Programming the active state in the (re)distributor can be an
expensive operation so it makes some sense to try and reduce
the number of accesses as much as possible. So far, we
program the active state on each VM entry, but there is some
opportunity to do less.

An obvious solution is to cache the active state in memory,
and only program it in the HW when conditions change. But
because the HW can also change things under our feet (the active
state can transition from 1 to 0 when the guest does an EOI),
some precautions have to be taken, which amount to only caching
an "inactive" state, and always programing it otherwise.

With this in place, we observe a reduction of around 700 cycles
on a 2GHz GICv2 platform for a NULL hypercall.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:22 +00:00
Marc Zyngier
d06a5440a0 ARM: KVM: Switch the CP reg search to be a binary search
Doing a linear search is a bit silly when we can do a binary search.
Not that we trap that so many things that it has become a burden yet,
but it makes sense to align it with the arm64 code.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:22 +00:00
Marc Zyngier
f1d67d4ac7 ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit
As we're going to play some tricks on the struct coproc_reg,
make sure its 64bit indicator field matches that of coproc_params.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:22 +00:00