Switching from the guest to host is another place
where the speculative accesses can be exploited.
Flush the branch predictor when entering KVM.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the 'regs' field was added to struct kvm_vcpu_arch, the code
was changed to use several of the fields inside regs (e.g., gpr, lr,
etc.) but not the ccr field, because the ccr field in struct pt_regs
is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is
only 32 bits. This changes the code to use the regs.ccr field
instead of cr, and changes the assembly code on 64-bit platforms to
use 64-bit loads and stores instead of 32-bit ones.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[agraf: Fix bookehv]
Signed-off-by: Alexander Graf <agraf@suse.de>
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host
infrastructure so follow the same approach for AltiVec.
Book3e specification defines shared interrupt numbers for SPE and AltiVec
units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
e6500 core. So we can currently decide at compile-time which of the SPE or
AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and
CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec
exception handlers distinct to improve code readability.
Guests have the privilege to enable AltiVec, so we always need to support
AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore
the unit context. KVM will be loaded on cores with AltiVec unit only if
CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The 440 target hasn't been properly functioning for a few releases and
before I was the only one who fixes a very serious bug that indicates to
me that nobody used it before either.
Furthermore KVM on 440 is slow to the extent of unusable.
We don't have to carry along completely unused code. Remove 440 and give
us one less thing to worry about.
Signed-off-by: Alexander Graf <agraf@suse.de>
SPRN_SPRG is used by debug interrupt handler, so this is required for
debug support.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
and LRAT), generated by loading a guest address, needs to be handled by KVM.
These exceptions are generated in a substituted guest translation context
(EPLC[EGS] = 1) from host context (MSR[GS] = 0).
Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
doing minimal checks on the fast path to avoid host performance degradation.
lwepx exceptions originate from host state (MSR[GS] = 0) which implies
additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
too intrusive for the host.
Read guest last instruction from kvmppc_load_last_inst() by searching for the
physical address and kmap it. This address the TODO for TLB eviction and
execute-but-not-read entries, and allow us to get rid of lwepx until we are
able to handle failures.
A simple stress benchmark shows a 1% sys performance degradation compared with
previous approach (lwepx without failure handling):
time for i in `seq 1 10000`; do /bin/echo > /dev/null; done
real 0m 8.85s
user 0m 4.34s
sys 0m 4.48s
vs
real 0m 8.84s
user 0m 4.36s
sys 0m 4.44s
A solution to use lwepx and to handle its exceptions in KVM would be to temporary
highjack the interrupt vector from host. This imposes additional synchronizations
for cores like FSL e6500 that shares host IVOR registers between hardware threads.
This optimized solution can be later developed on top of this patch.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The commit 1d628af7 "add load inst fixup" made an attempt to handle
failures generated by reading the guest current instruction. The fixup
code that was added works by chance hiding the real issue.
Load external pid (lwepx) instruction, used by KVM to read guest
instructions, is executed in a subsituted guest translation context
(EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
interrupts need to be handled by KVM, even though these interrupts
are generated from host context (MSR[GS] = 0) where lwepx is executed.
Currently, KVM hooks only interrupts generated from guest context
(MSR[GS] = 1), doing minimal checks on the fast path to avoid host
performance degradation. As a result, the host kernel handles lwepx
faults searching the faulting guest data address (loaded in DEAR) in
its own Logical Partition ID (LPID) 0 context. In case a host translation
is found the execution returns to the lwepx instruction instead of the
fixup, the host ending up in an infinite loop.
Revert the commit "add load inst fixup". lwepx issue will be addressed
in a subsequent patch without needing fixup code.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
While bolted handlers (including e6500) do not need to deal with a TLB
miss recursively causing another TLB miss, nested TLB misses can still
happen with crit/mc/debug exceptions -- so we still need to honor
SPRG_TLB_EXFRAME.
We don't need to spend time modifying it in the TLB miss fastpath,
though -- the special level exception will handle that.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: kvm-ppc@vger.kernel.org
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).
In commit 8b64a9dfb0 ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:
- It's incompatible with EXCEPTION_COMMON which restores r13 from the
by-then-overwritten scratch (this cost me some debugging time).
- It forces critical exceptions to be a special case handled
differently from even machine check and debug level exceptions.
- It didn't occur to me that it was possible to make this work at all
(by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
I made (most of) this patch. :-)
It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles. Until then, let's stick with
SPRG for all exceptions.
Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org
two s390 guest features that need some handling in the host,
and all the PPC changes. The PPC changes include support for
little-endian guests and enablement for new POWER8 features.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJS6UF5AAoJEBvWZb6bTYby55kP/AgTJnyu7avN653/2aSHkjkx
KgYSMYhZPIFoY5LyZuNetXaoXFRvCykux1VYSZ6V6s35h2PZ+hdJNbHGjFYKPGTq
FQ92xQVNuWCAPxmFCjDNuDV/0BauG5y08/Orh/jpjz+GAfH43LruUQGbtXUuyJ8u
vf+yTHniU5gguqsAmodqjHUgbf+GoPJ1j7hmRoWwt8IWm7Ns3v/IK4l0p6G0h26a
RjE6aK+Tm208Yr5hD/dRAqeTbBNt3c4xub+QPsKoiEMaZBSuAOiux7D3Kx+If1gp
WsmqEQxoymihVtkZhUFO9ONLJepvmG2QwJVVyMSUW9iqxX9rraXsvVyVMwcQAhog
JuOAYxBftH07xu6Fs4eym5KvCFghM+EaJvxxt+kgnvdD4htK1+eK5trntc2zygSi
/qGiIrkqjXpkskW8kujLayF0eAU3CrZvFWveEPBfFgYiOGX/2wzJCtSm/bt9Jo0M
v60qgNFK3LNqAyeEfnm9VtlwGr6ZgsAB6DHNPX4fM5s2IBjL+qloXk/e/+aVKkW0
I3yeRdy/ExhLAab6w81JtMeR7G3YS0UNuAEVvcoxzNb5wIBY8qnpfUzTKyMxQR94
64EVpxWEYO1s55eCCyMujWrSvc+YAwhJcWHGKgC4K7mxxLD3FVyQXX6YZvgRozMX
HjQju+DToj9CskyrFlRL
=yd0Z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"Second batch of KVM updates. Some minor x86 fixes, two s390 guest
features that need some handling in the host, and all the PPC changes.
The PPC changes include support for little-endian guests and
enablement for new POWER8 features"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
x86, kvm: cache the base of the KVM cpuid leaves
kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef
KVM: PPC: Book3S PR: Cope with doorbell interrupts
KVM: PPC: Book3S HV: Add software abort codes for transactional memory
KVM: PPC: Book3S HV: Add new state for transactional memory
powerpc/Kconfig: Make TM select VSX and VMX
KVM: PPC: Book3S HV: Basic little-endian guest support
KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
KVM: PPC: Book3S HV: Add handler for HV facility unavailable
KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
KVM: PPC: Book3S HV: Don't set DABR on POWER8
kvm/ppc: IRQ disabling cleanup
...
Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.
This should be functionally equivalent to the code before, but
cleaner and faster.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[agraf: fix comment, commit message]
Signed-off-by: Alexander Graf <agraf@suse.de>
LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware
translation from a logical page number (LPN) to a real page number (RPN) when
tlbwe is executed by a guest or when a page table translation occurs from a
guest virtual address.
Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM
handler to avoid build breakage. This is a prerequisite for KVM LRAT support
that will follow.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit
implementations using a common stack layout and a common execution flow starting
from kvm_handler_common macro. Update documentation for 64-bit input register
values. This patch only address the bolted TLB miss exception handlers version.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
GET_VCPU define will not be implemented for 64-bit for performance reasons
so get rid of it also on 32-bit.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
After merging the register type check patches from Ben's tree, the
hv enabled booke implementation ceased to compile.
This patch fixes things up so everyone's happy again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
2AsN5bS0BWq+aqPpZHa5
=pECD
-----END PGP SIGNATURE-----
Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Avi Kivity:
"Highlights include
- full big real mode emulation on pre-Westmere Intel hosts (can be
disabled with emulate_invalid_guest_state=0)
- relatively small ppc and s390 updates
- PCID/INVPCID support in guests
- EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
interrupt intensive workloads)
- Lockless write faults during live migration
- EPT accessed/dirty bits support for new Intel processors"
Fix up conflicts in:
- Documentation/virtual/kvm/api.txt:
Stupid subchapter numbering, added next to each other.
- arch/powerpc/kvm/booke_interrupts.S:
PPC asm changes clashing with the KVM fixes
- arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:
Duplicated commits through the kvm tree and the s390 tree, with
subsequent edits in the KVM tree.
* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
KVM: fix race with level interrupts
x86, hyper: fix build with !CONFIG_KVM_GUEST
Revert "apic: fix kvm build on UP without IOAPIC"
KVM guest: switch to apic_set_eoi_write, apic_write
apic: add apic_set_eoi_write for PV use
KVM: VMX: Implement PCID/INVPCID for guests with EPT
KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
KVM: PPC: Critical interrupt emulation support
KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
KVM: PPC: bookehv64: Add support for std/ld emulation.
booke: Added crit/mc exception handler for e500v2
booke/bookehv: Add host crit-watchdog exception support
KVM: MMU: document mmu-lock and fast page fault
KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
KVM: MMU: trace fast page fault
KVM: MMU: fast path of handling guest page fault
KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
KVM: MMU: fold tlb flush judgement into mmu_spte_update
...
ESR register is required by Data Storage Interrupt handling code.
Add the specific flag to the interrupt handler.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Merge the defines of VCPU_GPR from different places.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.
Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
std r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
No instruction which can change Condition Register (CR) should be executed after
Guest CR is loaded. So the guest CR is restored after the Exit Timing in
lightweight_exit executes cmpw, which can clobber CR.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
We didn't respect that for any architecture until Paul spotted it in his
patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There was some unused code in the exit code path that must have been
a leftover from earlier iterations. While it did no harm, it's superfluous
and thus should be removed.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to make sure that no MAS updates happen automatically while we
have the guest MAS registers loaded. So move the disabling code a bit
higher up so that it covers the full time we have guest values in MAS
registers.
The race this patch fixes should never occur, but it makes the code a
bit more logical to do it this way around.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The SET_VCPU macro is a leftover from times when the vcpu struct wasn't
stored in the thread on vcpu_load/put. It's not needed anymore. Remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead if doing
#ifndef CONFIG_64BIT
...
#else
...
#endif
we should rather do
#ifdef CONFIG_64BIT
...
#else
...
#endif
which is a lot easier to read. Change the bookehv implementation to
stick with this rule.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When using exit timing stats, we clobber r9 in the NEED_EMU case,
so better move that part down a few lines and fix it that way.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There's always a chance we're unable to read a guest instruction. The guest
could have its TLB mapped execute-, but not readable, something odd happens
and our TLB gets flushed. So it's a good idea to be prepared for that case
and have a fallback that allows us to fix things up in that case.
Add fixup code that keeps guest code from potentially crashing our host kernel.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
If we hit any exception whatsoever in the restore path and r1/r2 aren't the
host registers, we don't get a working oops. So it's always a good idea to
restore them as early as possible.
This time, it actually has practical reasons to do so too, since we need to
have the host page fault handler fix up our guest instruction read code. And
for that to work we need r1/r2 restored.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Chips such as e500mc that implement category E.HV in Power ISA 2.06
provide hardware virtualization features, including a new MSR mode for
guest state. The guest OS can perform many operations without trapping
into the hypervisor, including transitions to and from guest userspace.
Since we can use SRR1[GS] to reliably tell whether an exception came from
guest state, instead of messing around with IVPR, we use DO_KVM similarly
to book3s.
Current issues include:
- Machine checks from guest state are not routed to the host handler.
- The guest can cause a host oops by executing an emulated instruction
in a page that lacks read permission. Existing e500/4xx support has
the same problem.
Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>,
Varun Sethi <Varun.Sethi@freescale.com>, and
Liu Yu <yu.liu@freescale.com>.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: remove pt_regs usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>