Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm

* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits)
  KVM: kill file->f_count abuse in kvm
  KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
  KVM: SVM: remove selective CR0 comment
  KVM: SVM: remove now obsolete FIXME comment
  KVM: SVM: disable CR8 intercept when tpr is not masking interrupts
  KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled
  KVM: export kvm_lapic_set_tpr() to modules
  KVM: SVM: sync TPR value to V_TPR field in the VMCB
  KVM: ppc: PowerPC 440 KVM implementation
  KVM: Add MAINTAINERS entry for PowerPC KVM
  KVM: ppc: Add DCR access information to struct kvm_run
  ppc: Export tlb_44x_hwater for KVM
  KVM: Rename debugfs_dir to kvm_debugfs_dir
  KVM: x86 emulator: fix lea to really get the effective address
  KVM: x86 emulator: fix smsw and lmsw with a memory operand
  KVM: x86 emulator: initialize src.val and dst.val for register operands
  KVM: SVM: force a new asid when initializing the vmcb
  KVM: fix kvm_vcpu_kick vs __vcpu_run race
  KVM: add ioctls to save/store mpstate
  KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*
  ...
This commit is contained in:
Linus Torvalds 2008-04-27 10:13:52 -07:00
commit 42cadc8600
119 changed files with 23727 additions and 642 deletions

View File

@ -0,0 +1,82 @@
Currently, kvm module in EXPERIMENTAL stage on IA64. This means that
interfaces are not stable enough to use. So, plase had better don't run
critical applications in virtual machine. We will try our best to make it
strong in future versions!
Guide: How to boot up guests on kvm/ia64
This guide is to describe how to enable kvm support for IA-64 systems.
1. Get the kvm source from git.kernel.org.
Userspace source:
git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git
Kernel Source:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/xiantao/kvm-ia64.git
2. Compile the source code.
2.1 Compile userspace code:
(1)cd ./kvm-userspace
(2)./configure
(3)cd kernel
(4)make sync LINUX= $kernel_dir (kernel_dir is the directory of kernel source.)
(5)cd ..
(6)make qemu
(7)cd qemu; make install
2.2 Compile kernel source code:
(1) cd ./$kernel_dir
(2) Make menuconfig
(3) Enter into virtualization option, and choose kvm.
(4) make
(5) Once (4) done, make modules_install
(6) Make initrd, and use new kernel to reboot up host machine.
(7) Once (6) done, cd $kernel_dir/arch/ia64/kvm
(8) insmod kvm.ko; insmod kvm-intel.ko
Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qemu, otherwise, may fail.
3. Get Guest Firmware named as Flash.fd, and put it under right place:
(1) If you have the guest firmware (binary) released by Intel Corp for Xen, use it directly.
(2) If you have no firmware at hand, Please download its source from
hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg
you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries.
(3) Rename the firware you owned to Flash.fd, and copy it to /usr/local/share/qemu
4. Boot up Linux or Windows guests:
4.1 Create or install a image for guest boot. If you have xen experience, it should be easy.
4.2 Boot up guests use the following command.
/usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image
(xx is the number of virtual processors for the guest, now the maximum value is 4)
5. Known possibile issue on some platforms with old Firmware.
If meet strange host crashe issues, try to solve it through either of the following ways:
(1): Upgrade your Firmware to the latest one.
(2): Applying the below patch to kernel source.
diff --git a/arch/ia64/kernel/pal.S b/arch/ia64/kernel/pal.S
index 0b53344..f02b0f7 100644
--- a/arch/ia64/kernel/pal.S
+++ b/arch/ia64/kernel/pal.S
@@ -84,7 +84,8 @@ GLOBAL_ENTRY(ia64_pal_call_static)
mov ar.pfs = loc1
mov rp = loc0
;;
- srlz.d // seralize restoration of psr.l
+ srlz.i // seralize restoration of psr.l
+ ;;
br.ret.sptk.many b0
END(ia64_pal_call_static)
6. Bug report:
If you found any issues when use kvm/ia64, Please post the bug info to kvm-ia64-devel mailing list.
https://lists.sourceforge.net/lists/listinfo/kvm-ia64-devel/
Thanks for your interest! Let's work together, and make kvm/ia64 stronger and stronger!
Xiantao Zhang <xiantao.zhang@intel.com>
2008.3.10

View File

@ -183,6 +183,8 @@ Code Seq# Include File Comments
0xAC 00-1F linux/raw.h
0xAD 00 Netfilter device in development:
<mailto:rusty@rustcorp.com.au>
0xAE all linux/kvm.h Kernel-based Virtual Machine
<mailto:kvm-devel@lists.sourceforge.net>
0xB0 all RATIO devices in development:
<mailto:vgo@ratio.de>
0xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca>

View File

@ -0,0 +1,41 @@
Hollis Blanchard <hollisb@us.ibm.com>
15 Apr 2008
Various notes on the implementation of KVM for PowerPC 440:
To enforce isolation, host userspace, guest kernel, and guest userspace all
run at user privilege level. Only the host kernel runs in supervisor mode.
Executing privileged instructions in the guest traps into KVM (in the host
kernel), where we decode and emulate them. Through this technique, unmodified
440 Linux kernels can be run (slowly) as guests. Future performance work will
focus on reducing the overhead and frequency of these traps.
The usual code flow is started from userspace invoking an "run" ioctl, which
causes KVM to switch into guest context. We use IVPR to hijack the host
interrupt vectors while running the guest, which allows us to direct all
interrupts to kvmppc_handle_interrupt(). At this point, we could either
- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
- let the host interrupt handler run (e.g. when the decrementer fires), or
- return to host userspace (e.g. when the guest performs device MMIO)
Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
address space (in host or guest), which gives us virtual address space to use
for guest mappings. While the guest is running, the host kernel remains mapped
in AS=0, but the guest can only use AS=1 mappings.
TLB entries: The TLB entries covering the host linear mapping remain
present while running the guest. This reduces the overhead of lightweight
exits, which are handled by KVM running in the host kernel. We keep three
copies of the TLB:
- guest TLB: contents of the TLB as the guest sees it
- shadow TLB: the TLB that is actually in hardware while guest is running
- host TLB: to restore TLB state when context switching guest -> host
When a TLB miss occurs because a mapping was not present in the shadow TLB,
but was present in the guest TLB, KVM handles the fault without invoking the
guest. Large guest pages are backed by multiple 4KB shadow pages through this
mechanism.
IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
and block IO, so those drivers must be enabled in the guest. It's possible
that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
little effort.

125
Documentation/s390/kvm.txt Normal file
View File

@ -0,0 +1,125 @@
*** BIG FAT WARNING ***
The kvm module is currently in EXPERIMENTAL state for s390. This means that
the interface to the module is not yet considered to remain stable. Thus, be
prepared that we keep breaking your userspace application and guest
compatibility over and over again until we feel happy with the result. Make sure
your guest kernel, your host kernel, and your userspace launcher are in a
consistent state.
This Documentation describes the unique ioctl calls to /dev/kvm, the resulting
kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86.
1. ioctl calls to /dev/kvm
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_GET_API_VERSION
KVM_CREATE_VM (*) see note
KVM_CHECK_EXTENSION
KVM_GET_VCPU_MMAP_SIZE
Notes:
* KVM_CREATE_VM may fail on s390, if the calling process has multiple
threads and has not called KVM_S390_ENABLE_SIE before.
In addition, on s390 the following architecture specific ioctls are supported:
ioctl: KVM_S390_ENABLE_SIE
args: none
see also: include/linux/kvm.h
This call causes the kernel to switch on PGSTE in the user page table. This
operation is needed in order to run a virtual machine, and it requires the
calling process to be single-threaded. Note that the first call to KVM_CREATE_VM
will implicitly try to switch on PGSTE if the user process has not called
KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads
before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
operation, is not reversible, and will persist over the entire lifetime of
the calling process. It does not have any user-visible effect other than a small
performance penalty.
2. ioctl calls to the kvm-vm file descriptor
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_CREATE_VCPU
KVM_SET_USER_MEMORY_REGION (*) see note
KVM_GET_DIRTY_LOG (**) see note
Notes:
* kvm does only allow exactly one memory slot on s390, which has to start
at guest absolute address zero and at a user address that is aligned on any
page boundary. This hardware "limitation" allows us to have a few unique
optimizations. The memory slot doesn't have to be filled
with memory actually, it may contain sparse holes. That said, with different
user memory layout this does still allow a large flexibility when
doing the guest memory setup.
** KVM_GET_DIRTY_LOG doesn't work properly yet. The user will receive an empty
log. This ioctl call is only needed for guest migration, and we intend to
implement this one in the future.
In addition, on s390 the following architecture specific ioctls for the kvm-vm
file descriptor are supported:
ioctl: KVM_S390_INTERRUPT
args: struct kvm_s390_interrupt *
see also: include/linux/kvm.h
This ioctl is used to submit a floating interrupt for a virtual machine.
Floating interrupts may be delivered to any virtual cpu in the configuration.
Only some interrupt types defined in include/linux/kvm.h make sense when
submitted as floating interrupts. The following interrupts are not considered
to be useful as floating interrupts, and a call to inject them will result in
-EINVAL error code: program interrupts and interprocessor signals. Valid
floating interrupts are:
KVM_S390_INT_VIRTIO
KVM_S390_INT_SERVICE
3. ioctl calls to the kvm-vcpu file descriptor
KVM does support the following ioctls on s390 that are common with other
architectures and do behave the same:
KVM_RUN
KVM_GET_REGS
KVM_SET_REGS
KVM_GET_SREGS
KVM_SET_SREGS
KVM_GET_FPU
KVM_SET_FPU
In addition, on s390 the following architecture specific ioctls for the
kvm-vcpu file descriptor are supported:
ioctl: KVM_S390_INTERRUPT
args: struct kvm_s390_interrupt *
see also: include/linux/kvm.h
This ioctl is used to submit an interrupt for a specific virtual cpu.
Only some interrupt types defined in include/linux/kvm.h make sense when
submitted for a specific cpu. The following interrupts are not considered
to be useful, and a call to inject them will result in -EINVAL error code:
service processor calls and virtio interrupts. Valid interrupt types are:
KVM_S390_PROGRAM_INT
KVM_S390_SIGP_STOP
KVM_S390_RESTART
KVM_S390_SIGP_SET_PREFIX
KVM_S390_INT_EMERGENCY
ioctl: KVM_S390_STORE_STATUS
args: unsigned long
see also: include/linux/kvm.h
This ioctl stores the state of the cpu at the guest real address given as
argument, unless one of the following values defined in include/linux/kvm.h
is given as arguement:
KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
absolute lowcore as defined by the principles of operation
KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
its prefix page just like the dump tool that comes with zipl. This is useful
to create a system dump for use with lkcdutils or crash.
ioctl: KVM_S390_SET_INITIAL_PSW
args: struct kvm_s390_psw *
see also: include/linux/kvm.h
This ioctl can be used to set the processor status word (psw) of a stopped cpu
prior to running it with KVM_RUN. Note that this call is not required to modify
the psw during sie intercepts that fall back to userspace because struct kvm_run
does contain the psw, and this value is evaluated during reentry of KVM_RUN
after the intercept exit was recognized.
ioctl: KVM_S390_INITIAL_RESET
args: none
see also: include/linux/kvm.h
This ioctl can be used to perform an initial cpu reset as defined by the
principles of operation. The target cpu has to be in stopped state.

View File

@ -2329,6 +2329,13 @@ L: kvm-devel@lists.sourceforge.net
W: kvm.sourceforge.net
S: Supported
KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
P: Hollis Blanchard
M: hollisb@us.ibm.com
L: kvm-ppc-devel@lists.sourceforge.net
W: kvm.sourceforge.net
S: Supported
KERNEL VIRTUAL MACHINE For Itanium(KVM/IA64)
P: Anthony Xu
M: anthony.xu@intel.com
@ -2338,6 +2345,16 @@ L: kvm-ia64-devel@lists.sourceforge.net
W: kvm.sourceforge.net
S: Supported
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
P: Carsten Otte
M: cotte@de.ibm.com
P: Christian Borntraeger
M: borntraeger@de.ibm.com
M: linux390@de.ibm.com
L: linux-s390@vger.kernel.org
W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported
KEXEC
P: Eric Biederman
M: ebiederm@xmission.com

View File

@ -19,6 +19,7 @@ config IA64
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_KVM
default y
help
The Itanium Processor Family is Intel's 64-bit successor to
@ -589,6 +590,8 @@ config MSPEC
source "fs/Kconfig"
source "arch/ia64/kvm/Kconfig"
source "lib/Kconfig"
#

View File

@ -57,6 +57,7 @@ core-$(CONFIG_IA64_GENERIC) += arch/ia64/dig/
core-$(CONFIG_IA64_HP_ZX1) += arch/ia64/dig/
core-$(CONFIG_IA64_HP_ZX1_SWIOTLB) += arch/ia64/dig/
core-$(CONFIG_IA64_SGI_SN2) += arch/ia64/sn/
core-$(CONFIG_KVM) += arch/ia64/kvm/
drivers-$(CONFIG_PCI) += arch/ia64/pci/
drivers-$(CONFIG_IA64_HP_SIM) += arch/ia64/hp/sim/

49
arch/ia64/kvm/Kconfig Normal file
View File

@ -0,0 +1,49 @@
#
# KVM configuration
#
config HAVE_KVM
bool
menuconfig VIRTUALIZATION
bool "Virtualization"
depends on HAVE_KVM || IA64
default y
---help---
Say Y here to get to see options for using your Linux host to run other
operating systems inside virtual machines (guests).
This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and disabled.
if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
processor equipped with virtualization extensions. You will also
need to select one or more of the processor modules below.
This module provides access to the hardware capabilities through
a character device node named /dev/kvm.
To compile this as a module, choose M here: the module
will be called kvm.
If unsure, say N.
config KVM_INTEL
tristate "KVM for Intel Itanium 2 processors support"
depends on KVM && m
---help---
Provides support for KVM on Itanium 2 processors equipped with the VT
extensions.
config KVM_TRACE
bool
endif # VIRTUALIZATION

61
arch/ia64/kvm/Makefile Normal file
View File

@ -0,0 +1,61 @@
#This Make file is to generate asm-offsets.h and build source.
#
#Generate asm-offsets.h for vmm module build
offsets-file := asm-offsets.h
always := $(offsets-file)
targets := $(offsets-file)
targets += arch/ia64/kvm/asm-offsets.s
clean-files := $(addprefix $(objtree)/,$(targets) $(obj)/memcpy.S $(obj)/memset.S)
# Default sed regexp - multiline due to syntax constraints
define sed-y
"/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}"
endef
quiet_cmd_offsets = GEN $@
define cmd_offsets
(set -e; \
echo "#ifndef __ASM_KVM_OFFSETS_H__"; \
echo "#define __ASM_KVM_OFFSETS_H__"; \
echo "/*"; \
echo " * DO NOT MODIFY."; \
echo " *"; \
echo " * This file was generated by Makefile"; \
echo " *"; \
echo " */"; \
echo ""; \
sed -ne $(sed-y) $<; \
echo ""; \
echo "#endif" ) > $@
endef
# We use internal rules to avoid the "is up to date" message from make
arch/ia64/kvm/asm-offsets.s: arch/ia64/kvm/asm-offsets.c
$(call if_changed_dep,cc_s_c)
$(obj)/$(offsets-file): arch/ia64/kvm/asm-offsets.s
$(call cmd,offsets)
#
# Makefile for Kernel-based Virtual Machine module
#
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/ia64/kvm/
$(addprefix $(objtree)/,$(obj)/memcpy.S $(obj)/memset.S):
$(shell ln -snf ../lib/memcpy.S $(src)/memcpy.S)
$(shell ln -snf ../lib/memset.S $(src)/memset.S)
common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o)
kvm-objs := $(common-objs) kvm-ia64.o kvm_fw.o
obj-$(CONFIG_KVM) += kvm.o
FORCE : $(obj)/$(offsets-file)
EXTRA_CFLAGS_vcpu.o += -mfixed-range=f2-f5,f12-f127
kvm-intel-objs = vmm.o vmm_ivt.o trampoline.o vcpu.o optvfault.o mmio.o \
vtlb.o process.o
#Add link memcpy and memset to avoid possible structure assignment error
kvm-intel-objs += memset.o memcpy.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o

251
arch/ia64/kvm/asm-offsets.c Normal file
View File

@ -0,0 +1,251 @@
/*
* asm-offsets.c Generate definitions needed by assembly language modules.
* This code generates raw asm output which is post-processed
* to extract and format the required data.
*
* Anthony Xu <anthony.xu@intel.com>
* Xiantao Zhang <xiantao.zhang@intel.com>
* Copyright (c) 2007 Intel Corporation KVM support.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include <linux/autoconf.h>
#include <linux/kvm_host.h>
#include "vcpu.h"
#define task_struct kvm_vcpu
#define DEFINE(sym, val) \
asm volatile("\n->" #sym " (%0) " #val : : "i" (val))
#define BLANK() asm volatile("\n->" : :)
#define OFFSET(_sym, _str, _mem) \
DEFINE(_sym, offsetof(_str, _mem));
void foo(void)
{
DEFINE(VMM_TASK_SIZE, sizeof(struct kvm_vcpu));
DEFINE(VMM_PT_REGS_SIZE, sizeof(struct kvm_pt_regs));
BLANK();
DEFINE(VMM_VCPU_META_RR0_OFFSET,
offsetof(struct kvm_vcpu, arch.metaphysical_rr0));
DEFINE(VMM_VCPU_META_SAVED_RR0_OFFSET,
offsetof(struct kvm_vcpu,
arch.metaphysical_saved_rr0));
DEFINE(VMM_VCPU_VRR0_OFFSET,
offsetof(struct kvm_vcpu, arch.vrr[0]));
DEFINE(VMM_VPD_IRR0_OFFSET,
offsetof(struct vpd, irr[0]));
DEFINE(VMM_VCPU_ITC_CHECK_OFFSET,
offsetof(struct kvm_vcpu, arch.itc_check));
DEFINE(VMM_VCPU_IRQ_CHECK_OFFSET,
offsetof(struct kvm_vcpu, arch.irq_check));
DEFINE(VMM_VPD_VHPI_OFFSET,
offsetof(struct vpd, vhpi));
DEFINE(VMM_VCPU_VSA_BASE_OFFSET,
offsetof(struct kvm_vcpu, arch.vsa_base));
DEFINE(VMM_VCPU_VPD_OFFSET,
offsetof(struct kvm_vcpu, arch.vpd));
DEFINE(VMM_VCPU_IRQ_CHECK,
offsetof(struct kvm_vcpu, arch.irq_check));
DEFINE(VMM_VCPU_TIMER_PENDING,
offsetof(struct kvm_vcpu, arch.timer_pending));
DEFINE(VMM_VCPU_META_SAVED_RR0_OFFSET,
offsetof(struct kvm_vcpu, arch.metaphysical_saved_rr0));
DEFINE(VMM_VCPU_MODE_FLAGS_OFFSET,
offsetof(struct kvm_vcpu, arch.mode_flags));
DEFINE(VMM_VCPU_ITC_OFS_OFFSET,
offsetof(struct kvm_vcpu, arch.itc_offset));
DEFINE(VMM_VCPU_LAST_ITC_OFFSET,
offsetof(struct kvm_vcpu, arch.last_itc));
DEFINE(VMM_VCPU_SAVED_GP_OFFSET,
offsetof(struct kvm_vcpu, arch.saved_gp));
BLANK();
DEFINE(VMM_PT_REGS_B6_OFFSET,
offsetof(struct kvm_pt_regs, b6));
DEFINE(VMM_PT_REGS_B7_OFFSET,
offsetof(struct kvm_pt_regs, b7));
DEFINE(VMM_PT_REGS_AR_CSD_OFFSET,
offsetof(struct kvm_pt_regs, ar_csd));
DEFINE(VMM_PT_REGS_AR_SSD_OFFSET,
offsetof(struct kvm_pt_regs, ar_ssd));
DEFINE(VMM_PT_REGS_R8_OFFSET,
offsetof(struct kvm_pt_regs, r8));
DEFINE(VMM_PT_REGS_R9_OFFSET,
offsetof(struct kvm_pt_regs, r9));
DEFINE(VMM_PT_REGS_R10_OFFSET,
offsetof(struct kvm_pt_regs, r10));
DEFINE(VMM_PT_REGS_R11_OFFSET,
offsetof(struct kvm_pt_regs, r11));
DEFINE(VMM_PT_REGS_CR_IPSR_OFFSET,
offsetof(struct kvm_pt_regs, cr_ipsr));
DEFINE(VMM_PT_REGS_CR_IIP_OFFSET,
offsetof(struct kvm_pt_regs, cr_iip));
DEFINE(VMM_PT_REGS_CR_IFS_OFFSET,
offsetof(struct kvm_pt_regs, cr_ifs));
DEFINE(VMM_PT_REGS_AR_UNAT_OFFSET,
offsetof(struct kvm_pt_regs, ar_unat));
DEFINE(VMM_PT_REGS_AR_PFS_OFFSET,
offsetof(struct kvm_pt_regs, ar_pfs));
DEFINE(VMM_PT_REGS_AR_RSC_OFFSET,
offsetof(struct kvm_pt_regs, ar_rsc));
DEFINE(VMM_PT_REGS_AR_RNAT_OFFSET,
offsetof(struct kvm_pt_regs, ar_rnat));
DEFINE(VMM_PT_REGS_AR_BSPSTORE_OFFSET,
offsetof(struct kvm_pt_regs, ar_bspstore));
DEFINE(VMM_PT_REGS_PR_OFFSET,
offsetof(struct kvm_pt_regs, pr));
DEFINE(VMM_PT_REGS_B0_OFFSET,
offsetof(struct kvm_pt_regs, b0));
DEFINE(VMM_PT_REGS_LOADRS_OFFSET,
offsetof(struct kvm_pt_regs, loadrs));
DEFINE(VMM_PT_REGS_R1_OFFSET,
offsetof(struct kvm_pt_regs, r1));
DEFINE(VMM_PT_REGS_R12_OFFSET,
offsetof(struct kvm_pt_regs, r12));
DEFINE(VMM_PT_REGS_R13_OFFSET,
offsetof(struct kvm_pt_regs, r13));
DEFINE(VMM_PT_REGS_AR_FPSR_OFFSET,
offsetof(struct kvm_pt_regs, ar_fpsr));
DEFINE(VMM_PT_REGS_R15_OFFSET,
offsetof(struct kvm_pt_regs, r15));
DEFINE(VMM_PT_REGS_R14_OFFSET,
offsetof(struct kvm_pt_regs, r14));
DEFINE(VMM_PT_REGS_R2_OFFSET,
offsetof(struct kvm_pt_regs, r2));
DEFINE(VMM_PT_REGS_R3_OFFSET,
offsetof(struct kvm_pt_regs, r3));
DEFINE(VMM_PT_REGS_R16_OFFSET,
offsetof(struct kvm_pt_regs, r16));
DEFINE(VMM_PT_REGS_R17_OFFSET,
offsetof(struct kvm_pt_regs, r17));
DEFINE(VMM_PT_REGS_R18_OFFSET,
offsetof(struct kvm_pt_regs, r18));
DEFINE(VMM_PT_REGS_R19_OFFSET,
offsetof(struct kvm_pt_regs, r19));
DEFINE(VMM_PT_REGS_R20_OFFSET,
offsetof(struct kvm_pt_regs, r20));
DEFINE(VMM_PT_REGS_R21_OFFSET,
offsetof(struct kvm_pt_regs, r21));
DEFINE(VMM_PT_REGS_R22_OFFSET,
offsetof(struct kvm_pt_regs, r22));
DEFINE(VMM_PT_REGS_R23_OFFSET,
offsetof(struct kvm_pt_regs, r23));
DEFINE(VMM_PT_REGS_R24_OFFSET,
offsetof(struct kvm_pt_regs, r24));
DEFINE(VMM_PT_REGS_R25_OFFSET,
offsetof(struct kvm_pt_regs, r25));
DEFINE(VMM_PT_REGS_R26_OFFSET,
offsetof(struct kvm_pt_regs, r26));
DEFINE(VMM_PT_REGS_R27_OFFSET,
offsetof(struct kvm_pt_regs, r27));
DEFINE(VMM_PT_REGS_R28_OFFSET,
offsetof(struct kvm_pt_regs, r28));
DEFINE(VMM_PT_REGS_R29_OFFSET,
offsetof(struct kvm_pt_regs, r29));
DEFINE(VMM_PT_REGS_R30_OFFSET,
offsetof(struct kvm_pt_regs, r30));
DEFINE(VMM_PT_REGS_R31_OFFSET,
offsetof(struct kvm_pt_regs, r31));
DEFINE(VMM_PT_REGS_AR_CCV_OFFSET,
offsetof(struct kvm_pt_regs, ar_ccv));
DEFINE(VMM_PT_REGS_F6_OFFSET,
offsetof(struct kvm_pt_regs, f6));
DEFINE(VMM_PT_REGS_F7_OFFSET,
offsetof(struct kvm_pt_regs, f7));
DEFINE(VMM_PT_REGS_F8_OFFSET,
offsetof(struct kvm_pt_regs, f8));
DEFINE(VMM_PT_REGS_F9_OFFSET,
offsetof(struct kvm_pt_regs, f9));
DEFINE(VMM_PT_REGS_F10_OFFSET,
offsetof(struct kvm_pt_regs, f10));
DEFINE(VMM_PT_REGS_F11_OFFSET,
offsetof(struct kvm_pt_regs, f11));
DEFINE(VMM_PT_REGS_R4_OFFSET,
offsetof(struct kvm_pt_regs, r4));
DEFINE(VMM_PT_REGS_R5_OFFSET,
offsetof(struct kvm_pt_regs, r5));
DEFINE(VMM_PT_REGS_R6_OFFSET,
offsetof(struct kvm_pt_regs, r6));
DEFINE(VMM_PT_REGS_R7_OFFSET,
offsetof(struct kvm_pt_regs, r7));
DEFINE(VMM_PT_REGS_EML_UNAT_OFFSET,
offsetof(struct kvm_pt_regs, eml_unat));
DEFINE(VMM_VCPU_IIPA_OFFSET,
offsetof(struct kvm_vcpu, arch.cr_iipa));
DEFINE(VMM_VCPU_OPCODE_OFFSET,
offsetof(struct kvm_vcpu, arch.opcode));
DEFINE(VMM_VCPU_CAUSE_OFFSET, offsetof(struct kvm_vcpu, arch.cause));
DEFINE(VMM_VCPU_ISR_OFFSET,
offsetof(struct kvm_vcpu, arch.cr_isr));
DEFINE(VMM_PT_REGS_R16_SLOT,
(((offsetof(struct kvm_pt_regs, r16)
- sizeof(struct kvm_pt_regs)) >> 3) & 0x3f));
DEFINE(VMM_VCPU_MODE_FLAGS_OFFSET,
offsetof(struct kvm_vcpu, arch.mode_flags));
DEFINE(VMM_VCPU_GP_OFFSET, offsetof(struct kvm_vcpu, arch.__gp));
BLANK();
DEFINE(VMM_VPD_BASE_OFFSET, offsetof(struct kvm_vcpu, arch.vpd));
DEFINE(VMM_VPD_VIFS_OFFSET, offsetof(struct vpd, ifs));
DEFINE(VMM_VLSAPIC_INSVC_BASE_OFFSET,
offsetof(struct kvm_vcpu, arch.insvc[0]));
DEFINE(VMM_VPD_VPTA_OFFSET, offsetof(struct vpd, pta));
DEFINE(VMM_VPD_VPSR_OFFSET, offsetof(struct vpd, vpsr));
DEFINE(VMM_CTX_R4_OFFSET, offsetof(union context, gr[4]));
DEFINE(VMM_CTX_R5_OFFSET, offsetof(union context, gr[5]));
DEFINE(VMM_CTX_R12_OFFSET, offsetof(union context, gr[12]));
DEFINE(VMM_CTX_R13_OFFSET, offsetof(union context, gr[13]));
DEFINE(VMM_CTX_KR0_OFFSET, offsetof(union context, ar[0]));
DEFINE(VMM_CTX_KR1_OFFSET, offsetof(union context, ar[1]));
DEFINE(VMM_CTX_B0_OFFSET, offsetof(union context, br[0]));
DEFINE(VMM_CTX_B1_OFFSET, offsetof(union context, br[1]));
DEFINE(VMM_CTX_B2_OFFSET, offsetof(union context, br[2]));
DEFINE(VMM_CTX_RR0_OFFSET, offsetof(union context, rr[0]));
DEFINE(VMM_CTX_RSC_OFFSET, offsetof(union context, ar[16]));
DEFINE(VMM_CTX_BSPSTORE_OFFSET, offsetof(union context, ar[18]));
DEFINE(VMM_CTX_RNAT_OFFSET, offsetof(union context, ar[19]));
DEFINE(VMM_CTX_FCR_OFFSET, offsetof(union context, ar[21]));
DEFINE(VMM_CTX_EFLAG_OFFSET, offsetof(union context, ar[24]));
DEFINE(VMM_CTX_CFLG_OFFSET, offsetof(union context, ar[27]));
DEFINE(VMM_CTX_FSR_OFFSET, offsetof(union context, ar[28]));
DEFINE(VMM_CTX_FIR_OFFSET, offsetof(union context, ar[29]));
DEFINE(VMM_CTX_FDR_OFFSET, offsetof(union context, ar[30]));
DEFINE(VMM_CTX_UNAT_OFFSET, offsetof(union context, ar[36]));
DEFINE(VMM_CTX_FPSR_OFFSET, offsetof(union context, ar[40]));
DEFINE(VMM_CTX_PFS_OFFSET, offsetof(union context, ar[64]));
DEFINE(VMM_CTX_LC_OFFSET, offsetof(union context, ar[65]));
DEFINE(VMM_CTX_DCR_OFFSET, offsetof(union context, cr[0]));
DEFINE(VMM_CTX_IVA_OFFSET, offsetof(union context, cr[2]));
DEFINE(VMM_CTX_PTA_OFFSET, offsetof(union context, cr[8]));
DEFINE(VMM_CTX_IBR0_OFFSET, offsetof(union context, ibr[0]));
DEFINE(VMM_CTX_DBR0_OFFSET, offsetof(union context, dbr[0]));
DEFINE(VMM_CTX_F2_OFFSET, offsetof(union context, fr[2]));
DEFINE(VMM_CTX_F3_OFFSET, offsetof(union context, fr[3]));
DEFINE(VMM_CTX_F32_OFFSET, offsetof(union context, fr[32]));
DEFINE(VMM_CTX_F33_OFFSET, offsetof(union context, fr[33]));
DEFINE(VMM_CTX_PKR0_OFFSET, offsetof(union context, pkr[0]));
DEFINE(VMM_CTX_PSR_OFFSET, offsetof(union context, psr));
BLANK();
}

1806
arch/ia64/kvm/kvm-ia64.c Normal file

File diff suppressed because it is too large Load Diff

500
arch/ia64/kvm/kvm_fw.c Normal file
View File

@ -0,0 +1,500 @@
/*
* PAL/SAL call delegation
*
* Copyright (c) 2004 Li Susie <susie.li@intel.com>
* Copyright (c) 2005 Yu Ke <ke.yu@intel.com>
* Copyright (c) 2007 Xiantao Zhang <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*/
#include <linux/kvm_host.h>
#include <linux/smp.h>
#include "vti.h"
#include "misc.h"
#include <asm/pal.h>
#include <asm/sal.h>
#include <asm/tlb.h>
/*
* Handy macros to make sure that the PAL return values start out
* as something meaningful.
*/
#define INIT_PAL_STATUS_UNIMPLEMENTED(x) \
{ \
x.status = PAL_STATUS_UNIMPLEMENTED; \
x.v0 = 0; \
x.v1 = 0; \
x.v2 = 0; \
}
#define INIT_PAL_STATUS_SUCCESS(x) \
{ \
x.status = PAL_STATUS_SUCCESS; \
x.v0 = 0; \
x.v1 = 0; \
x.v2 = 0; \
}
static void kvm_get_pal_call_data(struct kvm_vcpu *vcpu,
u64 *gr28, u64 *gr29, u64 *gr30, u64 *gr31) {
struct exit_ctl_data *p;
if (vcpu) {
p = &vcpu->arch.exit_data;
if (p->exit_reason == EXIT_REASON_PAL_CALL) {
*gr28 = p->u.pal_data.gr28;
*gr29 = p->u.pal_data.gr29;
*gr30 = p->u.pal_data.gr30;
*gr31 = p->u.pal_data.gr31;
return ;
}
}
printk(KERN_DEBUG"Failed to get vcpu pal data!!!\n");
}
static void set_pal_result(struct kvm_vcpu *vcpu,
struct ia64_pal_retval result) {
struct exit_ctl_data *p;
p = kvm_get_exit_data(vcpu);
if (p && p->exit_reason == EXIT_REASON_PAL_CALL) {
p->u.pal_data.ret = result;
return ;
}
INIT_PAL_STATUS_UNIMPLEMENTED(p->u.pal_data.ret);
}
static void set_sal_result(struct kvm_vcpu *vcpu,
struct sal_ret_values result) {
struct exit_ctl_data *p;
p = kvm_get_exit_data(vcpu);
if (p && p->exit_reason == EXIT_REASON_SAL_CALL) {
p->u.sal_data.ret = result;
return ;
}
printk(KERN_WARNING"Failed to set sal result!!\n");
}
struct cache_flush_args {
u64 cache_type;
u64 operation;
u64 progress;
long status;
};
cpumask_t cpu_cache_coherent_map;
static void remote_pal_cache_flush(void *data)
{
struct cache_flush_args *args = data;
long status;
u64 progress = args->progress;
status = ia64_pal_cache_flush(args->cache_type, args->operation,
&progress, NULL);
if (status != 0)
args->status = status;
}
static struct ia64_pal_retval pal_cache_flush(struct kvm_vcpu *vcpu)
{
u64 gr28, gr29, gr30, gr31;
struct ia64_pal_retval result = {0, 0, 0, 0};
struct cache_flush_args args = {0, 0, 0, 0};
long psr;
gr28 = gr29 = gr30 = gr31 = 0;
kvm_get_pal_call_data(vcpu, &gr28, &gr29, &gr30, &gr31);
if (gr31 != 0)
printk(KERN_ERR"vcpu:%p called cache_flush error!\n", vcpu);
/* Always call Host Pal in int=1 */
gr30 &= ~PAL_CACHE_FLUSH_CHK_INTRS;
args.cache_type = gr29;
args.operation = gr30;
smp_call_function(remote_pal_cache_flush,
(void *)&args, 1, 1);
if (args.status != 0)
printk(KERN_ERR"pal_cache_flush error!,"
"status:0x%lx\n", args.status);
/*
* Call Host PAL cache flush
* Clear psr.ic when call PAL_CACHE_FLUSH
*/
local_irq_save(psr);
result.status = ia64_pal_cache_flush(gr29, gr30, &result.v1,
&result.v0);
local_irq_restore(psr);
if (result.status != 0)
printk(KERN_ERR"vcpu:%p crashed due to cache_flush err:%ld"
"in1:%lx,in2:%lx\n",
vcpu, result.status, gr29, gr30);
#if 0
if (gr29 == PAL_CACHE_TYPE_COHERENT) {
cpus_setall(vcpu->arch.cache_coherent_map);
cpu_clear(vcpu->cpu, vcpu->arch.cache_coherent_map);
cpus_setall(cpu_cache_coherent_map);
cpu_clear(vcpu->cpu, cpu_cache_coherent_map);
}
#endif
return result;
}
struct ia64_pal_retval pal_cache_summary(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
PAL_CALL(result, PAL_CACHE_SUMMARY, 0, 0, 0);
return result;
}
static struct ia64_pal_retval pal_freq_base(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
PAL_CALL(result, PAL_FREQ_BASE, 0, 0, 0);
/*
* PAL_FREQ_BASE may not be implemented in some platforms,
* call SAL instead.
*/
if (result.v0 == 0) {
result.status = ia64_sal_freq_base(SAL_FREQ_BASE_PLATFORM,
&result.v0,
&result.v1);
result.v2 = 0;
}
return result;
}
static struct ia64_pal_retval pal_freq_ratios(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
PAL_CALL(result, PAL_FREQ_RATIOS, 0, 0, 0);
return result;
}
static struct ia64_pal_retval pal_logical_to_physica(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
INIT_PAL_STATUS_UNIMPLEMENTED(result);
return result;
}
static struct ia64_pal_retval pal_platform_addr(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
INIT_PAL_STATUS_SUCCESS(result);
return result;
}
static struct ia64_pal_retval pal_proc_get_features(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result = {0, 0, 0, 0};
long in0, in1, in2, in3;
kvm_get_pal_call_data(vcpu, &in0, &in1, &in2, &in3);
result.status = ia64_pal_proc_get_features(&result.v0, &result.v1,
&result.v2, in2);
return result;
}
static struct ia64_pal_retval pal_cache_info(struct kvm_vcpu *vcpu)
{
pal_cache_config_info_t ci;
long status;
unsigned long in0, in1, in2, in3, r9, r10;
kvm_get_pal_call_data(vcpu, &in0, &in1, &in2, &in3);
status = ia64_pal_cache_config_info(in1, in2, &ci);
r9 = ci.pcci_info_1.pcci1_data;
r10 = ci.pcci_info_2.pcci2_data;
return ((struct ia64_pal_retval){status, r9, r10, 0});
}
#define GUEST_IMPL_VA_MSB 59
#define GUEST_RID_BITS 18
static struct ia64_pal_retval pal_vm_summary(struct kvm_vcpu *vcpu)
{
pal_vm_info_1_u_t vminfo1;
pal_vm_info_2_u_t vminfo2;
struct ia64_pal_retval result;
PAL_CALL(result, PAL_VM_SUMMARY, 0, 0, 0);
if (!result.status) {
vminfo1.pvi1_val = result.v0;
vminfo1.pal_vm_info_1_s.max_itr_entry = 8;
vminfo1.pal_vm_info_1_s.max_dtr_entry = 8;
result.v0 = vminfo1.pvi1_val;
vminfo2.pal_vm_info_2_s.impl_va_msb = GUEST_IMPL_VA_MSB;
vminfo2.pal_vm_info_2_s.rid_size = GUEST_RID_BITS;
result.v1 = vminfo2.pvi2_val;
}
return result;
}
static struct ia64_pal_retval pal_vm_info(struct kvm_vcpu *vcpu)
{
struct ia64_pal_retval result;
INIT_PAL_STATUS_UNIMPLEMENTED(result);
return result;
}
static u64 kvm_get_pal_call_index(struct kvm_vcpu *vcpu)
{
u64 index = 0;
struct exit_ctl_data *p;
p = kvm_get_exit_data(vcpu);
if (p && (p->exit_reason == EXIT_REASON_PAL_CALL))
index = p->u.pal_data.gr28;
return index;
}
int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
u64 gr28;
struct ia64_pal_retval result;
int ret = 1;
gr28 = kvm_get_pal_call_index(vcpu);
/*printk("pal_call index:%lx\n",gr28);*/
switch (gr28) {
case PAL_CACHE_FLUSH:
result = pal_cache_flush(vcpu);
break;
case PAL_CACHE_SUMMARY:
result = pal_cache_summary(vcpu);
break;
case PAL_HALT_LIGHT:
{
vcpu->arch.timer_pending = 1;
INIT_PAL_STATUS_SUCCESS(result);
if (kvm_highest_pending_irq(vcpu) == -1)
ret = kvm_emulate_halt(vcpu);
}
break;
case PAL_FREQ_RATIOS:
result = pal_freq_ratios(vcpu);
break;
case PAL_FREQ_BASE:
result = pal_freq_base(vcpu);
break;
case PAL_LOGICAL_TO_PHYSICAL :
result = pal_logical_to_physica(vcpu);
break;
case PAL_VM_SUMMARY :
result = pal_vm_summary(vcpu);
break;
case PAL_VM_INFO :
result = pal_vm_info(vcpu);
break;
case PAL_PLATFORM_ADDR :
result = pal_platform_addr(vcpu);
break;
case PAL_CACHE_INFO:
result = pal_cache_info(vcpu);
break;
case PAL_PTCE_INFO:
INIT_PAL_STATUS_SUCCESS(result);
result.v1 = (1L << 32) | 1L;
break;
case PAL_VM_PAGE_SIZE:
result.status = ia64_pal_vm_page_size(&result.v0,
&result.v1);
break;
case PAL_RSE_INFO:
result.status = ia64_pal_rse_info(&result.v0,
(pal_hints_u_t *)&result.v1);
break;
case PAL_PROC_GET_FEATURES:
result = pal_proc_get_features(vcpu);
break;
case PAL_DEBUG_INFO:
result.status = ia64_pal_debug_info(&result.v0,
&result.v1);
break;
case PAL_VERSION:
result.status = ia64_pal_version(
(pal_version_u_t *)&result.v0,
(pal_version_u_t *)&result.v1);
break;
case PAL_FIXED_ADDR:
result.status = PAL_STATUS_SUCCESS;
result.v0 = vcpu->vcpu_id;
break;
default:
INIT_PAL_STATUS_UNIMPLEMENTED(result);
printk(KERN_WARNING"kvm: Unsupported pal call,"
" index:0x%lx\n", gr28);
}
set_pal_result(vcpu, result);
return ret;
}
static struct sal_ret_values sal_emulator(struct kvm *kvm,
long index, unsigned long in1,
unsigned long in2, unsigned long in3,
unsigned long in4, unsigned long in5,
unsigned long in6, unsigned long in7)
{
unsigned long r9 = 0;
unsigned long r10 = 0;
long r11 = 0;
long status;
status = 0;
switch (index) {
case SAL_FREQ_BASE:
status = ia64_sal_freq_base(in1, &r9, &r10);
break;
case SAL_PCI_CONFIG_READ:
printk(KERN_WARNING"kvm: Not allowed to call here!"
" SAL_PCI_CONFIG_READ\n");
break;
case SAL_PCI_CONFIG_WRITE:
printk(KERN_WARNING"kvm: Not allowed to call here!"
" SAL_PCI_CONFIG_WRITE\n");
break;
case SAL_SET_VECTORS:
if (in1 == SAL_VECTOR_OS_BOOT_RENDEZ) {
if (in4 != 0 || in5 != 0 || in6 != 0 || in7 != 0) {
status = -2;
} else {
kvm->arch.rdv_sal_data.boot_ip = in2;
kvm->arch.rdv_sal_data.boot_gp = in3;
}
printk("Rendvous called! iip:%lx\n\n", in2);
} else
printk(KERN_WARNING"kvm: CALLED SAL_SET_VECTORS %lu."
"ignored...\n", in1);
break;
case SAL_GET_STATE_INFO:
/* No more info. */
status = -5;
r9 = 0;
break;
case SAL_GET_STATE_INFO_SIZE:
/* Return a dummy size. */
status = 0;
r9 = 128;
break;
case SAL_CLEAR_STATE_INFO:
/* Noop. */
break;
case SAL_MC_RENDEZ:
printk(KERN_WARNING
"kvm: called SAL_MC_RENDEZ. ignored...\n");
break;
case SAL_MC_SET_PARAMS:
printk(KERN_WARNING
"kvm: called SAL_MC_SET_PARAMS.ignored!\n");
break;
case SAL_CACHE_FLUSH:
if (1) {
/*Flush using SAL.
This method is faster but has a side
effect on other vcpu running on
this cpu. */
status = ia64_sal_cache_flush(in1);
} else {
/*Maybe need to implement the method
without side effect!*/
status = 0;
}
break;
case SAL_CACHE_INIT:
printk(KERN_WARNING
"kvm: called SAL_CACHE_INIT. ignored...\n");
break;
case SAL_UPDATE_PAL:
printk(KERN_WARNING
"kvm: CALLED SAL_UPDATE_PAL. ignored...\n");
break;
default:
printk(KERN_WARNING"kvm: called SAL_CALL with unknown index."
" index:%ld\n", index);
status = -1;
break;
}
return ((struct sal_ret_values) {status, r9, r10, r11});
}
static void kvm_get_sal_call_data(struct kvm_vcpu *vcpu, u64 *in0, u64 *in1,
u64 *in2, u64 *in3, u64 *in4, u64 *in5, u64 *in6, u64 *in7){
struct exit_ctl_data *p;
p = kvm_get_exit_data(vcpu);
if (p) {
if (p->exit_reason == EXIT_REASON_SAL_CALL) {
*in0 = p->u.sal_data.in0;
*in1 = p->u.sal_data.in1;
*in2 = p->u.sal_data.in2;
*in3 = p->u.sal_data.in3;
*in4 = p->u.sal_data.in4;
*in5 = p->u.sal_data.in5;
*in6 = p->u.sal_data.in6;
*in7 = p->u.sal_data.in7;
return ;
}
}
*in0 = 0;
}
void kvm_sal_emul(struct kvm_vcpu *vcpu)
{
struct sal_ret_values result;
u64 index, in1, in2, in3, in4, in5, in6, in7;
kvm_get_sal_call_data(vcpu, &index, &in1, &in2,
&in3, &in4, &in5, &in6, &in7);
result = sal_emulator(vcpu->kvm, index, in1, in2, in3,
in4, in5, in6, in7);
set_sal_result(vcpu, result);
}

View File

@ -0,0 +1,273 @@
/*
* kvm_minstate.h: min save macros
* Copyright (c) 2007, Intel Corporation.
*
* Xuefei Xu (Anthony Xu) (Anthony.xu@intel.com)
* Xiantao Zhang (xiantao.zhang@intel.com)
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include <asm/asmmacro.h>
#include <asm/types.h>
#include <asm/kregs.h>
#include "asm-offsets.h"
#define KVM_MINSTATE_START_SAVE_MIN \
mov ar.rsc = 0;/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */\
;; \
mov.m r28 = ar.rnat; \
addl r22 = VMM_RBS_OFFSET,r1; /* compute base of RBS */ \
;; \
lfetch.fault.excl.nt1 [r22]; \
addl r1 = IA64_STK_OFFSET-VMM_PT_REGS_SIZE,r1; /* compute base of memory stack */ \
mov r23 = ar.bspstore; /* save ar.bspstore */ \
;; \
mov ar.bspstore = r22; /* switch to kernel RBS */\
;; \
mov r18 = ar.bsp; \
mov ar.rsc = 0x3; /* set eager mode, pl 0, little-endian, loadrs=0 */
#define KVM_MINSTATE_END_SAVE_MIN \
bsw.1; /* switch back to bank 1 (must be last in insn group) */\
;;
#define PAL_VSA_SYNC_READ \
/* begin to call pal vps sync_read */ \
add r25 = VMM_VPD_BASE_OFFSET, r21; \
adds r20 = VMM_VCPU_VSA_BASE_OFFSET, r21; /* entry point */ \
;; \
ld8 r25 = [r25]; /* read vpd base */ \
ld8 r20 = [r20]; \
;; \
add r20 = PAL_VPS_SYNC_READ,r20; \
;; \
{ .mii; \
nop 0x0; \
mov r24 = ip; \
mov b0 = r20; \
;; \
}; \
{ .mmb; \
add r24 = 0x20, r24; \
nop 0x0; \
br.cond.sptk b0; /* call the service */ \
;; \
};
#define KVM_MINSTATE_GET_CURRENT(reg) mov reg=r21
/*
* KVM_DO_SAVE_MIN switches to the kernel stacks (if necessary) and saves
* the minimum state necessary that allows us to turn psr.ic back
* on.
*
* Assumed state upon entry:
* psr.ic: off
* r31: contains saved predicates (pr)
*
* Upon exit, the state is as follows:
* psr.ic: off
* r2 = points to &pt_regs.r16
* r8 = contents of ar.ccv
* r9 = contents of ar.csd
* r10 = contents of ar.ssd
* r11 = FPSR_DEFAULT
* r12 = kernel sp (kernel virtual address)
* r13 = points to current task_struct (kernel virtual address)
* p15 = TRUE if psr.i is set in cr.ipsr
* predicate registers (other than p2, p3, and p15), b6, r3, r14, r15:
* preserved
*
* Note that psr.ic is NOT turned on by this macro. This is so that
* we can pass interruption state as arguments to a handler.
*/
#define PT(f) (VMM_PT_REGS_##f##_OFFSET)
#define KVM_DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA) \
KVM_MINSTATE_GET_CURRENT(r16); /* M (or M;;I) */ \
mov r27 = ar.rsc; /* M */ \
mov r20 = r1; /* A */ \
mov r25 = ar.unat; /* M */ \
mov r29 = cr.ipsr; /* M */ \
mov r26 = ar.pfs; /* I */ \
mov r18 = cr.isr; \
COVER; /* B;; (or nothing) */ \
;; \
tbit.z p0,p15 = r29,IA64_PSR_I_BIT; \
mov r1 = r16; \
/* mov r21=r16; */ \
/* switch from user to kernel RBS: */ \
;; \
invala; /* M */ \
SAVE_IFS; \
;; \
KVM_MINSTATE_START_SAVE_MIN \
adds r17 = 2*L1_CACHE_BYTES,r1;/* cache-line size */ \
adds r16 = PT(CR_IPSR),r1; \
;; \
lfetch.fault.excl.nt1 [r17],L1_CACHE_BYTES; \
st8 [r16] = r29; /* save cr.ipsr */ \
;; \
lfetch.fault.excl.nt1 [r17]; \
tbit.nz p15,p0 = r29,IA64_PSR_I_BIT; \
mov r29 = b0 \
;; \
adds r16 = PT(R8),r1; /* initialize first base pointer */\
adds r17 = PT(R9),r1; /* initialize second base pointer */\
;; \
.mem.offset 0,0; st8.spill [r16] = r8,16; \
.mem.offset 8,0; st8.spill [r17] = r9,16; \
;; \
.mem.offset 0,0; st8.spill [r16] = r10,24; \
.mem.offset 8,0; st8.spill [r17] = r11,24; \
;; \
mov r9 = cr.iip; /* M */ \
mov r10 = ar.fpsr; /* M */ \
;; \
st8 [r16] = r9,16; /* save cr.iip */ \
st8 [r17] = r30,16; /* save cr.ifs */ \
sub r18 = r18,r22; /* r18=RSE.ndirty*8 */ \
;; \
st8 [r16] = r25,16; /* save ar.unat */ \
st8 [r17] = r26,16; /* save ar.pfs */ \
shl r18 = r18,16; /* calu ar.rsc used for "loadrs" */\
;; \
st8 [r16] = r27,16; /* save ar.rsc */ \
st8 [r17] = r28,16; /* save ar.rnat */ \
;; /* avoid RAW on r16 & r17 */ \
st8 [r16] = r23,16; /* save ar.bspstore */ \
st8 [r17] = r31,16; /* save predicates */ \
;; \
st8 [r16] = r29,16; /* save b0 */ \
st8 [r17] = r18,16; /* save ar.rsc value for "loadrs" */\
;; \
.mem.offset 0,0; st8.spill [r16] = r20,16;/* save original r1 */ \
.mem.offset 8,0; st8.spill [r17] = r12,16; \
adds r12 = -16,r1; /* switch to kernel memory stack */ \
;; \
.mem.offset 0,0; st8.spill [r16] = r13,16; \
.mem.offset 8,0; st8.spill [r17] = r10,16; /* save ar.fpsr */\
mov r13 = r21; /* establish `current' */ \
;; \
.mem.offset 0,0; st8.spill [r16] = r15,16; \
.mem.offset 8,0; st8.spill [r17] = r14,16; \
;; \
.mem.offset 0,0; st8.spill [r16] = r2,16; \
.mem.offset 8,0; st8.spill [r17] = r3,16; \
adds r2 = VMM_PT_REGS_R16_OFFSET,r1; \
;; \
adds r16 = VMM_VCPU_IIPA_OFFSET,r13; \
adds r17 = VMM_VCPU_ISR_OFFSET,r13; \
mov r26 = cr.iipa; \
mov r27 = cr.isr; \
;; \
st8 [r16] = r26; \
st8 [r17] = r27; \
;; \
EXTRA; \
mov r8 = ar.ccv; \
mov r9 = ar.csd; \
mov r10 = ar.ssd; \
movl r11 = FPSR_DEFAULT; /* L-unit */ \
adds r17 = VMM_VCPU_GP_OFFSET,r13; \
;; \
ld8 r1 = [r17];/* establish kernel global pointer */ \
;; \
PAL_VSA_SYNC_READ \
KVM_MINSTATE_END_SAVE_MIN
/*
* SAVE_REST saves the remainder of pt_regs (with psr.ic on).
*
* Assumed state upon entry:
* psr.ic: on
* r2: points to &pt_regs.f6
* r3: points to &pt_regs.f7
* r8: contents of ar.ccv
* r9: contents of ar.csd
* r10: contents of ar.ssd
* r11: FPSR_DEFAULT
*
* Registers r14 and r15 are guaranteed not to be touched by SAVE_REST.
*/
#define KVM_SAVE_REST \
.mem.offset 0,0; st8.spill [r2] = r16,16; \
.mem.offset 8,0; st8.spill [r3] = r17,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r18,16; \
.mem.offset 8,0; st8.spill [r3] = r19,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r20,16; \
.mem.offset 8,0; st8.spill [r3] = r21,16; \
mov r18=b6; \
;; \
.mem.offset 0,0; st8.spill [r2] = r22,16; \
.mem.offset 8,0; st8.spill [r3] = r23,16; \
mov r19 = b7; \
;; \
.mem.offset 0,0; st8.spill [r2] = r24,16; \
.mem.offset 8,0; st8.spill [r3] = r25,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r26,16; \
.mem.offset 8,0; st8.spill [r3] = r27,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r28,16; \
.mem.offset 8,0; st8.spill [r3] = r29,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r30,16; \
.mem.offset 8,0; st8.spill [r3] = r31,32; \
;; \
mov ar.fpsr = r11; \
st8 [r2] = r8,8; \
adds r24 = PT(B6)-PT(F7),r3; \
adds r25 = PT(B7)-PT(F7),r3; \
;; \
st8 [r24] = r18,16; /* b6 */ \
st8 [r25] = r19,16; /* b7 */ \
adds r2 = PT(R4)-PT(F6),r2; \
adds r3 = PT(R5)-PT(F7),r3; \
;; \
st8 [r24] = r9; /* ar.csd */ \
st8 [r25] = r10; /* ar.ssd */ \
;; \
mov r18 = ar.unat; \
adds r19 = PT(EML_UNAT)-PT(R4),r2; \
;; \
st8 [r19] = r18; /* eml_unat */ \
#define KVM_SAVE_EXTRA \
.mem.offset 0,0; st8.spill [r2] = r4,16; \
.mem.offset 8,0; st8.spill [r3] = r5,16; \
;; \
.mem.offset 0,0; st8.spill [r2] = r6,16; \
.mem.offset 8,0; st8.spill [r3] = r7; \
;; \
mov r26 = ar.unat; \
;; \
st8 [r2] = r26;/* eml_unat */ \
#define KVM_SAVE_MIN_WITH_COVER KVM_DO_SAVE_MIN(cover, mov r30 = cr.ifs,)
#define KVM_SAVE_MIN_WITH_COVER_R19 KVM_DO_SAVE_MIN(cover, mov r30 = cr.ifs, mov r15 = r19)
#define KVM_SAVE_MIN KVM_DO_SAVE_MIN( , mov r30 = r0, )

25
arch/ia64/kvm/lapic.h Normal file
View File

@ -0,0 +1,25 @@
#ifndef __KVM_IA64_LAPIC_H
#define __KVM_IA64_LAPIC_H
#include <linux/kvm_host.h>
/*
* vlsapic
*/
struct kvm_lapic{
struct kvm_vcpu *vcpu;
uint64_t insvc[4];
uint64_t vhpi;
uint8_t xtp;
uint8_t pal_init_pending;
uint8_t pad[2];
};
int kvm_create_lapic(struct kvm_vcpu *vcpu);
void kvm_free_lapic(struct kvm_vcpu *vcpu);
int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
int kvm_apic_set_irq(struct kvm_vcpu *vcpu, u8 vec, u8 trig);
#endif

93
arch/ia64/kvm/misc.h Normal file
View File

@ -0,0 +1,93 @@
#ifndef __KVM_IA64_MISC_H
#define __KVM_IA64_MISC_H
#include <linux/kvm_host.h>
/*
* misc.h
* Copyright (C) 2007, Intel Corporation.
* Xiantao Zhang (xiantao.zhang@intel.com)
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
/*
*Return p2m base address at host side!
*/
static inline uint64_t *kvm_host_get_pmt(struct kvm *kvm)
{
return (uint64_t *)(kvm->arch.vm_base + KVM_P2M_OFS);
}
static inline void kvm_set_pmt_entry(struct kvm *kvm, gfn_t gfn,
u64 paddr, u64 mem_flags)
{
uint64_t *pmt_base = kvm_host_get_pmt(kvm);
unsigned long pte;
pte = PAGE_ALIGN(paddr) | mem_flags;
pmt_base[gfn] = pte;
}
/*Function for translating host address to guest address*/
static inline void *to_guest(struct kvm *kvm, void *addr)
{
return (void *)((unsigned long)(addr) - kvm->arch.vm_base +
KVM_VM_DATA_BASE);
}
/*Function for translating guest address to host address*/
static inline void *to_host(struct kvm *kvm, void *addr)
{
return (void *)((unsigned long)addr - KVM_VM_DATA_BASE
+ kvm->arch.vm_base);
}
/* Get host context of the vcpu */
static inline union context *kvm_get_host_context(struct kvm_vcpu *vcpu)
{
union context *ctx = &vcpu->arch.host;
return to_guest(vcpu->kvm, ctx);
}
/* Get guest context of the vcpu */
static inline union context *kvm_get_guest_context(struct kvm_vcpu *vcpu)
{
union context *ctx = &vcpu->arch.guest;
return to_guest(vcpu->kvm, ctx);
}
/* kvm get exit data from gvmm! */
static inline struct exit_ctl_data *kvm_get_exit_data(struct kvm_vcpu *vcpu)
{
return &vcpu->arch.exit_data;
}
/*kvm get vcpu ioreq for kvm module!*/
static inline struct kvm_mmio_req *kvm_get_vcpu_ioreq(struct kvm_vcpu *vcpu)
{
struct exit_ctl_data *p_ctl_data;
if (vcpu) {
p_ctl_data = kvm_get_exit_data(vcpu);
if (p_ctl_data->exit_reason == EXIT_REASON_MMIO_INSTRUCTION)
return &p_ctl_data->u.ioreq;
}
return NULL;
}
#endif

341
arch/ia64/kvm/mmio.c Normal file
View File

@ -0,0 +1,341 @@
/*
* mmio.c: MMIO emulation components.
* Copyright (c) 2004, Intel Corporation.
* Yaozu Dong (Eddie Dong) (Eddie.dong@intel.com)
* Kun Tian (Kevin Tian) (Kevin.tian@intel.com)
*
* Copyright (c) 2007 Intel Corporation KVM support.
* Xuefei Xu (Anthony Xu) (anthony.xu@intel.com)
* Xiantao Zhang (xiantao.zhang@intel.com)
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include <linux/kvm_host.h>
#include "vcpu.h"
static void vlsapic_write_xtp(struct kvm_vcpu *v, uint8_t val)
{
VLSAPIC_XTP(v) = val;
}
/*
* LSAPIC OFFSET
*/
#define PIB_LOW_HALF(ofst) !(ofst & (1 << 20))
#define PIB_OFST_INTA 0x1E0000
#define PIB_OFST_XTP 0x1E0008
/*
* execute write IPI op.
*/
static void vlsapic_write_ipi(struct kvm_vcpu *vcpu,
uint64_t addr, uint64_t data)
{
struct exit_ctl_data *p = &current_vcpu->arch.exit_data;
unsigned long psr;
local_irq_save(psr);
p->exit_reason = EXIT_REASON_IPI;
p->u.ipi_data.addr.val = addr;
p->u.ipi_data.data.val = data;
vmm_transition(current_vcpu);
local_irq_restore(psr);
}
void lsapic_write(struct kvm_vcpu *v, unsigned long addr,
unsigned long length, unsigned long val)
{
addr &= (PIB_SIZE - 1);
switch (addr) {
case PIB_OFST_INTA:
/*panic_domain(NULL, "Undefined write on PIB INTA\n");*/
panic_vm(v);
break;
case PIB_OFST_XTP:
if (length == 1) {
vlsapic_write_xtp(v, val);
} else {
/*panic_domain(NULL,
"Undefined write on PIB XTP\n");*/
panic_vm(v);
}
break;
default:
if (PIB_LOW_HALF(addr)) {
/*lower half */
if (length != 8)
/*panic_domain(NULL,
"Can't LHF write with size %ld!\n",
length);*/
panic_vm(v);
else
vlsapic_write_ipi(v, addr, val);
} else { /* upper half
printk("IPI-UHF write %lx\n",addr);*/
panic_vm(v);
}
break;
}
}
unsigned long lsapic_read(struct kvm_vcpu *v, unsigned long addr,
unsigned long length)
{
uint64_t result = 0;
addr &= (PIB_SIZE - 1);
switch (addr) {
case PIB_OFST_INTA:
if (length == 1) /* 1 byte load */
; /* There is no i8259, there is no INTA access*/
else
/*panic_domain(NULL,"Undefined read on PIB INTA\n"); */
panic_vm(v);
break;
case PIB_OFST_XTP:
if (length == 1) {
result = VLSAPIC_XTP(v);
/* printk("read xtp %lx\n", result); */
} else {
/*panic_domain(NULL,
"Undefined read on PIB XTP\n");*/
panic_vm(v);
}
break;
default:
panic_vm(v);
break;
}
return result;
}
static void mmio_access(struct kvm_vcpu *vcpu, u64 src_pa, u64 *dest,
u16 s, int ma, int dir)
{
unsigned long iot;
struct exit_ctl_data *p = &vcpu->arch.exit_data;
unsigned long psr;
iot = __gpfn_is_io(src_pa >> PAGE_SHIFT);
local_irq_save(psr);
/*Intercept the acces for PIB range*/
if (iot == GPFN_PIB) {
if (!dir)
lsapic_write(vcpu, src_pa, s, *dest);
else
*dest = lsapic_read(vcpu, src_pa, s);
goto out;
}
p->exit_reason = EXIT_REASON_MMIO_INSTRUCTION;
p->u.ioreq.addr = src_pa;
p->u.ioreq.size = s;
p->u.ioreq.dir = dir;
if (dir == IOREQ_WRITE)
p->u.ioreq.data = *dest;
p->u.ioreq.state = STATE_IOREQ_READY;
vmm_transition(vcpu);
if (p->u.ioreq.state == STATE_IORESP_READY) {
if (dir == IOREQ_READ)
*dest = p->u.ioreq.data;
} else
panic_vm(vcpu);
out:
local_irq_restore(psr);
return ;
}
/*
dir 1: read 0:write
inst_type 0:integer 1:floating point
*/
#define SL_INTEGER 0 /* store/load interger*/
#define SL_FLOATING 1 /* store/load floating*/
void emulate_io_inst(struct kvm_vcpu *vcpu, u64 padr, u64 ma)
{
struct kvm_pt_regs *regs;
IA64_BUNDLE bundle;
int slot, dir = 0;
int inst_type = -1;
u16 size = 0;
u64 data, slot1a, slot1b, temp, update_reg;
s32 imm;
INST64 inst;
regs = vcpu_regs(vcpu);
if (fetch_code(vcpu, regs->cr_iip, &bundle)) {
/* if fetch code fail, return and try again */
return;
}
slot = ((struct ia64_psr *)&(regs->cr_ipsr))->ri;
if (!slot)
inst.inst = bundle.slot0;
else if (slot == 1) {
slot1a = bundle.slot1a;
slot1b = bundle.slot1b;
inst.inst = slot1a + (slot1b << 18);
} else if (slot == 2)
inst.inst = bundle.slot2;
/* Integer Load/Store */
if (inst.M1.major == 4 && inst.M1.m == 0 && inst.M1.x == 0) {
inst_type = SL_INTEGER;
size = (inst.M1.x6 & 0x3);
if ((inst.M1.x6 >> 2) > 0xb) {
/*write*/
dir = IOREQ_WRITE;
data = vcpu_get_gr(vcpu, inst.M4.r2);
} else if ((inst.M1.x6 >> 2) < 0xb) {
/*read*/
dir = IOREQ_READ;
}
} else if (inst.M2.major == 4 && inst.M2.m == 1 && inst.M2.x == 0) {
/* Integer Load + Reg update */
inst_type = SL_INTEGER;
dir = IOREQ_READ;
size = (inst.M2.x6 & 0x3);
temp = vcpu_get_gr(vcpu, inst.M2.r3);
update_reg = vcpu_get_gr(vcpu, inst.M2.r2);
temp += update_reg;
vcpu_set_gr(vcpu, inst.M2.r3, temp, 0);
} else if (inst.M3.major == 5) {
/*Integer Load/Store + Imm update*/
inst_type = SL_INTEGER;
size = (inst.M3.x6&0x3);
if ((inst.M5.x6 >> 2) > 0xb) {
/*write*/
dir = IOREQ_WRITE;
data = vcpu_get_gr(vcpu, inst.M5.r2);
temp = vcpu_get_gr(vcpu, inst.M5.r3);
imm = (inst.M5.s << 31) | (inst.M5.i << 30) |
(inst.M5.imm7 << 23);
temp += imm >> 23;
vcpu_set_gr(vcpu, inst.M5.r3, temp, 0);
} else if ((inst.M3.x6 >> 2) < 0xb) {
/*read*/
dir = IOREQ_READ;
temp = vcpu_get_gr(vcpu, inst.M3.r3);
imm = (inst.M3.s << 31) | (inst.M3.i << 30) |
(inst.M3.imm7 << 23);
temp += imm >> 23;
vcpu_set_gr(vcpu, inst.M3.r3, temp, 0);
}
} else if (inst.M9.major == 6 && inst.M9.x6 == 0x3B
&& inst.M9.m == 0 && inst.M9.x == 0) {
/* Floating-point spill*/
struct ia64_fpreg v;
inst_type = SL_FLOATING;
dir = IOREQ_WRITE;
vcpu_get_fpreg(vcpu, inst.M9.f2, &v);
/* Write high word. FIXME: this is a kludge! */
v.u.bits[1] &= 0x3ffff;
mmio_access(vcpu, padr + 8, &v.u.bits[1], 8, ma, IOREQ_WRITE);
data = v.u.bits[0];
size = 3;
} else if (inst.M10.major == 7 && inst.M10.x6 == 0x3B) {
/* Floating-point spill + Imm update */
struct ia64_fpreg v;
inst_type = SL_FLOATING;
dir = IOREQ_WRITE;
vcpu_get_fpreg(vcpu, inst.M10.f2, &v);
temp = vcpu_get_gr(vcpu, inst.M10.r3);
imm = (inst.M10.s << 31) | (inst.M10.i << 30) |
(inst.M10.imm7 << 23);
temp += imm >> 23;
vcpu_set_gr(vcpu, inst.M10.r3, temp, 0);
/* Write high word.FIXME: this is a kludge! */
v.u.bits[1] &= 0x3ffff;
mmio_access(vcpu, padr + 8, &v.u.bits[1], 8, ma, IOREQ_WRITE);
data = v.u.bits[0];
size = 3;
} else if (inst.M10.major == 7 && inst.M10.x6 == 0x31) {
/* Floating-point stf8 + Imm update */
struct ia64_fpreg v;
inst_type = SL_FLOATING;
dir = IOREQ_WRITE;
size = 3;
vcpu_get_fpreg(vcpu, inst.M10.f2, &v);
data = v.u.bits[0]; /* Significand. */
temp = vcpu_get_gr(vcpu, inst.M10.r3);
imm = (inst.M10.s << 31) | (inst.M10.i << 30) |
(inst.M10.imm7 << 23);
temp += imm >> 23;
vcpu_set_gr(vcpu, inst.M10.r3, temp, 0);
} else if (inst.M15.major == 7 && inst.M15.x6 >= 0x2c
&& inst.M15.x6 <= 0x2f) {
temp = vcpu_get_gr(vcpu, inst.M15.r3);
imm = (inst.M15.s << 31) | (inst.M15.i << 30) |
(inst.M15.imm7 << 23);
temp += imm >> 23;
vcpu_set_gr(vcpu, inst.M15.r3, temp, 0);
vcpu_increment_iip(vcpu);
return;
} else if (inst.M12.major == 6 && inst.M12.m == 1
&& inst.M12.x == 1 && inst.M12.x6 == 1) {
/* Floating-point Load Pair + Imm ldfp8 M12*/
struct ia64_fpreg v;
inst_type = SL_FLOATING;
dir = IOREQ_READ;
size = 8; /*ldfd*/
mmio_access(vcpu, padr, &data, size, ma, dir);
v.u.bits[0] = data;
v.u.bits[1] = 0x1003E;
vcpu_set_fpreg(vcpu, inst.M12.f1, &v);
padr += 8;
mmio_access(vcpu, padr, &data, size, ma, dir);
v.u.bits[0] = data;
v.u.bits[1] = 0x1003E;
vcpu_set_fpreg(vcpu, inst.M12.f2, &v);
padr += 8;
vcpu_set_gr(vcpu, inst.M12.r3, padr, 0);
vcpu_increment_iip(vcpu);
return;
} else {
inst_type = -1;
panic_vm(vcpu);
}
size = 1 << size;
if (dir == IOREQ_WRITE) {
mmio_access(vcpu, padr, &data, size, ma, dir);
} else {
mmio_access(vcpu, padr, &data, size, ma, dir);
if (inst_type == SL_INTEGER)
vcpu_set_gr(vcpu, inst.M1.r1, data, 0);
else
panic_vm(vcpu);
}
vcpu_increment_iip(vcpu);
}

918
arch/ia64/kvm/optvfault.S Normal file
View File

@ -0,0 +1,918 @@
/*
* arch/ia64/vmx/optvfault.S
* optimize virtualization fault handler
*
* Copyright (C) 2006 Intel Co
* Xuefei Xu (Anthony Xu) <anthony.xu@intel.com>
*/
#include <asm/asmmacro.h>
#include <asm/processor.h>
#include "vti.h"
#include "asm-offsets.h"
#define ACCE_MOV_FROM_AR
#define ACCE_MOV_FROM_RR
#define ACCE_MOV_TO_RR
#define ACCE_RSM
#define ACCE_SSM
#define ACCE_MOV_TO_PSR
#define ACCE_THASH
//mov r1=ar3
GLOBAL_ENTRY(kvm_asm_mov_from_ar)
#ifndef ACCE_MOV_FROM_AR
br.many kvm_virtualization_fault_back
#endif
add r18=VMM_VCPU_ITC_OFS_OFFSET, r21
add r16=VMM_VCPU_LAST_ITC_OFFSET,r21
extr.u r17=r25,6,7
;;
ld8 r18=[r18]
mov r19=ar.itc
mov r24=b0
;;
add r19=r19,r18
addl r20=@gprel(asm_mov_to_reg),gp
;;
st8 [r16] = r19
adds r30=kvm_resume_to_guest-asm_mov_to_reg,r20
shladd r17=r17,4,r20
;;
mov b0=r17
br.sptk.few b0
;;
END(kvm_asm_mov_from_ar)
// mov r1=rr[r3]
GLOBAL_ENTRY(kvm_asm_mov_from_rr)
#ifndef ACCE_MOV_FROM_RR
br.many kvm_virtualization_fault_back
#endif
extr.u r16=r25,20,7
extr.u r17=r25,6,7
addl r20=@gprel(asm_mov_from_reg),gp
;;
adds r30=kvm_asm_mov_from_rr_back_1-asm_mov_from_reg,r20
shladd r16=r16,4,r20
mov r24=b0
;;
add r27=VMM_VCPU_VRR0_OFFSET,r21
mov b0=r16
br.many b0
;;
kvm_asm_mov_from_rr_back_1:
adds r30=kvm_resume_to_guest-asm_mov_from_reg,r20
adds r22=asm_mov_to_reg-asm_mov_from_reg,r20
shr.u r26=r19,61
;;
shladd r17=r17,4,r22
shladd r27=r26,3,r27
;;
ld8 r19=[r27]
mov b0=r17
br.many b0
END(kvm_asm_mov_from_rr)
// mov rr[r3]=r2
GLOBAL_ENTRY(kvm_asm_mov_to_rr)
#ifndef ACCE_MOV_TO_RR
br.many kvm_virtualization_fault_back
#endif
extr.u r16=r25,20,7
extr.u r17=r25,13,7
addl r20=@gprel(asm_mov_from_reg),gp
;;
adds r30=kvm_asm_mov_to_rr_back_1-asm_mov_from_reg,r20
shladd r16=r16,4,r20
mov r22=b0
;;
add r27=VMM_VCPU_VRR0_OFFSET,r21
mov b0=r16
br.many b0
;;
kvm_asm_mov_to_rr_back_1:
adds r30=kvm_asm_mov_to_rr_back_2-asm_mov_from_reg,r20
shr.u r23=r19,61
shladd r17=r17,4,r20
;;
//if rr6, go back
cmp.eq p6,p0=6,r23
mov b0=r22
(p6) br.cond.dpnt.many kvm_virtualization_fault_back
;;
mov r28=r19
mov b0=r17
br.many b0
kvm_asm_mov_to_rr_back_2:
adds r30=kvm_resume_to_guest-asm_mov_from_reg,r20
shladd r27=r23,3,r27
;; // vrr.rid<<4 |0xe
st8 [r27]=r19
mov b0=r30
;;
extr.u r16=r19,8,26
extr.u r18 =r19,2,6
mov r17 =0xe
;;
shladd r16 = r16, 4, r17
extr.u r19 =r19,0,8
;;
shl r16 = r16,8
;;
add r19 = r19, r16
;; //set ve 1
dep r19=-1,r19,0,1
cmp.lt p6,p0=14,r18
;;
(p6) mov r18=14
;;
(p6) dep r19=r18,r19,2,6
;;
cmp.eq p6,p0=0,r23
;;
cmp.eq.or p6,p0=4,r23
;;
adds r16=VMM_VCPU_MODE_FLAGS_OFFSET,r21
(p6) adds r17=VMM_VCPU_META_SAVED_RR0_OFFSET,r21
;;
ld4 r16=[r16]
cmp.eq p7,p0=r0,r0
(p6) shladd r17=r23,1,r17
;;
(p6) st8 [r17]=r19
(p6) tbit.nz p6,p7=r16,0
;;
(p7) mov rr[r28]=r19
mov r24=r22
br.many b0
END(kvm_asm_mov_to_rr)
//rsm
GLOBAL_ENTRY(kvm_asm_rsm)
#ifndef ACCE_RSM
br.many kvm_virtualization_fault_back
#endif
add r16=VMM_VPD_BASE_OFFSET,r21
extr.u r26=r25,6,21
extr.u r27=r25,31,2
;;
ld8 r16=[r16]
extr.u r28=r25,36,1
dep r26=r27,r26,21,2
;;
add r17=VPD_VPSR_START_OFFSET,r16
add r22=VMM_VCPU_MODE_FLAGS_OFFSET,r21
//r26 is imm24
dep r26=r28,r26,23,1
;;
ld8 r18=[r17]
movl r28=IA64_PSR_IC+IA64_PSR_I+IA64_PSR_DT+IA64_PSR_SI
ld4 r23=[r22]
sub r27=-1,r26
mov r24=b0
;;
mov r20=cr.ipsr
or r28=r27,r28
and r19=r18,r27
;;
st8 [r17]=r19
and r20=r20,r28
/* Comment it out due to short of fp lazy alorgithm support
adds r27=IA64_VCPU_FP_PSR_OFFSET,r21
;;
ld8 r27=[r27]
;;
tbit.nz p8,p0= r27,IA64_PSR_DFH_BIT
;;
(p8) dep r20=-1,r20,IA64_PSR_DFH_BIT,1
*/
;;
mov cr.ipsr=r20
tbit.nz p6,p0=r23,0
;;
tbit.z.or p6,p0=r26,IA64_PSR_DT_BIT
(p6) br.dptk kvm_resume_to_guest
;;
add r26=VMM_VCPU_META_RR0_OFFSET,r21
add r27=VMM_VCPU_META_RR0_OFFSET+8,r21
dep r23=-1,r23,0,1
;;
ld8 r26=[r26]
ld8 r27=[r27]
st4 [r22]=r23
dep.z r28=4,61,3
;;
mov rr[r0]=r26
;;
mov rr[r28]=r27
;;
srlz.d
br.many kvm_resume_to_guest
END(kvm_asm_rsm)
//ssm
GLOBAL_ENTRY(kvm_asm_ssm)
#ifndef ACCE_SSM
br.many kvm_virtualization_fault_back
#endif
add r16=VMM_VPD_BASE_OFFSET,r21
extr.u r26=r25,6,21
extr.u r27=r25,31,2
;;
ld8 r16=[r16]
extr.u r28=r25,36,1
dep r26=r27,r26,21,2
;; //r26 is imm24
add r27=VPD_VPSR_START_OFFSET,r16
dep r26=r28,r26,23,1
;; //r19 vpsr
ld8 r29=[r27]
mov r24=b0
;;
add r22=VMM_VCPU_MODE_FLAGS_OFFSET,r21
mov r20=cr.ipsr
or r19=r29,r26
;;
ld4 r23=[r22]
st8 [r27]=r19
or r20=r20,r26
;;
mov cr.ipsr=r20
movl r28=IA64_PSR_DT+IA64_PSR_RT+IA64_PSR_IT
;;
and r19=r28,r19
tbit.z p6,p0=r23,0
;;
cmp.ne.or p6,p0=r28,r19
(p6) br.dptk kvm_asm_ssm_1
;;
add r26=VMM_VCPU_META_SAVED_RR0_OFFSET,r21
add r27=VMM_VCPU_META_SAVED_RR0_OFFSET+8,r21
dep r23=0,r23,0,1
;;
ld8 r26=[r26]
ld8 r27=[r27]
st4 [r22]=r23
dep.z r28=4,61,3
;;
mov rr[r0]=r26
;;
mov rr[r28]=r27
;;
srlz.d
;;
kvm_asm_ssm_1:
tbit.nz p6,p0=r29,IA64_PSR_I_BIT
;;
tbit.z.or p6,p0=r19,IA64_PSR_I_BIT
(p6) br.dptk kvm_resume_to_guest
;;
add r29=VPD_VTPR_START_OFFSET,r16
add r30=VPD_VHPI_START_OFFSET,r16
;;
ld8 r29=[r29]
ld8 r30=[r30]
;;
extr.u r17=r29,4,4
extr.u r18=r29,16,1
;;
dep r17=r18,r17,4,1
;;
cmp.gt p6,p0=r30,r17
(p6) br.dpnt.few kvm_asm_dispatch_vexirq
br.many kvm_resume_to_guest
END(kvm_asm_ssm)
//mov psr.l=r2
GLOBAL_ENTRY(kvm_asm_mov_to_psr)
#ifndef ACCE_MOV_TO_PSR
br.many kvm_virtualization_fault_back
#endif
add r16=VMM_VPD_BASE_OFFSET,r21
extr.u r26=r25,13,7 //r2
;;
ld8 r16=[r16]
addl r20=@gprel(asm_mov_from_reg),gp
;;
adds r30=kvm_asm_mov_to_psr_back-asm_mov_from_reg,r20
shladd r26=r26,4,r20
mov r24=b0
;;
add r27=VPD_VPSR_START_OFFSET,r16
mov b0=r26
br.many b0
;;
kvm_asm_mov_to_psr_back:
ld8 r17=[r27]
add r22=VMM_VCPU_MODE_FLAGS_OFFSET,r21
dep r19=0,r19,32,32
;;
ld4 r23=[r22]
dep r18=0,r17,0,32
;;
add r30=r18,r19
movl r28=IA64_PSR_DT+IA64_PSR_RT+IA64_PSR_IT
;;
st8 [r27]=r30
and r27=r28,r30
and r29=r28,r17
;;
cmp.eq p5,p0=r29,r27
cmp.eq p6,p7=r28,r27
(p5) br.many kvm_asm_mov_to_psr_1
;;
//virtual to physical
(p7) add r26=VMM_VCPU_META_RR0_OFFSET,r21
(p7) add r27=VMM_VCPU_META_RR0_OFFSET+8,r21
(p7) dep r23=-1,r23,0,1
;;
//physical to virtual
(p6) add r26=VMM_VCPU_META_SAVED_RR0_OFFSET,r21
(p6) add r27=VMM_VCPU_META_SAVED_RR0_OFFSET+8,r21
(p6) dep r23=0,r23,0,1
;;
ld8 r26=[r26]
ld8 r27=[r27]
st4 [r22]=r23
dep.z r28=4,61,3
;;
mov rr[r0]=r26
;;
mov rr[r28]=r27
;;
srlz.d
;;
kvm_asm_mov_to_psr_1:
mov r20=cr.ipsr
movl r28=IA64_PSR_IC+IA64_PSR_I+IA64_PSR_DT+IA64_PSR_SI+IA64_PSR_RT
;;
or r19=r19,r28
dep r20=0,r20,0,32
;;
add r20=r19,r20
mov b0=r24
;;
/* Comment it out due to short of fp lazy algorithm support
adds r27=IA64_VCPU_FP_PSR_OFFSET,r21
;;
ld8 r27=[r27]
;;
tbit.nz p8,p0=r27,IA64_PSR_DFH_BIT
;;
(p8) dep r20=-1,r20,IA64_PSR_DFH_BIT,1
;;
*/
mov cr.ipsr=r20
cmp.ne p6,p0=r0,r0
;;
tbit.nz.or p6,p0=r17,IA64_PSR_I_BIT
tbit.z.or p6,p0=r30,IA64_PSR_I_BIT
(p6) br.dpnt.few kvm_resume_to_guest
;;
add r29=VPD_VTPR_START_OFFSET,r16
add r30=VPD_VHPI_START_OFFSET,r16
;;
ld8 r29=[r29]
ld8 r30=[r30]
;;
extr.u r17=r29,4,4
extr.u r18=r29,16,1
;;
dep r17=r18,r17,4,1
;;
cmp.gt p6,p0=r30,r17
(p6) br.dpnt.few kvm_asm_dispatch_vexirq
br.many kvm_resume_to_guest
END(kvm_asm_mov_to_psr)
ENTRY(kvm_asm_dispatch_vexirq)
//increment iip
mov r16=cr.ipsr
;;
extr.u r17=r16,IA64_PSR_RI_BIT,2
tbit.nz p6,p7=r16,IA64_PSR_RI_BIT+1
;;
(p6) mov r18=cr.iip
(p6) mov r17=r0
(p7) add r17=1,r17
;;
(p6) add r18=0x10,r18
dep r16=r17,r16,IA64_PSR_RI_BIT,2
;;
(p6) mov cr.iip=r18
mov cr.ipsr=r16
mov r30 =1
br.many kvm_dispatch_vexirq
END(kvm_asm_dispatch_vexirq)
// thash
// TODO: add support when pta.vf = 1
GLOBAL_ENTRY(kvm_asm_thash)
#ifndef ACCE_THASH
br.many kvm_virtualization_fault_back
#endif
extr.u r17=r25,20,7 // get r3 from opcode in r25
extr.u r18=r25,6,7 // get r1 from opcode in r25
addl r20=@gprel(asm_mov_from_reg),gp
;;
adds r30=kvm_asm_thash_back1-asm_mov_from_reg,r20
shladd r17=r17,4,r20 // get addr of MOVE_FROM_REG(r17)
adds r16=VMM_VPD_BASE_OFFSET,r21 // get vcpu.arch.priveregs
;;
mov r24=b0
;;
ld8 r16=[r16] // get VPD addr
mov b0=r17
br.many b0 // r19 return value
;;
kvm_asm_thash_back1:
shr.u r23=r19,61 // get RR number
adds r25=VMM_VCPU_VRR0_OFFSET,r21 // get vcpu->arch.vrr[0]'s addr
adds r16=VMM_VPD_VPTA_OFFSET,r16 // get vpta
;;
shladd r27=r23,3,r25 // get vcpu->arch.vrr[r23]'s addr
ld8 r17=[r16] // get PTA
mov r26=1
;;
extr.u r29=r17,2,6 // get pta.size
ld8 r25=[r27] // get vcpu->arch.vrr[r23]'s value
;;
extr.u r25=r25,2,6 // get rr.ps
shl r22=r26,r29 // 1UL << pta.size
;;
shr.u r23=r19,r25 // vaddr >> rr.ps
adds r26=3,r29 // pta.size + 3
shl r27=r17,3 // pta << 3
;;
shl r23=r23,3 // (vaddr >> rr.ps) << 3
shr.u r27=r27,r26 // (pta << 3) >> (pta.size+3)
movl r16=7<<61
;;
adds r22=-1,r22 // (1UL << pta.size) - 1
shl r27=r27,r29 // ((pta<<3)>>(pta.size+3))<<pta.size
and r19=r19,r16 // vaddr & VRN_MASK
;;
and r22=r22,r23 // vhpt_offset
or r19=r19,r27 // (vadr&VRN_MASK)|(((pta<<3)>>(pta.size + 3))<<pta.size)
adds r26=asm_mov_to_reg-asm_mov_from_reg,r20
;;
or r19=r19,r22 // calc pval
shladd r17=r18,4,r26
adds r30=kvm_resume_to_guest-asm_mov_from_reg,r20
;;
mov b0=r17
br.many b0
END(kvm_asm_thash)
#define MOV_TO_REG0 \
{; \
nop.b 0x0; \
nop.b 0x0; \
nop.b 0x0; \
;; \
};
#define MOV_TO_REG(n) \
{; \
mov r##n##=r19; \
mov b0=r30; \
br.sptk.many b0; \
;; \
};
#define MOV_FROM_REG(n) \
{; \
mov r19=r##n##; \
mov b0=r30; \
br.sptk.many b0; \
;; \
};
#define MOV_TO_BANK0_REG(n) \
ENTRY_MIN_ALIGN(asm_mov_to_bank0_reg##n##); \
{; \
mov r26=r2; \
mov r2=r19; \
bsw.1; \
;; \
}; \
{; \
mov r##n##=r2; \
nop.b 0x0; \
bsw.0; \
;; \
}; \
{; \
mov r2=r26; \
mov b0=r30; \
br.sptk.many b0; \
;; \
}; \
END(asm_mov_to_bank0_reg##n##)
#define MOV_FROM_BANK0_REG(n) \
ENTRY_MIN_ALIGN(asm_mov_from_bank0_reg##n##); \
{; \
mov r26=r2; \
nop.b 0x0; \
bsw.1; \
;; \
}; \
{; \
mov r2=r##n##; \
nop.b 0x0; \
bsw.0; \
;; \
}; \
{; \
mov r19=r2; \
mov r2=r26; \
mov b0=r30; \
}; \
{; \
nop.b 0x0; \
nop.b 0x0; \
br.sptk.many b0; \
;; \
}; \
END(asm_mov_from_bank0_reg##n##)
#define JMP_TO_MOV_TO_BANK0_REG(n) \
{; \
nop.b 0x0; \
nop.b 0x0; \
br.sptk.many asm_mov_to_bank0_reg##n##; \
;; \
}
#define JMP_TO_MOV_FROM_BANK0_REG(n) \
{; \
nop.b 0x0; \
nop.b 0x0; \
br.sptk.many asm_mov_from_bank0_reg##n##; \
;; \
}
MOV_FROM_BANK0_REG(16)
MOV_FROM_BANK0_REG(17)
MOV_FROM_BANK0_REG(18)
MOV_FROM_BANK0_REG(19)
MOV_FROM_BANK0_REG(20)
MOV_FROM_BANK0_REG(21)
MOV_FROM_BANK0_REG(22)
MOV_FROM_BANK0_REG(23)
MOV_FROM_BANK0_REG(24)
MOV_FROM_BANK0_REG(25)
MOV_FROM_BANK0_REG(26)
MOV_FROM_BANK0_REG(27)
MOV_FROM_BANK0_REG(28)
MOV_FROM_BANK0_REG(29)
MOV_FROM_BANK0_REG(30)
MOV_FROM_BANK0_REG(31)
// mov from reg table
ENTRY(asm_mov_from_reg)
MOV_FROM_REG(0)
MOV_FROM_REG(1)
MOV_FROM_REG(2)
MOV_FROM_REG(3)
MOV_FROM_REG(4)
MOV_FROM_REG(5)
MOV_FROM_REG(6)
MOV_FROM_REG(7)
MOV_FROM_REG(8)
MOV_FROM_REG(9)
MOV_FROM_REG(10)
MOV_FROM_REG(11)
MOV_FROM_REG(12)
MOV_FROM_REG(13)
MOV_FROM_REG(14)
MOV_FROM_REG(15)
JMP_TO_MOV_FROM_BANK0_REG(16)
JMP_TO_MOV_FROM_BANK0_REG(17)
JMP_TO_MOV_FROM_BANK0_REG(18)
JMP_TO_MOV_FROM_BANK0_REG(19)
JMP_TO_MOV_FROM_BANK0_REG(20)
JMP_TO_MOV_FROM_BANK0_REG(21)
JMP_TO_MOV_FROM_BANK0_REG(22)
JMP_TO_MOV_FROM_BANK0_REG(23)
JMP_TO_MOV_FROM_BANK0_REG(24)
JMP_TO_MOV_FROM_BANK0_REG(25)
JMP_TO_MOV_FROM_BANK0_REG(26)
JMP_TO_MOV_FROM_BANK0_REG(27)
JMP_TO_MOV_FROM_BANK0_REG(28)
JMP_TO_MOV_FROM_BANK0_REG(29)
JMP_TO_MOV_FROM_BANK0_REG(30)
JMP_TO_MOV_FROM_BANK0_REG(31)
MOV_FROM_REG(32)
MOV_FROM_REG(33)
MOV_FROM_REG(34)
MOV_FROM_REG(35)
MOV_FROM_REG(36)
MOV_FROM_REG(37)
MOV_FROM_REG(38)
MOV_FROM_REG(39)
MOV_FROM_REG(40)
MOV_FROM_REG(41)
MOV_FROM_REG(42)
MOV_FROM_REG(43)
MOV_FROM_REG(44)
MOV_FROM_REG(45)
MOV_FROM_REG(46)
MOV_FROM_REG(47)
MOV_FROM_REG(48)
MOV_FROM_REG(49)
MOV_FROM_REG(50)
MOV_FROM_REG(51)
MOV_FROM_REG(52)
MOV_FROM_REG(53)
MOV_FROM_REG(54)
MOV_FROM_REG(55)
MOV_FROM_REG(56)
MOV_FROM_REG(57)
MOV_FROM_REG(58)
MOV_FROM_REG(59)
MOV_FROM_REG(60)
MOV_FROM_REG(61)
MOV_FROM_REG(62)
MOV_FROM_REG(63)
MOV_FROM_REG(64)
MOV_FROM_REG(65)
MOV_FROM_REG(66)
MOV_FROM_REG(67)
MOV_FROM_REG(68)
MOV_FROM_REG(69)
MOV_FROM_REG(70)
MOV_FROM_REG(71)
MOV_FROM_REG(72)
MOV_FROM_REG(73)
MOV_FROM_REG(74)
MOV_FROM_REG(75)
MOV_FROM_REG(76)
MOV_FROM_REG(77)
MOV_FROM_REG(78)
MOV_FROM_REG(79)
MOV_FROM_REG(80)
MOV_FROM_REG(81)
MOV_FROM_REG(82)
MOV_FROM_REG(83)
MOV_FROM_REG(84)
MOV_FROM_REG(85)
MOV_FROM_REG(86)
MOV_FROM_REG(87)
MOV_FROM_REG(88)
MOV_FROM_REG(89)
MOV_FROM_REG(90)
MOV_FROM_REG(91)
MOV_FROM_REG(92)
MOV_FROM_REG(93)
MOV_FROM_REG(94)
MOV_FROM_REG(95)
MOV_FROM_REG(96)
MOV_FROM_REG(97)
MOV_FROM_REG(98)
MOV_FROM_REG(99)
MOV_FROM_REG(100)
MOV_FROM_REG(101)
MOV_FROM_REG(102)
MOV_FROM_REG(103)
MOV_FROM_REG(104)
MOV_FROM_REG(105)
MOV_FROM_REG(106)
MOV_FROM_REG(107)
MOV_FROM_REG(108)
MOV_FROM_REG(109)
MOV_FROM_REG(110)
MOV_FROM_REG(111)
MOV_FROM_REG(112)
MOV_FROM_REG(113)
MOV_FROM_REG(114)
MOV_FROM_REG(115)
MOV_FROM_REG(116)
MOV_FROM_REG(117)
MOV_FROM_REG(118)
MOV_FROM_REG(119)
MOV_FROM_REG(120)
MOV_FROM_REG(121)
MOV_FROM_REG(122)
MOV_FROM_REG(123)
MOV_FROM_REG(124)
MOV_FROM_REG(125)
MOV_FROM_REG(126)
MOV_FROM_REG(127)
END(asm_mov_from_reg)
/* must be in bank 0
* parameter:
* r31: pr
* r24: b0
*/
ENTRY(kvm_resume_to_guest)
adds r16 = VMM_VCPU_SAVED_GP_OFFSET,r21
;;
ld8 r1 =[r16]
adds r20 = VMM_VCPU_VSA_BASE_OFFSET,r21
;;
mov r16=cr.ipsr
;;
ld8 r20 = [r20]
adds r19=VMM_VPD_BASE_OFFSET,r21
;;
ld8 r25=[r19]
extr.u r17=r16,IA64_PSR_RI_BIT,2
tbit.nz p6,p7=r16,IA64_PSR_RI_BIT+1
;;
(p6) mov r18=cr.iip
(p6) mov r17=r0
;;
(p6) add r18=0x10,r18
(p7) add r17=1,r17
;;
(p6) mov cr.iip=r18
dep r16=r17,r16,IA64_PSR_RI_BIT,2
;;
mov cr.ipsr=r16
adds r19= VPD_VPSR_START_OFFSET,r25
add r28=PAL_VPS_RESUME_NORMAL,r20
add r29=PAL_VPS_RESUME_HANDLER,r20
;;
ld8 r19=[r19]
mov b0=r29
cmp.ne p6,p7 = r0,r0
;;
tbit.z p6,p7 = r19,IA64_PSR_IC_BIT // p1=vpsr.ic
;;
(p6) ld8 r26=[r25]
(p7) mov b0=r28
mov pr=r31,-2
br.sptk.many b0 // call pal service
;;
END(kvm_resume_to_guest)
MOV_TO_BANK0_REG(16)
MOV_TO_BANK0_REG(17)
MOV_TO_BANK0_REG(18)
MOV_TO_BANK0_REG(19)
MOV_TO_BANK0_REG(20)
MOV_TO_BANK0_REG(21)
MOV_TO_BANK0_REG(22)
MOV_TO_BANK0_REG(23)
MOV_TO_BANK0_REG(24)
MOV_TO_BANK0_REG(25)
MOV_TO_BANK0_REG(26)
MOV_TO_BANK0_REG(27)
MOV_TO_BANK0_REG(28)
MOV_TO_BANK0_REG(29)
MOV_TO_BANK0_REG(30)
MOV_TO_BANK0_REG(31)
// mov to reg table
ENTRY(asm_mov_to_reg)
MOV_TO_REG0
MOV_TO_REG(1)
MOV_TO_REG(2)
MOV_TO_REG(3)
MOV_TO_REG(4)
MOV_TO_REG(5)
MOV_TO_REG(6)
MOV_TO_REG(7)
MOV_TO_REG(8)
MOV_TO_REG(9)
MOV_TO_REG(10)
MOV_TO_REG(11)
MOV_TO_REG(12)
MOV_TO_REG(13)
MOV_TO_REG(14)
MOV_TO_REG(15)
JMP_TO_MOV_TO_BANK0_REG(16)
JMP_TO_MOV_TO_BANK0_REG(17)
JMP_TO_MOV_TO_BANK0_REG(18)
JMP_TO_MOV_TO_BANK0_REG(19)
JMP_TO_MOV_TO_BANK0_REG(20)
JMP_TO_MOV_TO_BANK0_REG(21)
JMP_TO_MOV_TO_BANK0_REG(22)
JMP_TO_MOV_TO_BANK0_REG(23)
JMP_TO_MOV_TO_BANK0_REG(24)
JMP_TO_MOV_TO_BANK0_REG(25)
JMP_TO_MOV_TO_BANK0_REG(26)
JMP_TO_MOV_TO_BANK0_REG(27)
JMP_TO_MOV_TO_BANK0_REG(28)
JMP_TO_MOV_TO_BANK0_REG(29)
JMP_TO_MOV_TO_BANK0_REG(30)
JMP_TO_MOV_TO_BANK0_REG(31)
MOV_TO_REG(32)
MOV_TO_REG(33)
MOV_TO_REG(34)
MOV_TO_REG(35)
MOV_TO_REG(36)
MOV_TO_REG(37)
MOV_TO_REG(38)
MOV_TO_REG(39)
MOV_TO_REG(40)
MOV_TO_REG(41)
MOV_TO_REG(42)
MOV_TO_REG(43)
MOV_TO_REG(44)
MOV_TO_REG(45)
MOV_TO_REG(46)
MOV_TO_REG(47)
MOV_TO_REG(48)
MOV_TO_REG(49)
MOV_TO_REG(50)
MOV_TO_REG(51)
MOV_TO_REG(52)
MOV_TO_REG(53)
MOV_TO_REG(54)
MOV_TO_REG(55)
MOV_TO_REG(56)
MOV_TO_REG(57)
MOV_TO_REG(58)
MOV_TO_REG(59)
MOV_TO_REG(60)
MOV_TO_REG(61)
MOV_TO_REG(62)
MOV_TO_REG(63)
MOV_TO_REG(64)
MOV_TO_REG(65)
MOV_TO_REG(66)
MOV_TO_REG(67)
MOV_TO_REG(68)
MOV_TO_REG(69)
MOV_TO_REG(70)
MOV_TO_REG(71)
MOV_TO_REG(72)
MOV_TO_REG(73)
MOV_TO_REG(74)
MOV_TO_REG(75)
MOV_TO_REG(76)
MOV_TO_REG(77)
MOV_TO_REG(78)
MOV_TO_REG(79)
MOV_TO_REG(80)
MOV_TO_REG(81)
MOV_TO_REG(82)
MOV_TO_REG(83)
MOV_TO_REG(84)
MOV_TO_REG(85)
MOV_TO_REG(86)
MOV_TO_REG(87)
MOV_TO_REG(88)
MOV_TO_REG(89)
MOV_TO_REG(90)
MOV_TO_REG(91)
MOV_TO_REG(92)
MOV_TO_REG(93)
MOV_TO_REG(94)
MOV_TO_REG(95)
MOV_TO_REG(96)
MOV_TO_REG(97)
MOV_TO_REG(98)
MOV_TO_REG(99)
MOV_TO_REG(100)
MOV_TO_REG(101)
MOV_TO_REG(102)
MOV_TO_REG(103)
MOV_TO_REG(104)
MOV_TO_REG(105)
MOV_TO_REG(106)
MOV_TO_REG(107)
MOV_TO_REG(108)
MOV_TO_REG(109)
MOV_TO_REG(110)
MOV_TO_REG(111)
MOV_TO_REG(112)
MOV_TO_REG(113)
MOV_TO_REG(114)
MOV_TO_REG(115)
MOV_TO_REG(116)
MOV_TO_REG(117)
MOV_TO_REG(118)
MOV_TO_REG(119)
MOV_TO_REG(120)
MOV_TO_REG(121)
MOV_TO_REG(122)
MOV_TO_REG(123)
MOV_TO_REG(124)
MOV_TO_REG(125)
MOV_TO_REG(126)
MOV_TO_REG(127)
END(asm_mov_to_reg)

970
arch/ia64/kvm/process.c Normal file
View File

@ -0,0 +1,970 @@
/*
* process.c: handle interruption inject for guests.
* Copyright (c) 2005, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
* Shaofan Li (Susue Li) <susie.li@intel.com>
* Xiaoyan Feng (Fleming Feng) <fleming.feng@intel.com>
* Xuefei Xu (Anthony Xu) (Anthony.xu@intel.com)
* Xiantao Zhang (xiantao.zhang@intel.com)
*/
#include "vcpu.h"
#include <asm/pal.h>
#include <asm/sal.h>
#include <asm/fpswa.h>
#include <asm/kregs.h>
#include <asm/tlb.h>
fpswa_interface_t *vmm_fpswa_interface;
#define IA64_VHPT_TRANS_VECTOR 0x0000
#define IA64_INST_TLB_VECTOR 0x0400
#define IA64_DATA_TLB_VECTOR 0x0800
#define IA64_ALT_INST_TLB_VECTOR 0x0c00
#define IA64_ALT_DATA_TLB_VECTOR 0x1000
#define IA64_DATA_NESTED_TLB_VECTOR 0x1400
#define IA64_INST_KEY_MISS_VECTOR 0x1800
#define IA64_DATA_KEY_MISS_VECTOR 0x1c00
#define IA64_DIRTY_BIT_VECTOR 0x2000
#define IA64_INST_ACCESS_BIT_VECTOR 0x2400
#define IA64_DATA_ACCESS_BIT_VECTOR 0x2800
#define IA64_BREAK_VECTOR 0x2c00
#define IA64_EXTINT_VECTOR 0x3000
#define IA64_PAGE_NOT_PRESENT_VECTOR 0x5000
#define IA64_KEY_PERMISSION_VECTOR 0x5100
#define IA64_INST_ACCESS_RIGHTS_VECTOR 0x5200
#define IA64_DATA_ACCESS_RIGHTS_VECTOR 0x5300
#define IA64_GENEX_VECTOR 0x5400
#define IA64_DISABLED_FPREG_VECTOR 0x5500
#define IA64_NAT_CONSUMPTION_VECTOR 0x5600
#define IA64_SPECULATION_VECTOR 0x5700 /* UNUSED */
#define IA64_DEBUG_VECTOR 0x5900
#define IA64_UNALIGNED_REF_VECTOR 0x5a00
#define IA64_UNSUPPORTED_DATA_REF_VECTOR 0x5b00
#define IA64_FP_FAULT_VECTOR 0x5c00
#define IA64_FP_TRAP_VECTOR 0x5d00
#define IA64_LOWERPRIV_TRANSFER_TRAP_VECTOR 0x5e00
#define IA64_TAKEN_BRANCH_TRAP_VECTOR 0x5f00
#define IA64_SINGLE_STEP_TRAP_VECTOR 0x6000
/* SDM vol2 5.5 - IVA based interruption handling */
#define INITIAL_PSR_VALUE_AT_INTERRUPTION (IA64_PSR_UP | IA64_PSR_MFL |\
IA64_PSR_MFH | IA64_PSR_PK | IA64_PSR_DT | \
IA64_PSR_RT | IA64_PSR_MC|IA64_PSR_IT)
#define DOMN_PAL_REQUEST 0x110000
#define DOMN_SAL_REQUEST 0x110001
static u64 vec2off[68] = {0x0, 0x400, 0x800, 0xc00, 0x1000, 0x1400, 0x1800,
0x1c00, 0x2000, 0x2400, 0x2800, 0x2c00, 0x3000, 0x3400, 0x3800, 0x3c00,
0x4000, 0x4400, 0x4800, 0x4c00, 0x5000, 0x5100, 0x5200, 0x5300, 0x5400,
0x5500, 0x5600, 0x5700, 0x5800, 0x5900, 0x5a00, 0x5b00, 0x5c00, 0x5d00,
0x5e00, 0x5f00, 0x6000, 0x6100, 0x6200, 0x6300, 0x6400, 0x6500, 0x6600,
0x6700, 0x6800, 0x6900, 0x6a00, 0x6b00, 0x6c00, 0x6d00, 0x6e00, 0x6f00,
0x7000, 0x7100, 0x7200, 0x7300, 0x7400, 0x7500, 0x7600, 0x7700, 0x7800,
0x7900, 0x7a00, 0x7b00, 0x7c00, 0x7d00, 0x7e00, 0x7f00
};
static void collect_interruption(struct kvm_vcpu *vcpu)
{
u64 ipsr;
u64 vdcr;
u64 vifs;
unsigned long vpsr;
struct kvm_pt_regs *regs = vcpu_regs(vcpu);
vpsr = vcpu_get_psr(vcpu);
vcpu_bsw0(vcpu);
if (vpsr & IA64_PSR_IC) {
/* Sync mpsr id/da/dd/ss/ed bits to vipsr
* since after guest do rfi, we still want these bits on in
* mpsr
*/
ipsr = regs->cr_ipsr;
vpsr = vpsr | (ipsr & (IA64_PSR_ID | IA64_PSR_DA
| IA64_PSR_DD | IA64_PSR_SS
| IA64_PSR_ED));
vcpu_set_ipsr(vcpu, vpsr);
/* Currently, for trap, we do not advance IIP to next
* instruction. That's because we assume caller already
* set up IIP correctly
*/
vcpu_set_iip(vcpu , regs->cr_iip);
/* set vifs.v to zero */
vifs = VCPU(vcpu, ifs);
vifs &= ~IA64_IFS_V;
vcpu_set_ifs(vcpu, vifs);
vcpu_set_iipa(vcpu, VMX(vcpu, cr_iipa));
}
vdcr = VCPU(vcpu, dcr);
/* Set guest psr
* up/mfl/mfh/pk/dt/rt/mc/it keeps unchanged
* be: set to the value of dcr.be
* pp: set to the value of dcr.pp
*/
vpsr &= INITIAL_PSR_VALUE_AT_INTERRUPTION;
vpsr |= (vdcr & IA64_DCR_BE);
/* VDCR pp bit position is different from VPSR pp bit */
if (vdcr & IA64_DCR_PP) {
vpsr |= IA64_PSR_PP;
} else {
vpsr &= ~IA64_PSR_PP;;
}
vcpu_set_psr(vcpu, vpsr);
}
void inject_guest_interruption(struct kvm_vcpu *vcpu, u64 vec)
{
u64 viva;
struct kvm_pt_regs *regs;
union ia64_isr pt_isr;
regs = vcpu_regs(vcpu);
/* clear cr.isr.ir (incomplete register frame)*/
pt_isr.val = VMX(vcpu, cr_isr);
pt_isr.ir = 0;
VMX(vcpu, cr_isr) = pt_isr.val;
collect_interruption(vcpu);
viva = vcpu_get_iva(vcpu);
regs->cr_iip = viva + vec;
}
static u64 vcpu_get_itir_on_fault(struct kvm_vcpu *vcpu, u64 ifa)
{
union ia64_rr rr, rr1;
rr.val = vcpu_get_rr(vcpu, ifa);
rr1.val = 0;
rr1.ps = rr.ps;
rr1.rid = rr.rid;
return (rr1.val);
}
/*
* Set vIFA & vITIR & vIHA, when vPSR.ic =1
* Parameter:
* set_ifa: if true, set vIFA
* set_itir: if true, set vITIR
* set_iha: if true, set vIHA
*/
void set_ifa_itir_iha(struct kvm_vcpu *vcpu, u64 vadr,
int set_ifa, int set_itir, int set_iha)
{
long vpsr;
u64 value;
vpsr = VCPU(vcpu, vpsr);
/* Vol2, Table 8-1 */
if (vpsr & IA64_PSR_IC) {
if (set_ifa)
vcpu_set_ifa(vcpu, vadr);
if (set_itir) {
value = vcpu_get_itir_on_fault(vcpu, vadr);
vcpu_set_itir(vcpu, value);
}
if (set_iha) {
value = vcpu_thash(vcpu, vadr);
vcpu_set_iha(vcpu, value);
}
}
}
/*
* Data TLB Fault
* @ Data TLB vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void dtlb_fault(struct kvm_vcpu *vcpu, u64 vadr)
{
/* If vPSR.ic, IFA, ITIR, IHA */
set_ifa_itir_iha(vcpu, vadr, 1, 1, 1);
inject_guest_interruption(vcpu, IA64_DATA_TLB_VECTOR);
}
/*
* Instruction TLB Fault
* @ Instruction TLB vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void itlb_fault(struct kvm_vcpu *vcpu, u64 vadr)
{
/* If vPSR.ic, IFA, ITIR, IHA */
set_ifa_itir_iha(vcpu, vadr, 1, 1, 1);
inject_guest_interruption(vcpu, IA64_INST_TLB_VECTOR);
}
/*
* Data Nested TLB Fault
* @ Data Nested TLB Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void nested_dtlb(struct kvm_vcpu *vcpu)
{
inject_guest_interruption(vcpu, IA64_DATA_NESTED_TLB_VECTOR);
}
/*
* Alternate Data TLB Fault
* @ Alternate Data TLB vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void alt_dtlb(struct kvm_vcpu *vcpu, u64 vadr)
{
set_ifa_itir_iha(vcpu, vadr, 1, 1, 0);
inject_guest_interruption(vcpu, IA64_ALT_DATA_TLB_VECTOR);
}
/*
* Data TLB Fault
* @ Data TLB vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void alt_itlb(struct kvm_vcpu *vcpu, u64 vadr)
{
set_ifa_itir_iha(vcpu, vadr, 1, 1, 0);
inject_guest_interruption(vcpu, IA64_ALT_INST_TLB_VECTOR);
}
/* Deal with:
* VHPT Translation Vector
*/
static void _vhpt_fault(struct kvm_vcpu *vcpu, u64 vadr)
{
/* If vPSR.ic, IFA, ITIR, IHA*/
set_ifa_itir_iha(vcpu, vadr, 1, 1, 1);
inject_guest_interruption(vcpu, IA64_VHPT_TRANS_VECTOR);
}
/*
* VHPT Instruction Fault
* @ VHPT Translation vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void ivhpt_fault(struct kvm_vcpu *vcpu, u64 vadr)
{
_vhpt_fault(vcpu, vadr);
}
/*
* VHPT Data Fault
* @ VHPT Translation vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void dvhpt_fault(struct kvm_vcpu *vcpu, u64 vadr)
{
_vhpt_fault(vcpu, vadr);
}
/*
* Deal with:
* General Exception vector
*/
void _general_exception(struct kvm_vcpu *vcpu)
{
inject_guest_interruption(vcpu, IA64_GENEX_VECTOR);
}
/*
* Illegal Operation Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void illegal_op(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/*
* Illegal Dependency Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void illegal_dep(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/*
* Reserved Register/Field Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void rsv_reg_field(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/*
* Privileged Operation Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void privilege_op(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/*
* Unimplement Data Address Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void unimpl_daddr(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/*
* Privileged Register Fault
* @ General Exception Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void privilege_reg(struct kvm_vcpu *vcpu)
{
_general_exception(vcpu);
}
/* Deal with
* Nat consumption vector
* Parameter:
* vaddr: Optional, if t == REGISTER
*/
static void _nat_consumption_fault(struct kvm_vcpu *vcpu, u64 vadr,
enum tlb_miss_type t)
{
/* If vPSR.ic && t == DATA/INST, IFA */
if (t == DATA || t == INSTRUCTION) {
/* IFA */
set_ifa_itir_iha(vcpu, vadr, 1, 0, 0);
}
inject_guest_interruption(vcpu, IA64_NAT_CONSUMPTION_VECTOR);
}
/*
* Instruction Nat Page Consumption Fault
* @ Nat Consumption Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void inat_page_consumption(struct kvm_vcpu *vcpu, u64 vadr)
{
_nat_consumption_fault(vcpu, vadr, INSTRUCTION);
}
/*
* Register Nat Consumption Fault
* @ Nat Consumption Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void rnat_consumption(struct kvm_vcpu *vcpu)
{
_nat_consumption_fault(vcpu, 0, REGISTER);
}
/*
* Data Nat Page Consumption Fault
* @ Nat Consumption Vector
* Refer to SDM Vol2 Table 5-6 & 8-1
*/
void dnat_page_consumption(struct kvm_vcpu *vcpu, u64 vadr)
{
_nat_consumption_fault(vcpu, vadr, DATA);
}
/* Deal with
* Page not present vector
*/
static void __page_not_present(struct kvm_vcpu *vcpu, u64 vadr)
{
/* If vPSR.ic, IFA, ITIR */
set_ifa_itir_iha(vcpu, vadr, 1, 1, 0);
inject_guest_interruption(vcpu, IA64_PAGE_NOT_PRESENT_VECTOR);
}
void data_page_not_present(struct kvm_vcpu *vcpu, u64 vadr)
{
__page_not_present(vcpu, vadr);
}
void inst_page_not_present(struct kvm_vcpu *vcpu, u64 vadr)
{
__page_not_present(vcpu, vadr);
}
/* Deal with
* Data access rights vector
*/
void data_access_rights(struct kvm_vcpu *vcpu, u64 vadr)
{
/* If vPSR.ic, IFA, ITIR */
set_ifa_itir_iha(vcpu, vadr, 1, 1, 0);
inject_guest_interruption(vcpu, IA64_DATA_ACCESS_RIGHTS_VECTOR);
}
fpswa_ret_t vmm_fp_emulate(int fp_fault, void *bundle, unsigned long *ipsr,
unsigned long *fpsr, unsigned long *isr, unsigned long *pr,
unsigned long *ifs, struct kvm_pt_regs *regs)
{
fp_state_t fp_state;
fpswa_ret_t ret;
struct kvm_vcpu *vcpu = current_vcpu;
uint64_t old_rr7 = ia64_get_rr(7UL<<61);
if (!vmm_fpswa_interface)
return (fpswa_ret_t) {-1, 0, 0, 0};
/*
* Just let fpswa driver to use hardware fp registers.
* No fp register is valid in memory.
*/
memset(&fp_state, 0, sizeof(fp_state_t));
/*
* unsigned long (*EFI_FPSWA) (
* unsigned long trap_type,
* void *Bundle,
* unsigned long *pipsr,
* unsigned long *pfsr,
* unsigned long *pisr,
* unsigned long *ppreds,
* unsigned long *pifs,
* void *fp_state);
*/
/*Call host fpswa interface directly to virtualize
*guest fpswa request!
*/
ia64_set_rr(7UL << 61, vcpu->arch.host.rr[7]);
ia64_srlz_d();
ret = (*vmm_fpswa_interface->fpswa) (fp_fault, bundle,
ipsr, fpsr, isr, pr, ifs, &fp_state);
ia64_set_rr(7UL << 61, old_rr7);
ia64_srlz_d();
return ret;
}
/*
* Handle floating-point assist faults and traps for domain.
*/
unsigned long vmm_handle_fpu_swa(int fp_fault, struct kvm_pt_regs *regs,
unsigned long isr)
{
struct kvm_vcpu *v = current_vcpu;
IA64_BUNDLE bundle;
unsigned long fault_ip;
fpswa_ret_t ret;
fault_ip = regs->cr_iip;
/*
* When the FP trap occurs, the trapping instruction is completed.
* If ipsr.ri == 0, there is the trapping instruction in previous
* bundle.
*/
if (!fp_fault && (ia64_psr(regs)->ri == 0))
fault_ip -= 16;
if (fetch_code(v, fault_ip, &bundle))
return -EAGAIN;
if (!bundle.i64[0] && !bundle.i64[1])
return -EACCES;
ret = vmm_fp_emulate(fp_fault, &bundle, &regs->cr_ipsr, &regs->ar_fpsr,
&isr, &regs->pr, &regs->cr_ifs, regs);
return ret.status;
}
void reflect_interruption(u64 ifa, u64 isr, u64 iim,
u64 vec, struct kvm_pt_regs *regs)
{
u64 vector;
int status ;
struct kvm_vcpu *vcpu = current_vcpu;
u64 vpsr = VCPU(vcpu, vpsr);
vector = vec2off[vec];
if (!(vpsr & IA64_PSR_IC) && (vector != IA64_DATA_NESTED_TLB_VECTOR)) {
panic_vm(vcpu);
return;
}
switch (vec) {
case 32: /*IA64_FP_FAULT_VECTOR*/
status = vmm_handle_fpu_swa(1, regs, isr);
if (!status) {
vcpu_increment_iip(vcpu);
return;
} else if (-EAGAIN == status)
return;
break;
case 33: /*IA64_FP_TRAP_VECTOR*/
status = vmm_handle_fpu_swa(0, regs, isr);
if (!status)
return ;
else if (-EAGAIN == status) {
vcpu_decrement_iip(vcpu);
return ;
}
break;
}
VCPU(vcpu, isr) = isr;
VCPU(vcpu, iipa) = regs->cr_iip;
if (vector == IA64_BREAK_VECTOR || vector == IA64_SPECULATION_VECTOR)
VCPU(vcpu, iim) = iim;
else
set_ifa_itir_iha(vcpu, ifa, 1, 1, 1);
inject_guest_interruption(vcpu, vector);
}
static void set_pal_call_data(struct kvm_vcpu *vcpu)
{
struct exit_ctl_data *p = &vcpu->arch.exit_data;
/*FIXME:For static and stacked convention, firmware
* has put the parameters in gr28-gr31 before
* break to vmm !!*/
p->u.pal_data.gr28 = vcpu_get_gr(vcpu, 28);
p->u.pal_data.gr29 = vcpu_get_gr(vcpu, 29);
p->u.pal_data.gr30 = vcpu_get_gr(vcpu, 30);
p->u.pal_data.gr31 = vcpu_get_gr(vcpu, 31);
p->exit_reason = EXIT_REASON_PAL_CALL;
}
static void set_pal_call_result(struct kvm_vcpu *vcpu)
{
struct exit_ctl_data *p = &vcpu->arch.exit_data;
if (p->exit_reason == EXIT_REASON_PAL_CALL) {
vcpu_set_gr(vcpu, 8, p->u.pal_data.ret.status, 0);
vcpu_set_gr(vcpu, 9, p->u.pal_data.ret.v0, 0);
vcpu_set_gr(vcpu, 10, p->u.pal_data.ret.v1, 0);
vcpu_set_gr(vcpu, 11, p->u.pal_data.ret.v2, 0);
} else
panic_vm(vcpu);
}
static void set_sal_call_data(struct kvm_vcpu *vcpu)
{
struct exit_ctl_data *p = &vcpu->arch.exit_data;
p->u.sal_data.in0 = vcpu_get_gr(vcpu, 32);
p->u.sal_data.in1 = vcpu_get_gr(vcpu, 33);
p->u.sal_data.in2 = vcpu_get_gr(vcpu, 34);
p->u.sal_data.in3 = vcpu_get_gr(vcpu, 35);
p->u.sal_data.in4 = vcpu_get_gr(vcpu, 36);
p->u.sal_data.in5 = vcpu_get_gr(vcpu, 37);
p->u.sal_data.in6 = vcpu_get_gr(vcpu, 38);
p->u.sal_data.in7 = vcpu_get_gr(vcpu, 39);
p->exit_reason = EXIT_REASON_SAL_CALL;
}
static void set_sal_call_result(struct kvm_vcpu *vcpu)
{
struct exit_ctl_data *p = &vcpu->arch.exit_data;
if (p->exit_reason == EXIT_REASON_SAL_CALL) {
vcpu_set_gr(vcpu, 8, p->u.sal_data.ret.r8, 0);
vcpu_set_gr(vcpu, 9, p->u.sal_data.ret.r9, 0);
vcpu_set_gr(vcpu, 10, p->u.sal_data.ret.r10, 0);
vcpu_set_gr(vcpu, 11, p->u.sal_data.ret.r11, 0);
} else
panic_vm(vcpu);
}
void kvm_ia64_handle_break(unsigned long ifa, struct kvm_pt_regs *regs,
unsigned long isr, unsigned long iim)
{
struct kvm_vcpu *v = current_vcpu;
if (ia64_psr(regs)->cpl == 0) {
/* Allow hypercalls only when cpl = 0. */
if (iim == DOMN_PAL_REQUEST) {
set_pal_call_data(v);
vmm_transition(v);
set_pal_call_result(v);
vcpu_increment_iip(v);
return;
} else if (iim == DOMN_SAL_REQUEST) {
set_sal_call_data(v);
vmm_transition(v);
set_sal_call_result(v);
vcpu_increment_iip(v);
return;
}
}
reflect_interruption(ifa, isr, iim, 11, regs);
}
void check_pending_irq(struct kvm_vcpu *vcpu)
{
int mask, h_pending, h_inservice;
u64 isr;
unsigned long vpsr;
struct kvm_pt_regs *regs = vcpu_regs(vcpu);
h_pending = highest_pending_irq(vcpu);
if (h_pending == NULL_VECTOR) {
update_vhpi(vcpu, NULL_VECTOR);
return;
}
h_inservice = highest_inservice_irq(vcpu);
vpsr = VCPU(vcpu, vpsr);
mask = irq_masked(vcpu, h_pending, h_inservice);
if ((vpsr & IA64_PSR_I) && IRQ_NO_MASKED == mask) {
isr = vpsr & IA64_PSR_RI;
update_vhpi(vcpu, h_pending);
reflect_interruption(0, isr, 0, 12, regs); /* EXT IRQ */
} else if (mask == IRQ_MASKED_BY_INSVC) {
if (VCPU(vcpu, vhpi))
update_vhpi(vcpu, NULL_VECTOR);
} else {
/* masked by vpsr.i or vtpr.*/
update_vhpi(vcpu, h_pending);
}
}
static void generate_exirq(struct kvm_vcpu *vcpu)
{
unsigned vpsr;
uint64_t isr;
struct kvm_pt_regs *regs = vcpu_regs(vcpu);
vpsr = VCPU(vcpu, vpsr);
isr = vpsr & IA64_PSR_RI;
if (!(vpsr & IA64_PSR_IC))
panic_vm(vcpu);
reflect_interruption(0, isr, 0, 12, regs); /* EXT IRQ */
}
void vhpi_detection(struct kvm_vcpu *vcpu)
{
uint64_t threshold, vhpi;
union ia64_tpr vtpr;
struct ia64_psr vpsr;
vpsr = *(struct ia64_psr *)&VCPU(vcpu, vpsr);
vtpr.val = VCPU(vcpu, tpr);
threshold = ((!vpsr.i) << 5) | (vtpr.mmi << 4) | vtpr.mic;
vhpi = VCPU(vcpu, vhpi);
if (vhpi > threshold) {
/* interrupt actived*/
generate_exirq(vcpu);
}
}
void leave_hypervisor_tail(void)
{
struct kvm_vcpu *v = current_vcpu;
if (VMX(v, timer_check)) {
VMX(v, timer_check) = 0;
if (VMX(v, itc_check)) {
if (vcpu_get_itc(v) > VCPU(v, itm)) {
if (!(VCPU(v, itv) & (1 << 16))) {
vcpu_pend_interrupt(v, VCPU(v, itv)
& 0xff);
VMX(v, itc_check) = 0;
} else {
v->arch.timer_pending = 1;
}
VMX(v, last_itc) = VCPU(v, itm) + 1;
}
}
}
rmb();
if (v->arch.irq_new_pending) {
v->arch.irq_new_pending = 0;
VMX(v, irq_check) = 0;
check_pending_irq(v);
return;
}
if (VMX(v, irq_check)) {
VMX(v, irq_check) = 0;
vhpi_detection(v);
}
}
static inline void handle_lds(struct kvm_pt_regs *regs)
{
regs->cr_ipsr |= IA64_PSR_ED;
}
void physical_tlb_miss(struct kvm_vcpu *vcpu, unsigned long vadr, int type)
{
unsigned long pte;
union ia64_rr rr;
rr.val = ia64_get_rr(vadr);
pte = vadr & _PAGE_PPN_MASK;
pte = pte | PHY_PAGE_WB;
thash_vhpt_insert(vcpu, pte, (u64)(rr.ps << 2), vadr, type);
return;
}
void kvm_page_fault(u64 vadr , u64 vec, struct kvm_pt_regs *regs)
{
unsigned long vpsr;
int type;
u64 vhpt_adr, gppa, pteval, rr, itir;
union ia64_isr misr;
union ia64_pta vpta;
struct thash_data *data;
struct kvm_vcpu *v = current_vcpu;
vpsr = VCPU(v, vpsr);
misr.val = VMX(v, cr_isr);
type = vec;
if (is_physical_mode(v) && (!(vadr << 1 >> 62))) {
if (vec == 2) {
if (__gpfn_is_io((vadr << 1) >> (PAGE_SHIFT + 1))) {
emulate_io_inst(v, ((vadr << 1) >> 1), 4);
return;
}
}
physical_tlb_miss(v, vadr, type);
return;
}
data = vtlb_lookup(v, vadr, type);
if (data != 0) {
if (type == D_TLB) {
gppa = (vadr & ((1UL << data->ps) - 1))
+ (data->ppn >> (data->ps - 12) << data->ps);
if (__gpfn_is_io(gppa >> PAGE_SHIFT)) {
if (data->pl >= ((regs->cr_ipsr >>
IA64_PSR_CPL0_BIT) & 3))
emulate_io_inst(v, gppa, data->ma);
else {
vcpu_set_isr(v, misr.val);
data_access_rights(v, vadr);
}
return ;
}
}
thash_vhpt_insert(v, data->page_flags, data->itir, vadr, type);
} else if (type == D_TLB) {
if (misr.sp) {
handle_lds(regs);
return;
}
rr = vcpu_get_rr(v, vadr);
itir = rr & (RR_RID_MASK | RR_PS_MASK);
if (!vhpt_enabled(v, vadr, misr.rs ? RSE_REF : DATA_REF)) {
if (vpsr & IA64_PSR_IC) {
vcpu_set_isr(v, misr.val);
alt_dtlb(v, vadr);
} else {
nested_dtlb(v);
}
return ;
}
vpta.val = vcpu_get_pta(v);
/* avoid recursively walking (short format) VHPT */
vhpt_adr = vcpu_thash(v, vadr);
if (!guest_vhpt_lookup(vhpt_adr, &pteval)) {
/* VHPT successfully read. */
if (!(pteval & _PAGE_P)) {
if (vpsr & IA64_PSR_IC) {
vcpu_set_isr(v, misr.val);
dtlb_fault(v, vadr);
} else {
nested_dtlb(v);
}
} else if ((pteval & _PAGE_MA_MASK) != _PAGE_MA_ST) {
thash_purge_and_insert(v, pteval, itir,
vadr, D_TLB);
} else if (vpsr & IA64_PSR_IC) {
vcpu_set_isr(v, misr.val);
dtlb_fault(v, vadr);
} else {
nested_dtlb(v);
}
} else {
/* Can't read VHPT. */
if (vpsr & IA64_PSR_IC) {
vcpu_set_isr(v, misr.val);
dvhpt_fault(v, vadr);
} else {
nested_dtlb(v);
}
}
} else if (type == I_TLB) {
if (!(vpsr & IA64_PSR_IC))
misr.ni = 1;
if (!vhpt_enabled(v, vadr, INST_REF)) {
vcpu_set_isr(v, misr.val);
alt_itlb(v, vadr);
return;
}
vpta.val = vcpu_get_pta(v);
vhpt_adr = vcpu_thash(v, vadr);
if (!guest_vhpt_lookup(vhpt_adr, &pteval)) {
/* VHPT successfully read. */
if (pteval & _PAGE_P) {
if ((pteval & _PAGE_MA_MASK) == _PAGE_MA_ST) {
vcpu_set_isr(v, misr.val);
itlb_fault(v, vadr);
return ;
}
rr = vcpu_get_rr(v, vadr);
itir = rr & (RR_RID_MASK | RR_PS_MASK);
thash_purge_and_insert(v, pteval, itir,
vadr, I_TLB);
} else {
vcpu_set_isr(v, misr.val);
inst_page_not_present(v, vadr);
}
} else {
vcpu_set_isr(v, misr.val);
ivhpt_fault(v, vadr);
}
}
}
void kvm_vexirq(struct kvm_vcpu *vcpu)
{
u64 vpsr, isr;
struct kvm_pt_regs *regs;
regs = vcpu_regs(vcpu);
vpsr = VCPU(vcpu, vpsr);
isr = vpsr & IA64_PSR_RI;
reflect_interruption(0, isr, 0, 12, regs); /*EXT IRQ*/
}
void kvm_ia64_handle_irq(struct kvm_vcpu *v)
{
struct exit_ctl_data *p = &v->arch.exit_data;
long psr;
local_irq_save(psr);
p->exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
vmm_transition(v);
local_irq_restore(psr);
VMX(v, timer_check) = 1;
}
static void ptc_ga_remote_func(struct kvm_vcpu *v, int pos)
{
u64 oldrid, moldrid, oldpsbits, vaddr;
struct kvm_ptc_g *p = &v->arch.ptc_g_data[pos];
vaddr = p->vaddr;
oldrid = VMX(v, vrr[0]);
VMX(v, vrr[0]) = p->rr;
oldpsbits = VMX(v, psbits[0]);
VMX(v, psbits[0]) = VMX(v, psbits[REGION_NUMBER(vaddr)]);
moldrid = ia64_get_rr(0x0);
ia64_set_rr(0x0, vrrtomrr(p->rr));
ia64_srlz_d();
vaddr = PAGEALIGN(vaddr, p->ps);
thash_purge_entries_remote(v, vaddr, p->ps);
VMX(v, vrr[0]) = oldrid;
VMX(v, psbits[0]) = oldpsbits;
ia64_set_rr(0x0, moldrid);
ia64_dv_serialize_data();
}
static void vcpu_do_resume(struct kvm_vcpu *vcpu)
{
/*Re-init VHPT and VTLB once from resume*/
vcpu->arch.vhpt.num = VHPT_NUM_ENTRIES;
thash_init(&vcpu->arch.vhpt, VHPT_SHIFT);
vcpu->arch.vtlb.num = VTLB_NUM_ENTRIES;
thash_init(&vcpu->arch.vtlb, VTLB_SHIFT);
ia64_set_pta(vcpu->arch.vhpt.pta.val);
}
static void kvm_do_resume_op(struct kvm_vcpu *vcpu)
{
if (test_and_clear_bit(KVM_REQ_RESUME, &vcpu->requests)) {
vcpu_do_resume(vcpu);
return;
}
if (unlikely(test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))) {
thash_purge_all(vcpu);
return;
}
if (test_and_clear_bit(KVM_REQ_PTC_G, &vcpu->requests)) {
while (vcpu->arch.ptc_g_count > 0)
ptc_ga_remote_func(vcpu, --vcpu->arch.ptc_g_count);
}
}
void vmm_transition(struct kvm_vcpu *vcpu)
{
ia64_call_vsa(PAL_VPS_SAVE, (unsigned long)vcpu->arch.vpd,
0, 0, 0, 0, 0, 0);
vmm_trampoline(&vcpu->arch.guest, &vcpu->arch.host);
ia64_call_vsa(PAL_VPS_RESTORE, (unsigned long)vcpu->arch.vpd,
0, 0, 0, 0, 0, 0);
kvm_do_resume_op(vcpu);
}

1038
arch/ia64/kvm/trampoline.S Normal file

File diff suppressed because it is too large Load Diff

2163
arch/ia64/kvm/vcpu.c Normal file

File diff suppressed because it is too large Load Diff

740
arch/ia64/kvm/vcpu.h Normal file
View File

@ -0,0 +1,740 @@
/*
* vcpu.h: vcpu routines
* Copyright (c) 2005, Intel Corporation.
* Xuefei Xu (Anthony Xu) (Anthony.xu@intel.com)
* Yaozu Dong (Eddie Dong) (Eddie.dong@intel.com)
*
* Copyright (c) 2007, Intel Corporation.
* Xuefei Xu (Anthony Xu) (Anthony.xu@intel.com)
* Xiantao Zhang (xiantao.zhang@intel.com)
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#ifndef __KVM_VCPU_H__
#define __KVM_VCPU_H__
#include <asm/types.h>
#include <asm/fpu.h>
#include <asm/processor.h>
#ifndef __ASSEMBLY__
#include "vti.h"
#include <linux/kvm_host.h>
#include <linux/spinlock.h>
typedef unsigned long IA64_INST;
typedef union U_IA64_BUNDLE {
unsigned long i64[2];
struct { unsigned long template:5, slot0:41, slot1a:18,
slot1b:23, slot2:41; };
/* NOTE: following doesn't work because bitfields can't cross natural
size boundaries
struct { unsigned long template:5, slot0:41, slot1:41, slot2:41; }; */
} IA64_BUNDLE;
typedef union U_INST64_A5 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, imm7b:7, r3:2, imm5c:5,
imm9d:9, s:1, major:4; };
} INST64_A5;
typedef union U_INST64_B4 {
IA64_INST inst;
struct { unsigned long qp:6, btype:3, un3:3, p:1, b2:3, un11:11, x6:6,
wh:2, d:1, un1:1, major:4; };
} INST64_B4;
typedef union U_INST64_B8 {
IA64_INST inst;
struct { unsigned long qp:6, un21:21, x6:6, un4:4, major:4; };
} INST64_B8;
typedef union U_INST64_B9 {
IA64_INST inst;
struct { unsigned long qp:6, imm20:20, :1, x6:6, :3, i:1, major:4; };
} INST64_B9;
typedef union U_INST64_I19 {
IA64_INST inst;
struct { unsigned long qp:6, imm20:20, :1, x6:6, x3:3, i:1, major:4; };
} INST64_I19;
typedef union U_INST64_I26 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, ar3:7, x6:6, x3:3, :1, major:4; };
} INST64_I26;
typedef union U_INST64_I27 {
IA64_INST inst;
struct { unsigned long qp:6, :7, imm:7, ar3:7, x6:6, x3:3, s:1, major:4; };
} INST64_I27;
typedef union U_INST64_I28 { /* not privileged (mov from AR) */
IA64_INST inst;
struct { unsigned long qp:6, r1:7, :7, ar3:7, x6:6, x3:3, :1, major:4; };
} INST64_I28;
typedef union U_INST64_M28 {
IA64_INST inst;
struct { unsigned long qp:6, :14, r3:7, x6:6, x3:3, :1, major:4; };
} INST64_M28;
typedef union U_INST64_M29 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, ar3:7, x6:6, x3:3, :1, major:4; };
} INST64_M29;
typedef union U_INST64_M30 {
IA64_INST inst;
struct { unsigned long qp:6, :7, imm:7, ar3:7, x4:4, x2:2,
x3:3, s:1, major:4; };
} INST64_M30;
typedef union U_INST64_M31 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, :7, ar3:7, x6:6, x3:3, :1, major:4; };
} INST64_M31;
typedef union U_INST64_M32 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, cr3:7, x6:6, x3:3, :1, major:4; };
} INST64_M32;
typedef union U_INST64_M33 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, :7, cr3:7, x6:6, x3:3, :1, major:4; };
} INST64_M33;
typedef union U_INST64_M35 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, :7, x6:6, x3:3, :1, major:4; };
} INST64_M35;
typedef union U_INST64_M36 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, :14, x6:6, x3:3, :1, major:4; };
} INST64_M36;
typedef union U_INST64_M37 {
IA64_INST inst;
struct { unsigned long qp:6, imm20a:20, :1, x4:4, x2:2, x3:3,
i:1, major:4; };
} INST64_M37;
typedef union U_INST64_M41 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, :7, x6:6, x3:3, :1, major:4; };
} INST64_M41;
typedef union U_INST64_M42 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, r3:7, x6:6, x3:3, :1, major:4; };
} INST64_M42;
typedef union U_INST64_M43 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, :7, r3:7, x6:6, x3:3, :1, major:4; };
} INST64_M43;
typedef union U_INST64_M44 {
IA64_INST inst;
struct { unsigned long qp:6, imm:21, x4:4, i2:2, x3:3, i:1, major:4; };
} INST64_M44;
typedef union U_INST64_M45 {
IA64_INST inst;
struct { unsigned long qp:6, :7, r2:7, r3:7, x6:6, x3:3, :1, major:4; };
} INST64_M45;
typedef union U_INST64_M46 {
IA64_INST inst;
struct { unsigned long qp:6, r1:7, un7:7, r3:7, x6:6,
x3:3, un1:1, major:4; };
} INST64_M46;
typedef union U_INST64_M47 {
IA64_INST inst;
struct { unsigned long qp:6, un14:14, r3:7, x6:6, x3:3, un1:1, major:4; };
} INST64_M47;
typedef union U_INST64_M1{
IA64_INST inst;
struct { unsigned long qp:6, r1:7, un7:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M1;
typedef union U_INST64_M2{
IA64_INST inst;
struct { unsigned long qp:6, r1:7, r2:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M2;
typedef union U_INST64_M3{
IA64_INST inst;
struct { unsigned long qp:6, r1:7, imm7:7, r3:7, i:1, hint:2,
x6:6, s:1, major:4; };
} INST64_M3;
typedef union U_INST64_M4 {
IA64_INST inst;
struct { unsigned long qp:6, un7:7, r2:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M4;
typedef union U_INST64_M5 {
IA64_INST inst;
struct { unsigned long qp:6, imm7:7, r2:7, r3:7, i:1, hint:2,
x6:6, s:1, major:4; };
} INST64_M5;
typedef union U_INST64_M6 {
IA64_INST inst;
struct { unsigned long qp:6, f1:7, un7:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M6;
typedef union U_INST64_M9 {
IA64_INST inst;
struct { unsigned long qp:6, :7, f2:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M9;
typedef union U_INST64_M10 {
IA64_INST inst;
struct { unsigned long qp:6, imm7:7, f2:7, r3:7, i:1, hint:2,
x6:6, s:1, major:4; };
} INST64_M10;
typedef union U_INST64_M12 {
IA64_INST inst;
struct { unsigned long qp:6, f1:7, f2:7, r3:7, x:1, hint:2,
x6:6, m:1, major:4; };
} INST64_M12;
typedef union U_INST64_M15 {
IA64_INST inst;
struct { unsigned long qp:6, :7, imm7:7, r3:7, i:1, hint:2,
x6:6, s:1, major:4; };
} INST64_M15;
typedef union U_INST64 {
IA64_INST inst;
struct { unsigned long :37, major:4; } generic;
INST64_A5 A5; /* used in build_hypercall_bundle only */
INST64_B4 B4; /* used in build_hypercall_bundle only */
INST64_B8 B8; /* rfi, bsw.[01] */
INST64_B9 B9; /* break.b */
INST64_I19 I19; /* used in build_hypercall_bundle only */
INST64_I26 I26; /* mov register to ar (I unit) */
INST64_I27 I27; /* mov immediate to ar (I unit) */
INST64_I28 I28; /* mov from ar (I unit) */
INST64_M1 M1; /* ld integer */
INST64_M2 M2;
INST64_M3 M3;
INST64_M4 M4; /* st integer */
INST64_M5 M5;
INST64_M6 M6; /* ldfd floating pointer */
INST64_M9 M9; /* stfd floating pointer */
INST64_M10 M10; /* stfd floating pointer */
INST64_M12 M12; /* ldfd pair floating pointer */
INST64_M15 M15; /* lfetch + imm update */
INST64_M28 M28; /* purge translation cache entry */
INST64_M29 M29; /* mov register to ar (M unit) */
INST64_M30 M30; /* mov immediate to ar (M unit) */
INST64_M31 M31; /* mov from ar (M unit) */
INST64_M32 M32; /* mov reg to cr */
INST64_M33 M33; /* mov from cr */
INST64_M35 M35; /* mov to psr */
INST64_M36 M36; /* mov from psr */
INST64_M37 M37; /* break.m */
INST64_M41 M41; /* translation cache insert */
INST64_M42 M42; /* mov to indirect reg/translation reg insert*/
INST64_M43 M43; /* mov from indirect reg */
INST64_M44 M44; /* set/reset system mask */
INST64_M45 M45; /* translation purge */
INST64_M46 M46; /* translation access (tpa,tak) */
INST64_M47 M47; /* purge translation entry */
} INST64;
#define MASK_41 ((unsigned long)0x1ffffffffff)
/* Virtual address memory attributes encoding */
#define VA_MATTR_WB 0x0
#define VA_MATTR_UC 0x4
#define VA_MATTR_UCE 0x5
#define VA_MATTR_WC 0x6
#define VA_MATTR_NATPAGE 0x7
#define PMASK(size) (~((size) - 1))
#define PSIZE(size) (1UL<<(size))
#define CLEARLSB(ppn, nbits) (((ppn) >> (nbits)) << (nbits))
#define PAGEALIGN(va, ps) CLEARLSB(va, ps)
#define PAGE_FLAGS_RV_MASK (0x2|(0x3UL<<50)|(((1UL<<11)-1)<<53))
#define _PAGE_MA_ST (0x1 << 2) /* is reserved for software use */
#define ARCH_PAGE_SHIFT 12
#define INVALID_TI_TAG (1UL << 63)
#define VTLB_PTE_P_BIT 0
#define VTLB_PTE_IO_BIT 60
#define VTLB_PTE_IO (1UL<<VTLB_PTE_IO_BIT)
#define VTLB_PTE_P (1UL<<VTLB_PTE_P_BIT)
#define vcpu_quick_region_check(_tr_regions,_ifa) \
(_tr_regions & (1 << ((unsigned long)_ifa >> 61)))
#define vcpu_quick_region_set(_tr_regions,_ifa) \
do {_tr_regions |= (1 << ((unsigned long)_ifa >> 61)); } while (0)
static inline void vcpu_set_tr(struct thash_data *trp, u64 pte, u64 itir,
u64 va, u64 rid)
{
trp->page_flags = pte;
trp->itir = itir;
trp->vadr = va;
trp->rid = rid;
}
extern u64 kvm_lookup_mpa(u64 gpfn);
extern u64 kvm_gpa_to_mpa(u64 gpa);
/* Return I/O type if trye */
#define __gpfn_is_io(gpfn) \
({ \
u64 pte, ret = 0; \
pte = kvm_lookup_mpa(gpfn); \
if (!(pte & GPFN_INV_MASK)) \
ret = pte & GPFN_IO_MASK; \
ret; \
})
#endif
#define IA64_NO_FAULT 0
#define IA64_FAULT 1
#define VMM_RBS_OFFSET ((VMM_TASK_SIZE + 15) & ~15)
#define SW_BAD 0 /* Bad mode transitition */
#define SW_V2P 1 /* Physical emulatino is activated */
#define SW_P2V 2 /* Exit physical mode emulation */
#define SW_SELF 3 /* No mode transition */
#define SW_NOP 4 /* Mode transition, but without action required */
#define GUEST_IN_PHY 0x1
#define GUEST_PHY_EMUL 0x2
#define current_vcpu ((struct kvm_vcpu *) ia64_getreg(_IA64_REG_TP))
#define VRN_SHIFT 61
#define VRN_MASK 0xe000000000000000
#define VRN0 0x0UL
#define VRN1 0x1UL
#define VRN2 0x2UL
#define VRN3 0x3UL
#define VRN4 0x4UL
#define VRN5 0x5UL
#define VRN6 0x6UL
#define VRN7 0x7UL
#define IRQ_NO_MASKED 0
#define IRQ_MASKED_BY_VTPR 1
#define IRQ_MASKED_BY_INSVC 2 /* masked by inservice IRQ */
#define PTA_BASE_SHIFT 15
#define IA64_PSR_VM_BIT 46
#define IA64_PSR_VM (__IA64_UL(1) << IA64_PSR_VM_BIT)
/* Interruption Function State */
#define IA64_IFS_V_BIT 63
#define IA64_IFS_V (__IA64_UL(1) << IA64_IFS_V_BIT)
#define PHY_PAGE_UC (_PAGE_A|_PAGE_D|_PAGE_P|_PAGE_MA_UC|_PAGE_AR_RWX)
#define PHY_PAGE_WB (_PAGE_A|_PAGE_D|_PAGE_P|_PAGE_MA_WB|_PAGE_AR_RWX)
#ifndef __ASSEMBLY__
#include <asm/gcc_intrin.h>
#define is_physical_mode(v) \
((v->arch.mode_flags) & GUEST_IN_PHY)
#define is_virtual_mode(v) \
(!is_physical_mode(v))
#define MODE_IND(psr) \
(((psr).it << 2) + ((psr).dt << 1) + (psr).rt)
#define _vmm_raw_spin_lock(x) \
do { \
__u32 *ia64_spinlock_ptr = (__u32 *) (x); \
__u64 ia64_spinlock_val; \
ia64_spinlock_val = ia64_cmpxchg4_acq(ia64_spinlock_ptr, 1, 0);\
if (unlikely(ia64_spinlock_val)) { \
do { \
while (*ia64_spinlock_ptr) \
ia64_barrier(); \
ia64_spinlock_val = \
ia64_cmpxchg4_acq(ia64_spinlock_ptr, 1, 0);\
} while (ia64_spinlock_val); \
} \
} while (0)
#define _vmm_raw_spin_unlock(x) \
do { barrier(); \
((spinlock_t *)x)->raw_lock.lock = 0; } \
while (0)
void vmm_spin_lock(spinlock_t *lock);
void vmm_spin_unlock(spinlock_t *lock);
enum {
I_TLB = 1,
D_TLB = 2
};
union kvm_va {
struct {
unsigned long off : 60; /* intra-region offset */
unsigned long reg : 4; /* region number */
} f;
unsigned long l;
void *p;
};
#define __kvm_pa(x) ({union kvm_va _v; _v.l = (long) (x); \
_v.f.reg = 0; _v.l; })
#define __kvm_va(x) ({union kvm_va _v; _v.l = (long) (x); \
_v.f.reg = -1; _v.p; })
#define _REGION_ID(x) ({union ia64_rr _v; _v.val = (long)(x); \
_v.rid; })
#define _REGION_PAGE_SIZE(x) ({union ia64_rr _v; _v.val = (long)(x); \
_v.ps; })
#define _REGION_HW_WALKER(x) ({union ia64_rr _v; _v.val = (long)(x); \
_v.ve; })
enum vhpt_ref{ DATA_REF, NA_REF, INST_REF, RSE_REF };
enum tlb_miss_type { INSTRUCTION, DATA, REGISTER };
#define VCPU(_v, _x) ((_v)->arch.vpd->_x)
#define VMX(_v, _x) ((_v)->arch._x)
#define VLSAPIC_INSVC(vcpu, i) ((vcpu)->arch.insvc[i])
#define VLSAPIC_XTP(_v) VMX(_v, xtp)
static inline unsigned long itir_ps(unsigned long itir)
{
return ((itir >> 2) & 0x3f);
}
/**************************************************************************
VCPU control register access routines
**************************************************************************/
static inline u64 vcpu_get_itir(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, itir));
}
static inline void vcpu_set_itir(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, itir) = val;
}
static inline u64 vcpu_get_ifa(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, ifa));
}
static inline void vcpu_set_ifa(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, ifa) = val;
}
static inline u64 vcpu_get_iva(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, iva));
}
static inline u64 vcpu_get_pta(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, pta));
}
static inline u64 vcpu_get_lid(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, lid));
}
static inline u64 vcpu_get_tpr(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, tpr));
}
static inline u64 vcpu_get_eoi(struct kvm_vcpu *vcpu)
{
return (0UL); /*reads of eoi always return 0 */
}
static inline u64 vcpu_get_irr0(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, irr[0]));
}
static inline u64 vcpu_get_irr1(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, irr[1]));
}
static inline u64 vcpu_get_irr2(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, irr[2]));
}
static inline u64 vcpu_get_irr3(struct kvm_vcpu *vcpu)
{
return ((u64)VCPU(vcpu, irr[3]));
}
static inline void vcpu_set_dcr(struct kvm_vcpu *vcpu, u64 val)
{
ia64_setreg(_IA64_REG_CR_DCR, val);
}
static inline void vcpu_set_isr(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, isr) = val;
}
static inline void vcpu_set_lid(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, lid) = val;
}
static inline void vcpu_set_ipsr(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, ipsr) = val;
}
static inline void vcpu_set_iip(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, iip) = val;
}
static inline void vcpu_set_ifs(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, ifs) = val;
}
static inline void vcpu_set_iipa(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, iipa) = val;
}
static inline void vcpu_set_iha(struct kvm_vcpu *vcpu, u64 val)
{
VCPU(vcpu, iha) = val;
}
static inline u64 vcpu_get_rr(struct kvm_vcpu *vcpu, u64 reg)
{
return vcpu->arch.vrr[reg>>61];
}
/**************************************************************************
VCPU debug breakpoint register access routines
**************************************************************************/
static inline void vcpu_set_dbr(struct kvm_vcpu *vcpu, u64 reg, u64 val)
{
__ia64_set_dbr(reg, val);
}
static inline void vcpu_set_ibr(struct kvm_vcpu *vcpu, u64 reg, u64 val)
{
ia64_set_ibr(reg, val);
}
static inline u64 vcpu_get_dbr(struct kvm_vcpu *vcpu, u64 reg)
{
return ((u64)__ia64_get_dbr(reg));
}
static inline u64 vcpu_get_ibr(struct kvm_vcpu *vcpu, u64 reg)
{
return ((u64)ia64_get_ibr(reg));
}
/**************************************************************************
VCPU performance monitor register access routines
**************************************************************************/
static inline void vcpu_set_pmc(struct kvm_vcpu *vcpu, u64 reg, u64 val)
{
/* NOTE: Writes to unimplemented PMC registers are discarded */
ia64_set_pmc(reg, val);
}
static inline void vcpu_set_pmd(struct kvm_vcpu *vcpu, u64 reg, u64 val)
{
/* NOTE: Writes to unimplemented PMD registers are discarded */
ia64_set_pmd(reg, val);
}
static inline u64 vcpu_get_pmc(struct kvm_vcpu *vcpu, u64 reg)
{
/* NOTE: Reads from unimplemented PMC registers return zero */
return ((u64)ia64_get_pmc(reg));
}
static inline u64 vcpu_get_pmd(struct kvm_vcpu *vcpu, u64 reg)
{
/* NOTE: Reads from unimplemented PMD registers return zero */
return ((u64)ia64_get_pmd(reg));
}
static inline unsigned long vrrtomrr(unsigned long val)
{
union ia64_rr rr;
rr.val = val;
rr.rid = (rr.rid << 4) | 0xe;
if (rr.ps > PAGE_SHIFT)
rr.ps = PAGE_SHIFT;
rr.ve = 1;
return rr.val;
}
static inline int highest_bits(int *dat)
{
u32 bits, bitnum;
int i;
/* loop for all 256 bits */
for (i = 7; i >= 0 ; i--) {
bits = dat[i];
if (bits) {
bitnum = fls(bits);
return i * 32 + bitnum - 1;
}
}
return NULL_VECTOR;
}
/*
* The pending irq is higher than the inservice one.
*
*/
static inline int is_higher_irq(int pending, int inservice)
{
return ((pending > inservice)
|| ((pending != NULL_VECTOR)
&& (inservice == NULL_VECTOR)));
}
static inline int is_higher_class(int pending, int mic)
{
return ((pending >> 4) > mic);
}
/*
* Return 0-255 for pending irq.
* NULL_VECTOR: when no pending.
*/
static inline int highest_pending_irq(struct kvm_vcpu *vcpu)
{
if (VCPU(vcpu, irr[0]) & (1UL<<NMI_VECTOR))
return NMI_VECTOR;
if (VCPU(vcpu, irr[0]) & (1UL<<ExtINT_VECTOR))
return ExtINT_VECTOR;
return highest_bits((int *)&VCPU(vcpu, irr[0]));
}
static inline int highest_inservice_irq(struct kvm_vcpu *vcpu)
{
if (VMX(vcpu, insvc[0]) & (1UL<<NMI_VECTOR))
return NMI_VECTOR;
if (VMX(vcpu, insvc[0]) & (1UL<<ExtINT_VECTOR))
return ExtINT_VECTOR;
return highest_bits((int *)&(VMX(vcpu, insvc[0])));
}
extern void vcpu_get_fpreg(struct kvm_vcpu *vcpu, u64 reg,
struct ia64_fpreg *val);
extern void vcpu_set_fpreg(struct kvm_vcpu *vcpu, u64 reg,
struct ia64_fpreg *val);
extern u64 vcpu_get_gr(struct kvm_vcpu *vcpu, u64 reg);
extern void vcpu_set_gr(struct kvm_vcpu *vcpu, u64 reg, u64 val, int nat);
extern u64 vcpu_get_psr(struct kvm_vcpu *vcpu);
extern void vcpu_set_psr(struct kvm_vcpu *vcpu, u64 val);
extern u64 vcpu_thash(struct kvm_vcpu *vcpu, u64 vadr);
extern void vcpu_bsw0(struct kvm_vcpu *vcpu);
extern void thash_vhpt_insert(struct kvm_vcpu *v, u64 pte,
u64 itir, u64 va, int type);
extern struct thash_data *vhpt_lookup(u64 va);
extern u64 guest_vhpt_lookup(u64 iha, u64 *pte);
extern void thash_purge_entries(struct kvm_vcpu *v, u64 va, u64 ps);
extern void thash_purge_entries_remote(struct kvm_vcpu *v, u64 va, u64 ps);
extern u64 translate_phy_pte(u64 *pte, u64 itir, u64 va);
extern int thash_purge_and_insert(struct kvm_vcpu *v, u64 pte,
u64 itir, u64 ifa, int type);
extern void thash_purge_all(struct kvm_vcpu *v);
extern struct thash_data *vtlb_lookup(struct kvm_vcpu *v,
u64 va, int is_data);
extern int vtr_find_overlap(struct kvm_vcpu *vcpu, u64 va,
u64 ps, int is_data);
extern void vcpu_increment_iip(struct kvm_vcpu *v);
extern void vcpu_decrement_iip(struct kvm_vcpu *vcpu);
extern void vcpu_pend_interrupt(struct kvm_vcpu *vcpu, u8 vec);
extern void vcpu_unpend_interrupt(struct kvm_vcpu *vcpu, u8 vec);
extern void data_page_not_present(struct kvm_vcpu *vcpu, u64 vadr);
extern void dnat_page_consumption(struct kvm_vcpu *vcpu, u64 vadr);
extern void alt_dtlb(struct kvm_vcpu *vcpu, u64 vadr);
extern void nested_dtlb(struct kvm_vcpu *vcpu);
extern void dvhpt_fault(struct kvm_vcpu *vcpu, u64 vadr);
extern int vhpt_enabled(struct kvm_vcpu *vcpu, u64 vadr, enum vhpt_ref ref);
extern void update_vhpi(struct kvm_vcpu *vcpu, int vec);
extern int irq_masked(struct kvm_vcpu *vcpu, int h_pending, int h_inservice);
extern int fetch_code(struct kvm_vcpu *vcpu, u64 gip, IA64_BUNDLE *pbundle);
extern void emulate_io_inst(struct kvm_vcpu *vcpu, u64 padr, u64 ma);
extern void vmm_transition(struct kvm_vcpu *vcpu);
extern void vmm_trampoline(union context *from, union context *to);
extern int vmm_entry(void);
extern u64 vcpu_get_itc(struct kvm_vcpu *vcpu);
extern void vmm_reset_entry(void);
void kvm_init_vtlb(struct kvm_vcpu *v);
void kvm_init_vhpt(struct kvm_vcpu *v);
void thash_init(struct thash_cb *hcb, u64 sz);
void panic_vm(struct kvm_vcpu *v);
extern u64 ia64_call_vsa(u64 proc, u64 arg1, u64 arg2, u64 arg3,
u64 arg4, u64 arg5, u64 arg6, u64 arg7);
#endif
#endif /* __VCPU_H__ */

66
arch/ia64/kvm/vmm.c Normal file
View File

@ -0,0 +1,66 @@
/*
* vmm.c: vmm module interface with kvm module
*
* Copyright (c) 2007, Intel Corporation.
*
* Xiantao Zhang (xiantao.zhang@intel.com)
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*/
#include<linux/module.h>
#include<asm/fpswa.h>
#include "vcpu.h"
MODULE_AUTHOR("Intel");
MODULE_LICENSE("GPL");
extern char kvm_ia64_ivt;
extern fpswa_interface_t *vmm_fpswa_interface;
struct kvm_vmm_info vmm_info = {
.module = THIS_MODULE,
.vmm_entry = vmm_entry,
.tramp_entry = vmm_trampoline,
.vmm_ivt = (unsigned long)&kvm_ia64_ivt,
};
static int __init kvm_vmm_init(void)
{
vmm_fpswa_interface = fpswa_interface;
/*Register vmm data to kvm side*/
return kvm_init(&vmm_info, 1024, THIS_MODULE);
}
static void __exit kvm_vmm_exit(void)
{
kvm_exit();
return ;
}
void vmm_spin_lock(spinlock_t *lock)
{
_vmm_raw_spin_lock(lock);
}
void vmm_spin_unlock(spinlock_t *lock)
{
_vmm_raw_spin_unlock(lock);
}
module_init(kvm_vmm_init)
module_exit(kvm_vmm_exit)

1424
arch/ia64/kvm/vmm_ivt.S Normal file

File diff suppressed because it is too large Load Diff

290
arch/ia64/kvm/vti.h Normal file
View File

@ -0,0 +1,290 @@
/*
* vti.h: prototype for generial vt related interface
* Copyright (c) 2004, Intel Corporation.
*
* Xuefei Xu (Anthony Xu) (anthony.xu@intel.com)
* Fred Yang (fred.yang@intel.com)
* Kun Tian (Kevin Tian) (kevin.tian@intel.com)
*
* Copyright (c) 2007, Intel Corporation.
* Zhang xiantao <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*/
#ifndef _KVM_VT_I_H
#define _KVM_VT_I_H
#ifndef __ASSEMBLY__
#include <asm/page.h>
#include <linux/kvm_host.h>
/* define itr.i and itr.d in ia64_itr function */
#define ITR 0x01
#define DTR 0x02
#define IaDTR 0x03
#define IA64_TR_VMM 6 /*itr6, dtr6 : maps vmm code, vmbuffer*/
#define IA64_TR_VM_DATA 7 /*dtr7 : maps current vm data*/
#define RR6 (6UL<<61)
#define RR7 (7UL<<61)
/* config_options in pal_vp_init_env */
#define VP_INITIALIZE 1UL
#define VP_FR_PMC 1UL<<1
#define VP_OPCODE 1UL<<8
#define VP_CAUSE 1UL<<9
#define VP_FW_ACC 1UL<<63
/* init vp env with initializing vm_buffer */
#define VP_INIT_ENV_INITALIZE (VP_INITIALIZE | VP_FR_PMC |\
VP_OPCODE | VP_CAUSE | VP_FW_ACC)
/* init vp env without initializing vm_buffer */
#define VP_INIT_ENV VP_FR_PMC | VP_OPCODE | VP_CAUSE | VP_FW_ACC
#define PAL_VP_CREATE 265
/* Stacked Virt. Initializes a new VPD for the operation of
* a new virtual processor in the virtual environment.
*/
#define PAL_VP_ENV_INFO 266
/*Stacked Virt. Returns the parameters needed to enter a virtual environment.*/
#define PAL_VP_EXIT_ENV 267
/*Stacked Virt. Allows a logical processor to exit a virtual environment.*/
#define PAL_VP_INIT_ENV 268
/*Stacked Virt. Allows a logical processor to enter a virtual environment.*/
#define PAL_VP_REGISTER 269
/*Stacked Virt. Register a different host IVT for the virtual processor.*/
#define PAL_VP_RESUME 270
/* Renamed from PAL_VP_RESUME */
#define PAL_VP_RESTORE 270
/*Stacked Virt. Resumes virtual processor operation on the logical processor.*/
#define PAL_VP_SUSPEND 271
/* Renamed from PAL_VP_SUSPEND */
#define PAL_VP_SAVE 271
/* Stacked Virt. Suspends operation for the specified virtual processor on
* the logical processor.
*/
#define PAL_VP_TERMINATE 272
/* Stacked Virt. Terminates operation for the specified virtual processor.*/
union vac {
unsigned long value;
struct {
int a_int:1;
int a_from_int_cr:1;
int a_to_int_cr:1;
int a_from_psr:1;
int a_from_cpuid:1;
int a_cover:1;
int a_bsw:1;
long reserved:57;
};
};
union vdc {
unsigned long value;
struct {
int d_vmsw:1;
int d_extint:1;
int d_ibr_dbr:1;
int d_pmc:1;
int d_to_pmd:1;
int d_itm:1;
long reserved:58;
};
};
struct vpd {
union vac vac;
union vdc vdc;
unsigned long virt_env_vaddr;
unsigned long reserved1[29];
unsigned long vhpi;
unsigned long reserved2[95];
unsigned long vgr[16];
unsigned long vbgr[16];
unsigned long vnat;
unsigned long vbnat;
unsigned long vcpuid[5];
unsigned long reserved3[11];
unsigned long vpsr;
unsigned long vpr;
unsigned long reserved4[76];
union {
unsigned long vcr[128];
struct {
unsigned long dcr;
unsigned long itm;
unsigned long iva;
unsigned long rsv1[5];
unsigned long pta;
unsigned long rsv2[7];
unsigned long ipsr;
unsigned long isr;
unsigned long rsv3;
unsigned long iip;
unsigned long ifa;
unsigned long itir;
unsigned long iipa;
unsigned long ifs;
unsigned long iim;
unsigned long iha;
unsigned long rsv4[38];
unsigned long lid;
unsigned long ivr;
unsigned long tpr;
unsigned long eoi;
unsigned long irr[4];
unsigned long itv;
unsigned long pmv;
unsigned long cmcv;
unsigned long rsv5[5];
unsigned long lrr0;
unsigned long lrr1;
unsigned long rsv6[46];
};
};
unsigned long reserved5[128];
unsigned long reserved6[3456];
unsigned long vmm_avail[128];
unsigned long reserved7[4096];
};
#define PAL_PROC_VM_BIT (1UL << 40)
#define PAL_PROC_VMSW_BIT (1UL << 54)
static inline s64 ia64_pal_vp_env_info(u64 *buffer_size,
u64 *vp_env_info)
{
struct ia64_pal_retval iprv;
PAL_CALL_STK(iprv, PAL_VP_ENV_INFO, 0, 0, 0);
*buffer_size = iprv.v0;
*vp_env_info = iprv.v1;
return iprv.status;
}
static inline s64 ia64_pal_vp_exit_env(u64 iva)
{
struct ia64_pal_retval iprv;
PAL_CALL_STK(iprv, PAL_VP_EXIT_ENV, (u64)iva, 0, 0);
return iprv.status;
}
static inline s64 ia64_pal_vp_init_env(u64 config_options, u64 pbase_addr,
u64 vbase_addr, u64 *vsa_base)
{
struct ia64_pal_retval iprv;
PAL_CALL_STK(iprv, PAL_VP_INIT_ENV, config_options, pbase_addr,
vbase_addr);
*vsa_base = iprv.v0;
return iprv.status;
}
static inline s64 ia64_pal_vp_restore(u64 *vpd, u64 pal_proc_vector)
{
struct ia64_pal_retval iprv;
PAL_CALL_STK(iprv, PAL_VP_RESTORE, (u64)vpd, pal_proc_vector, 0);
return iprv.status;
}
static inline s64 ia64_pal_vp_save(u64 *vpd, u64 pal_proc_vector)
{
struct ia64_pal_retval iprv;
PAL_CALL_STK(iprv, PAL_VP_SAVE, (u64)vpd, pal_proc_vector, 0);
return iprv.status;
}
#endif
/*VPD field offset*/
#define VPD_VAC_START_OFFSET 0
#define VPD_VDC_START_OFFSET 8
#define VPD_VHPI_START_OFFSET 256
#define VPD_VGR_START_OFFSET 1024
#define VPD_VBGR_START_OFFSET 1152
#define VPD_VNAT_START_OFFSET 1280
#define VPD_VBNAT_START_OFFSET 1288
#define VPD_VCPUID_START_OFFSET 1296
#define VPD_VPSR_START_OFFSET 1424
#define VPD_VPR_START_OFFSET 1432
#define VPD_VRSE_CFLE_START_OFFSET 1440
#define VPD_VCR_START_OFFSET 2048
#define VPD_VTPR_START_OFFSET 2576
#define VPD_VRR_START_OFFSET 3072
#define VPD_VMM_VAIL_START_OFFSET 31744
/*Virtualization faults*/
#define EVENT_MOV_TO_AR 1
#define EVENT_MOV_TO_AR_IMM 2
#define EVENT_MOV_FROM_AR 3
#define EVENT_MOV_TO_CR 4
#define EVENT_MOV_FROM_CR 5
#define EVENT_MOV_TO_PSR 6
#define EVENT_MOV_FROM_PSR 7
#define EVENT_ITC_D 8
#define EVENT_ITC_I 9
#define EVENT_MOV_TO_RR 10
#define EVENT_MOV_TO_DBR 11
#define EVENT_MOV_TO_IBR 12
#define EVENT_MOV_TO_PKR 13
#define EVENT_MOV_TO_PMC 14
#define EVENT_MOV_TO_PMD 15
#define EVENT_ITR_D 16
#define EVENT_ITR_I 17
#define EVENT_MOV_FROM_RR 18
#define EVENT_MOV_FROM_DBR 19
#define EVENT_MOV_FROM_IBR 20
#define EVENT_MOV_FROM_PKR 21
#define EVENT_MOV_FROM_PMC 22
#define EVENT_MOV_FROM_CPUID 23
#define EVENT_SSM 24
#define EVENT_RSM 25
#define EVENT_PTC_L 26
#define EVENT_PTC_G 27
#define EVENT_PTC_GA 28
#define EVENT_PTR_D 29
#define EVENT_PTR_I 30
#define EVENT_THASH 31
#define EVENT_TTAG 32
#define EVENT_TPA 33
#define EVENT_TAK 34
#define EVENT_PTC_E 35
#define EVENT_COVER 36
#define EVENT_RFI 37
#define EVENT_BSW_0 38
#define EVENT_BSW_1 39
#define EVENT_VMSW 40
/**PAL virtual services offsets */
#define PAL_VPS_RESUME_NORMAL 0x0000
#define PAL_VPS_RESUME_HANDLER 0x0400
#define PAL_VPS_SYNC_READ 0x0800
#define PAL_VPS_SYNC_WRITE 0x0c00
#define PAL_VPS_SET_PENDING_INTERRUPT 0x1000
#define PAL_VPS_THASH 0x1400
#define PAL_VPS_TTAG 0x1800
#define PAL_VPS_RESTORE 0x1c00
#define PAL_VPS_SAVE 0x2000
#endif/* _VT_I_H*/

636
arch/ia64/kvm/vtlb.c Normal file
View File

@ -0,0 +1,636 @@
/*
* vtlb.c: guest virtual tlb handling module.
* Copyright (c) 2004, Intel Corporation.
* Yaozu Dong (Eddie Dong) <Eddie.dong@intel.com>
* Xuefei Xu (Anthony Xu) <anthony.xu@intel.com>
*
* Copyright (c) 2007, Intel Corporation.
* Xuefei Xu (Anthony Xu) <anthony.xu@intel.com>
* Xiantao Zhang <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include "vcpu.h"
#include <linux/rwsem.h>
#include <asm/tlb.h>
/*
* Check to see if the address rid:va is translated by the TLB
*/
static int __is_tr_translated(struct thash_data *trp, u64 rid, u64 va)
{
return ((trp->p) && (trp->rid == rid)
&& ((va-trp->vadr) < PSIZE(trp->ps)));
}
/*
* Only for GUEST TR format.
*/
static int __is_tr_overlap(struct thash_data *trp, u64 rid, u64 sva, u64 eva)
{
u64 sa1, ea1;
if (!trp->p || trp->rid != rid)
return 0;
sa1 = trp->vadr;
ea1 = sa1 + PSIZE(trp->ps) - 1;
eva -= 1;
if ((sva > ea1) || (sa1 > eva))
return 0;
else
return 1;
}
void machine_tlb_purge(u64 va, u64 ps)
{
ia64_ptcl(va, ps << 2);
}
void local_flush_tlb_all(void)
{
int i, j;
unsigned long flags, count0, count1;
unsigned long stride0, stride1, addr;
addr = current_vcpu->arch.ptce_base;
count0 = current_vcpu->arch.ptce_count[0];
count1 = current_vcpu->arch.ptce_count[1];
stride0 = current_vcpu->arch.ptce_stride[0];
stride1 = current_vcpu->arch.ptce_stride[1];
local_irq_save(flags);
for (i = 0; i < count0; ++i) {
for (j = 0; j < count1; ++j) {
ia64_ptce(addr);
addr += stride1;
}
addr += stride0;
}
local_irq_restore(flags);
ia64_srlz_i(); /* srlz.i implies srlz.d */
}
int vhpt_enabled(struct kvm_vcpu *vcpu, u64 vadr, enum vhpt_ref ref)
{
union ia64_rr vrr;
union ia64_pta vpta;
struct ia64_psr vpsr;
vpsr = *(struct ia64_psr *)&VCPU(vcpu, vpsr);
vrr.val = vcpu_get_rr(vcpu, vadr);
vpta.val = vcpu_get_pta(vcpu);
if (vrr.ve & vpta.ve) {
switch (ref) {
case DATA_REF:
case NA_REF:
return vpsr.dt;
case INST_REF:
return vpsr.dt && vpsr.it && vpsr.ic;
case RSE_REF:
return vpsr.dt && vpsr.rt;
}
}
return 0;
}
struct thash_data *vsa_thash(union ia64_pta vpta, u64 va, u64 vrr, u64 *tag)
{
u64 index, pfn, rid, pfn_bits;
pfn_bits = vpta.size - 5 - 8;
pfn = REGION_OFFSET(va) >> _REGION_PAGE_SIZE(vrr);
rid = _REGION_ID(vrr);
index = ((rid & 0xff) << pfn_bits)|(pfn & ((1UL << pfn_bits) - 1));
*tag = ((rid >> 8) & 0xffff) | ((pfn >> pfn_bits) << 16);
return (struct thash_data *)((vpta.base << PTA_BASE_SHIFT) +
(index << 5));
}
struct thash_data *__vtr_lookup(struct kvm_vcpu *vcpu, u64 va, int type)
{
struct thash_data *trp;
int i;
u64 rid;
rid = vcpu_get_rr(vcpu, va);
rid = rid & RR_RID_MASK;;
if (type == D_TLB) {
if (vcpu_quick_region_check(vcpu->arch.dtr_regions, va)) {
for (trp = (struct thash_data *)&vcpu->arch.dtrs, i = 0;
i < NDTRS; i++, trp++) {
if (__is_tr_translated(trp, rid, va))
return trp;
}
}
} else {
if (vcpu_quick_region_check(vcpu->arch.itr_regions, va)) {
for (trp = (struct thash_data *)&vcpu->arch.itrs, i = 0;
i < NITRS; i++, trp++) {
if (__is_tr_translated(trp, rid, va))
return trp;
}
}
}
return NULL;
}
static void vhpt_insert(u64 pte, u64 itir, u64 ifa, u64 gpte)
{
union ia64_rr rr;
struct thash_data *head;
unsigned long ps, gpaddr;
ps = itir_ps(itir);
gpaddr = ((gpte & _PAGE_PPN_MASK) >> ps << ps) |
(ifa & ((1UL << ps) - 1));
rr.val = ia64_get_rr(ifa);
head = (struct thash_data *)ia64_thash(ifa);
head->etag = INVALID_TI_TAG;
ia64_mf();
head->page_flags = pte & ~PAGE_FLAGS_RV_MASK;
head->itir = rr.ps << 2;
head->etag = ia64_ttag(ifa);
head->gpaddr = gpaddr;
}
void mark_pages_dirty(struct kvm_vcpu *v, u64 pte, u64 ps)
{
u64 i, dirty_pages = 1;
u64 base_gfn = (pte&_PAGE_PPN_MASK) >> PAGE_SHIFT;
spinlock_t *lock = __kvm_va(v->arch.dirty_log_lock_pa);
void *dirty_bitmap = (void *)v - (KVM_VCPU_OFS + v->vcpu_id * VCPU_SIZE)
+ KVM_MEM_DIRTY_LOG_OFS;
dirty_pages <<= ps <= PAGE_SHIFT ? 0 : ps - PAGE_SHIFT;
vmm_spin_lock(lock);
for (i = 0; i < dirty_pages; i++) {
/* avoid RMW */
if (!test_bit(base_gfn + i, dirty_bitmap))
set_bit(base_gfn + i , dirty_bitmap);
}
vmm_spin_unlock(lock);
}
void thash_vhpt_insert(struct kvm_vcpu *v, u64 pte, u64 itir, u64 va, int type)
{
u64 phy_pte, psr;
union ia64_rr mrr;
mrr.val = ia64_get_rr(va);
phy_pte = translate_phy_pte(&pte, itir, va);
if (itir_ps(itir) >= mrr.ps) {
vhpt_insert(phy_pte, itir, va, pte);
} else {
phy_pte &= ~PAGE_FLAGS_RV_MASK;
psr = ia64_clear_ic();
ia64_itc(type, va, phy_pte, itir_ps(itir));
ia64_set_psr(psr);
}
if (!(pte&VTLB_PTE_IO))
mark_pages_dirty(v, pte, itir_ps(itir));
}
/*
* vhpt lookup
*/
struct thash_data *vhpt_lookup(u64 va)
{
struct thash_data *head;
u64 tag;
head = (struct thash_data *)ia64_thash(va);
tag = ia64_ttag(va);
if (head->etag == tag)
return head;
return NULL;
}
u64 guest_vhpt_lookup(u64 iha, u64 *pte)
{
u64 ret;
struct thash_data *data;
data = __vtr_lookup(current_vcpu, iha, D_TLB);
if (data != NULL)
thash_vhpt_insert(current_vcpu, data->page_flags,
data->itir, iha, D_TLB);
asm volatile ("rsm psr.ic|psr.i;;"
"srlz.d;;"
"ld8.s r9=[%1];;"
"tnat.nz p6,p7=r9;;"
"(p6) mov %0=1;"
"(p6) mov r9=r0;"
"(p7) extr.u r9=r9,0,53;;"
"(p7) mov %0=r0;"
"(p7) st8 [%2]=r9;;"
"ssm psr.ic;;"
"srlz.d;;"
/* "ssm psr.i;;" Once interrupts in vmm open, need fix*/
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
return ret;
}
/*
* purge software guest tlb
*/
static void vtlb_purge(struct kvm_vcpu *v, u64 va, u64 ps)
{
struct thash_data *cur;
u64 start, curadr, size, psbits, tag, rr_ps, num;
union ia64_rr vrr;
struct thash_cb *hcb = &v->arch.vtlb;
vrr.val = vcpu_get_rr(v, va);
psbits = VMX(v, psbits[(va >> 61)]);
start = va & ~((1UL << ps) - 1);
while (psbits) {
curadr = start;
rr_ps = __ffs(psbits);
psbits &= ~(1UL << rr_ps);
num = 1UL << ((ps < rr_ps) ? 0 : (ps - rr_ps));
size = PSIZE(rr_ps);
vrr.ps = rr_ps;
while (num) {
cur = vsa_thash(hcb->pta, curadr, vrr.val, &tag);
if (cur->etag == tag && cur->ps == rr_ps)
cur->etag = INVALID_TI_TAG;
curadr += size;
num--;
}
}
}
/*
* purge VHPT and machine TLB
*/
static void vhpt_purge(struct kvm_vcpu *v, u64 va, u64 ps)
{
struct thash_data *cur;
u64 start, size, tag, num;
union ia64_rr rr;
start = va & ~((1UL << ps) - 1);
rr.val = ia64_get_rr(va);
size = PSIZE(rr.ps);
num = 1UL << ((ps < rr.ps) ? 0 : (ps - rr.ps));
while (num) {
cur = (struct thash_data *)ia64_thash(start);
tag = ia64_ttag(start);
if (cur->etag == tag)
cur->etag = INVALID_TI_TAG;
start += size;
num--;
}
machine_tlb_purge(va, ps);
}
/*
* Insert an entry into hash TLB or VHPT.
* NOTES:
* 1: When inserting VHPT to thash, "va" is a must covered
* address by the inserted machine VHPT entry.
* 2: The format of entry is always in TLB.
* 3: The caller need to make sure the new entry will not overlap
* with any existed entry.
*/
void vtlb_insert(struct kvm_vcpu *v, u64 pte, u64 itir, u64 va)
{
struct thash_data *head;
union ia64_rr vrr;
u64 tag;
struct thash_cb *hcb = &v->arch.vtlb;
vrr.val = vcpu_get_rr(v, va);
vrr.ps = itir_ps(itir);
VMX(v, psbits[va >> 61]) |= (1UL << vrr.ps);
head = vsa_thash(hcb->pta, va, vrr.val, &tag);
head->page_flags = pte;
head->itir = itir;
head->etag = tag;
}
int vtr_find_overlap(struct kvm_vcpu *vcpu, u64 va, u64 ps, int type)
{
struct thash_data *trp;
int i;
u64 end, rid;
rid = vcpu_get_rr(vcpu, va);
rid = rid & RR_RID_MASK;
end = va + PSIZE(ps);
if (type == D_TLB) {
if (vcpu_quick_region_check(vcpu->arch.dtr_regions, va)) {
for (trp = (struct thash_data *)&vcpu->arch.dtrs, i = 0;
i < NDTRS; i++, trp++) {
if (__is_tr_overlap(trp, rid, va, end))
return i;
}
}
} else {
if (vcpu_quick_region_check(vcpu->arch.itr_regions, va)) {
for (trp = (struct thash_data *)&vcpu->arch.itrs, i = 0;
i < NITRS; i++, trp++) {
if (__is_tr_overlap(trp, rid, va, end))
return i;
}
}
}
return -1;
}
/*
* Purge entries in VTLB and VHPT
*/
void thash_purge_entries(struct kvm_vcpu *v, u64 va, u64 ps)
{
if (vcpu_quick_region_check(v->arch.tc_regions, va))
vtlb_purge(v, va, ps);
vhpt_purge(v, va, ps);
}
void thash_purge_entries_remote(struct kvm_vcpu *v, u64 va, u64 ps)
{
u64 old_va = va;
va = REGION_OFFSET(va);
if (vcpu_quick_region_check(v->arch.tc_regions, old_va))
vtlb_purge(v, va, ps);
vhpt_purge(v, va, ps);
}
u64 translate_phy_pte(u64 *pte, u64 itir, u64 va)
{
u64 ps, ps_mask, paddr, maddr;
union pte_flags phy_pte;
ps = itir_ps(itir);
ps_mask = ~((1UL << ps) - 1);
phy_pte.val = *pte;
paddr = *pte;
paddr = ((paddr & _PAGE_PPN_MASK) & ps_mask) | (va & ~ps_mask);
maddr = kvm_lookup_mpa(paddr >> PAGE_SHIFT);
if (maddr & GPFN_IO_MASK) {
*pte |= VTLB_PTE_IO;
return -1;
}
maddr = ((maddr & _PAGE_PPN_MASK) & PAGE_MASK) |
(paddr & ~PAGE_MASK);
phy_pte.ppn = maddr >> ARCH_PAGE_SHIFT;
return phy_pte.val;
}
/*
* Purge overlap TCs and then insert the new entry to emulate itc ops.
* Notes: Only TC entry can purge and insert.
* 1 indicates this is MMIO
*/
int thash_purge_and_insert(struct kvm_vcpu *v, u64 pte, u64 itir,
u64 ifa, int type)
{
u64 ps;
u64 phy_pte;
union ia64_rr vrr, mrr;
int ret = 0;
ps = itir_ps(itir);
vrr.val = vcpu_get_rr(v, ifa);
mrr.val = ia64_get_rr(ifa);
phy_pte = translate_phy_pte(&pte, itir, ifa);
/* Ensure WB attribute if pte is related to a normal mem page,
* which is required by vga acceleration since qemu maps shared
* vram buffer with WB.
*/
if (!(pte & VTLB_PTE_IO) && ((pte & _PAGE_MA_MASK) != _PAGE_MA_NAT)) {
pte &= ~_PAGE_MA_MASK;
phy_pte &= ~_PAGE_MA_MASK;
}
if (pte & VTLB_PTE_IO)
ret = 1;
vtlb_purge(v, ifa, ps);
vhpt_purge(v, ifa, ps);
if (ps == mrr.ps) {
if (!(pte&VTLB_PTE_IO)) {
vhpt_insert(phy_pte, itir, ifa, pte);
} else {
vtlb_insert(v, pte, itir, ifa);
vcpu_quick_region_set(VMX(v, tc_regions), ifa);
}
} else if (ps > mrr.ps) {
vtlb_insert(v, pte, itir, ifa);
vcpu_quick_region_set(VMX(v, tc_regions), ifa);
if (!(pte&VTLB_PTE_IO))
vhpt_insert(phy_pte, itir, ifa, pte);
} else {
u64 psr;
phy_pte &= ~PAGE_FLAGS_RV_MASK;
psr = ia64_clear_ic();
ia64_itc(type, ifa, phy_pte, ps);
ia64_set_psr(psr);
}
if (!(pte&VTLB_PTE_IO))
mark_pages_dirty(v, pte, ps);
return ret;
}
/*
* Purge all TCs or VHPT entries including those in Hash table.
*
*/
void thash_purge_all(struct kvm_vcpu *v)
{
int i;
struct thash_data *head;
struct thash_cb *vtlb, *vhpt;
vtlb = &v->arch.vtlb;
vhpt = &v->arch.vhpt;
for (i = 0; i < 8; i++)
VMX(v, psbits[i]) = 0;
head = vtlb->hash;
for (i = 0; i < vtlb->num; i++) {
head->page_flags = 0;
head->etag = INVALID_TI_TAG;
head->itir = 0;
head->next = 0;
head++;
};
head = vhpt->hash;
for (i = 0; i < vhpt->num; i++) {
head->page_flags = 0;
head->etag = INVALID_TI_TAG;
head->itir = 0;
head->next = 0;
head++;
};
local_flush_tlb_all();
}
/*
* Lookup the hash table and its collision chain to find an entry
* covering this address rid:va or the entry.
*
* INPUT:
* in: TLB format for both VHPT & TLB.
*/
struct thash_data *vtlb_lookup(struct kvm_vcpu *v, u64 va, int is_data)
{
struct thash_data *cch;
u64 psbits, ps, tag;
union ia64_rr vrr;
struct thash_cb *hcb = &v->arch.vtlb;
cch = __vtr_lookup(v, va, is_data);;
if (cch)
return cch;
if (vcpu_quick_region_check(v->arch.tc_regions, va) == 0)
return NULL;
psbits = VMX(v, psbits[(va >> 61)]);
vrr.val = vcpu_get_rr(v, va);
while (psbits) {
ps = __ffs(psbits);
psbits &= ~(1UL << ps);
vrr.ps = ps;
cch = vsa_thash(hcb->pta, va, vrr.val, &tag);
if (cch->etag == tag && cch->ps == ps)
return cch;
}
return NULL;
}
/*
* Initialize internal control data before service.
*/
void thash_init(struct thash_cb *hcb, u64 sz)
{
int i;
struct thash_data *head;
hcb->pta.val = (unsigned long)hcb->hash;
hcb->pta.vf = 1;
hcb->pta.ve = 1;
hcb->pta.size = sz;
head = hcb->hash;
for (i = 0; i < hcb->num; i++) {
head->page_flags = 0;
head->itir = 0;
head->etag = INVALID_TI_TAG;
head->next = 0;
head++;
}
}
u64 kvm_lookup_mpa(u64 gpfn)
{
u64 *base = (u64 *) KVM_P2M_BASE;
return *(base + gpfn);
}
u64 kvm_gpa_to_mpa(u64 gpa)
{
u64 pte = kvm_lookup_mpa(gpa >> PAGE_SHIFT);
return (pte >> PAGE_SHIFT << PAGE_SHIFT) | (gpa & ~PAGE_MASK);
}
/*
* Fetch guest bundle code.
* INPUT:
* gip: guest ip
* pbundle: used to return fetched bundle.
*/
int fetch_code(struct kvm_vcpu *vcpu, u64 gip, IA64_BUNDLE *pbundle)
{
u64 gpip = 0; /* guest physical IP*/
u64 *vpa;
struct thash_data *tlb;
u64 maddr;
if (!(VCPU(vcpu, vpsr) & IA64_PSR_IT)) {
/* I-side physical mode */
gpip = gip;
} else {
tlb = vtlb_lookup(vcpu, gip, I_TLB);
if (tlb)
gpip = (tlb->ppn >> (tlb->ps - 12) << tlb->ps) |
(gip & (PSIZE(tlb->ps) - 1));
}
if (gpip) {
maddr = kvm_gpa_to_mpa(gpip);
} else {
tlb = vhpt_lookup(gip);
if (tlb == NULL) {
ia64_ptcl(gip, ARCH_PAGE_SHIFT << 2);
return IA64_FAULT;
}
maddr = (tlb->ppn >> (tlb->ps - 12) << tlb->ps)
| (gip & (PSIZE(tlb->ps) - 1));
}
vpa = (u64 *)__kvm_va(maddr);
pbundle->i64[0] = *vpa++;
pbundle->i64[1] = *vpa;
return IA64_NO_FAULT;
}
void kvm_init_vhpt(struct kvm_vcpu *v)
{
v->arch.vhpt.num = VHPT_NUM_ENTRIES;
thash_init(&v->arch.vhpt, VHPT_SHIFT);
ia64_set_pta(v->arch.vhpt.pta.val);
/*Enable VHPT here?*/
}
void kvm_init_vtlb(struct kvm_vcpu *v)
{
v->arch.vtlb.num = VTLB_NUM_ENTRIES;
thash_init(&v->arch.vtlb, VTLB_SHIFT);
}

View File

@ -803,3 +803,4 @@ config PPC_CLOCK
config PPC_LIB_RHEAP
bool
source "arch/powerpc/kvm/Kconfig"

View File

@ -151,6 +151,9 @@ config BOOTX_TEXT
config PPC_EARLY_DEBUG
bool "Early debugging (dangerous)"
# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
# mark, which doesn't work with current 440 KVM.
depends on !KVM
help
Say Y to enable some early debugging facilities that may be available
for your processor/board combination. Those facilities are hacks

View File

@ -145,6 +145,7 @@ core-y += arch/powerpc/kernel/ \
arch/powerpc/platforms/
core-$(CONFIG_MATH_EMULATION) += arch/powerpc/math-emu/
core-$(CONFIG_XMON) += arch/powerpc/xmon/
core-$(CONFIG_KVM) += arch/powerpc/kvm/
drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/

View File

@ -23,6 +23,9 @@
#include <linux/mm.h>
#include <linux/suspend.h>
#include <linux/hrtimer.h>
#ifdef CONFIG_KVM
#include <linux/kvm_host.h>
#endif
#ifdef CONFIG_PPC64
#include <linux/time.h>
#include <linux/hardirq.h>
@ -324,5 +327,30 @@ int main(void)
DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);
#ifdef CONFIG_KVM
DEFINE(TLBE_BYTES, sizeof(struct tlbe));
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, arch.host_tlb));
DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_PID, offsetof(struct kvm_vcpu, arch.pid));
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
#endif
return 0;
}

224
arch/powerpc/kvm/44x_tlb.c Normal file
View File

@ -0,0 +1,224 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <asm/mmu-44x.h>
#include <asm/kvm_ppc.h>
#include "44x_tlb.h"
#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW)
#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW)
static unsigned int kvmppc_tlb_44x_pos;
static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
{
/* Mask off reserved bits. */
attrib &= PPC44x_TLB_PERM_MASK|PPC44x_TLB_ATTR_MASK;
if (!usermode) {
/* Guest is in supervisor mode, so we need to translate guest
* supervisor permissions into user permissions. */
attrib &= ~PPC44x_TLB_USER_PERM_MASK;
attrib |= (attrib & PPC44x_TLB_SUPER_PERM_MASK) << 3;
}
/* Make sure host can always access this memory. */
attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW;
return attrib;
}
/* Search the guest TLB for a matching entry. */
int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
unsigned int as)
{
int i;
/* XXX Replace loop with fancy data structures. */
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
struct tlbe *tlbe = &vcpu->arch.guest_tlb[i];
unsigned int tid;
if (eaddr < get_tlb_eaddr(tlbe))
continue;
if (eaddr > get_tlb_end(tlbe))
continue;
tid = get_tlb_tid(tlbe);
if (tid && (tid != pid))
continue;
if (!get_tlb_v(tlbe))
continue;
if (get_tlb_ts(tlbe) != as)
continue;
return i;
}
return -1;
}
struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_IS);
unsigned int index;
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
return &vcpu->arch.guest_tlb[index];
}
struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_DS);
unsigned int index;
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
return &vcpu->arch.guest_tlb[index];
}
static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
{
return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW);
}
/* Must be called with mmap_sem locked for writing. */
static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
unsigned int index)
{
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
struct page *page = vcpu->arch.shadow_pages[index];
kunmap(vcpu->arch.shadow_pages[index]);
if (get_tlb_v(stlbe)) {
if (kvmppc_44x_tlbe_is_writable(stlbe))
kvm_release_page_dirty(page);
else
kvm_release_page_clean(page);
}
}
/* Caller must ensure that the specified guest TLB entry is safe to insert into
* the shadow TLB. */
void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
u32 flags)
{
struct page *new_page;
struct tlbe *stlbe;
hpa_t hpaddr;
unsigned int victim;
/* Future optimization: don't overwrite the TLB entry containing the
* current PC (or stack?). */
victim = kvmppc_tlb_44x_pos++;
if (kvmppc_tlb_44x_pos > tlb_44x_hwater)
kvmppc_tlb_44x_pos = 0;
stlbe = &vcpu->arch.shadow_tlb[victim];
/* Get reference to new page. */
down_write(&current->mm->mmap_sem);
new_page = gfn_to_page(vcpu->kvm, gfn);
if (is_error_page(new_page)) {
printk(KERN_ERR "Couldn't get guest page!\n");
kvm_release_page_clean(new_page);
return;
}
hpaddr = page_to_phys(new_page);
/* Drop reference to old page. */
kvmppc_44x_shadow_release(vcpu, victim);
up_write(&current->mm->mmap_sem);
vcpu->arch.shadow_pages[victim] = new_page;
/* XXX Make sure (va, size) doesn't overlap any other
* entries. 440x6 user manual says the result would be
* "undefined." */
/* XXX what about AS? */
stlbe->tid = asid & 0xff;
/* Force TS=1 for all guest mappings. */
/* For now we hardcode 4KB mappings, but it will be important to
* use host large pages in the future. */
stlbe->word0 = (gvaddr & PAGE_MASK) | PPC44x_TLB_VALID | PPC44x_TLB_TS
| PPC44x_TLB_4K;
stlbe->word1 = (hpaddr & 0xfffffc00) | ((hpaddr >> 32) & 0xf);
stlbe->word2 = kvmppc_44x_tlb_shadow_attrib(flags,
vcpu->arch.msr & MSR_PR);
}
void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid)
{
unsigned int pid = asid & 0xff;
int i;
/* XXX Replace loop with fancy data structures. */
down_write(&current->mm->mmap_sem);
for (i = 0; i <= tlb_44x_hwater; i++) {
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
unsigned int tid;
if (!get_tlb_v(stlbe))
continue;
if (eaddr < get_tlb_eaddr(stlbe))
continue;
if (eaddr > get_tlb_end(stlbe))
continue;
tid = get_tlb_tid(stlbe);
if (tid && (tid != pid))
continue;
kvmppc_44x_shadow_release(vcpu, i);
stlbe->word0 = 0;
}
up_write(&current->mm->mmap_sem);
}
/* Invalidate all mappings, so that when they fault back in they will get the
* proper permission bits. */
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
{
int i;
/* XXX Replace loop with fancy data structures. */
down_write(&current->mm->mmap_sem);
for (i = 0; i <= tlb_44x_hwater; i++) {
kvmppc_44x_shadow_release(vcpu, i);
vcpu->arch.shadow_tlb[i].word0 = 0;
}
up_write(&current->mm->mmap_sem);
}

View File

@ -0,0 +1,91 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __KVM_POWERPC_TLB_H__
#define __KVM_POWERPC_TLB_H__
#include <linux/kvm_host.h>
#include <asm/mmu-44x.h>
extern int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr,
unsigned int pid, unsigned int as);
extern struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
extern struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
/* TLB helper functions */
static inline unsigned int get_tlb_size(const struct tlbe *tlbe)
{
return (tlbe->word0 >> 4) & 0xf;
}
static inline gva_t get_tlb_eaddr(const struct tlbe *tlbe)
{
return tlbe->word0 & 0xfffffc00;
}
static inline gva_t get_tlb_bytes(const struct tlbe *tlbe)
{
unsigned int pgsize = get_tlb_size(tlbe);
return 1 << 10 << (pgsize << 1);
}
static inline gva_t get_tlb_end(const struct tlbe *tlbe)
{
return get_tlb_eaddr(tlbe) + get_tlb_bytes(tlbe) - 1;
}
static inline u64 get_tlb_raddr(const struct tlbe *tlbe)
{
u64 word1 = tlbe->word1;
return ((word1 & 0xf) << 32) | (word1 & 0xfffffc00);
}
static inline unsigned int get_tlb_tid(const struct tlbe *tlbe)
{
return tlbe->tid & 0xff;
}
static inline unsigned int get_tlb_ts(const struct tlbe *tlbe)
{
return (tlbe->word0 >> 8) & 0x1;
}
static inline unsigned int get_tlb_v(const struct tlbe *tlbe)
{
return (tlbe->word0 >> 9) & 0x1;
}
static inline unsigned int get_mmucr_stid(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.mmucr & 0xff;
}
static inline unsigned int get_mmucr_sts(const struct kvm_vcpu *vcpu)
{
return (vcpu->arch.mmucr >> 16) & 0x1;
}
static inline gpa_t tlb_xlate(struct tlbe *tlbe, gva_t eaddr)
{
unsigned int pgmask = get_tlb_bytes(tlbe) - 1;
return get_tlb_raddr(tlbe) | (eaddr & pgmask);
}
#endif /* __KVM_POWERPC_TLB_H__ */

42
arch/powerpc/kvm/Kconfig Normal file
View File

@ -0,0 +1,42 @@
#
# KVM configuration
#
menuconfig VIRTUALIZATION
bool "Virtualization"
---help---
Say Y here to get to see options for using your Linux host to run
other operating systems inside virtual machines (guests).
This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and
disabled.
if VIRTUALIZATION
config KVM
bool "Kernel-based Virtual Machine (KVM) support"
depends on 44x && EXPERIMENTAL
select PREEMPT_NOTIFIERS
select ANON_INODES
# We can only run on Book E hosts so far
select KVM_BOOKE_HOST
---help---
Support hosting virtualized guest machines. You will also
need to select one or more of the processor modules below.
This module provides access to the hardware capabilities through
a character device node named /dev/kvm.
If unsure, say N.
config KVM_BOOKE_HOST
bool "KVM host support for Book E PowerPC processors"
depends on KVM && 44x
---help---
Provides host support for KVM on Book E PowerPC processors. Currently
this works on 440 processors only.
source drivers/virtio/Kconfig
endif # VIRTUALIZATION

15
arch/powerpc/kvm/Makefile Normal file
View File

@ -0,0 +1,15 @@
#
# Makefile for Kernel-based Virtual Machine module
#
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm
common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
kvm-objs := $(common-objs) powerpc.o emulate.o booke_guest.o
obj-$(CONFIG_KVM) += kvm.o
AFLAGS_booke_interrupts.o := -I$(obj)
kvm-booke-host-objs := booke_host.o booke_interrupts.o 44x_tlb.o
obj-$(CONFIG_KVM_BOOKE_HOST) += kvm-booke-host.o

View File

@ -0,0 +1,615 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
* Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
*/
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <asm/cputable.h>
#include <asm/uaccess.h>
#include <asm/kvm_ppc.h>
#include "44x_tlb.h"
#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "exits", VCPU_STAT(sum_exits) },
{ "mmio", VCPU_STAT(mmio_exits) },
{ "dcr", VCPU_STAT(dcr_exits) },
{ "sig", VCPU_STAT(signal_exits) },
{ "light", VCPU_STAT(light_exits) },
{ "itlb_r", VCPU_STAT(itlb_real_miss_exits) },
{ "itlb_v", VCPU_STAT(itlb_virt_miss_exits) },
{ "dtlb_r", VCPU_STAT(dtlb_real_miss_exits) },
{ "dtlb_v", VCPU_STAT(dtlb_virt_miss_exits) },
{ "sysc", VCPU_STAT(syscall_exits) },
{ "isi", VCPU_STAT(isi_exits) },
{ "dsi", VCPU_STAT(dsi_exits) },
{ "inst_emu", VCPU_STAT(emulated_inst_exits) },
{ "dec", VCPU_STAT(dec_exits) },
{ "ext_intr", VCPU_STAT(ext_intr_exits) },
{ NULL }
};
static const u32 interrupt_msr_mask[16] = {
[BOOKE_INTERRUPT_CRITICAL] = MSR_ME,
[BOOKE_INTERRUPT_MACHINE_CHECK] = 0,
[BOOKE_INTERRUPT_DATA_STORAGE] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_INST_STORAGE] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_EXTERNAL] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_ALIGNMENT] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_PROGRAM] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_FP_UNAVAIL] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_SYSCALL] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_AP_UNAVAIL] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_DECREMENTER] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_FIT] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_WATCHDOG] = MSR_ME,
[BOOKE_INTERRUPT_DTLB_MISS] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_ITLB_MISS] = MSR_CE|MSR_ME|MSR_DE,
[BOOKE_INTERRUPT_DEBUG] = MSR_ME,
};
const unsigned char exception_priority[] = {
[BOOKE_INTERRUPT_DATA_STORAGE] = 0,
[BOOKE_INTERRUPT_INST_STORAGE] = 1,
[BOOKE_INTERRUPT_ALIGNMENT] = 2,
[BOOKE_INTERRUPT_PROGRAM] = 3,
[BOOKE_INTERRUPT_FP_UNAVAIL] = 4,
[BOOKE_INTERRUPT_SYSCALL] = 5,
[BOOKE_INTERRUPT_AP_UNAVAIL] = 6,
[BOOKE_INTERRUPT_DTLB_MISS] = 7,
[BOOKE_INTERRUPT_ITLB_MISS] = 8,
[BOOKE_INTERRUPT_MACHINE_CHECK] = 9,
[BOOKE_INTERRUPT_DEBUG] = 10,
[BOOKE_INTERRUPT_CRITICAL] = 11,
[BOOKE_INTERRUPT_WATCHDOG] = 12,
[BOOKE_INTERRUPT_EXTERNAL] = 13,
[BOOKE_INTERRUPT_FIT] = 14,
[BOOKE_INTERRUPT_DECREMENTER] = 15,
};
const unsigned char priority_exception[] = {
BOOKE_INTERRUPT_DATA_STORAGE,
BOOKE_INTERRUPT_INST_STORAGE,
BOOKE_INTERRUPT_ALIGNMENT,
BOOKE_INTERRUPT_PROGRAM,
BOOKE_INTERRUPT_FP_UNAVAIL,
BOOKE_INTERRUPT_SYSCALL,
BOOKE_INTERRUPT_AP_UNAVAIL,
BOOKE_INTERRUPT_DTLB_MISS,
BOOKE_INTERRUPT_ITLB_MISS,
BOOKE_INTERRUPT_MACHINE_CHECK,
BOOKE_INTERRUPT_DEBUG,
BOOKE_INTERRUPT_CRITICAL,
BOOKE_INTERRUPT_WATCHDOG,
BOOKE_INTERRUPT_EXTERNAL,
BOOKE_INTERRUPT_FIT,
BOOKE_INTERRUPT_DECREMENTER,
};
void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
{
struct tlbe *tlbe;
int i;
printk("vcpu %d TLB dump:\n", vcpu->vcpu_id);
printk("| %2s | %3s | %8s | %8s | %8s |\n",
"nr", "tid", "word0", "word1", "word2");
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
tlbe = &vcpu->arch.guest_tlb[i];
if (tlbe->word0 & PPC44x_TLB_VALID)
printk(" G%2d | %02X | %08X | %08X | %08X |\n",
i, tlbe->tid, tlbe->word0, tlbe->word1,
tlbe->word2);
}
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
tlbe = &vcpu->arch.shadow_tlb[i];
if (tlbe->word0 & PPC44x_TLB_VALID)
printk(" S%2d | %02X | %08X | %08X | %08X |\n",
i, tlbe->tid, tlbe->word0, tlbe->word1,
tlbe->word2);
}
}
/* TODO: use vcpu_printf() */
void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
{
int i;
printk("pc: %08x msr: %08x\n", vcpu->arch.pc, vcpu->arch.msr);
printk("lr: %08x ctr: %08x\n", vcpu->arch.lr, vcpu->arch.ctr);
printk("srr0: %08x srr1: %08x\n", vcpu->arch.srr0, vcpu->arch.srr1);
printk("exceptions: %08lx\n", vcpu->arch.pending_exceptions);
for (i = 0; i < 32; i += 4) {
printk("gpr%02d: %08x %08x %08x %08x\n", i,
vcpu->arch.gpr[i],
vcpu->arch.gpr[i+1],
vcpu->arch.gpr[i+2],
vcpu->arch.gpr[i+3]);
}
}
/* Check if we are ready to deliver the interrupt */
static int kvmppc_can_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
{
int r;
switch (interrupt) {
case BOOKE_INTERRUPT_CRITICAL:
r = vcpu->arch.msr & MSR_CE;
break;
case BOOKE_INTERRUPT_MACHINE_CHECK:
r = vcpu->arch.msr & MSR_ME;
break;
case BOOKE_INTERRUPT_EXTERNAL:
r = vcpu->arch.msr & MSR_EE;
break;
case BOOKE_INTERRUPT_DECREMENTER:
r = vcpu->arch.msr & MSR_EE;
break;
case BOOKE_INTERRUPT_FIT:
r = vcpu->arch.msr & MSR_EE;
break;
case BOOKE_INTERRUPT_WATCHDOG:
r = vcpu->arch.msr & MSR_CE;
break;
case BOOKE_INTERRUPT_DEBUG:
r = vcpu->arch.msr & MSR_DE;
break;
default:
r = 1;
}
return r;
}
static void kvmppc_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
{
switch (interrupt) {
case BOOKE_INTERRUPT_DECREMENTER:
vcpu->arch.tsr |= TSR_DIS;
break;
}
vcpu->arch.srr0 = vcpu->arch.pc;
vcpu->arch.srr1 = vcpu->arch.msr;
vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[interrupt];
kvmppc_set_msr(vcpu, vcpu->arch.msr & interrupt_msr_mask[interrupt]);
}
/* Check pending exceptions and deliver one, if possible. */
void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu)
{
unsigned long *pending = &vcpu->arch.pending_exceptions;
unsigned int exception;
unsigned int priority;
priority = find_first_bit(pending, BITS_PER_BYTE * sizeof(*pending));
while (priority <= BOOKE_MAX_INTERRUPT) {
exception = priority_exception[priority];
if (kvmppc_can_deliver_interrupt(vcpu, exception)) {
kvmppc_clear_exception(vcpu, exception);
kvmppc_deliver_interrupt(vcpu, exception);
break;
}
priority = find_next_bit(pending,
BITS_PER_BYTE * sizeof(*pending),
priority + 1);
}
}
static int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
{
enum emulation_result er;
int r;
er = kvmppc_emulate_instruction(run, vcpu);
switch (er) {
case EMULATE_DONE:
/* Future optimization: only reload non-volatiles if they were
* actually modified. */
r = RESUME_GUEST_NV;
break;
case EMULATE_DO_MMIO:
run->exit_reason = KVM_EXIT_MMIO;
/* We must reload nonvolatiles because "update" load/store
* instructions modify register state. */
/* Future optimization: only reload non-volatiles if they were
* actually modified. */
r = RESUME_HOST_NV;
break;
case EMULATE_FAIL:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__,
vcpu->arch.last_inst);
r = RESUME_HOST;
break;
default:
BUG();
}
return r;
}
/**
* kvmppc_handle_exit
*
* Return value is in the form (errcode<<2 | RESUME_FLAG_HOST | RESUME_FLAG_NV)
*/
int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int exit_nr)
{
enum emulation_result er;
int r = RESUME_HOST;
local_irq_enable();
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
switch (exit_nr) {
case BOOKE_INTERRUPT_MACHINE_CHECK:
printk("MACHINE CHECK: %lx\n", mfspr(SPRN_MCSR));
kvmppc_dump_vcpu(vcpu);
r = RESUME_HOST;
break;
case BOOKE_INTERRUPT_EXTERNAL:
case BOOKE_INTERRUPT_DECREMENTER:
/* Since we switched IVPR back to the host's value, the host
* handled this interrupt the moment we enabled interrupts.
* Now we just offer it a chance to reschedule the guest. */
/* XXX At this point the TLB still holds our shadow TLB, so if
* we do reschedule the host will fault over it. Perhaps we
* should politely restore the host's entries to minimize
* misses before ceding control. */
if (need_resched())
cond_resched();
if (exit_nr == BOOKE_INTERRUPT_DECREMENTER)
vcpu->stat.dec_exits++;
else
vcpu->stat.ext_intr_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_PROGRAM:
if (vcpu->arch.msr & MSR_PR) {
/* Program traps generated by user-level software must be handled
* by the guest kernel. */
vcpu->arch.esr = vcpu->arch.fault_esr;
kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
r = RESUME_GUEST;
break;
}
er = kvmppc_emulate_instruction(run, vcpu);
switch (er) {
case EMULATE_DONE:
/* Future optimization: only reload non-volatiles if
* they were actually modified by emulation. */
vcpu->stat.emulated_inst_exits++;
r = RESUME_GUEST_NV;
break;
case EMULATE_DO_DCR:
run->exit_reason = KVM_EXIT_DCR;
r = RESUME_HOST;
break;
case EMULATE_FAIL:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_CRIT "%s: emulation at %x failed (%08x)\n",
__func__, vcpu->arch.pc, vcpu->arch.last_inst);
/* For debugging, encode the failing instruction and
* report it to userspace. */
run->hw.hardware_exit_reason = ~0ULL << 32;
run->hw.hardware_exit_reason |= vcpu->arch.last_inst;
r = RESUME_HOST;
break;
default:
BUG();
}
break;
case BOOKE_INTERRUPT_DATA_STORAGE:
vcpu->arch.dear = vcpu->arch.fault_dear;
vcpu->arch.esr = vcpu->arch.fault_esr;
kvmppc_queue_exception(vcpu, exit_nr);
vcpu->stat.dsi_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_INST_STORAGE:
vcpu->arch.esr = vcpu->arch.fault_esr;
kvmppc_queue_exception(vcpu, exit_nr);
vcpu->stat.isi_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_SYSCALL:
kvmppc_queue_exception(vcpu, exit_nr);
vcpu->stat.syscall_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_DTLB_MISS: {
struct tlbe *gtlbe;
unsigned long eaddr = vcpu->arch.fault_dear;
gfn_t gfn;
/* Check the guest TLB. */
gtlbe = kvmppc_44x_dtlb_search(vcpu, eaddr);
if (!gtlbe) {
/* The guest didn't have a mapping for it. */
kvmppc_queue_exception(vcpu, exit_nr);
vcpu->arch.dear = vcpu->arch.fault_dear;
vcpu->arch.esr = vcpu->arch.fault_esr;
vcpu->stat.dtlb_real_miss_exits++;
r = RESUME_GUEST;
break;
}
vcpu->arch.paddr_accessed = tlb_xlate(gtlbe, eaddr);
gfn = vcpu->arch.paddr_accessed >> PAGE_SHIFT;
if (kvm_is_visible_gfn(vcpu->kvm, gfn)) {
/* The guest TLB had a mapping, but the shadow TLB
* didn't, and it is RAM. This could be because:
* a) the entry is mapping the host kernel, or
* b) the guest used a large mapping which we're faking
* Either way, we need to satisfy the fault without
* invoking the guest. */
kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid,
gtlbe->word2);
vcpu->stat.dtlb_virt_miss_exits++;
r = RESUME_GUEST;
} else {
/* Guest has mapped and accessed a page which is not
* actually RAM. */
r = kvmppc_emulate_mmio(run, vcpu);
}
break;
}
case BOOKE_INTERRUPT_ITLB_MISS: {
struct tlbe *gtlbe;
unsigned long eaddr = vcpu->arch.pc;
gfn_t gfn;
r = RESUME_GUEST;
/* Check the guest TLB. */
gtlbe = kvmppc_44x_itlb_search(vcpu, eaddr);
if (!gtlbe) {
/* The guest didn't have a mapping for it. */
kvmppc_queue_exception(vcpu, exit_nr);
vcpu->stat.itlb_real_miss_exits++;
break;
}
vcpu->stat.itlb_virt_miss_exits++;
gfn = tlb_xlate(gtlbe, eaddr) >> PAGE_SHIFT;
if (kvm_is_visible_gfn(vcpu->kvm, gfn)) {
/* The guest TLB had a mapping, but the shadow TLB
* didn't. This could be because:
* a) the entry is mapping the host kernel, or
* b) the guest used a large mapping which we're faking
* Either way, we need to satisfy the fault without
* invoking the guest. */
kvmppc_mmu_map(vcpu, eaddr, gfn, gtlbe->tid,
gtlbe->word2);
} else {
/* Guest mapped and leaped at non-RAM! */
kvmppc_queue_exception(vcpu,
BOOKE_INTERRUPT_MACHINE_CHECK);
}
break;
}
default:
printk(KERN_EMERG "exit_nr %d\n", exit_nr);
BUG();
}
local_irq_disable();
kvmppc_check_and_deliver_interrupts(vcpu);
/* Do some exit accounting. */
vcpu->stat.sum_exits++;
if (!(r & RESUME_HOST)) {
/* To avoid clobbering exit_reason, only check for signals if
* we aren't already exiting to userspace for some other
* reason. */
if (signal_pending(current)) {
run->exit_reason = KVM_EXIT_INTR;
r = (-EINTR << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
vcpu->stat.signal_exits++;
} else {
vcpu->stat.light_exits++;
}
} else {
switch (run->exit_reason) {
case KVM_EXIT_MMIO:
vcpu->stat.mmio_exits++;
break;
case KVM_EXIT_DCR:
vcpu->stat.dcr_exits++;
break;
case KVM_EXIT_INTR:
vcpu->stat.signal_exits++;
break;
}
}
return r;
}
/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
struct tlbe *tlbe = &vcpu->arch.guest_tlb[0];
tlbe->tid = 0;
tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
tlbe->word1 = 0;
tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
tlbe++;
tlbe->tid = 0;
tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
tlbe->word1 = 0xef600000;
tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
| PPC44x_TLB_I | PPC44x_TLB_G;
vcpu->arch.pc = 0;
vcpu->arch.msr = 0;
vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */
/* Eye-catching number so we know if the guest takes an interrupt
* before it's programmed its own IVPR. */
vcpu->arch.ivpr = 0x55550000;
/* Since the guest can directly access the timebase, it must know the
* real timebase frequency. Accordingly, it must see the state of
* CCR1[TCS]. */
vcpu->arch.ccr1 = mfspr(SPRN_CCR1);
return 0;
}
int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
int i;
regs->pc = vcpu->arch.pc;
regs->cr = vcpu->arch.cr;
regs->ctr = vcpu->arch.ctr;
regs->lr = vcpu->arch.lr;
regs->xer = vcpu->arch.xer;
regs->msr = vcpu->arch.msr;
regs->srr0 = vcpu->arch.srr0;
regs->srr1 = vcpu->arch.srr1;
regs->pid = vcpu->arch.pid;
regs->sprg0 = vcpu->arch.sprg0;
regs->sprg1 = vcpu->arch.sprg1;
regs->sprg2 = vcpu->arch.sprg2;
regs->sprg3 = vcpu->arch.sprg3;
regs->sprg5 = vcpu->arch.sprg4;
regs->sprg6 = vcpu->arch.sprg5;
regs->sprg7 = vcpu->arch.sprg6;
for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
regs->gpr[i] = vcpu->arch.gpr[i];
return 0;
}
int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
int i;
vcpu->arch.pc = regs->pc;
vcpu->arch.cr = regs->cr;
vcpu->arch.ctr = regs->ctr;
vcpu->arch.lr = regs->lr;
vcpu->arch.xer = regs->xer;
vcpu->arch.msr = regs->msr;
vcpu->arch.srr0 = regs->srr0;
vcpu->arch.srr1 = regs->srr1;
vcpu->arch.sprg0 = regs->sprg0;
vcpu->arch.sprg1 = regs->sprg1;
vcpu->arch.sprg2 = regs->sprg2;
vcpu->arch.sprg3 = regs->sprg3;
vcpu->arch.sprg5 = regs->sprg4;
vcpu->arch.sprg6 = regs->sprg5;
vcpu->arch.sprg7 = regs->sprg6;
for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
vcpu->arch.gpr[i] = regs->gpr[i];
return 0;
}
int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
return -ENOTSUPP;
}
int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
return -ENOTSUPP;
}
int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
return -ENOTSUPP;
}
int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
return -ENOTSUPP;
}
/* 'linear_address' is actually an encoding of AS|PID|EADDR . */
int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
struct tlbe *gtlbe;
int index;
gva_t eaddr;
u8 pid;
u8 as;
eaddr = tr->linear_address;
pid = (tr->linear_address >> 32) & 0xff;
as = (tr->linear_address >> 40) & 0x1;
index = kvmppc_44x_tlb_index(vcpu, eaddr, pid, as);
if (index == -1) {
tr->valid = 0;
return 0;
}
gtlbe = &vcpu->arch.guest_tlb[index];
tr->physical_address = tlb_xlate(gtlbe, eaddr);
/* XXX what does "writeable" and "usermode" even mean? */
tr->valid = 1;
return 0;
}

View File

@ -0,0 +1,83 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2008
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#include <linux/errno.h>
#include <linux/kvm_host.h>
#include <linux/module.h>
#include <asm/cacheflush.h>
#include <asm/kvm_ppc.h>
unsigned long kvmppc_booke_handlers;
static int kvmppc_booke_init(void)
{
unsigned long ivor[16];
unsigned long max_ivor = 0;
int i;
/* We install our own exception handlers by hijacking IVPR. IVPR must
* be 16-bit aligned, so we need a 64KB allocation. */
kvmppc_booke_handlers = __get_free_pages(GFP_KERNEL | __GFP_ZERO,
VCPU_SIZE_ORDER);
if (!kvmppc_booke_handlers)
return -ENOMEM;
/* XXX make sure our handlers are smaller than Linux's */
/* Copy our interrupt handlers to match host IVORs. That way we don't
* have to swap the IVORs on every guest/host transition. */
ivor[0] = mfspr(SPRN_IVOR0);
ivor[1] = mfspr(SPRN_IVOR1);
ivor[2] = mfspr(SPRN_IVOR2);
ivor[3] = mfspr(SPRN_IVOR3);
ivor[4] = mfspr(SPRN_IVOR4);
ivor[5] = mfspr(SPRN_IVOR5);
ivor[6] = mfspr(SPRN_IVOR6);
ivor[7] = mfspr(SPRN_IVOR7);
ivor[8] = mfspr(SPRN_IVOR8);
ivor[9] = mfspr(SPRN_IVOR9);
ivor[10] = mfspr(SPRN_IVOR10);
ivor[11] = mfspr(SPRN_IVOR11);
ivor[12] = mfspr(SPRN_IVOR12);
ivor[13] = mfspr(SPRN_IVOR13);
ivor[14] = mfspr(SPRN_IVOR14);
ivor[15] = mfspr(SPRN_IVOR15);
for (i = 0; i < 16; i++) {
if (ivor[i] > max_ivor)
max_ivor = ivor[i];
memcpy((void *)kvmppc_booke_handlers + ivor[i],
kvmppc_handlers_start + i * kvmppc_handler_len,
kvmppc_handler_len);
}
flush_icache_range(kvmppc_booke_handlers,
kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
}
static void __exit kvmppc_booke_exit(void)
{
free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
kvm_exit();
}
module_init(kvmppc_booke_init)
module_exit(kvmppc_booke_exit)

View File

@ -0,0 +1,436 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#include <asm/ppc_asm.h>
#include <asm/kvm_asm.h>
#include <asm/reg.h>
#include <asm/mmu-44x.h>
#include <asm/page.h>
#include <asm/asm-offsets.h>
#define KVMPPC_MSR_MASK (MSR_CE|MSR_EE|MSR_PR|MSR_DE|MSR_ME|MSR_IS|MSR_DS)
#define VCPU_GPR(n) (VCPU_GPRS + (n * 4))
/* The host stack layout: */
#define HOST_R1 0 /* Implied by stwu. */
#define HOST_CALLEE_LR 4
#define HOST_RUN 8
/* r2 is special: it holds 'current', and it made nonvolatile in the
* kernel with the -ffixed-r2 gcc option. */
#define HOST_R2 12
#define HOST_NV_GPRS 16
#define HOST_NV_GPR(n) (HOST_NV_GPRS + ((n - 14) * 4))
#define HOST_MIN_STACK_SIZE (HOST_NV_GPR(31) + 4)
#define HOST_STACK_SIZE (((HOST_MIN_STACK_SIZE + 15) / 16) * 16) /* Align. */
#define HOST_STACK_LR (HOST_STACK_SIZE + 4) /* In caller stack frame. */
#define NEED_INST_MASK ((1<<BOOKE_INTERRUPT_PROGRAM) | \
(1<<BOOKE_INTERRUPT_DTLB_MISS))
#define NEED_DEAR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \
(1<<BOOKE_INTERRUPT_DTLB_MISS))
#define NEED_ESR_MASK ((1<<BOOKE_INTERRUPT_DATA_STORAGE) | \
(1<<BOOKE_INTERRUPT_INST_STORAGE) | \
(1<<BOOKE_INTERRUPT_PROGRAM) | \
(1<<BOOKE_INTERRUPT_DTLB_MISS))
.macro KVM_HANDLER ivor_nr
_GLOBAL(kvmppc_handler_\ivor_nr)
/* Get pointer to vcpu and record exit number. */
mtspr SPRN_SPRG0, r4
mfspr r4, SPRN_SPRG1
stw r5, VCPU_GPR(r5)(r4)
stw r6, VCPU_GPR(r6)(r4)
mfctr r5
lis r6, kvmppc_resume_host@h
stw r5, VCPU_CTR(r4)
li r5, \ivor_nr
ori r6, r6, kvmppc_resume_host@l
mtctr r6
bctr
.endm
_GLOBAL(kvmppc_handlers_start)
KVM_HANDLER BOOKE_INTERRUPT_CRITICAL
KVM_HANDLER BOOKE_INTERRUPT_MACHINE_CHECK
KVM_HANDLER BOOKE_INTERRUPT_DATA_STORAGE
KVM_HANDLER BOOKE_INTERRUPT_INST_STORAGE
KVM_HANDLER BOOKE_INTERRUPT_EXTERNAL
KVM_HANDLER BOOKE_INTERRUPT_ALIGNMENT
KVM_HANDLER BOOKE_INTERRUPT_PROGRAM
KVM_HANDLER BOOKE_INTERRUPT_FP_UNAVAIL
KVM_HANDLER BOOKE_INTERRUPT_SYSCALL
KVM_HANDLER BOOKE_INTERRUPT_AP_UNAVAIL
KVM_HANDLER BOOKE_INTERRUPT_DECREMENTER
KVM_HANDLER BOOKE_INTERRUPT_FIT
KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG
KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS
KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS
KVM_HANDLER BOOKE_INTERRUPT_DEBUG
_GLOBAL(kvmppc_handler_len)
.long kvmppc_handler_1 - kvmppc_handler_0
/* Registers:
* SPRG0: guest r4
* r4: vcpu pointer
* r5: KVM exit number
*/
_GLOBAL(kvmppc_resume_host)
stw r3, VCPU_GPR(r3)(r4)
mfcr r3
stw r3, VCPU_CR(r4)
stw r7, VCPU_GPR(r7)(r4)
stw r8, VCPU_GPR(r8)(r4)
stw r9, VCPU_GPR(r9)(r4)
li r6, 1
slw r6, r6, r5
/* Save the faulting instruction and all GPRs for emulation. */
andi. r7, r6, NEED_INST_MASK
beq ..skip_inst_copy
mfspr r9, SPRN_SRR0
mfmsr r8
ori r7, r8, MSR_DS
mtmsr r7
isync
lwz r9, 0(r9)
mtmsr r8
isync
stw r9, VCPU_LAST_INST(r4)
stw r15, VCPU_GPR(r15)(r4)
stw r16, VCPU_GPR(r16)(r4)
stw r17, VCPU_GPR(r17)(r4)
stw r18, VCPU_GPR(r18)(r4)
stw r19, VCPU_GPR(r19)(r4)
stw r20, VCPU_GPR(r20)(r4)
stw r21, VCPU_GPR(r21)(r4)
stw r22, VCPU_GPR(r22)(r4)
stw r23, VCPU_GPR(r23)(r4)
stw r24, VCPU_GPR(r24)(r4)
stw r25, VCPU_GPR(r25)(r4)
stw r26, VCPU_GPR(r26)(r4)
stw r27, VCPU_GPR(r27)(r4)
stw r28, VCPU_GPR(r28)(r4)
stw r29, VCPU_GPR(r29)(r4)
stw r30, VCPU_GPR(r30)(r4)
stw r31, VCPU_GPR(r31)(r4)
..skip_inst_copy:
/* Also grab DEAR and ESR before the host can clobber them. */
andi. r7, r6, NEED_DEAR_MASK
beq ..skip_dear
mfspr r9, SPRN_DEAR
stw r9, VCPU_FAULT_DEAR(r4)
..skip_dear:
andi. r7, r6, NEED_ESR_MASK
beq ..skip_esr
mfspr r9, SPRN_ESR
stw r9, VCPU_FAULT_ESR(r4)
..skip_esr:
/* Save remaining volatile guest register state to vcpu. */
stw r0, VCPU_GPR(r0)(r4)
stw r1, VCPU_GPR(r1)(r4)
stw r2, VCPU_GPR(r2)(r4)
stw r10, VCPU_GPR(r10)(r4)
stw r11, VCPU_GPR(r11)(r4)
stw r12, VCPU_GPR(r12)(r4)
stw r13, VCPU_GPR(r13)(r4)
stw r14, VCPU_GPR(r14)(r4) /* We need a NV GPR below. */
mflr r3
stw r3, VCPU_LR(r4)
mfxer r3
stw r3, VCPU_XER(r4)
mfspr r3, SPRN_SPRG0
stw r3, VCPU_GPR(r4)(r4)
mfspr r3, SPRN_SRR0
stw r3, VCPU_PC(r4)
/* Restore host stack pointer and PID before IVPR, since the host
* exception handlers use them. */
lwz r1, VCPU_HOST_STACK(r4)
lwz r3, VCPU_HOST_PID(r4)
mtspr SPRN_PID, r3
/* Restore host IVPR before re-enabling interrupts. We cheat and know
* that Linux IVPR is always 0xc0000000. */
lis r3, 0xc000
mtspr SPRN_IVPR, r3
/* Switch to kernel stack and jump to handler. */
LOAD_REG_ADDR(r3, kvmppc_handle_exit)
mtctr r3
lwz r3, HOST_RUN(r1)
lwz r2, HOST_R2(r1)
mr r14, r4 /* Save vcpu pointer. */
bctrl /* kvmppc_handle_exit() */
/* Restore vcpu pointer and the nonvolatiles we used. */
mr r4, r14
lwz r14, VCPU_GPR(r14)(r4)
/* Sometimes instruction emulation must restore complete GPR state. */
andi. r5, r3, RESUME_FLAG_NV
beq ..skip_nv_load
lwz r15, VCPU_GPR(r15)(r4)
lwz r16, VCPU_GPR(r16)(r4)
lwz r17, VCPU_GPR(r17)(r4)
lwz r18, VCPU_GPR(r18)(r4)
lwz r19, VCPU_GPR(r19)(r4)
lwz r20, VCPU_GPR(r20)(r4)
lwz r21, VCPU_GPR(r21)(r4)
lwz r22, VCPU_GPR(r22)(r4)
lwz r23, VCPU_GPR(r23)(r4)
lwz r24, VCPU_GPR(r24)(r4)
lwz r25, VCPU_GPR(r25)(r4)
lwz r26, VCPU_GPR(r26)(r4)
lwz r27, VCPU_GPR(r27)(r4)
lwz r28, VCPU_GPR(r28)(r4)
lwz r29, VCPU_GPR(r29)(r4)
lwz r30, VCPU_GPR(r30)(r4)
lwz r31, VCPU_GPR(r31)(r4)
..skip_nv_load:
/* Should we return to the guest? */
andi. r5, r3, RESUME_FLAG_HOST
beq lightweight_exit
srawi r3, r3, 2 /* Shift -ERR back down. */
heavyweight_exit:
/* Not returning to guest. */
/* We already saved guest volatile register state; now save the
* non-volatiles. */
stw r15, VCPU_GPR(r15)(r4)
stw r16, VCPU_GPR(r16)(r4)
stw r17, VCPU_GPR(r17)(r4)
stw r18, VCPU_GPR(r18)(r4)
stw r19, VCPU_GPR(r19)(r4)
stw r20, VCPU_GPR(r20)(r4)
stw r21, VCPU_GPR(r21)(r4)
stw r22, VCPU_GPR(r22)(r4)
stw r23, VCPU_GPR(r23)(r4)
stw r24, VCPU_GPR(r24)(r4)
stw r25, VCPU_GPR(r25)(r4)
stw r26, VCPU_GPR(r26)(r4)
stw r27, VCPU_GPR(r27)(r4)
stw r28, VCPU_GPR(r28)(r4)
stw r29, VCPU_GPR(r29)(r4)
stw r30, VCPU_GPR(r30)(r4)
stw r31, VCPU_GPR(r31)(r4)
/* Load host non-volatile register state from host stack. */
lwz r14, HOST_NV_GPR(r14)(r1)
lwz r15, HOST_NV_GPR(r15)(r1)
lwz r16, HOST_NV_GPR(r16)(r1)
lwz r17, HOST_NV_GPR(r17)(r1)
lwz r18, HOST_NV_GPR(r18)(r1)
lwz r19, HOST_NV_GPR(r19)(r1)
lwz r20, HOST_NV_GPR(r20)(r1)
lwz r21, HOST_NV_GPR(r21)(r1)
lwz r22, HOST_NV_GPR(r22)(r1)
lwz r23, HOST_NV_GPR(r23)(r1)
lwz r24, HOST_NV_GPR(r24)(r1)
lwz r25, HOST_NV_GPR(r25)(r1)
lwz r26, HOST_NV_GPR(r26)(r1)
lwz r27, HOST_NV_GPR(r27)(r1)
lwz r28, HOST_NV_GPR(r28)(r1)
lwz r29, HOST_NV_GPR(r29)(r1)
lwz r30, HOST_NV_GPR(r30)(r1)
lwz r31, HOST_NV_GPR(r31)(r1)
/* Return to kvm_vcpu_run(). */
lwz r4, HOST_STACK_LR(r1)
addi r1, r1, HOST_STACK_SIZE
mtlr r4
/* r3 still contains the return code from kvmppc_handle_exit(). */
blr
/* Registers:
* r3: kvm_run pointer
* r4: vcpu pointer
*/
_GLOBAL(__kvmppc_vcpu_run)
stwu r1, -HOST_STACK_SIZE(r1)
stw r1, VCPU_HOST_STACK(r4) /* Save stack pointer to vcpu. */
/* Save host state to stack. */
stw r3, HOST_RUN(r1)
mflr r3
stw r3, HOST_STACK_LR(r1)
/* Save host non-volatile register state to stack. */
stw r14, HOST_NV_GPR(r14)(r1)
stw r15, HOST_NV_GPR(r15)(r1)
stw r16, HOST_NV_GPR(r16)(r1)
stw r17, HOST_NV_GPR(r17)(r1)
stw r18, HOST_NV_GPR(r18)(r1)
stw r19, HOST_NV_GPR(r19)(r1)
stw r20, HOST_NV_GPR(r20)(r1)
stw r21, HOST_NV_GPR(r21)(r1)
stw r22, HOST_NV_GPR(r22)(r1)
stw r23, HOST_NV_GPR(r23)(r1)
stw r24, HOST_NV_GPR(r24)(r1)
stw r25, HOST_NV_GPR(r25)(r1)
stw r26, HOST_NV_GPR(r26)(r1)
stw r27, HOST_NV_GPR(r27)(r1)
stw r28, HOST_NV_GPR(r28)(r1)
stw r29, HOST_NV_GPR(r29)(r1)
stw r30, HOST_NV_GPR(r30)(r1)
stw r31, HOST_NV_GPR(r31)(r1)
/* Load guest non-volatiles. */
lwz r14, VCPU_GPR(r14)(r4)
lwz r15, VCPU_GPR(r15)(r4)
lwz r16, VCPU_GPR(r16)(r4)
lwz r17, VCPU_GPR(r17)(r4)
lwz r18, VCPU_GPR(r18)(r4)
lwz r19, VCPU_GPR(r19)(r4)
lwz r20, VCPU_GPR(r20)(r4)
lwz r21, VCPU_GPR(r21)(r4)
lwz r22, VCPU_GPR(r22)(r4)
lwz r23, VCPU_GPR(r23)(r4)
lwz r24, VCPU_GPR(r24)(r4)
lwz r25, VCPU_GPR(r25)(r4)
lwz r26, VCPU_GPR(r26)(r4)
lwz r27, VCPU_GPR(r27)(r4)
lwz r28, VCPU_GPR(r28)(r4)
lwz r29, VCPU_GPR(r29)(r4)
lwz r30, VCPU_GPR(r30)(r4)
lwz r31, VCPU_GPR(r31)(r4)
lightweight_exit:
stw r2, HOST_R2(r1)
mfspr r3, SPRN_PID
stw r3, VCPU_HOST_PID(r4)
lwz r3, VCPU_PID(r4)
mtspr SPRN_PID, r3
/* Prevent all TLB updates. */
mfmsr r5
lis r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@h
ori r6, r6, (MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
andc r6, r5, r6
mtmsr r6
/* Save the host's non-pinned TLB mappings, and load the guest mappings
* over them. Leave the host's "pinned" kernel mappings in place. */
/* XXX optimization: use generation count to avoid swapping unmodified
* entries. */
mfspr r10, SPRN_MMUCR /* Save host MMUCR. */
lis r8, tlb_44x_hwater@ha
lwz r8, tlb_44x_hwater@l(r8)
addi r3, r4, VCPU_HOST_TLB - 4
addi r9, r4, VCPU_SHADOW_TLB - 4
li r6, 0
1:
/* Save host entry. */
tlbre r7, r6, PPC44x_TLB_PAGEID
mfspr r5, SPRN_MMUCR
stwu r5, 4(r3)
stwu r7, 4(r3)
tlbre r7, r6, PPC44x_TLB_XLAT
stwu r7, 4(r3)
tlbre r7, r6, PPC44x_TLB_ATTRIB
stwu r7, 4(r3)
/* Load guest entry. */
lwzu r7, 4(r9)
mtspr SPRN_MMUCR, r7
lwzu r7, 4(r9)
tlbwe r7, r6, PPC44x_TLB_PAGEID
lwzu r7, 4(r9)
tlbwe r7, r6, PPC44x_TLB_XLAT
lwzu r7, 4(r9)
tlbwe r7, r6, PPC44x_TLB_ATTRIB
/* Increment index. */
addi r6, r6, 1
cmpw r6, r8
blt 1b
mtspr SPRN_MMUCR, r10 /* Restore host MMUCR. */
iccci 0, 0 /* XXX hack */
/* Load some guest volatiles. */
lwz r0, VCPU_GPR(r0)(r4)
lwz r2, VCPU_GPR(r2)(r4)
lwz r9, VCPU_GPR(r9)(r4)
lwz r10, VCPU_GPR(r10)(r4)
lwz r11, VCPU_GPR(r11)(r4)
lwz r12, VCPU_GPR(r12)(r4)
lwz r13, VCPU_GPR(r13)(r4)
lwz r3, VCPU_LR(r4)
mtlr r3
lwz r3, VCPU_XER(r4)
mtxer r3
/* Switch the IVPR. XXX If we take a TLB miss after this we're screwed,
* so how do we make sure vcpu won't fault? */
lis r8, kvmppc_booke_handlers@ha
lwz r8, kvmppc_booke_handlers@l(r8)
mtspr SPRN_IVPR, r8
/* Save vcpu pointer for the exception handlers. */
mtspr SPRN_SPRG1, r4
/* Can't switch the stack pointer until after IVPR is switched,
* because host interrupt handlers would get confused. */
lwz r1, VCPU_GPR(r1)(r4)
/* XXX handle USPRG0 */
/* Host interrupt handlers may have clobbered these guest-readable
* SPRGs, so we need to reload them here with the guest's values. */
lwz r3, VCPU_SPRG4(r4)
mtspr SPRN_SPRG4, r3
lwz r3, VCPU_SPRG5(r4)
mtspr SPRN_SPRG5, r3
lwz r3, VCPU_SPRG6(r4)
mtspr SPRN_SPRG6, r3
lwz r3, VCPU_SPRG7(r4)
mtspr SPRN_SPRG7, r3
/* Finish loading guest volatiles and jump to guest. */
lwz r3, VCPU_CTR(r4)
mtctr r3
lwz r3, VCPU_CR(r4)
mtcr r3
lwz r5, VCPU_GPR(r5)(r4)
lwz r6, VCPU_GPR(r6)(r4)
lwz r7, VCPU_GPR(r7)(r4)
lwz r8, VCPU_GPR(r8)(r4)
lwz r3, VCPU_PC(r4)
mtsrr0 r3
lwz r3, VCPU_MSR(r4)
oris r3, r3, KVMPPC_MSR_MASK@h
ori r3, r3, KVMPPC_MSR_MASK@l
mtsrr1 r3
lwz r3, VCPU_GPR(r3)(r4)
lwz r4, VCPU_GPR(r4)(r4)
rfi

760
arch/powerpc/kvm/emulate.c Normal file
View File

@ -0,0 +1,760 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#include <linux/jiffies.h>
#include <linux/timer.h>
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm_host.h>
#include <asm/dcr.h>
#include <asm/dcr-regs.h>
#include <asm/time.h>
#include <asm/byteorder.h>
#include <asm/kvm_ppc.h>
#include "44x_tlb.h"
/* Instruction decoding */
static inline unsigned int get_op(u32 inst)
{
return inst >> 26;
}
static inline unsigned int get_xop(u32 inst)
{
return (inst >> 1) & 0x3ff;
}
static inline unsigned int get_sprn(u32 inst)
{
return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
}
static inline unsigned int get_dcrn(u32 inst)
{
return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
}
static inline unsigned int get_rt(u32 inst)
{
return (inst >> 21) & 0x1f;
}
static inline unsigned int get_rs(u32 inst)
{
return (inst >> 21) & 0x1f;
}
static inline unsigned int get_ra(u32 inst)
{
return (inst >> 16) & 0x1f;
}
static inline unsigned int get_rb(u32 inst)
{
return (inst >> 11) & 0x1f;
}
static inline unsigned int get_rc(u32 inst)
{
return inst & 0x1;
}
static inline unsigned int get_ws(u32 inst)
{
return (inst >> 11) & 0x1f;
}
static inline unsigned int get_d(u32 inst)
{
return inst & 0xffff;
}
static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
const struct tlbe *tlbe)
{
gpa_t gpa;
if (!get_tlb_v(tlbe))
return 0;
/* Does it match current guest AS? */
/* XXX what about IS != DS? */
if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS))
return 0;
gpa = get_tlb_raddr(tlbe);
if (!gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT))
/* Mapping is not for RAM. */
return 0;
return 1;
}
static int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u32 inst)
{
u64 eaddr;
u64 raddr;
u64 asid;
u32 flags;
struct tlbe *tlbe;
unsigned int ra;
unsigned int rs;
unsigned int ws;
unsigned int index;
ra = get_ra(inst);
rs = get_rs(inst);
ws = get_ws(inst);
index = vcpu->arch.gpr[ra];
if (index > PPC44x_TLB_SIZE) {
printk("%s: index %d\n", __func__, index);
kvmppc_dump_vcpu(vcpu);
return EMULATE_FAIL;
}
tlbe = &vcpu->arch.guest_tlb[index];
/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
if (tlbe->word0 & PPC44x_TLB_VALID) {
eaddr = get_tlb_eaddr(tlbe);
asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
kvmppc_mmu_invalidate(vcpu, eaddr, asid);
}
switch (ws) {
case PPC44x_TLB_PAGEID:
tlbe->tid = vcpu->arch.mmucr & 0xff;
tlbe->word0 = vcpu->arch.gpr[rs];
break;
case PPC44x_TLB_XLAT:
tlbe->word1 = vcpu->arch.gpr[rs];
break;
case PPC44x_TLB_ATTRIB:
tlbe->word2 = vcpu->arch.gpr[rs];
break;
default:
return EMULATE_FAIL;
}
if (tlbe_is_host_safe(vcpu, tlbe)) {
eaddr = get_tlb_eaddr(tlbe);
raddr = get_tlb_raddr(tlbe);
asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
flags = tlbe->word2 & 0xffff;
/* Create a 4KB mapping on the host. If the guest wanted a
* large page, only the first 4KB is mapped here and the rest
* are mapped on the fly. */
kvmppc_mmu_map(vcpu, eaddr, raddr >> PAGE_SHIFT, asid, flags);
}
return EMULATE_DONE;
}
static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.tcr & TCR_DIE) {
/* The decrementer ticks at the same rate as the timebase, so
* that's how we convert the guest DEC value to the number of
* host ticks. */
unsigned long nr_jiffies;
nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
mod_timer(&vcpu->arch.dec_timer,
get_jiffies_64() + nr_jiffies);
} else {
del_timer(&vcpu->arch.dec_timer);
}
}
static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
{
vcpu->arch.pc = vcpu->arch.srr0;
kvmppc_set_msr(vcpu, vcpu->arch.srr1);
}
/* XXX to do:
* lhax
* lhaux
* lswx
* lswi
* stswx
* stswi
* lha
* lhau
* lmw
* stmw
*
* XXX is_bigendian should depend on MMU mapping or MSR[LE]
*/
int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
{
u32 inst = vcpu->arch.last_inst;
u32 ea;
int ra;
int rb;
int rc;
int rs;
int rt;
int sprn;
int dcrn;
enum emulation_result emulated = EMULATE_DONE;
int advance = 1;
switch (get_op(inst)) {
case 3: /* trap */
printk("trap!\n");
kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
advance = 0;
break;
case 19:
switch (get_xop(inst)) {
case 50: /* rfi */
kvmppc_emul_rfi(vcpu);
advance = 0;
break;
default:
emulated = EMULATE_FAIL;
break;
}
break;
case 31:
switch (get_xop(inst)) {
case 83: /* mfmsr */
rt = get_rt(inst);
vcpu->arch.gpr[rt] = vcpu->arch.msr;
break;
case 87: /* lbzx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
case 131: /* wrtee */
rs = get_rs(inst);
vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
| (vcpu->arch.gpr[rs] & MSR_EE);
break;
case 146: /* mtmsr */
rs = get_rs(inst);
kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]);
break;
case 163: /* wrteei */
vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
| (inst & MSR_EE);
break;
case 215: /* stbx */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
1, 1);
break;
case 247: /* stbux */
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
ea = vcpu->arch.gpr[rb];
if (ra)
ea += vcpu->arch.gpr[ra];
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
1, 1);
vcpu->arch.gpr[rs] = ea;
break;
case 279: /* lhzx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
break;
case 311: /* lhzux */
rt = get_rt(inst);
ra = get_ra(inst);
rb = get_rb(inst);
ea = vcpu->arch.gpr[rb];
if (ra)
ea += vcpu->arch.gpr[ra];
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
vcpu->arch.gpr[ra] = ea;
break;
case 323: /* mfdcr */
dcrn = get_dcrn(inst);
rt = get_rt(inst);
/* The guest may access CPR0 registers to determine the timebase
* frequency, and it must know the real host frequency because it
* can directly access the timebase registers.
*
* It would be possible to emulate those accesses in userspace,
* but userspace can really only figure out the end frequency.
* We could decompose that into the factors that compute it, but
* that's tricky math, and it's easier to just report the real
* CPR0 values.
*/
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
vcpu->arch.gpr[rt] = vcpu->arch.cpr0_cfgaddr;
break;
case DCRN_CPR0_CONFIG_DATA:
local_irq_disable();
mtdcr(DCRN_CPR0_CONFIG_ADDR,
vcpu->arch.cpr0_cfgaddr);
vcpu->arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA);
local_irq_enable();
break;
default:
run->dcr.dcrn = dcrn;
run->dcr.data = 0;
run->dcr.is_write = 0;
vcpu->arch.io_gpr = rt;
vcpu->arch.dcr_needed = 1;
emulated = EMULATE_DO_DCR;
}
break;
case 339: /* mfspr */
sprn = get_sprn(inst);
rt = get_rt(inst);
switch (sprn) {
case SPRN_SRR0:
vcpu->arch.gpr[rt] = vcpu->arch.srr0; break;
case SPRN_SRR1:
vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
case SPRN_MMUCR:
vcpu->arch.gpr[rt] = vcpu->arch.mmucr; break;
case SPRN_PID:
vcpu->arch.gpr[rt] = vcpu->arch.pid; break;
case SPRN_IVPR:
vcpu->arch.gpr[rt] = vcpu->arch.ivpr; break;
case SPRN_CCR0:
vcpu->arch.gpr[rt] = vcpu->arch.ccr0; break;
case SPRN_CCR1:
vcpu->arch.gpr[rt] = vcpu->arch.ccr1; break;
case SPRN_PVR:
vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
case SPRN_DEAR:
vcpu->arch.gpr[rt] = vcpu->arch.dear; break;
case SPRN_ESR:
vcpu->arch.gpr[rt] = vcpu->arch.esr; break;
case SPRN_DBCR0:
vcpu->arch.gpr[rt] = vcpu->arch.dbcr0; break;
case SPRN_DBCR1:
vcpu->arch.gpr[rt] = vcpu->arch.dbcr1; break;
/* Note: mftb and TBRL/TBWL are user-accessible, so
* the guest can always access the real TB anyways.
* In fact, we probably will never see these traps. */
case SPRN_TBWL:
vcpu->arch.gpr[rt] = mftbl(); break;
case SPRN_TBWU:
vcpu->arch.gpr[rt] = mftbu(); break;
case SPRN_SPRG0:
vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
case SPRN_SPRG1:
vcpu->arch.gpr[rt] = vcpu->arch.sprg1; break;
case SPRN_SPRG2:
vcpu->arch.gpr[rt] = vcpu->arch.sprg2; break;
case SPRN_SPRG3:
vcpu->arch.gpr[rt] = vcpu->arch.sprg3; break;
/* Note: SPRG4-7 are user-readable, so we don't get
* a trap. */
case SPRN_IVOR0:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[0]; break;
case SPRN_IVOR1:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[1]; break;
case SPRN_IVOR2:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[2]; break;
case SPRN_IVOR3:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[3]; break;
case SPRN_IVOR4:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[4]; break;
case SPRN_IVOR5:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[5]; break;
case SPRN_IVOR6:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[6]; break;
case SPRN_IVOR7:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[7]; break;
case SPRN_IVOR8:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[8]; break;
case SPRN_IVOR9:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[9]; break;
case SPRN_IVOR10:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[10]; break;
case SPRN_IVOR11:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[11]; break;
case SPRN_IVOR12:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[12]; break;
case SPRN_IVOR13:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[13]; break;
case SPRN_IVOR14:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[14]; break;
case SPRN_IVOR15:
vcpu->arch.gpr[rt] = vcpu->arch.ivor[15]; break;
default:
printk("mfspr: unknown spr %x\n", sprn);
vcpu->arch.gpr[rt] = 0;
break;
}
break;
case 407: /* sthx */
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
2, 1);
break;
case 439: /* sthux */
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
ea = vcpu->arch.gpr[rb];
if (ra)
ea += vcpu->arch.gpr[ra];
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
2, 1);
vcpu->arch.gpr[ra] = ea;
break;
case 451: /* mtdcr */
dcrn = get_dcrn(inst);
rs = get_rs(inst);
/* emulate some access in kernel */
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
vcpu->arch.cpr0_cfgaddr = vcpu->arch.gpr[rs];
break;
default:
run->dcr.dcrn = dcrn;
run->dcr.data = vcpu->arch.gpr[rs];
run->dcr.is_write = 1;
vcpu->arch.dcr_needed = 1;
emulated = EMULATE_DO_DCR;
}
break;
case 467: /* mtspr */
sprn = get_sprn(inst);
rs = get_rs(inst);
switch (sprn) {
case SPRN_SRR0:
vcpu->arch.srr0 = vcpu->arch.gpr[rs]; break;
case SPRN_SRR1:
vcpu->arch.srr1 = vcpu->arch.gpr[rs]; break;
case SPRN_MMUCR:
vcpu->arch.mmucr = vcpu->arch.gpr[rs]; break;
case SPRN_PID:
vcpu->arch.pid = vcpu->arch.gpr[rs]; break;
case SPRN_CCR0:
vcpu->arch.ccr0 = vcpu->arch.gpr[rs]; break;
case SPRN_CCR1:
vcpu->arch.ccr1 = vcpu->arch.gpr[rs]; break;
case SPRN_DEAR:
vcpu->arch.dear = vcpu->arch.gpr[rs]; break;
case SPRN_ESR:
vcpu->arch.esr = vcpu->arch.gpr[rs]; break;
case SPRN_DBCR0:
vcpu->arch.dbcr0 = vcpu->arch.gpr[rs]; break;
case SPRN_DBCR1:
vcpu->arch.dbcr1 = vcpu->arch.gpr[rs]; break;
/* XXX We need to context-switch the timebase for
* watchdog and FIT. */
case SPRN_TBWL: break;
case SPRN_TBWU: break;
case SPRN_DEC:
vcpu->arch.dec = vcpu->arch.gpr[rs];
kvmppc_emulate_dec(vcpu);
break;
case SPRN_TSR:
vcpu->arch.tsr &= ~vcpu->arch.gpr[rs]; break;
case SPRN_TCR:
vcpu->arch.tcr = vcpu->arch.gpr[rs];
kvmppc_emulate_dec(vcpu);
break;
case SPRN_SPRG0:
vcpu->arch.sprg0 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG1:
vcpu->arch.sprg1 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG2:
vcpu->arch.sprg2 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG3:
vcpu->arch.sprg3 = vcpu->arch.gpr[rs]; break;
/* Note: SPRG4-7 are user-readable. These values are
* loaded into the real SPRGs when resuming the
* guest. */
case SPRN_SPRG4:
vcpu->arch.sprg4 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG5:
vcpu->arch.sprg5 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG6:
vcpu->arch.sprg6 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG7:
vcpu->arch.sprg7 = vcpu->arch.gpr[rs]; break;
case SPRN_IVPR:
vcpu->arch.ivpr = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR0:
vcpu->arch.ivor[0] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR1:
vcpu->arch.ivor[1] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR2:
vcpu->arch.ivor[2] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR3:
vcpu->arch.ivor[3] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR4:
vcpu->arch.ivor[4] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR5:
vcpu->arch.ivor[5] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR6:
vcpu->arch.ivor[6] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR7:
vcpu->arch.ivor[7] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR8:
vcpu->arch.ivor[8] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR9:
vcpu->arch.ivor[9] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR10:
vcpu->arch.ivor[10] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR11:
vcpu->arch.ivor[11] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR12:
vcpu->arch.ivor[12] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR13:
vcpu->arch.ivor[13] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR14:
vcpu->arch.ivor[14] = vcpu->arch.gpr[rs]; break;
case SPRN_IVOR15:
vcpu->arch.ivor[15] = vcpu->arch.gpr[rs]; break;
default:
printk("mtspr: unknown spr %x\n", sprn);
emulated = EMULATE_FAIL;
break;
}
break;
case 470: /* dcbi */
/* Do nothing. The guest is performing dcbi because
* hardware DMA is not snooped by the dcache, but
* emulated DMA either goes through the dcache as
* normal writes, or the host kernel has handled dcache
* coherence. */
break;
case 534: /* lwbrx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 0);
break;
case 566: /* tlbsync */
break;
case 662: /* stwbrx */
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
4, 0);
break;
case 978: /* tlbwe */
emulated = kvmppc_emul_tlbwe(vcpu, inst);
break;
case 914: { /* tlbsx */
int index;
unsigned int as = get_mmucr_sts(vcpu);
unsigned int pid = get_mmucr_stid(vcpu);
rt = get_rt(inst);
ra = get_ra(inst);
rb = get_rb(inst);
rc = get_rc(inst);
ea = vcpu->arch.gpr[rb];
if (ra)
ea += vcpu->arch.gpr[ra];
index = kvmppc_44x_tlb_index(vcpu, ea, pid, as);
if (rc) {
if (index < 0)
vcpu->arch.cr &= ~0x20000000;
else
vcpu->arch.cr |= 0x20000000;
}
vcpu->arch.gpr[rt] = index;
}
break;
case 790: /* lhbrx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 0);
break;
case 918: /* sthbrx */
rs = get_rs(inst);
ra = get_ra(inst);
rb = get_rb(inst);
emulated = kvmppc_handle_store(run, vcpu,
vcpu->arch.gpr[rs],
2, 0);
break;
case 966: /* iccci */
break;
default:
printk("unknown: op %d xop %d\n", get_op(inst),
get_xop(inst));
emulated = EMULATE_FAIL;
break;
}
break;
case 32: /* lwz */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
break;
case 33: /* lwzu */
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
case 34: /* lbz */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
case 35: /* lbzu */
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
case 36: /* stw */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
4, 1);
break;
case 37: /* stwu */
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
4, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
case 38: /* stb */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
1, 1);
break;
case 39: /* stbu */
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
1, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
case 40: /* lhz */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
break;
case 41: /* lhzu */
ra = get_ra(inst);
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
case 44: /* sth */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
2, 1);
break;
case 45: /* sthu */
ra = get_ra(inst);
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu, vcpu->arch.gpr[rs],
2, 1);
vcpu->arch.gpr[ra] = vcpu->arch.paddr_accessed;
break;
default:
printk("unknown op %d\n", get_op(inst));
emulated = EMULATE_FAIL;
break;
}
if (advance)
vcpu->arch.pc += 4; /* Advance past emulated instruction. */
return emulated;
}

436
arch/powerpc/kvm/powerpc.c Normal file
View File

@ -0,0 +1,436 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
* Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
*/
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <asm/cputable.h>
#include <asm/uaccess.h>
#include <asm/kvm_ppc.h>
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
}
int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
{
/* XXX implement me */
return 0;
}
int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
{
return 1;
}
int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu)
{
enum emulation_result er;
int r;
er = kvmppc_emulate_instruction(run, vcpu);
switch (er) {
case EMULATE_DONE:
/* Future optimization: only reload non-volatiles if they were
* actually modified. */
r = RESUME_GUEST_NV;
break;
case EMULATE_DO_MMIO:
run->exit_reason = KVM_EXIT_MMIO;
/* We must reload nonvolatiles because "update" load/store
* instructions modify register state. */
/* Future optimization: only reload non-volatiles if they were
* actually modified. */
r = RESUME_HOST_NV;
break;
case EMULATE_FAIL:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_EMERG "%s: emulation failed (%08x)\n", __func__,
vcpu->arch.last_inst);
r = RESUME_HOST;
break;
default:
BUG();
}
return r;
}
void kvm_arch_hardware_enable(void *garbage)
{
}
void kvm_arch_hardware_disable(void *garbage)
{
}
int kvm_arch_hardware_setup(void)
{
return 0;
}
void kvm_arch_hardware_unsetup(void)
{
}
void kvm_arch_check_processor_compat(void *rtn)
{
int r;
if (strcmp(cur_cpu_spec->platform, "ppc440") == 0)
r = 0;
else
r = -ENOTSUPP;
*(int *)rtn = r;
}
struct kvm *kvm_arch_create_vm(void)
{
struct kvm *kvm;
kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
if (!kvm)
return ERR_PTR(-ENOMEM);
return kvm;
}
static void kvmppc_free_vcpus(struct kvm *kvm)
{
unsigned int i;
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
if (kvm->vcpus[i]) {
kvm_arch_vcpu_free(kvm->vcpus[i]);
kvm->vcpus[i] = NULL;
}
}
}
void kvm_arch_destroy_vm(struct kvm *kvm)
{
kvmppc_free_vcpus(kvm);
kvm_free_physmem(kvm);
kfree(kvm);
}
int kvm_dev_ioctl_check_extension(long ext)
{
int r;
switch (ext) {
case KVM_CAP_USER_MEMORY:
r = 1;
break;
default:
r = 0;
break;
}
return r;
}
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
return -EINVAL;
}
int kvm_arch_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc)
{
return 0;
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
int err;
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu) {
err = -ENOMEM;
goto out;
}
err = kvm_vcpu_init(vcpu, kvm, id);
if (err)
goto free_vcpu;
return vcpu;
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
out:
return ERR_PTR(err);
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
{
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kvm_arch_vcpu_free(vcpu);
}
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
unsigned int priority = exception_priority[BOOKE_INTERRUPT_DECREMENTER];
return test_bit(priority, &vcpu->arch.pending_exceptions);
}
static void kvmppc_decrementer_func(unsigned long data)
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_DECREMENTER);
}
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
(unsigned long)vcpu);
return 0;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
}
void decache_vcpus_on_cpu(int cpu)
{
}
int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
struct kvm_debug_guest *dbg)
{
return -ENOTSUPP;
}
static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu,
struct kvm_run *run)
{
u32 *gpr = &vcpu->arch.gpr[vcpu->arch.io_gpr];
*gpr = run->dcr.data;
}
static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu,
struct kvm_run *run)
{
u32 *gpr = &vcpu->arch.gpr[vcpu->arch.io_gpr];
if (run->mmio.len > sizeof(*gpr)) {
printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len);
return;
}
if (vcpu->arch.mmio_is_bigendian) {
switch (run->mmio.len) {
case 4: *gpr = *(u32 *)run->mmio.data; break;
case 2: *gpr = *(u16 *)run->mmio.data; break;
case 1: *gpr = *(u8 *)run->mmio.data; break;
}
} else {
/* Convert BE data from userland back to LE. */
switch (run->mmio.len) {
case 4: *gpr = ld_le32((u32 *)run->mmio.data); break;
case 2: *gpr = ld_le16((u16 *)run->mmio.data); break;
case 1: *gpr = *(u8 *)run->mmio.data; break;
}
}
}
int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int rt, unsigned int bytes, int is_bigendian)
{
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
run->mmio.len);
}
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
run->mmio.is_write = 0;
vcpu->arch.io_gpr = rt;
vcpu->arch.mmio_is_bigendian = is_bigendian;
vcpu->mmio_needed = 1;
vcpu->mmio_is_write = 0;
return EMULATE_DO_MMIO;
}
int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
u32 val, unsigned int bytes, int is_bigendian)
{
void *data = run->mmio.data;
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
run->mmio.len);
}
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
run->mmio.is_write = 1;
vcpu->mmio_needed = 1;
vcpu->mmio_is_write = 1;
/* Store the value at the lowest bytes in 'data'. */
if (is_bigendian) {
switch (bytes) {
case 4: *(u32 *)data = val; break;
case 2: *(u16 *)data = val; break;
case 1: *(u8 *)data = val; break;
}
} else {
/* Store LE value into 'data'. */
switch (bytes) {
case 4: st_le32(data, val); break;
case 2: st_le16(data, val); break;
case 1: *(u8 *)data = val; break;
}
}
return EMULATE_DO_MMIO;
}
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
int r;
sigset_t sigsaved;
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
if (vcpu->mmio_needed) {
if (!vcpu->mmio_is_write)
kvmppc_complete_mmio_load(vcpu, run);
vcpu->mmio_needed = 0;
} else if (vcpu->arch.dcr_needed) {
if (!vcpu->arch.dcr_is_write)
kvmppc_complete_dcr_load(vcpu, run);
vcpu->arch.dcr_needed = 0;
}
kvmppc_check_and_deliver_interrupts(vcpu);
local_irq_disable();
kvm_guest_enter();
r = __kvmppc_vcpu_run(run, vcpu);
kvm_guest_exit();
local_irq_enable();
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &sigsaved, NULL);
return r;
}
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
{
kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_EXTERNAL);
return 0;
}
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
return -EINVAL;
}
int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
return -EINVAL;
}
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
struct kvm_vcpu *vcpu = filp->private_data;
void __user *argp = (void __user *)arg;
long r;
switch (ioctl) {
case KVM_INTERRUPT: {
struct kvm_interrupt irq;
r = -EFAULT;
if (copy_from_user(&irq, argp, sizeof(irq)))
goto out;
r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
break;
}
default:
r = -EINVAL;
}
out:
return r;
}
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
return -ENOTSUPP;
}
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
long r;
switch (ioctl) {
default:
r = -EINVAL;
}
return r;
}
int kvm_arch_init(void *opaque)
{
return 0;
}
void kvm_arch_exit(void)
{
}

View File

@ -62,6 +62,10 @@ config GENERIC_LOCKBREAK
default y
depends on SMP && PREEMPT
config PGSTE
bool
default y if KVM
mainmenu "Linux Kernel Configuration"
config S390
@ -69,6 +73,7 @@ config S390
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_KVM if 64BIT
source "init/Kconfig"
@ -515,6 +520,13 @@ config ZFCPDUMP
Select this option if you want to build an zfcpdump enabled kernel.
Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this.
config S390_GUEST
bool "s390 guest support (EXPERIMENTAL)"
depends on 64BIT && EXPERIMENTAL
select VIRTIO
select VIRTIO_RING
help
Select this option if you want to run the kernel under s390 linux
endmenu
source "net/Kconfig"
@ -536,3 +548,5 @@ source "security/Kconfig"
source "crypto/Kconfig"
source "lib/Kconfig"
source "arch/s390/kvm/Kconfig"

View File

@ -87,7 +87,7 @@ LDFLAGS_vmlinux := -e start
head-y := arch/s390/kernel/head.o arch/s390/kernel/init_task.o
core-y += arch/s390/mm/ arch/s390/kernel/ arch/s390/crypto/ \
arch/s390/appldata/ arch/s390/hypfs/
arch/s390/appldata/ arch/s390/hypfs/ arch/s390/kvm/
libs-y += arch/s390/lib/
drivers-y += drivers/s390/
drivers-$(CONFIG_MATHEMU) += arch/s390/math-emu/

View File

@ -144,6 +144,10 @@ static noinline __init void detect_machine_type(void)
/* Running on a P/390 ? */
if (cpuinfo->cpu_id.machine == 0x7490)
machine_flags |= 4;
/* Running under KVM ? */
if (cpuinfo->cpu_id.version == 0xfe)
machine_flags |= 64;
}
#ifdef CONFIG_64BIT

View File

@ -316,7 +316,11 @@ static int __init early_parse_ipldelay(char *p)
early_param("ipldelay", early_parse_ipldelay);
#ifdef CONFIG_S390_SWITCH_AMODE
#ifdef CONFIG_PGSTE
unsigned int switch_amode = 1;
#else
unsigned int switch_amode = 0;
#endif
EXPORT_SYMBOL_GPL(switch_amode);
static void set_amode_and_uaccess(unsigned long user_amode,
@ -797,9 +801,13 @@ setup_arch(char **cmdline_p)
"This machine has an IEEE fpu\n" :
"This machine has no IEEE fpu\n");
#else /* CONFIG_64BIT */
printk((MACHINE_IS_VM) ?
"We are running under VM (64 bit mode)\n" :
"We are running native (64 bit mode)\n");
if (MACHINE_IS_VM)
printk("We are running under VM (64 bit mode)\n");
else if (MACHINE_IS_KVM) {
printk("We are running under KVM (64 bit mode)\n");
add_preferred_console("ttyS", 1, NULL);
} else
printk("We are running native (64 bit mode)\n");
#endif /* CONFIG_64BIT */
/* Save unparsed command line copy for /proc/cmdline */

View File

@ -110,6 +110,7 @@ void account_system_vtime(struct task_struct *tsk)
S390_lowcore.steal_clock -= cputime << 12;
account_system_time(tsk, 0, cputime);
}
EXPORT_SYMBOL_GPL(account_system_vtime);
static inline void set_vtimer(__u64 expires)
{

46
arch/s390/kvm/Kconfig Normal file
View File

@ -0,0 +1,46 @@
#
# KVM configuration
#
config HAVE_KVM
bool
menuconfig VIRTUALIZATION
bool "Virtualization"
default y
---help---
Say Y here to get to see options for using your Linux host to run other
operating systems inside virtual machines (guests).
This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and disabled.
if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
select PREEMPT_NOTIFIERS
select ANON_INODES
select S390_SWITCH_AMODE
select PREEMPT
---help---
Support hosting paravirtualized guest machines using the SIE
virtualization capability on the mainframe. This should work
on any 64bit machine.
This module provides access to the hardware capabilities through
a character device node named /dev/kvm.
To compile this as a module, choose M here: the module
will be called kvm.
If unsure, say N.
config KVM_TRACE
bool
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/virtio/Kconfig
endif # VIRTUALIZATION

14
arch/s390/kvm/Makefile Normal file
View File

@ -0,0 +1,14 @@
# Makefile for kernel virtual machines on s390
#
# Copyright IBM Corp. 2008
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License (version 2 only)
# as published by the Free Software Foundation.
common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o sigp.o diag.o
obj-$(CONFIG_KVM) += kvm.o

67
arch/s390/kvm/diag.c Normal file
View File

@ -0,0 +1,67 @@
/*
* diag.c - handling diagnose instructions
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include "kvm-s390.h"
static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
{
VCPU_EVENT(vcpu, 5, "%s", "diag time slice end");
vcpu->stat.diagnose_44++;
vcpu_put(vcpu);
schedule();
vcpu_load(vcpu);
return 0;
}
static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
{
unsigned int reg = vcpu->arch.sie_block->ipa & 0xf;
unsigned long subcode = vcpu->arch.guest_gprs[reg] & 0xffff;
VCPU_EVENT(vcpu, 5, "diag ipl functions, subcode %lx", subcode);
switch (subcode) {
case 3:
vcpu->run->s390_reset_flags = KVM_S390_RESET_CLEAR;
break;
case 4:
vcpu->run->s390_reset_flags = 0;
break;
default:
return -ENOTSUPP;
}
atomic_clear_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
vcpu->run->s390_reset_flags |= KVM_S390_RESET_SUBSYSTEM;
vcpu->run->s390_reset_flags |= KVM_S390_RESET_IPL;
vcpu->run->s390_reset_flags |= KVM_S390_RESET_CPU_INIT;
vcpu->run->exit_reason = KVM_EXIT_S390_RESET;
VCPU_EVENT(vcpu, 3, "requesting userspace resets %lx",
vcpu->run->s390_reset_flags);
return -EREMOTE;
}
int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
{
int code = (vcpu->arch.sie_block->ipb & 0xfff0000) >> 16;
switch (code) {
case 0x44:
return __diag_time_slice_end(vcpu);
case 0x308:
return __diag_ipl_functions(vcpu);
default:
return -ENOTSUPP;
}
}

274
arch/s390/kvm/gaccess.h Normal file
View File

@ -0,0 +1,274 @@
/*
* gaccess.h - access guest memory
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
*/
#ifndef __KVM_S390_GACCESS_H
#define __KVM_S390_GACCESS_H
#include <linux/compiler.h>
#include <linux/kvm_host.h>
#include <asm/uaccess.h>
static inline void __user *__guestaddr_to_user(struct kvm_vcpu *vcpu,
u64 guestaddr)
{
u64 prefix = vcpu->arch.sie_block->prefix;
u64 origin = vcpu->kvm->arch.guest_origin;
u64 memsize = vcpu->kvm->arch.guest_memsize;
if (guestaddr < 2 * PAGE_SIZE)
guestaddr += prefix;
else if ((guestaddr >= prefix) && (guestaddr < prefix + 2 * PAGE_SIZE))
guestaddr -= prefix;
if (guestaddr > memsize)
return (void __user __force *) ERR_PTR(-EFAULT);
guestaddr += origin;
return (void __user *) guestaddr;
}
static inline int get_guest_u64(struct kvm_vcpu *vcpu, u64 guestaddr,
u64 *result)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 7);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return get_user(*result, (u64 __user *) uptr);
}
static inline int get_guest_u32(struct kvm_vcpu *vcpu, u64 guestaddr,
u32 *result)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 3);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return get_user(*result, (u32 __user *) uptr);
}
static inline int get_guest_u16(struct kvm_vcpu *vcpu, u64 guestaddr,
u16 *result)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 1);
if (IS_ERR(uptr))
return PTR_ERR(uptr);
return get_user(*result, (u16 __user *) uptr);
}
static inline int get_guest_u8(struct kvm_vcpu *vcpu, u64 guestaddr,
u8 *result)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return get_user(*result, (u8 __user *) uptr);
}
static inline int put_guest_u64(struct kvm_vcpu *vcpu, u64 guestaddr,
u64 value)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 7);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return put_user(value, (u64 __user *) uptr);
}
static inline int put_guest_u32(struct kvm_vcpu *vcpu, u64 guestaddr,
u32 value)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 3);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return put_user(value, (u32 __user *) uptr);
}
static inline int put_guest_u16(struct kvm_vcpu *vcpu, u64 guestaddr,
u16 value)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
BUG_ON(guestaddr & 1);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return put_user(value, (u16 __user *) uptr);
}
static inline int put_guest_u8(struct kvm_vcpu *vcpu, u64 guestaddr,
u8 value)
{
void __user *uptr = __guestaddr_to_user(vcpu, guestaddr);
if (IS_ERR((void __force *) uptr))
return PTR_ERR((void __force *) uptr);
return put_user(value, (u8 __user *) uptr);
}
static inline int __copy_to_guest_slow(struct kvm_vcpu *vcpu, u64 guestdest,
const void *from, unsigned long n)
{
int rc;
unsigned long i;
const u8 *data = from;
for (i = 0; i < n; i++) {
rc = put_guest_u8(vcpu, guestdest++, *(data++));
if (rc < 0)
return rc;
}
return 0;
}
static inline int copy_to_guest(struct kvm_vcpu *vcpu, u64 guestdest,
const void *from, unsigned long n)
{
u64 prefix = vcpu->arch.sie_block->prefix;
u64 origin = vcpu->kvm->arch.guest_origin;
u64 memsize = vcpu->kvm->arch.guest_memsize;
if ((guestdest < 2 * PAGE_SIZE) && (guestdest + n > 2 * PAGE_SIZE))
goto slowpath;
if ((guestdest < prefix) && (guestdest + n > prefix))
goto slowpath;
if ((guestdest < prefix + 2 * PAGE_SIZE)
&& (guestdest + n > prefix + 2 * PAGE_SIZE))
goto slowpath;
if (guestdest < 2 * PAGE_SIZE)
guestdest += prefix;
else if ((guestdest >= prefix) && (guestdest < prefix + 2 * PAGE_SIZE))
guestdest -= prefix;
if (guestdest + n > memsize)
return -EFAULT;
if (guestdest + n < guestdest)
return -EFAULT;
guestdest += origin;
return copy_to_user((void __user *) guestdest, from, n);
slowpath:
return __copy_to_guest_slow(vcpu, guestdest, from, n);
}
static inline int __copy_from_guest_slow(struct kvm_vcpu *vcpu, void *to,
u64 guestsrc, unsigned long n)
{
int rc;
unsigned long i;
u8 *data = to;
for (i = 0; i < n; i++) {
rc = get_guest_u8(vcpu, guestsrc++, data++);
if (rc < 0)
return rc;
}
return 0;
}
static inline int copy_from_guest(struct kvm_vcpu *vcpu, void *to,
u64 guestsrc, unsigned long n)
{
u64 prefix = vcpu->arch.sie_block->prefix;
u64 origin = vcpu->kvm->arch.guest_origin;
u64 memsize = vcpu->kvm->arch.guest_memsize;
if ((guestsrc < 2 * PAGE_SIZE) && (guestsrc + n > 2 * PAGE_SIZE))
goto slowpath;
if ((guestsrc < prefix) && (guestsrc + n > prefix))
goto slowpath;
if ((guestsrc < prefix + 2 * PAGE_SIZE)
&& (guestsrc + n > prefix + 2 * PAGE_SIZE))
goto slowpath;
if (guestsrc < 2 * PAGE_SIZE)
guestsrc += prefix;
else if ((guestsrc >= prefix) && (guestsrc < prefix + 2 * PAGE_SIZE))
guestsrc -= prefix;
if (guestsrc + n > memsize)
return -EFAULT;
if (guestsrc + n < guestsrc)
return -EFAULT;
guestsrc += origin;
return copy_from_user(to, (void __user *) guestsrc, n);
slowpath:
return __copy_from_guest_slow(vcpu, to, guestsrc, n);
}
static inline int copy_to_guest_absolute(struct kvm_vcpu *vcpu, u64 guestdest,
const void *from, unsigned long n)
{
u64 origin = vcpu->kvm->arch.guest_origin;
u64 memsize = vcpu->kvm->arch.guest_memsize;
if (guestdest + n > memsize)
return -EFAULT;
if (guestdest + n < guestdest)
return -EFAULT;
guestdest += origin;
return copy_to_user((void __user *) guestdest, from, n);
}
static inline int copy_from_guest_absolute(struct kvm_vcpu *vcpu, void *to,
u64 guestsrc, unsigned long n)
{
u64 origin = vcpu->kvm->arch.guest_origin;
u64 memsize = vcpu->kvm->arch.guest_memsize;
if (guestsrc + n > memsize)
return -EFAULT;
if (guestsrc + n < guestsrc)
return -EFAULT;
guestsrc += origin;
return copy_from_user(to, (void __user *) guestsrc, n);
}
#endif

216
arch/s390/kvm/intercept.c Normal file
View File

@ -0,0 +1,216 @@
/*
* intercept.c - in-kernel handling for sie intercepts
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <linux/kvm_host.h>
#include <linux/errno.h>
#include <linux/pagemap.h>
#include <asm/kvm_host.h>
#include "kvm-s390.h"
#include "gaccess.h"
static int handle_lctg(struct kvm_vcpu *vcpu)
{
int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16) +
((vcpu->arch.sie_block->ipb & 0xff00) << 4);
u64 useraddr;
int reg, rc;
vcpu->stat.instruction_lctg++;
if ((vcpu->arch.sie_block->ipb & 0xff) != 0x2f)
return -ENOTSUPP;
useraddr = disp2;
if (base2)
useraddr += vcpu->arch.guest_gprs[base2];
reg = reg1;
VCPU_EVENT(vcpu, 5, "lctg r1:%x, r3:%x,b2:%x,d2:%x", reg1, reg3, base2,
disp2);
do {
rc = get_guest_u64(vcpu, useraddr,
&vcpu->arch.sie_block->gcr[reg]);
if (rc == -EFAULT) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
break;
}
useraddr += 8;
if (reg == reg3)
break;
reg = (reg + 1) % 16;
} while (1);
return 0;
}
static int handle_lctl(struct kvm_vcpu *vcpu)
{
int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 useraddr;
u32 val = 0;
int reg, rc;
vcpu->stat.instruction_lctl++;
useraddr = disp2;
if (base2)
useraddr += vcpu->arch.guest_gprs[base2];
VCPU_EVENT(vcpu, 5, "lctl r1:%x, r3:%x,b2:%x,d2:%x", reg1, reg3, base2,
disp2);
reg = reg1;
do {
rc = get_guest_u32(vcpu, useraddr, &val);
if (rc == -EFAULT) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
break;
}
vcpu->arch.sie_block->gcr[reg] &= 0xffffffff00000000ul;
vcpu->arch.sie_block->gcr[reg] |= val;
useraddr += 4;
if (reg == reg3)
break;
reg = (reg + 1) % 16;
} while (1);
return 0;
}
static intercept_handler_t instruction_handlers[256] = {
[0x83] = kvm_s390_handle_diag,
[0xae] = kvm_s390_handle_sigp,
[0xb2] = kvm_s390_handle_priv,
[0xb7] = handle_lctl,
[0xeb] = handle_lctg,
};
static int handle_noop(struct kvm_vcpu *vcpu)
{
switch (vcpu->arch.sie_block->icptcode) {
case 0x10:
vcpu->stat.exit_external_request++;
break;
case 0x14:
vcpu->stat.exit_external_interrupt++;
break;
default:
break; /* nothing */
}
return 0;
}
static int handle_stop(struct kvm_vcpu *vcpu)
{
int rc;
vcpu->stat.exit_stop_request++;
atomic_clear_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
spin_lock_bh(&vcpu->arch.local_int.lock);
if (vcpu->arch.local_int.action_bits & ACTION_STORE_ON_STOP) {
vcpu->arch.local_int.action_bits &= ~ACTION_STORE_ON_STOP;
rc = __kvm_s390_vcpu_store_status(vcpu,
KVM_S390_STORE_STATUS_NOADDR);
if (rc >= 0)
rc = -ENOTSUPP;
}
if (vcpu->arch.local_int.action_bits & ACTION_STOP_ON_STOP) {
vcpu->arch.local_int.action_bits &= ~ACTION_STOP_ON_STOP;
VCPU_EVENT(vcpu, 3, "%s", "cpu stopped");
rc = -ENOTSUPP;
} else
rc = 0;
spin_unlock_bh(&vcpu->arch.local_int.lock);
return rc;
}
static int handle_validity(struct kvm_vcpu *vcpu)
{
int viwhy = vcpu->arch.sie_block->ipb >> 16;
vcpu->stat.exit_validity++;
if (viwhy == 0x37) {
fault_in_pages_writeable((char __user *)
vcpu->kvm->arch.guest_origin +
vcpu->arch.sie_block->prefix,
PAGE_SIZE);
return 0;
}
VCPU_EVENT(vcpu, 2, "unhandled validity intercept code %d",
viwhy);
return -ENOTSUPP;
}
static int handle_instruction(struct kvm_vcpu *vcpu)
{
intercept_handler_t handler;
vcpu->stat.exit_instruction++;
handler = instruction_handlers[vcpu->arch.sie_block->ipa >> 8];
if (handler)
return handler(vcpu);
return -ENOTSUPP;
}
static int handle_prog(struct kvm_vcpu *vcpu)
{
vcpu->stat.exit_program_interruption++;
return kvm_s390_inject_program_int(vcpu, vcpu->arch.sie_block->iprcc);
}
static int handle_instruction_and_prog(struct kvm_vcpu *vcpu)
{
int rc, rc2;
vcpu->stat.exit_instr_and_program++;
rc = handle_instruction(vcpu);
rc2 = handle_prog(vcpu);
if (rc == -ENOTSUPP)
vcpu->arch.sie_block->icptcode = 0x04;
if (rc)
return rc;
return rc2;
}
static const intercept_handler_t intercept_funcs[0x48 >> 2] = {
[0x00 >> 2] = handle_noop,
[0x04 >> 2] = handle_instruction,
[0x08 >> 2] = handle_prog,
[0x0C >> 2] = handle_instruction_and_prog,
[0x10 >> 2] = handle_noop,
[0x14 >> 2] = handle_noop,
[0x1C >> 2] = kvm_s390_handle_wait,
[0x20 >> 2] = handle_validity,
[0x28 >> 2] = handle_stop,
};
int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
{
intercept_handler_t func;
u8 code = vcpu->arch.sie_block->icptcode;
if (code & 3 || code > 0x48)
return -ENOTSUPP;
func = intercept_funcs[code >> 2];
if (func)
return func(vcpu);
return -ENOTSUPP;
}

592
arch/s390/kvm/interrupt.c Normal file
View File

@ -0,0 +1,592 @@
/*
* interrupt.c - handling kvm guest interrupts
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
*/
#include <asm/lowcore.h>
#include <asm/uaccess.h>
#include <linux/kvm_host.h>
#include "kvm-s390.h"
#include "gaccess.h"
static int psw_extint_disabled(struct kvm_vcpu *vcpu)
{
return !(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_EXT);
}
static int psw_interrupts_disabled(struct kvm_vcpu *vcpu)
{
if ((vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PER) ||
(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_IO) ||
(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_EXT))
return 0;
return 1;
}
static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
struct interrupt_info *inti)
{
switch (inti->type) {
case KVM_S390_INT_EMERGENCY:
if (psw_extint_disabled(vcpu))
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x4000ul)
return 1;
return 0;
case KVM_S390_INT_SERVICE:
if (psw_extint_disabled(vcpu))
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x200ul)
return 1;
return 0;
case KVM_S390_INT_VIRTIO:
if (psw_extint_disabled(vcpu))
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x200ul)
return 1;
return 0;
case KVM_S390_PROGRAM_INT:
case KVM_S390_SIGP_STOP:
case KVM_S390_SIGP_SET_PREFIX:
case KVM_S390_RESTART:
return 1;
default:
BUG();
}
return 0;
}
static void __set_cpu_idle(struct kvm_vcpu *vcpu)
{
BUG_ON(vcpu->vcpu_id > KVM_MAX_VCPUS - 1);
atomic_set_mask(CPUSTAT_WAIT, &vcpu->arch.sie_block->cpuflags);
set_bit(vcpu->vcpu_id, vcpu->arch.local_int.float_int->idle_mask);
}
static void __unset_cpu_idle(struct kvm_vcpu *vcpu)
{
BUG_ON(vcpu->vcpu_id > KVM_MAX_VCPUS - 1);
atomic_clear_mask(CPUSTAT_WAIT, &vcpu->arch.sie_block->cpuflags);
clear_bit(vcpu->vcpu_id, vcpu->arch.local_int.float_int->idle_mask);
}
static void __reset_intercept_indicators(struct kvm_vcpu *vcpu)
{
atomic_clear_mask(CPUSTAT_ECALL_PEND |
CPUSTAT_IO_INT | CPUSTAT_EXT_INT | CPUSTAT_STOP_INT,
&vcpu->arch.sie_block->cpuflags);
vcpu->arch.sie_block->lctl = 0x0000;
}
static void __set_cpuflag(struct kvm_vcpu *vcpu, u32 flag)
{
atomic_set_mask(flag, &vcpu->arch.sie_block->cpuflags);
}
static void __set_intercept_indicator(struct kvm_vcpu *vcpu,
struct interrupt_info *inti)
{
switch (inti->type) {
case KVM_S390_INT_EMERGENCY:
case KVM_S390_INT_SERVICE:
case KVM_S390_INT_VIRTIO:
if (psw_extint_disabled(vcpu))
__set_cpuflag(vcpu, CPUSTAT_EXT_INT);
else
vcpu->arch.sie_block->lctl |= LCTL_CR0;
break;
case KVM_S390_SIGP_STOP:
__set_cpuflag(vcpu, CPUSTAT_STOP_INT);
break;
default:
BUG();
}
}
static void __do_deliver_interrupt(struct kvm_vcpu *vcpu,
struct interrupt_info *inti)
{
const unsigned short table[] = { 2, 4, 4, 6 };
int rc, exception = 0;
switch (inti->type) {
case KVM_S390_INT_EMERGENCY:
VCPU_EVENT(vcpu, 4, "%s", "interrupt: sigp emerg");
vcpu->stat.deliver_emergency_signal++;
rc = put_guest_u16(vcpu, __LC_EXT_INT_CODE, 0x1201);
if (rc == -EFAULT)
exception = 1;
rc = copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
&vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
__LC_EXT_NEW_PSW, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
break;
case KVM_S390_INT_SERVICE:
VCPU_EVENT(vcpu, 4, "interrupt: sclp parm:%x",
inti->ext.ext_params);
vcpu->stat.deliver_service_signal++;
rc = put_guest_u16(vcpu, __LC_EXT_INT_CODE, 0x2401);
if (rc == -EFAULT)
exception = 1;
rc = copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
&vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
__LC_EXT_NEW_PSW, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = put_guest_u32(vcpu, __LC_EXT_PARAMS, inti->ext.ext_params);
if (rc == -EFAULT)
exception = 1;
break;
case KVM_S390_INT_VIRTIO:
VCPU_EVENT(vcpu, 4, "interrupt: virtio parm:%x,parm64:%lx",
inti->ext.ext_params, inti->ext.ext_params2);
vcpu->stat.deliver_virtio_interrupt++;
rc = put_guest_u16(vcpu, __LC_EXT_INT_CODE, 0x2603);
if (rc == -EFAULT)
exception = 1;
rc = put_guest_u16(vcpu, __LC_CPU_ADDRESS, 0x0d00);
if (rc == -EFAULT)
exception = 1;
rc = copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
&vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
__LC_EXT_NEW_PSW, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = put_guest_u32(vcpu, __LC_EXT_PARAMS, inti->ext.ext_params);
if (rc == -EFAULT)
exception = 1;
rc = put_guest_u64(vcpu, __LC_PFAULT_INTPARM,
inti->ext.ext_params2);
if (rc == -EFAULT)
exception = 1;
break;
case KVM_S390_SIGP_STOP:
VCPU_EVENT(vcpu, 4, "%s", "interrupt: cpu stop");
vcpu->stat.deliver_stop_signal++;
__set_intercept_indicator(vcpu, inti);
break;
case KVM_S390_SIGP_SET_PREFIX:
VCPU_EVENT(vcpu, 4, "interrupt: set prefix to %x",
inti->prefix.address);
vcpu->stat.deliver_prefix_signal++;
vcpu->arch.sie_block->prefix = inti->prefix.address;
vcpu->arch.sie_block->ihcpu = 0xffff;
break;
case KVM_S390_RESTART:
VCPU_EVENT(vcpu, 4, "%s", "interrupt: cpu restart");
vcpu->stat.deliver_restart_signal++;
rc = copy_to_guest(vcpu, offsetof(struct _lowcore,
restart_old_psw), &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
offsetof(struct _lowcore, restart_psw), sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
break;
case KVM_S390_PROGRAM_INT:
VCPU_EVENT(vcpu, 4, "interrupt: pgm check code:%x, ilc:%x",
inti->pgm.code,
table[vcpu->arch.sie_block->ipa >> 14]);
vcpu->stat.deliver_program_int++;
rc = put_guest_u16(vcpu, __LC_PGM_INT_CODE, inti->pgm.code);
if (rc == -EFAULT)
exception = 1;
rc = put_guest_u16(vcpu, __LC_PGM_ILC,
table[vcpu->arch.sie_block->ipa >> 14]);
if (rc == -EFAULT)
exception = 1;
rc = copy_to_guest(vcpu, __LC_PGM_OLD_PSW,
&vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
__LC_PGM_NEW_PSW, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
break;
default:
BUG();
}
if (exception) {
VCPU_EVENT(vcpu, 1, "%s", "program exception while delivering"
" interrupt");
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
if (inti->type == KVM_S390_PROGRAM_INT) {
printk(KERN_WARNING "kvm: recursive program check\n");
BUG();
}
}
}
static int __try_deliver_ckc_interrupt(struct kvm_vcpu *vcpu)
{
int rc, exception = 0;
if (psw_extint_disabled(vcpu))
return 0;
if (!(vcpu->arch.sie_block->gcr[0] & 0x800ul))
return 0;
rc = put_guest_u16(vcpu, __LC_EXT_INT_CODE, 0x1004);
if (rc == -EFAULT)
exception = 1;
rc = copy_to_guest(vcpu, __LC_EXT_OLD_PSW,
&vcpu->arch.sie_block->gpsw, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
rc = copy_from_guest(vcpu, &vcpu->arch.sie_block->gpsw,
__LC_EXT_NEW_PSW, sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
if (exception) {
VCPU_EVENT(vcpu, 1, "%s", "program exception while delivering" \
" ckc interrupt");
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
return 0;
}
return 1;
}
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
{
struct local_interrupt *li = &vcpu->arch.local_int;
struct float_interrupt *fi = vcpu->arch.local_int.float_int;
struct interrupt_info *inti;
int rc = 0;
if (atomic_read(&li->active)) {
spin_lock_bh(&li->lock);
list_for_each_entry(inti, &li->list, list)
if (__interrupt_is_deliverable(vcpu, inti)) {
rc = 1;
break;
}
spin_unlock_bh(&li->lock);
}
if ((!rc) && atomic_read(&fi->active)) {
spin_lock_bh(&fi->lock);
list_for_each_entry(inti, &fi->list, list)
if (__interrupt_is_deliverable(vcpu, inti)) {
rc = 1;
break;
}
spin_unlock_bh(&fi->lock);
}
if ((!rc) && (vcpu->arch.sie_block->ckc <
get_clock() + vcpu->arch.sie_block->epoch)) {
if ((!psw_extint_disabled(vcpu)) &&
(vcpu->arch.sie_block->gcr[0] & 0x800ul))
rc = 1;
}
return rc;
}
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
return 0;
}
int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
{
u64 now, sltime;
DECLARE_WAITQUEUE(wait, current);
vcpu->stat.exit_wait_state++;
if (kvm_cpu_has_interrupt(vcpu))
return 0;
if (psw_interrupts_disabled(vcpu)) {
VCPU_EVENT(vcpu, 3, "%s", "disabled wait");
__unset_cpu_idle(vcpu);
return -ENOTSUPP; /* disabled wait */
}
if (psw_extint_disabled(vcpu) ||
(!(vcpu->arch.sie_block->gcr[0] & 0x800ul))) {
VCPU_EVENT(vcpu, 3, "%s", "enabled wait w/o timer");
goto no_timer;
}
now = get_clock() + vcpu->arch.sie_block->epoch;
if (vcpu->arch.sie_block->ckc < now) {
__unset_cpu_idle(vcpu);
return 0;
}
sltime = (vcpu->arch.sie_block->ckc - now) / (0xf4240000ul / HZ) + 1;
vcpu->arch.ckc_timer.expires = jiffies + sltime;
add_timer(&vcpu->arch.ckc_timer);
VCPU_EVENT(vcpu, 5, "enabled wait timer:%lx jiffies", sltime);
no_timer:
spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
spin_lock_bh(&vcpu->arch.local_int.lock);
__set_cpu_idle(vcpu);
vcpu->arch.local_int.timer_due = 0;
add_wait_queue(&vcpu->arch.local_int.wq, &wait);
while (list_empty(&vcpu->arch.local_int.list) &&
list_empty(&vcpu->arch.local_int.float_int->list) &&
(!vcpu->arch.local_int.timer_due) &&
!signal_pending(current)) {
set_current_state(TASK_INTERRUPTIBLE);
spin_unlock_bh(&vcpu->arch.local_int.lock);
spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
vcpu_put(vcpu);
schedule();
vcpu_load(vcpu);
spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
spin_lock_bh(&vcpu->arch.local_int.lock);
}
__unset_cpu_idle(vcpu);
__set_current_state(TASK_RUNNING);
remove_wait_queue(&vcpu->wq, &wait);
spin_unlock_bh(&vcpu->arch.local_int.lock);
spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
del_timer(&vcpu->arch.ckc_timer);
return 0;
}
void kvm_s390_idle_wakeup(unsigned long data)
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
spin_lock_bh(&vcpu->arch.local_int.lock);
vcpu->arch.local_int.timer_due = 1;
if (waitqueue_active(&vcpu->arch.local_int.wq))
wake_up_interruptible(&vcpu->arch.local_int.wq);
spin_unlock_bh(&vcpu->arch.local_int.lock);
}
void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
{
struct local_interrupt *li = &vcpu->arch.local_int;
struct float_interrupt *fi = vcpu->arch.local_int.float_int;
struct interrupt_info *n, *inti = NULL;
int deliver;
__reset_intercept_indicators(vcpu);
if (atomic_read(&li->active)) {
do {
deliver = 0;
spin_lock_bh(&li->lock);
list_for_each_entry_safe(inti, n, &li->list, list) {
if (__interrupt_is_deliverable(vcpu, inti)) {
list_del(&inti->list);
deliver = 1;
break;
}
__set_intercept_indicator(vcpu, inti);
}
if (list_empty(&li->list))
atomic_set(&li->active, 0);
spin_unlock_bh(&li->lock);
if (deliver) {
__do_deliver_interrupt(vcpu, inti);
kfree(inti);
}
} while (deliver);
}
if ((vcpu->arch.sie_block->ckc <
get_clock() + vcpu->arch.sie_block->epoch))
__try_deliver_ckc_interrupt(vcpu);
if (atomic_read(&fi->active)) {
do {
deliver = 0;
spin_lock_bh(&fi->lock);
list_for_each_entry_safe(inti, n, &fi->list, list) {
if (__interrupt_is_deliverable(vcpu, inti)) {
list_del(&inti->list);
deliver = 1;
break;
}
__set_intercept_indicator(vcpu, inti);
}
if (list_empty(&fi->list))
atomic_set(&fi->active, 0);
spin_unlock_bh(&fi->lock);
if (deliver) {
__do_deliver_interrupt(vcpu, inti);
kfree(inti);
}
} while (deliver);
}
}
int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code)
{
struct local_interrupt *li = &vcpu->arch.local_int;
struct interrupt_info *inti;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return -ENOMEM;
inti->type = KVM_S390_PROGRAM_INT;;
inti->pgm.code = code;
VCPU_EVENT(vcpu, 3, "inject: program check %d (from kernel)", code);
spin_lock_bh(&li->lock);
list_add(&inti->list, &li->list);
atomic_set(&li->active, 1);
BUG_ON(waitqueue_active(&li->wq));
spin_unlock_bh(&li->lock);
return 0;
}
int kvm_s390_inject_vm(struct kvm *kvm,
struct kvm_s390_interrupt *s390int)
{
struct local_interrupt *li;
struct float_interrupt *fi;
struct interrupt_info *inti;
int sigcpu;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return -ENOMEM;
switch (s390int->type) {
case KVM_S390_INT_VIRTIO:
VM_EVENT(kvm, 5, "inject: virtio parm:%x,parm64:%lx",
s390int->parm, s390int->parm64);
inti->type = s390int->type;
inti->ext.ext_params = s390int->parm;
inti->ext.ext_params2 = s390int->parm64;
break;
case KVM_S390_INT_SERVICE:
VM_EVENT(kvm, 5, "inject: sclp parm:%x", s390int->parm);
inti->type = s390int->type;
inti->ext.ext_params = s390int->parm;
break;
case KVM_S390_PROGRAM_INT:
case KVM_S390_SIGP_STOP:
case KVM_S390_INT_EMERGENCY:
default:
kfree(inti);
return -EINVAL;
}
mutex_lock(&kvm->lock);
fi = &kvm->arch.float_int;
spin_lock_bh(&fi->lock);
list_add_tail(&inti->list, &fi->list);
atomic_set(&fi->active, 1);
sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
if (sigcpu == KVM_MAX_VCPUS) {
do {
sigcpu = fi->next_rr_cpu++;
if (sigcpu == KVM_MAX_VCPUS)
sigcpu = fi->next_rr_cpu = 0;
} while (fi->local_int[sigcpu] == NULL);
}
li = fi->local_int[sigcpu];
spin_lock_bh(&li->lock);
atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
if (waitqueue_active(&li->wq))
wake_up_interruptible(&li->wq);
spin_unlock_bh(&li->lock);
spin_unlock_bh(&fi->lock);
mutex_unlock(&kvm->lock);
return 0;
}
int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_interrupt *s390int)
{
struct local_interrupt *li;
struct interrupt_info *inti;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return -ENOMEM;
switch (s390int->type) {
case KVM_S390_PROGRAM_INT:
if (s390int->parm & 0xffff0000) {
kfree(inti);
return -EINVAL;
}
inti->type = s390int->type;
inti->pgm.code = s390int->parm;
VCPU_EVENT(vcpu, 3, "inject: program check %d (from user)",
s390int->parm);
break;
case KVM_S390_SIGP_STOP:
case KVM_S390_RESTART:
case KVM_S390_SIGP_SET_PREFIX:
case KVM_S390_INT_EMERGENCY:
VCPU_EVENT(vcpu, 3, "inject: type %x", s390int->type);
inti->type = s390int->type;
break;
case KVM_S390_INT_VIRTIO:
case KVM_S390_INT_SERVICE:
default:
kfree(inti);
return -EINVAL;
}
mutex_lock(&vcpu->kvm->lock);
li = &vcpu->arch.local_int;
spin_lock_bh(&li->lock);
if (inti->type == KVM_S390_PROGRAM_INT)
list_add(&inti->list, &li->list);
else
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
if (inti->type == KVM_S390_SIGP_STOP)
li->action_bits |= ACTION_STOP_ON_STOP;
atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
if (waitqueue_active(&li->wq))
wake_up_interruptible(&vcpu->arch.local_int.wq);
spin_unlock_bh(&li->lock);
mutex_unlock(&vcpu->kvm->lock);
return 0;
}

685
arch/s390/kvm/kvm-s390.c Normal file
View File

@ -0,0 +1,685 @@
/*
* s390host.c -- hosting zSeries kernel virtual machines
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
* Heiko Carstens <heiko.carstens@de.ibm.com>
*/
#include <linux/compiler.h>
#include <linux/err.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/timer.h>
#include <asm/lowcore.h>
#include <asm/pgtable.h>
#include "kvm-s390.h"
#include "gaccess.h"
#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "userspace_handled", VCPU_STAT(exit_userspace) },
{ "exit_validity", VCPU_STAT(exit_validity) },
{ "exit_stop_request", VCPU_STAT(exit_stop_request) },
{ "exit_external_request", VCPU_STAT(exit_external_request) },
{ "exit_external_interrupt", VCPU_STAT(exit_external_interrupt) },
{ "exit_instruction", VCPU_STAT(exit_instruction) },
{ "exit_program_interruption", VCPU_STAT(exit_program_interruption) },
{ "exit_instr_and_program_int", VCPU_STAT(exit_instr_and_program) },
{ "instruction_lctg", VCPU_STAT(instruction_lctg) },
{ "instruction_lctl", VCPU_STAT(instruction_lctl) },
{ "deliver_emergency_signal", VCPU_STAT(deliver_emergency_signal) },
{ "deliver_service_signal", VCPU_STAT(deliver_service_signal) },
{ "deliver_virtio_interrupt", VCPU_STAT(deliver_virtio_interrupt) },
{ "deliver_stop_signal", VCPU_STAT(deliver_stop_signal) },
{ "deliver_prefix_signal", VCPU_STAT(deliver_prefix_signal) },
{ "deliver_restart_signal", VCPU_STAT(deliver_restart_signal) },
{ "deliver_program_interruption", VCPU_STAT(deliver_program_int) },
{ "exit_wait_state", VCPU_STAT(exit_wait_state) },
{ "instruction_stidp", VCPU_STAT(instruction_stidp) },
{ "instruction_spx", VCPU_STAT(instruction_spx) },
{ "instruction_stpx", VCPU_STAT(instruction_stpx) },
{ "instruction_stap", VCPU_STAT(instruction_stap) },
{ "instruction_storage_key", VCPU_STAT(instruction_storage_key) },
{ "instruction_stsch", VCPU_STAT(instruction_stsch) },
{ "instruction_chsc", VCPU_STAT(instruction_chsc) },
{ "instruction_stsi", VCPU_STAT(instruction_stsi) },
{ "instruction_stfl", VCPU_STAT(instruction_stfl) },
{ "instruction_sigp_sense", VCPU_STAT(instruction_sigp_sense) },
{ "instruction_sigp_emergency", VCPU_STAT(instruction_sigp_emergency) },
{ "instruction_sigp_stop", VCPU_STAT(instruction_sigp_stop) },
{ "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) },
{ "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) },
{ "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) },
{ "diagnose_44", VCPU_STAT(diagnose_44) },
{ NULL }
};
/* Section: not file related */
void kvm_arch_hardware_enable(void *garbage)
{
/* every s390 is virtualization enabled ;-) */
}
void kvm_arch_hardware_disable(void *garbage)
{
}
void decache_vcpus_on_cpu(int cpu)
{
}
int kvm_arch_hardware_setup(void)
{
return 0;
}
void kvm_arch_hardware_unsetup(void)
{
}
void kvm_arch_check_processor_compat(void *rtn)
{
}
int kvm_arch_init(void *opaque)
{
return 0;
}
void kvm_arch_exit(void)
{
}
/* Section: device related */
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
if (ioctl == KVM_S390_ENABLE_SIE)
return s390_enable_sie();
return -EINVAL;
}
int kvm_dev_ioctl_check_extension(long ext)
{
return 0;
}
/* Section: vm related */
/*
* Get (and clear) the dirty memory log for a memory slot.
*/
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log)
{
return 0;
}
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
struct kvm *kvm = filp->private_data;
void __user *argp = (void __user *)arg;
int r;
switch (ioctl) {
case KVM_S390_INTERRUPT: {
struct kvm_s390_interrupt s390int;
r = -EFAULT;
if (copy_from_user(&s390int, argp, sizeof(s390int)))
break;
r = kvm_s390_inject_vm(kvm, &s390int);
break;
}
default:
r = -EINVAL;
}
return r;
}
struct kvm *kvm_arch_create_vm(void)
{
struct kvm *kvm;
int rc;
char debug_name[16];
rc = s390_enable_sie();
if (rc)
goto out_nokvm;
rc = -ENOMEM;
kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
if (!kvm)
goto out_nokvm;
kvm->arch.sca = (struct sca_block *) get_zeroed_page(GFP_KERNEL);
if (!kvm->arch.sca)
goto out_nosca;
sprintf(debug_name, "kvm-%u", current->pid);
kvm->arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long));
if (!kvm->arch.dbf)
goto out_nodbf;
spin_lock_init(&kvm->arch.float_int.lock);
INIT_LIST_HEAD(&kvm->arch.float_int.list);
debug_register_view(kvm->arch.dbf, &debug_sprintf_view);
VM_EVENT(kvm, 3, "%s", "vm created");
try_module_get(THIS_MODULE);
return kvm;
out_nodbf:
free_page((unsigned long)(kvm->arch.sca));
out_nosca:
kfree(kvm);
out_nokvm:
return ERR_PTR(rc);
}
void kvm_arch_destroy_vm(struct kvm *kvm)
{
debug_unregister(kvm->arch.dbf);
free_page((unsigned long)(kvm->arch.sca));
kfree(kvm);
module_put(THIS_MODULE);
}
/* Section: vcpu related */
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
return 0;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but does'nt call it */
BUG();
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
save_fp_regs(&vcpu->arch.host_fpregs);
save_access_regs(vcpu->arch.host_acrs);
vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK;
restore_fp_regs(&vcpu->arch.guest_fpregs);
restore_access_regs(vcpu->arch.guest_acrs);
if (signal_pending(current))
atomic_set_mask(CPUSTAT_STOP_INT,
&vcpu->arch.sie_block->cpuflags);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
save_fp_regs(&vcpu->arch.guest_fpregs);
save_access_regs(vcpu->arch.guest_acrs);
restore_fp_regs(&vcpu->arch.host_fpregs);
restore_access_regs(vcpu->arch.host_acrs);
}
static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
{
/* this equals initial cpu reset in pop, but we don't switch to ESA */
vcpu->arch.sie_block->gpsw.mask = 0UL;
vcpu->arch.sie_block->gpsw.addr = 0UL;
vcpu->arch.sie_block->prefix = 0UL;
vcpu->arch.sie_block->ihcpu = 0xffff;
vcpu->arch.sie_block->cputm = 0UL;
vcpu->arch.sie_block->ckc = 0UL;
vcpu->arch.sie_block->todpr = 0;
memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64));
vcpu->arch.sie_block->gcr[0] = 0xE0UL;
vcpu->arch.sie_block->gcr[14] = 0xC2000000UL;
vcpu->arch.guest_fpregs.fpc = 0;
asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
vcpu->arch.sie_block->gbea = 1;
}
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH);
vcpu->arch.sie_block->gmslm = 0xffffffffffUL;
vcpu->arch.sie_block->gmsor = 0x000000000000;
vcpu->arch.sie_block->ecb = 2;
vcpu->arch.sie_block->eca = 0xC1002001U;
setup_timer(&vcpu->arch.ckc_timer, kvm_s390_idle_wakeup,
(unsigned long) vcpu);
get_cpu_id(&vcpu->arch.cpu_id);
vcpu->arch.cpu_id.version = 0xfe;
return 0;
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
unsigned int id)
{
struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
int rc = -ENOMEM;
if (!vcpu)
goto out_nomem;
vcpu->arch.sie_block = (struct sie_block *) get_zeroed_page(GFP_KERNEL);
if (!vcpu->arch.sie_block)
goto out_free_cpu;
vcpu->arch.sie_block->icpua = id;
BUG_ON(!kvm->arch.sca);
BUG_ON(kvm->arch.sca->cpu[id].sda);
kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
spin_lock_init(&vcpu->arch.local_int.lock);
INIT_LIST_HEAD(&vcpu->arch.local_int.list);
vcpu->arch.local_int.float_int = &kvm->arch.float_int;
spin_lock_bh(&kvm->arch.float_int.lock);
kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int;
init_waitqueue_head(&vcpu->arch.local_int.wq);
vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags;
spin_unlock_bh(&kvm->arch.float_int.lock);
rc = kvm_vcpu_init(vcpu, kvm, id);
if (rc)
goto out_free_cpu;
VM_EVENT(kvm, 3, "create cpu %d at %p, sie block at %p", id, vcpu,
vcpu->arch.sie_block);
try_module_get(THIS_MODULE);
return vcpu;
out_free_cpu:
kfree(vcpu);
out_nomem:
return ERR_PTR(rc);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
VCPU_EVENT(vcpu, 3, "%s", "destroy cpu");
free_page((unsigned long)(vcpu->arch.sie_block));
kfree(vcpu);
module_put(THIS_MODULE);
}
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but never calls it */
BUG();
return 0;
}
static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
kvm_s390_vcpu_initial_reset(vcpu);
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
vcpu_load(vcpu);
memcpy(&vcpu->arch.guest_gprs, &regs->gprs, sizeof(regs->gprs));
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
vcpu_load(vcpu);
memcpy(&regs->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs));
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
vcpu_load(vcpu);
memcpy(&vcpu->arch.guest_acrs, &sregs->acrs, sizeof(sregs->acrs));
memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs));
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
vcpu_load(vcpu);
memcpy(&sregs->acrs, &vcpu->arch.guest_acrs, sizeof(sregs->acrs));
memcpy(&sregs->crs, &vcpu->arch.sie_block->gcr, sizeof(sregs->crs));
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
vcpu_load(vcpu);
memcpy(&vcpu->arch.guest_fpregs.fprs, &fpu->fprs, sizeof(fpu->fprs));
vcpu->arch.guest_fpregs.fpc = fpu->fpc;
vcpu_put(vcpu);
return 0;
}
int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
vcpu_load(vcpu);
memcpy(&fpu->fprs, &vcpu->arch.guest_fpregs.fprs, sizeof(fpu->fprs));
fpu->fpc = vcpu->arch.guest_fpregs.fpc;
vcpu_put(vcpu);
return 0;
}
static int kvm_arch_vcpu_ioctl_set_initial_psw(struct kvm_vcpu *vcpu, psw_t psw)
{
int rc = 0;
vcpu_load(vcpu);
if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING)
rc = -EBUSY;
else
vcpu->arch.sie_block->gpsw = psw;
vcpu_put(vcpu);
return rc;
}
int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
return -EINVAL; /* not implemented yet */
}
int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
struct kvm_debug_guest *dbg)
{
return -EINVAL; /* not implemented yet */
}
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
return -EINVAL; /* not implemented yet */
}
int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
return -EINVAL; /* not implemented yet */
}
static void __vcpu_run(struct kvm_vcpu *vcpu)
{
memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
if (need_resched())
schedule();
vcpu->arch.sie_block->icptcode = 0;
local_irq_disable();
kvm_guest_enter();
local_irq_enable();
VCPU_EVENT(vcpu, 6, "entering sie flags %x",
atomic_read(&vcpu->arch.sie_block->cpuflags));
sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs);
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
vcpu->arch.sie_block->icptcode);
local_irq_disable();
kvm_guest_exit();
local_irq_enable();
memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16);
}
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
int rc;
sigset_t sigsaved;
vcpu_load(vcpu);
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);
switch (kvm_run->exit_reason) {
case KVM_EXIT_S390_SIEIC:
vcpu->arch.sie_block->gpsw.mask = kvm_run->s390_sieic.mask;
vcpu->arch.sie_block->gpsw.addr = kvm_run->s390_sieic.addr;
break;
case KVM_EXIT_UNKNOWN:
case KVM_EXIT_S390_RESET:
break;
default:
BUG();
}
might_sleep();
do {
kvm_s390_deliver_pending_interrupts(vcpu);
__vcpu_run(vcpu);
rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
if (signal_pending(current) && !rc)
rc = -EINTR;
if (rc == -ENOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode;
kvm_run->s390_sieic.mask = vcpu->arch.sie_block->gpsw.mask;
kvm_run->s390_sieic.addr = vcpu->arch.sie_block->gpsw.addr;
kvm_run->s390_sieic.ipa = vcpu->arch.sie_block->ipa;
kvm_run->s390_sieic.ipb = vcpu->arch.sie_block->ipb;
rc = 0;
}
if (rc == -EREMOTE) {
/* intercept was handled, but userspace support is needed
* kvm_run has been prepared by the handler */
rc = 0;
}
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &sigsaved, NULL);
vcpu_put(vcpu);
vcpu->stat.exit_userspace++;
return rc;
}
static int __guestcopy(struct kvm_vcpu *vcpu, u64 guestdest, const void *from,
unsigned long n, int prefix)
{
if (prefix)
return copy_to_guest(vcpu, guestdest, from, n);
else
return copy_to_guest_absolute(vcpu, guestdest, from, n);
}
/*
* store status at address
* we use have two special cases:
* KVM_S390_STORE_STATUS_NOADDR: -> 0x1200 on 64 bit
* KVM_S390_STORE_STATUS_PREFIXED: -> prefix
*/
int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr)
{
const unsigned char archmode = 1;
int prefix;
if (addr == KVM_S390_STORE_STATUS_NOADDR) {
if (copy_to_guest_absolute(vcpu, 163ul, &archmode, 1))
return -EFAULT;
addr = SAVE_AREA_BASE;
prefix = 0;
} else if (addr == KVM_S390_STORE_STATUS_PREFIXED) {
if (copy_to_guest(vcpu, 163ul, &archmode, 1))
return -EFAULT;
addr = SAVE_AREA_BASE;
prefix = 1;
} else
prefix = 0;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, fp_regs),
vcpu->arch.guest_fpregs.fprs, 128, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, gp_regs),
vcpu->arch.guest_gprs, 128, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, psw),
&vcpu->arch.sie_block->gpsw, 16, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, pref_reg),
&vcpu->arch.sie_block->prefix, 4, prefix))
return -EFAULT;
if (__guestcopy(vcpu,
addr + offsetof(struct save_area_s390x, fp_ctrl_reg),
&vcpu->arch.guest_fpregs.fpc, 4, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, tod_reg),
&vcpu->arch.sie_block->todpr, 4, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, timer),
&vcpu->arch.sie_block->cputm, 8, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, clk_cmp),
&vcpu->arch.sie_block->ckc, 8, prefix))
return -EFAULT;
if (__guestcopy(vcpu, addr + offsetof(struct save_area_s390x, acc_regs),
&vcpu->arch.guest_acrs, 64, prefix))
return -EFAULT;
if (__guestcopy(vcpu,
addr + offsetof(struct save_area_s390x, ctrl_regs),
&vcpu->arch.sie_block->gcr, 128, prefix))
return -EFAULT;
return 0;
}
static int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr)
{
int rc;
vcpu_load(vcpu);
rc = __kvm_s390_vcpu_store_status(vcpu, addr);
vcpu_put(vcpu);
return rc;
}
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
struct kvm_vcpu *vcpu = filp->private_data;
void __user *argp = (void __user *)arg;
switch (ioctl) {
case KVM_S390_INTERRUPT: {
struct kvm_s390_interrupt s390int;
if (copy_from_user(&s390int, argp, sizeof(s390int)))
return -EFAULT;
return kvm_s390_inject_vcpu(vcpu, &s390int);
}
case KVM_S390_STORE_STATUS:
return kvm_s390_vcpu_store_status(vcpu, arg);
case KVM_S390_SET_INITIAL_PSW: {
psw_t psw;
if (copy_from_user(&psw, argp, sizeof(psw)))
return -EFAULT;
return kvm_arch_vcpu_ioctl_set_initial_psw(vcpu, psw);
}
case KVM_S390_INITIAL_RESET:
return kvm_arch_vcpu_ioctl_initial_reset(vcpu);
default:
;
}
return -EINVAL;
}
/* Section: memory related */
int kvm_arch_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc)
{
/* A few sanity checks. We can have exactly one memory slot which has
to start at guest virtual zero and which has to be located at a
page boundary in userland and which has to end at a page boundary.
The memory in userland is ok to be fragmented into various different
vmas. It is okay to mmap() and munmap() stuff in this slot after
doing this call at any time */
if (mem->slot)
return -EINVAL;
if (mem->guest_phys_addr)
return -EINVAL;
if (mem->userspace_addr & (PAGE_SIZE - 1))
return -EINVAL;
if (mem->memory_size & (PAGE_SIZE - 1))
return -EINVAL;
kvm->arch.guest_origin = mem->userspace_addr;
kvm->arch.guest_memsize = mem->memory_size;
/* FIXME: we do want to interrupt running CPUs and update their memory
configuration now to avoid race conditions. But hey, changing the
memory layout while virtual CPUs are running is usually bad
programming practice. */
return 0;
}
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
}
static int __init kvm_s390_init(void)
{
return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
}
static void __exit kvm_s390_exit(void)
{
kvm_exit();
}
module_init(kvm_s390_init);
module_exit(kvm_s390_exit);

64
arch/s390/kvm/kvm-s390.h Normal file
View File

@ -0,0 +1,64 @@
/*
* kvm_s390.h - definition for kvm on s390
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#ifndef ARCH_S390_KVM_S390_H
#define ARCH_S390_KVM_S390_H
#include <linux/kvm.h>
#include <linux/kvm_host.h>
typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu);
int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu);
#define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
do { \
debug_sprintf_event(d_kvm->arch.dbf, d_loglevel, d_string "\n", \
d_args); \
} while (0)
#define VCPU_EVENT(d_vcpu, d_loglevel, d_string, d_args...)\
do { \
debug_sprintf_event(d_vcpu->kvm->arch.dbf, d_loglevel, \
"%02d[%016lx-%016lx]: " d_string "\n", d_vcpu->vcpu_id, \
d_vcpu->arch.sie_block->gpsw.mask, d_vcpu->arch.sie_block->gpsw.addr,\
d_args); \
} while (0)
static inline int __cpu_is_stopped(struct kvm_vcpu *vcpu)
{
return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOP_INT;
}
int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
void kvm_s390_idle_wakeup(unsigned long data);
void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
int kvm_s390_inject_vm(struct kvm *kvm,
struct kvm_s390_interrupt *s390int);
int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_interrupt *s390int);
int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code);
/* implemented in priv.c */
int kvm_s390_handle_priv(struct kvm_vcpu *vcpu);
/* implemented in sigp.c */
int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
/* implemented in kvm-s390.c */
int __kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu,
unsigned long addr);
/* implemented in diag.c */
int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
#endif

323
arch/s390/kvm/priv.c Normal file
View File

@ -0,0 +1,323 @@
/*
* priv.c - handling privileged instructions
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <linux/kvm.h>
#include <linux/errno.h>
#include <asm/current.h>
#include <asm/debug.h>
#include <asm/ebcdic.h>
#include <asm/sysinfo.h>
#include "gaccess.h"
#include "kvm-s390.h"
static int handle_set_prefix(struct kvm_vcpu *vcpu)
{
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 operand2;
u32 address = 0;
u8 tmp;
vcpu->stat.instruction_spx++;
operand2 = disp2;
if (base2)
operand2 += vcpu->arch.guest_gprs[base2];
/* must be word boundary */
if (operand2 & 3) {
kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
goto out;
}
/* get the value */
if (get_guest_u32(vcpu, operand2, &address)) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out;
}
address = address & 0x7fffe000u;
/* make sure that the new value is valid memory */
if (copy_from_guest_absolute(vcpu, &tmp, address, 1) ||
(copy_from_guest_absolute(vcpu, &tmp, address + PAGE_SIZE, 1))) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out;
}
vcpu->arch.sie_block->prefix = address;
vcpu->arch.sie_block->ihcpu = 0xffff;
VCPU_EVENT(vcpu, 5, "setting prefix to %x", address);
out:
return 0;
}
static int handle_store_prefix(struct kvm_vcpu *vcpu)
{
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 operand2;
u32 address;
vcpu->stat.instruction_stpx++;
operand2 = disp2;
if (base2)
operand2 += vcpu->arch.guest_gprs[base2];
/* must be word boundary */
if (operand2 & 3) {
kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
goto out;
}
address = vcpu->arch.sie_block->prefix;
address = address & 0x7fffe000u;
/* get the value */
if (put_guest_u32(vcpu, operand2, address)) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out;
}
VCPU_EVENT(vcpu, 5, "storing prefix to %x", address);
out:
return 0;
}
static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
{
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 useraddr;
int rc;
vcpu->stat.instruction_stap++;
useraddr = disp2;
if (base2)
useraddr += vcpu->arch.guest_gprs[base2];
if (useraddr & 1) {
kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
goto out;
}
rc = put_guest_u16(vcpu, useraddr, vcpu->vcpu_id);
if (rc == -EFAULT) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out;
}
VCPU_EVENT(vcpu, 5, "storing cpu address to %lx", useraddr);
out:
return 0;
}
static int handle_skey(struct kvm_vcpu *vcpu)
{
vcpu->stat.instruction_storage_key++;
vcpu->arch.sie_block->gpsw.addr -= 4;
VCPU_EVENT(vcpu, 4, "%s", "retrying storage key operation");
return 0;
}
static int handle_stsch(struct kvm_vcpu *vcpu)
{
vcpu->stat.instruction_stsch++;
VCPU_EVENT(vcpu, 4, "%s", "store subchannel - CC3");
/* condition code 3 */
vcpu->arch.sie_block->gpsw.mask &= ~(3ul << 44);
vcpu->arch.sie_block->gpsw.mask |= (3 & 3ul) << 44;
return 0;
}
static int handle_chsc(struct kvm_vcpu *vcpu)
{
vcpu->stat.instruction_chsc++;
VCPU_EVENT(vcpu, 4, "%s", "channel subsystem call - CC3");
/* condition code 3 */
vcpu->arch.sie_block->gpsw.mask &= ~(3ul << 44);
vcpu->arch.sie_block->gpsw.mask |= (3 & 3ul) << 44;
return 0;
}
static unsigned int kvm_stfl(void)
{
asm volatile(
" .insn s,0xb2b10000,0(0)\n" /* stfl */
"0:\n"
EX_TABLE(0b, 0b));
return S390_lowcore.stfl_fac_list;
}
static int handle_stfl(struct kvm_vcpu *vcpu)
{
unsigned int facility_list = kvm_stfl();
int rc;
vcpu->stat.instruction_stfl++;
facility_list &= ~(1UL<<24); /* no stfle */
rc = copy_to_guest(vcpu, offsetof(struct _lowcore, stfl_fac_list),
&facility_list, sizeof(facility_list));
if (rc == -EFAULT)
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
else
VCPU_EVENT(vcpu, 5, "store facility list value %x",
facility_list);
return 0;
}
static int handle_stidp(struct kvm_vcpu *vcpu)
{
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 operand2;
int rc;
vcpu->stat.instruction_stidp++;
operand2 = disp2;
if (base2)
operand2 += vcpu->arch.guest_gprs[base2];
if (operand2 & 7) {
kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
goto out;
}
rc = put_guest_u64(vcpu, operand2, vcpu->arch.stidp_data);
if (rc == -EFAULT) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out;
}
VCPU_EVENT(vcpu, 5, "%s", "store cpu id");
out:
return 0;
}
static void handle_stsi_3_2_2(struct kvm_vcpu *vcpu, struct sysinfo_3_2_2 *mem)
{
struct float_interrupt *fi = &vcpu->kvm->arch.float_int;
int cpus = 0;
int n;
spin_lock_bh(&fi->lock);
for (n = 0; n < KVM_MAX_VCPUS; n++)
if (fi->local_int[n])
cpus++;
spin_unlock_bh(&fi->lock);
/* deal with other level 3 hypervisors */
if (stsi(mem, 3, 2, 2) == -ENOSYS)
mem->count = 0;
if (mem->count < 8)
mem->count++;
for (n = mem->count - 1; n > 0 ; n--)
memcpy(&mem->vm[n], &mem->vm[n - 1], sizeof(mem->vm[0]));
mem->vm[0].cpus_total = cpus;
mem->vm[0].cpus_configured = cpus;
mem->vm[0].cpus_standby = 0;
mem->vm[0].cpus_reserved = 0;
mem->vm[0].caf = 1000;
memcpy(mem->vm[0].name, "KVMguest", 8);
ASCEBC(mem->vm[0].name, 8);
memcpy(mem->vm[0].cpi, "KVM/Linux ", 16);
ASCEBC(mem->vm[0].cpi, 16);
}
static int handle_stsi(struct kvm_vcpu *vcpu)
{
int fc = (vcpu->arch.guest_gprs[0] & 0xf0000000) >> 28;
int sel1 = vcpu->arch.guest_gprs[0] & 0xff;
int sel2 = vcpu->arch.guest_gprs[1] & 0xffff;
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u64 operand2;
unsigned long mem;
vcpu->stat.instruction_stsi++;
VCPU_EVENT(vcpu, 4, "stsi: fc: %x sel1: %x sel2: %x", fc, sel1, sel2);
operand2 = disp2;
if (base2)
operand2 += vcpu->arch.guest_gprs[base2];
if (operand2 & 0xfff && fc > 0)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
switch (fc) {
case 0:
vcpu->arch.guest_gprs[0] = 3 << 28;
vcpu->arch.sie_block->gpsw.mask &= ~(3ul << 44);
return 0;
case 1: /* same handling for 1 and 2 */
case 2:
mem = get_zeroed_page(GFP_KERNEL);
if (!mem)
goto out_fail;
if (stsi((void *) mem, fc, sel1, sel2) == -ENOSYS)
goto out_mem;
break;
case 3:
if (sel1 != 2 || sel2 != 2)
goto out_fail;
mem = get_zeroed_page(GFP_KERNEL);
if (!mem)
goto out_fail;
handle_stsi_3_2_2(vcpu, (void *) mem);
break;
default:
goto out_fail;
}
if (copy_to_guest_absolute(vcpu, operand2, (void *) mem, PAGE_SIZE)) {
kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
goto out_mem;
}
free_page(mem);
vcpu->arch.sie_block->gpsw.mask &= ~(3ul << 44);
vcpu->arch.guest_gprs[0] = 0;
return 0;
out_mem:
free_page(mem);
out_fail:
/* condition code 3 */
vcpu->arch.sie_block->gpsw.mask |= 3ul << 44;
return 0;
}
static intercept_handler_t priv_handlers[256] = {
[0x02] = handle_stidp,
[0x10] = handle_set_prefix,
[0x11] = handle_store_prefix,
[0x12] = handle_store_cpu_address,
[0x29] = handle_skey,
[0x2a] = handle_skey,
[0x2b] = handle_skey,
[0x34] = handle_stsch,
[0x5f] = handle_chsc,
[0x7d] = handle_stsi,
[0xb1] = handle_stfl,
};
int kvm_s390_handle_priv(struct kvm_vcpu *vcpu)
{
intercept_handler_t handler;
handler = priv_handlers[vcpu->arch.sie_block->ipa & 0x00ff];
if (handler)
return handler(vcpu);
return -ENOTSUPP;
}

47
arch/s390/kvm/sie64a.S Normal file
View File

@ -0,0 +1,47 @@
/*
* sie64a.S - low level sie call
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Heiko Carstens <heiko.carstens@de.ibm.com>
*/
#include <linux/errno.h>
#include <asm/asm-offsets.h>
SP_R5 = 5 * 8 # offset into stackframe
SP_R6 = 6 * 8
/*
* sie64a calling convention:
* %r2 pointer to sie control block
* %r3 guest register save area
*/
.globl sie64a
sie64a:
lgr %r5,%r3
stmg %r5,%r14,SP_R5(%r15) # save register on entry
lgr %r14,%r2 # pointer to sie control block
lmg %r0,%r13,0(%r3) # load guest gprs 0-13
sie_inst:
sie 0(%r14)
lg %r14,SP_R5(%r15)
stmg %r0,%r13,0(%r14) # save guest gprs 0-13
lghi %r2,0
lmg %r6,%r14,SP_R6(%r15)
br %r14
sie_err:
lg %r14,SP_R5(%r15)
stmg %r0,%r13,0(%r14) # save guest gprs 0-13
lghi %r2,-EFAULT
lmg %r6,%r14,SP_R6(%r15)
br %r14
.section __ex_table,"a"
.quad sie_inst,sie_err
.previous

288
arch/s390/kvm/sigp.c Normal file
View File

@ -0,0 +1,288 @@
/*
* sigp.c - handlinge interprocessor communication
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include "gaccess.h"
#include "kvm-s390.h"
/* sigp order codes */
#define SIGP_SENSE 0x01
#define SIGP_EXTERNAL_CALL 0x02
#define SIGP_EMERGENCY 0x03
#define SIGP_START 0x04
#define SIGP_STOP 0x05
#define SIGP_RESTART 0x06
#define SIGP_STOP_STORE_STATUS 0x09
#define SIGP_INITIAL_CPU_RESET 0x0b
#define SIGP_CPU_RESET 0x0c
#define SIGP_SET_PREFIX 0x0d
#define SIGP_STORE_STATUS_ADDR 0x0e
#define SIGP_SET_ARCH 0x12
/* cpu status bits */
#define SIGP_STAT_EQUIPMENT_CHECK 0x80000000UL
#define SIGP_STAT_INCORRECT_STATE 0x00000200UL
#define SIGP_STAT_INVALID_PARAMETER 0x00000100UL
#define SIGP_STAT_EXT_CALL_PENDING 0x00000080UL
#define SIGP_STAT_STOPPED 0x00000040UL
#define SIGP_STAT_OPERATOR_INTERV 0x00000020UL
#define SIGP_STAT_CHECK_STOP 0x00000010UL
#define SIGP_STAT_INOPERATIVE 0x00000004UL
#define SIGP_STAT_INVALID_ORDER 0x00000002UL
#define SIGP_STAT_RECEIVER_CHECK 0x00000001UL
static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr, u64 *reg)
{
struct float_interrupt *fi = &vcpu->kvm->arch.float_int;
int rc;
if (cpu_addr >= KVM_MAX_VCPUS)
return 3; /* not operational */
spin_lock_bh(&fi->lock);
if (fi->local_int[cpu_addr] == NULL)
rc = 3; /* not operational */
else if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
& CPUSTAT_RUNNING) {
*reg &= 0xffffffff00000000UL;
rc = 1; /* status stored */
} else {
*reg &= 0xffffffff00000000UL;
*reg |= SIGP_STAT_STOPPED;
rc = 1; /* status stored */
}
spin_unlock_bh(&fi->lock);
VCPU_EVENT(vcpu, 4, "sensed status of cpu %x rc %x", cpu_addr, rc);
return rc;
}
static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
{
struct float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct local_interrupt *li;
struct interrupt_info *inti;
int rc;
if (cpu_addr >= KVM_MAX_VCPUS)
return 3; /* not operational */
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return -ENOMEM;
inti->type = KVM_S390_INT_EMERGENCY;
spin_lock_bh(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = 3; /* not operational */
kfree(inti);
goto unlock;
}
spin_lock_bh(&li->lock);
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
if (waitqueue_active(&li->wq))
wake_up_interruptible(&li->wq);
spin_unlock_bh(&li->lock);
rc = 0; /* order accepted */
unlock:
spin_unlock_bh(&fi->lock);
VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
return rc;
}
static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int store)
{
struct float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct local_interrupt *li;
struct interrupt_info *inti;
int rc;
if (cpu_addr >= KVM_MAX_VCPUS)
return 3; /* not operational */
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return -ENOMEM;
inti->type = KVM_S390_SIGP_STOP;
spin_lock_bh(&fi->lock);
li = fi->local_int[cpu_addr];
if (li == NULL) {
rc = 3; /* not operational */
kfree(inti);
goto unlock;
}
spin_lock_bh(&li->lock);
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
atomic_set_mask(CPUSTAT_STOP_INT, li->cpuflags);
if (store)
li->action_bits |= ACTION_STORE_ON_STOP;
li->action_bits |= ACTION_STOP_ON_STOP;
if (waitqueue_active(&li->wq))
wake_up_interruptible(&li->wq);
spin_unlock_bh(&li->lock);
rc = 0; /* order accepted */
unlock:
spin_unlock_bh(&fi->lock);
VCPU_EVENT(vcpu, 4, "sent sigp stop to cpu %x", cpu_addr);
return rc;
}
static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
{
int rc;
switch (parameter & 0xff) {
case 0:
printk(KERN_WARNING "kvm: request to switch to ESA/390 mode"
" not supported");
rc = 3; /* not operational */
break;
case 1:
case 2:
rc = 0; /* order accepted */
break;
default:
rc = -ENOTSUPP;
}
return rc;
}
static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
u64 *reg)
{
struct float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct local_interrupt *li;
struct interrupt_info *inti;
int rc;
u8 tmp;
/* make sure that the new value is valid memory */
address = address & 0x7fffe000u;
if ((copy_from_guest(vcpu, &tmp,
(u64) (address + vcpu->kvm->arch.guest_origin) , 1)) ||
(copy_from_guest(vcpu, &tmp, (u64) (address +
vcpu->kvm->arch.guest_origin + PAGE_SIZE), 1))) {
*reg |= SIGP_STAT_INVALID_PARAMETER;
return 1; /* invalid parameter */
}
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
return 2; /* busy */
spin_lock_bh(&fi->lock);
li = fi->local_int[cpu_addr];
if ((cpu_addr >= KVM_MAX_VCPUS) || (li == NULL)) {
rc = 1; /* incorrect state */
*reg &= SIGP_STAT_INCORRECT_STATE;
kfree(inti);
goto out_fi;
}
spin_lock_bh(&li->lock);
/* cpu must be in stopped state */
if (atomic_read(li->cpuflags) & CPUSTAT_RUNNING) {
rc = 1; /* incorrect state */
*reg &= SIGP_STAT_INCORRECT_STATE;
kfree(inti);
goto out_li;
}
inti->type = KVM_S390_SIGP_SET_PREFIX;
inti->prefix.address = address;
list_add_tail(&inti->list, &li->list);
atomic_set(&li->active, 1);
if (waitqueue_active(&li->wq))
wake_up_interruptible(&li->wq);
rc = 0; /* order accepted */
VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
out_li:
spin_unlock_bh(&li->lock);
out_fi:
spin_unlock_bh(&fi->lock);
return rc;
}
int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu)
{
int r1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
int r3 = vcpu->arch.sie_block->ipa & 0x000f;
int base2 = vcpu->arch.sie_block->ipb >> 28;
int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff0000) >> 16);
u32 parameter;
u16 cpu_addr = vcpu->arch.guest_gprs[r3];
u8 order_code;
int rc;
order_code = disp2;
if (base2)
order_code += vcpu->arch.guest_gprs[base2];
if (r1 % 2)
parameter = vcpu->arch.guest_gprs[r1];
else
parameter = vcpu->arch.guest_gprs[r1 + 1];
switch (order_code) {
case SIGP_SENSE:
vcpu->stat.instruction_sigp_sense++;
rc = __sigp_sense(vcpu, cpu_addr,
&vcpu->arch.guest_gprs[r1]);
break;
case SIGP_EMERGENCY:
vcpu->stat.instruction_sigp_emergency++;
rc = __sigp_emergency(vcpu, cpu_addr);
break;
case SIGP_STOP:
vcpu->stat.instruction_sigp_stop++;
rc = __sigp_stop(vcpu, cpu_addr, 0);
break;
case SIGP_STOP_STORE_STATUS:
vcpu->stat.instruction_sigp_stop++;
rc = __sigp_stop(vcpu, cpu_addr, 1);
break;
case SIGP_SET_ARCH:
vcpu->stat.instruction_sigp_arch++;
rc = __sigp_set_arch(vcpu, parameter);
break;
case SIGP_SET_PREFIX:
vcpu->stat.instruction_sigp_prefix++;
rc = __sigp_set_prefix(vcpu, cpu_addr, parameter,
&vcpu->arch.guest_gprs[r1]);
break;
case SIGP_RESTART:
vcpu->stat.instruction_sigp_restart++;
/* user space must know about restart */
default:
return -ENOTSUPP;
}
if (rc < 0)
return rc;
vcpu->arch.sie_block->gpsw.mask &= ~(3ul << 44);
vcpu->arch.sie_block->gpsw.mask |= (rc & 3ul) << 44;
return 0;
}

View File

@ -30,11 +30,27 @@
#define TABLES_PER_PAGE 4
#define FRAG_MASK 15UL
#define SECOND_HALVES 10UL
void clear_table_pgstes(unsigned long *table)
{
clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
memset(table + 256, 0, PAGE_SIZE/4);
clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
memset(table + 768, 0, PAGE_SIZE/4);
}
#else
#define ALLOC_ORDER 2
#define TABLES_PER_PAGE 2
#define FRAG_MASK 3UL
#define SECOND_HALVES 2UL
void clear_table_pgstes(unsigned long *table)
{
clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2);
memset(table + 256, 0, PAGE_SIZE/2);
}
#endif
unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec)
@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
unsigned long *table;
unsigned long bits;
bits = mm->context.noexec ? 3UL : 1UL;
bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
spin_lock(&mm->page_table_lock);
page = NULL;
if (!list_empty(&mm->context.pgtable_list)) {
@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
pgtable_page_ctor(page);
page->flags &= ~FRAG_MASK;
table = (unsigned long *) page_to_phys(page);
clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
if (mm->context.pgstes)
clear_table_pgstes(table);
else
clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
spin_lock(&mm->page_table_lock);
list_add(&page->lru, &mm->context.pgtable_list);
}
@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *mm, unsigned long *table)
struct page *page;
unsigned long bits;
bits = mm->context.noexec ? 3UL : 1UL;
bits = (mm->context.noexec || mm->context.pgstes) ? 3UL : 1UL;
bits <<= (__pa(table) & (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long);
page = pfn_to_page(__pa(table) >> PAGE_SHIFT);
spin_lock(&mm->page_table_lock);
@ -228,3 +247,43 @@ void disable_noexec(struct mm_struct *mm, struct task_struct *tsk)
mm->context.noexec = 0;
update_mm(mm, tsk);
}
/*
* switch on pgstes for its userspace process (for kvm)
*/
int s390_enable_sie(void)
{
struct task_struct *tsk = current;
struct mm_struct *mm;
int rc;
task_lock(tsk);
rc = 0;
if (tsk->mm->context.pgstes)
goto unlock;
rc = -EINVAL;
if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 ||
tsk->mm != tsk->active_mm || tsk->mm->ioctx_list)
goto unlock;
tsk->mm->context.pgstes = 1; /* dirty little tricks .. */
mm = dup_mm(tsk);
tsk->mm->context.pgstes = 0;
rc = -ENOMEM;
if (!mm)
goto unlock;
mmput(tsk->mm);
tsk->mm = tsk->active_mm = mm;
preempt_disable();
update_mm(mm, tsk);
cpu_set(smp_processor_id(), mm->cpu_vm_mask);
preempt_enable();
rc = 0;
unlock:
task_unlock(tsk);
return rc;
}
EXPORT_SYMBOL_GPL(s390_enable_sie);

View File

@ -373,6 +373,25 @@ config VMI
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
depends on !(X86_VISWS || X86_VOYAGER)
help
Turning on this option will allow you to run a paravirtualized clock
when running over the KVM hypervisor. Instead of relying on a PIT
(or probably other) emulation by the underlying device model, the host
provides the guest with timing infrastructure such as time of day, and
system time
config KVM_GUEST
bool "KVM Guest support"
select PARAVIRT
depends on !(X86_VISWS || X86_VOYAGER)
help
This option enables various optimizations for running under the KVM
hypervisor.
source "arch/x86/lguest/Kconfig"
config PARAVIRT

View File

@ -80,6 +80,8 @@ obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
obj-$(CONFIG_KVM_GUEST) += kvm.o
obj-$(CONFIG_KVM_CLOCK) += kvmclock.o
obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
ifdef CONFIG_INPUT_PCSPKR

View File

@ -25,6 +25,7 @@
#include <asm/hpet.h>
#include <linux/kdebug.h>
#include <asm/smp.h>
#include <asm/reboot.h>
#include <mach_ipi.h>
@ -117,7 +118,7 @@ static void nmi_shootdown_cpus(void)
}
#endif
void machine_crash_shutdown(struct pt_regs *regs)
void native_machine_crash_shutdown(struct pt_regs *regs)
{
/* This function is only called after the system
* has panicked or is otherwise in a critical state.

248
arch/x86/kernel/kvm.c Normal file
View File

@ -0,0 +1,248 @@
/*
* KVM paravirt_ops implementation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright (C) 2007, Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright IBM Corporation, 2007
* Authors: Anthony Liguori <aliguori@us.ibm.com>
*/
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kvm_para.h>
#include <linux/cpu.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/hardirq.h>
#define MMU_QUEUE_SIZE 1024
struct kvm_para_state {
u8 mmu_queue[MMU_QUEUE_SIZE];
int mmu_queue_len;
enum paravirt_lazy_mode mode;
};
static DEFINE_PER_CPU(struct kvm_para_state, para_state);
static struct kvm_para_state *kvm_para_state(void)
{
return &per_cpu(para_state, raw_smp_processor_id());
}
/*
* No need for any "IO delay" on KVM
*/
static void kvm_io_delay(void)
{
}
static void kvm_mmu_op(void *buffer, unsigned len)
{
int r;
unsigned long a1, a2;
do {
a1 = __pa(buffer);
a2 = 0; /* on i386 __pa() always returns <4G */
r = kvm_hypercall3(KVM_HC_MMU_OP, len, a1, a2);
buffer += r;
len -= r;
} while (len);
}
static void mmu_queue_flush(struct kvm_para_state *state)
{
if (state->mmu_queue_len) {
kvm_mmu_op(state->mmu_queue, state->mmu_queue_len);
state->mmu_queue_len = 0;
}
}
static void kvm_deferred_mmu_op(void *buffer, int len)
{
struct kvm_para_state *state = kvm_para_state();
if (state->mode != PARAVIRT_LAZY_MMU) {
kvm_mmu_op(buffer, len);
return;
}
if (state->mmu_queue_len + len > sizeof state->mmu_queue)
mmu_queue_flush(state);
memcpy(state->mmu_queue + state->mmu_queue_len, buffer, len);
state->mmu_queue_len += len;
}
static void kvm_mmu_write(void *dest, u64 val)
{
__u64 pte_phys;
struct kvm_mmu_op_write_pte wpte;
#ifdef CONFIG_HIGHPTE
struct page *page;
unsigned long dst = (unsigned long) dest;
page = kmap_atomic_to_page(dest);
pte_phys = page_to_pfn(page);
pte_phys <<= PAGE_SHIFT;
pte_phys += (dst & ~(PAGE_MASK));
#else
pte_phys = (unsigned long)__pa(dest);
#endif
wpte.header.op = KVM_MMU_OP_WRITE_PTE;
wpte.pte_val = val;
wpte.pte_phys = pte_phys;
kvm_deferred_mmu_op(&wpte, sizeof wpte);
}
/*
* We only need to hook operations that are MMU writes. We hook these so that
* we can use lazy MMU mode to batch these operations. We could probably
* improve the performance of the host code if we used some of the information
* here to simplify processing of batched writes.
*/
static void kvm_set_pte(pte_t *ptep, pte_t pte)
{
kvm_mmu_write(ptep, pte_val(pte));
}
static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
kvm_mmu_write(ptep, pte_val(pte));
}
static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
{
kvm_mmu_write(pmdp, pmd_val(pmd));
}
#if PAGETABLE_LEVELS >= 3
#ifdef CONFIG_X86_PAE
static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
{
kvm_mmu_write(ptep, pte_val(pte));
}
static void kvm_set_pte_present(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
kvm_mmu_write(ptep, pte_val(pte));
}
static void kvm_pte_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
kvm_mmu_write(ptep, 0);
}
static void kvm_pmd_clear(pmd_t *pmdp)
{
kvm_mmu_write(pmdp, 0);
}
#endif
static void kvm_set_pud(pud_t *pudp, pud_t pud)
{
kvm_mmu_write(pudp, pud_val(pud));
}
#if PAGETABLE_LEVELS == 4
static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
kvm_mmu_write(pgdp, pgd_val(pgd));
}
#endif
#endif /* PAGETABLE_LEVELS >= 3 */
static void kvm_flush_tlb(void)
{
struct kvm_mmu_op_flush_tlb ftlb = {
.header.op = KVM_MMU_OP_FLUSH_TLB,
};
kvm_deferred_mmu_op(&ftlb, sizeof ftlb);
}
static void kvm_release_pt(u32 pfn)
{
struct kvm_mmu_op_release_pt rpt = {
.header.op = KVM_MMU_OP_RELEASE_PT,
.pt_phys = (u64)pfn << PAGE_SHIFT,
};
kvm_mmu_op(&rpt, sizeof rpt);
}
static void kvm_enter_lazy_mmu(void)
{
struct kvm_para_state *state = kvm_para_state();
paravirt_enter_lazy_mmu();
state->mode = paravirt_get_lazy_mode();
}
static void kvm_leave_lazy_mmu(void)
{
struct kvm_para_state *state = kvm_para_state();
mmu_queue_flush(state);
paravirt_leave_lazy(paravirt_get_lazy_mode());
state->mode = paravirt_get_lazy_mode();
}
static void paravirt_ops_setup(void)
{
pv_info.name = "KVM";
pv_info.paravirt_enabled = 1;
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
pv_cpu_ops.io_delay = kvm_io_delay;
if (kvm_para_has_feature(KVM_FEATURE_MMU_OP)) {
pv_mmu_ops.set_pte = kvm_set_pte;
pv_mmu_ops.set_pte_at = kvm_set_pte_at;
pv_mmu_ops.set_pmd = kvm_set_pmd;
#if PAGETABLE_LEVELS >= 3
#ifdef CONFIG_X86_PAE
pv_mmu_ops.set_pte_atomic = kvm_set_pte_atomic;
pv_mmu_ops.set_pte_present = kvm_set_pte_present;
pv_mmu_ops.pte_clear = kvm_pte_clear;
pv_mmu_ops.pmd_clear = kvm_pmd_clear;
#endif
pv_mmu_ops.set_pud = kvm_set_pud;
#if PAGETABLE_LEVELS == 4
pv_mmu_ops.set_pgd = kvm_set_pgd;
#endif
#endif
pv_mmu_ops.flush_tlb_user = kvm_flush_tlb;
pv_mmu_ops.release_pte = kvm_release_pt;
pv_mmu_ops.release_pmd = kvm_release_pt;
pv_mmu_ops.release_pud = kvm_release_pt;
pv_mmu_ops.lazy_mode.enter = kvm_enter_lazy_mmu;
pv_mmu_ops.lazy_mode.leave = kvm_leave_lazy_mmu;
}
}
void __init kvm_guest_init(void)
{
if (!kvm_para_available())
return;
paravirt_ops_setup();
}

187
arch/x86/kernel/kvmclock.c Normal file
View File

@ -0,0 +1,187 @@
/* KVM paravirtual clock driver. A clocksource implementation
Copyright (C) 2008 Glauber de Oliveira Costa, Red Hat Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include <linux/clocksource.h>
#include <linux/kvm_para.h>
#include <asm/arch_hooks.h>
#include <asm/msr.h>
#include <asm/apic.h>
#include <linux/percpu.h>
#include <asm/reboot.h>
#define KVM_SCALE 22
static int kvmclock = 1;
static int parse_no_kvmclock(char *arg)
{
kvmclock = 0;
return 0;
}
early_param("no-kvmclock", parse_no_kvmclock);
/* The hypervisor will put information about time periodically here */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct kvm_vcpu_time_info, hv_clock);
#define get_clock(cpu, field) per_cpu(hv_clock, cpu).field
static inline u64 kvm_get_delta(u64 last_tsc)
{
int cpu = smp_processor_id();
u64 delta = native_read_tsc() - last_tsc;
return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE;
}
static struct kvm_wall_clock wall_clock;
static cycle_t kvm_clock_read(void);
/*
* The wallclock is the time of day when we booted. Since then, some time may
* have elapsed since the hypervisor wrote the data. So we try to account for
* that with system time
*/
unsigned long kvm_get_wallclock(void)
{
u32 wc_sec, wc_nsec;
u64 delta;
struct timespec ts;
int version, nsec;
int low, high;
low = (int)__pa(&wall_clock);
high = ((u64)__pa(&wall_clock) >> 32);
delta = kvm_clock_read();
native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
do {
version = wall_clock.wc_version;
rmb();
wc_sec = wall_clock.wc_sec;
wc_nsec = wall_clock.wc_nsec;
rmb();
} while ((wall_clock.wc_version != version) || (version & 1));
delta = kvm_clock_read() - delta;
delta += wc_nsec;
nsec = do_div(delta, NSEC_PER_SEC);
set_normalized_timespec(&ts, wc_sec + delta, nsec);
/*
* Of all mechanisms of time adjustment I've tested, this one
* was the champion!
*/
return ts.tv_sec + 1;
}
int kvm_set_wallclock(unsigned long now)
{
return 0;
}
/*
* This is our read_clock function. The host puts an tsc timestamp each time
* it updates a new time. Without the tsc adjustment, we can have a situation
* in which a vcpu starts to run earlier (smaller system_time), but probes
* time later (compared to another vcpu), leading to backwards time
*/
static cycle_t kvm_clock_read(void)
{
u64 last_tsc, now;
int cpu;
preempt_disable();
cpu = smp_processor_id();
last_tsc = get_clock(cpu, tsc_timestamp);
now = get_clock(cpu, system_time);
now += kvm_get_delta(last_tsc);
preempt_enable();
return now;
}
static struct clocksource kvm_clock = {
.name = "kvm-clock",
.read = kvm_clock_read,
.rating = 400,
.mask = CLOCKSOURCE_MASK(64),
.mult = 1 << KVM_SCALE,
.shift = KVM_SCALE,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
static int kvm_register_clock(void)
{
int cpu = smp_processor_id();
int low, high;
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
}
static void kvm_setup_secondary_clock(void)
{
/*
* Now that the first cpu already had this clocksource initialized,
* we shouldn't fail.
*/
WARN_ON(kvm_register_clock());
/* ok, done with our trickery, call native */
setup_secondary_APIC_clock();
}
/*
* After the clock is registered, the host will keep writing to the
* registered memory location. If the guest happens to shutdown, this memory
* won't be valid. In cases like kexec, in which you install a new kernel, this
* means a random memory location will be kept being written. So before any
* kind of shutdown from our side, we unregister the clock by writting anything
* that does not have the 'enable' bit set in the msr
*/
#ifdef CONFIG_KEXEC
static void kvm_crash_shutdown(struct pt_regs *regs)
{
native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
native_machine_crash_shutdown(regs);
}
#endif
static void kvm_shutdown(void)
{
native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
native_machine_shutdown();
}
void __init kvmclock_init(void)
{
if (!kvm_para_available())
return;
if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
if (kvm_register_clock())
return;
pv_time_ops.get_wallclock = kvm_get_wallclock;
pv_time_ops.set_wallclock = kvm_set_wallclock;
pv_time_ops.sched_clock = kvm_clock_read;
pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
machine_ops.shutdown = kvm_shutdown;
#ifdef CONFIG_KEXEC
machine_ops.crash_shutdown = kvm_crash_shutdown;
#endif
clocksource_register(&kvm_clock);
}
}

View File

@ -399,7 +399,7 @@ static void native_machine_emergency_restart(void)
}
}
static void native_machine_shutdown(void)
void native_machine_shutdown(void)
{
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
@ -470,7 +470,10 @@ struct machine_ops machine_ops = {
.shutdown = native_machine_shutdown,
.emergency_restart = native_machine_emergency_restart,
.restart = native_machine_restart,
.halt = native_machine_halt
.halt = native_machine_halt,
#ifdef CONFIG_KEXEC
.crash_shutdown = native_machine_crash_shutdown,
#endif
};
void machine_power_off(void)
@ -498,3 +501,9 @@ void machine_halt(void)
machine_ops.halt();
}
#ifdef CONFIG_KEXEC
void machine_crash_shutdown(struct pt_regs *regs)
{
machine_ops.crash_shutdown(regs);
}
#endif

View File

@ -47,6 +47,7 @@
#include <linux/pfn.h>
#include <linux/pci.h>
#include <linux/init_ohci1394_dma.h>
#include <linux/kvm_para.h>
#include <video/edid.h>
@ -820,6 +821,10 @@ void __init setup_arch(char **cmdline_p)
max_low_pfn = setup_memory();
#ifdef CONFIG_KVM_CLOCK
kvmclock_init();
#endif
#ifdef CONFIG_VMI
/*
* Must be after max_low_pfn is determined, and before kernel
@ -827,6 +832,7 @@ void __init setup_arch(char **cmdline_p)
*/
vmi_init();
#endif
kvm_guest_init();
/*
* NOTE: before this point _nobody_ is allowed to allocate

View File

@ -42,6 +42,7 @@
#include <linux/ctype.h>
#include <linux/uaccess.h>
#include <linux/init_ohci1394_dma.h>
#include <linux/kvm_para.h>
#include <asm/mtrr.h>
#include <asm/uaccess.h>
@ -384,6 +385,10 @@ void __init setup_arch(char **cmdline_p)
io_delay_init();
#ifdef CONFIG_KVM_CLOCK
kvmclock_init();
#endif
#ifdef CONFIG_SMP
/* setup to use the early static init tables during kernel startup */
x86_cpu_to_apicid_early_ptr = (void *)x86_cpu_to_apicid_init;
@ -488,6 +493,8 @@ void __init setup_arch(char **cmdline_p)
init_apic_mappings();
ioapic_init_mappings();
kvm_guest_init();
/*
* We trust e820 completely. No explicit ROM probing in memory.
*/

View File

@ -19,7 +19,7 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
depends on HAVE_KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
@ -50,6 +50,17 @@ config KVM_AMD
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.
config KVM_TRACE
bool "KVM trace support"
depends on KVM && MARKERS && SYSFS
select RELAY
select DEBUG_FS
default n
---help---
This option allows reading a trace of kvm-related events through
relayfs. Note the ABI is not considered stable and will be
modified in future updates.
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/lguest/Kconfig

View File

@ -3,10 +3,14 @@
#
common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o)
ifeq ($(CONFIG_KVM_TRACE),y)
common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
endif
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o
kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
i8254.o
obj-$(CONFIG_KVM) += kvm.o
kvm-intel-objs = vmx.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o

611
arch/x86/kvm/i8254.c Normal file
View File

@ -0,0 +1,611 @@
/*
* 8253/8254 interval timer emulation
*
* Copyright (c) 2003-2004 Fabrice Bellard
* Copyright (c) 2006 Intel Corporation
* Copyright (c) 2007 Keir Fraser, XenSource Inc
* Copyright (c) 2008 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* Authors:
* Sheng Yang <sheng.yang@intel.com>
* Based on QEMU and Xen.
*/
#include <linux/kvm_host.h>
#include "irq.h"
#include "i8254.h"
#ifndef CONFIG_X86_64
#define mod_64(x, y) ((x) - (y) * div64_64(x, y))
#else
#define mod_64(x, y) ((x) % (y))
#endif
#define RW_STATE_LSB 1
#define RW_STATE_MSB 2
#define RW_STATE_WORD0 3
#define RW_STATE_WORD1 4
/* Compute with 96 bit intermediate result: (a*b)/c */
static u64 muldiv64(u64 a, u32 b, u32 c)
{
union {
u64 ll;
struct {
u32 low, high;
} l;
} u, res;
u64 rl, rh;
u.ll = a;
rl = (u64)u.l.low * (u64)b;
rh = (u64)u.l.high * (u64)b;
rh += (rl >> 32);
res.l.high = div64_64(rh, c);
res.l.low = div64_64(((mod_64(rh, c) << 32) + (rl & 0xffffffff)), c);
return res.ll;
}
static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
{
struct kvm_kpit_channel_state *c =
&kvm->arch.vpit->pit_state.channels[channel];
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
switch (c->mode) {
default:
case 0:
case 4:
/* XXX: just disable/enable counting */
break;
case 1:
case 2:
case 3:
case 5:
/* Restart counting on rising edge. */
if (c->gate < val)
c->count_load_time = ktime_get();
break;
}
c->gate = val;
}
int pit_get_gate(struct kvm *kvm, int channel)
{
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
return kvm->arch.vpit->pit_state.channels[channel].gate;
}
static int pit_get_count(struct kvm *kvm, int channel)
{
struct kvm_kpit_channel_state *c =
&kvm->arch.vpit->pit_state.channels[channel];
s64 d, t;
int counter;
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
t = ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
switch (c->mode) {
case 0:
case 1:
case 4:
case 5:
counter = (c->count - d) & 0xffff;
break;
case 3:
/* XXX: may be incorrect for odd counts */
counter = c->count - (mod_64((2 * d), c->count));
break;
default:
counter = c->count - mod_64(d, c->count);
break;
}
return counter;
}
static int pit_get_out(struct kvm *kvm, int channel)
{
struct kvm_kpit_channel_state *c =
&kvm->arch.vpit->pit_state.channels[channel];
s64 d, t;
int out;
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
t = ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
switch (c->mode) {
default:
case 0:
out = (d >= c->count);
break;
case 1:
out = (d < c->count);
break;
case 2:
out = ((mod_64(d, c->count) == 0) && (d != 0));
break;
case 3:
out = (mod_64(d, c->count) < ((c->count + 1) >> 1));
break;
case 4:
case 5:
out = (d == c->count);
break;
}
return out;
}
static void pit_latch_count(struct kvm *kvm, int channel)
{
struct kvm_kpit_channel_state *c =
&kvm->arch.vpit->pit_state.channels[channel];
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
if (!c->count_latched) {
c->latched_count = pit_get_count(kvm, channel);
c->count_latched = c->rw_mode;
}
}
static void pit_latch_status(struct kvm *kvm, int channel)
{
struct kvm_kpit_channel_state *c =
&kvm->arch.vpit->pit_state.channels[channel];
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
if (!c->status_latched) {
/* TODO: Return NULL COUNT (bit 6). */
c->status = ((pit_get_out(kvm, channel) << 7) |
(c->rw_mode << 4) |
(c->mode << 1) |
c->bcd);
c->status_latched = 1;
}
}
int __pit_timer_fn(struct kvm_kpit_state *ps)
{
struct kvm_vcpu *vcpu0 = ps->pit->kvm->vcpus[0];
struct kvm_kpit_timer *pt = &ps->pit_timer;
atomic_inc(&pt->pending);
smp_mb__after_atomic_inc();
/* FIXME: handle case where the guest is in guest mode */
if (vcpu0 && waitqueue_active(&vcpu0->wq)) {
vcpu0->arch.mp_state = KVM_MP_STATE_RUNNABLE;
wake_up_interruptible(&vcpu0->wq);
}
pt->timer.expires = ktime_add_ns(pt->timer.expires, pt->period);
pt->scheduled = ktime_to_ns(pt->timer.expires);
return (pt->period == 0 ? 0 : 1);
}
int pit_has_pending_timer(struct kvm_vcpu *vcpu)
{
struct kvm_pit *pit = vcpu->kvm->arch.vpit;
if (pit && vcpu->vcpu_id == 0)
return atomic_read(&pit->pit_state.pit_timer.pending);
return 0;
}
static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
{
struct kvm_kpit_state *ps;
int restart_timer = 0;
ps = container_of(data, struct kvm_kpit_state, pit_timer.timer);
restart_timer = __pit_timer_fn(ps);
if (restart_timer)
return HRTIMER_RESTART;
else
return HRTIMER_NORESTART;
}
static void destroy_pit_timer(struct kvm_kpit_timer *pt)
{
pr_debug("pit: execute del timer!\n");
hrtimer_cancel(&pt->timer);
}
static void create_pit_timer(struct kvm_kpit_timer *pt, u32 val, int is_period)
{
s64 interval;
interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
pr_debug("pit: create pit timer, interval is %llu nsec\n", interval);
/* TODO The new value only affected after the retriggered */
hrtimer_cancel(&pt->timer);
pt->period = (is_period == 0) ? 0 : interval;
pt->timer.function = pit_timer_fn;
atomic_set(&pt->pending, 0);
hrtimer_start(&pt->timer, ktime_add_ns(ktime_get(), interval),
HRTIMER_MODE_ABS);
}
static void pit_load_count(struct kvm *kvm, int channel, u32 val)
{
struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
WARN_ON(!mutex_is_locked(&ps->lock));
pr_debug("pit: load_count val is %d, channel is %d\n", val, channel);
/*
* Though spec said the state of 8254 is undefined after power-up,
* seems some tricky OS like Windows XP depends on IRQ0 interrupt
* when booting up.
* So here setting initialize rate for it, and not a specific number
*/
if (val == 0)
val = 0x10000;
ps->channels[channel].count_load_time = ktime_get();
ps->channels[channel].count = val;
if (channel != 0)
return;
/* Two types of timer
* mode 1 is one shot, mode 2 is period, otherwise del timer */
switch (ps->channels[0].mode) {
case 1:
create_pit_timer(&ps->pit_timer, val, 0);
break;
case 2:
create_pit_timer(&ps->pit_timer, val, 1);
break;
default:
destroy_pit_timer(&ps->pit_timer);
}
}
void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
{
mutex_lock(&kvm->arch.vpit->pit_state.lock);
pit_load_count(kvm, channel, val);
mutex_unlock(&kvm->arch.vpit->pit_state.lock);
}
static void pit_ioport_write(struct kvm_io_device *this,
gpa_t addr, int len, const void *data)
{
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
int channel, access;
struct kvm_kpit_channel_state *s;
u32 val = *(u32 *) data;
val &= 0xff;
addr &= KVM_PIT_CHANNEL_MASK;
mutex_lock(&pit_state->lock);
if (val != 0)
pr_debug("pit: write addr is 0x%x, len is %d, val is 0x%x\n",
(unsigned int)addr, len, val);
if (addr == 3) {
channel = val >> 6;
if (channel == 3) {
/* Read-Back Command. */
for (channel = 0; channel < 3; channel++) {
s = &pit_state->channels[channel];
if (val & (2 << channel)) {
if (!(val & 0x20))
pit_latch_count(kvm, channel);
if (!(val & 0x10))
pit_latch_status(kvm, channel);
}
}
} else {
/* Select Counter <channel>. */
s = &pit_state->channels[channel];
access = (val >> 4) & KVM_PIT_CHANNEL_MASK;
if (access == 0) {
pit_latch_count(kvm, channel);
} else {
s->rw_mode = access;
s->read_state = access;
s->write_state = access;
s->mode = (val >> 1) & 7;
if (s->mode > 5)
s->mode -= 4;
s->bcd = val & 1;
}
}
} else {
/* Write Count. */
s = &pit_state->channels[addr];
switch (s->write_state) {
default:
case RW_STATE_LSB:
pit_load_count(kvm, addr, val);
break;
case RW_STATE_MSB:
pit_load_count(kvm, addr, val << 8);
break;
case RW_STATE_WORD0:
s->write_latch = val;
s->write_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
pit_load_count(kvm, addr, s->write_latch | (val << 8));
s->write_state = RW_STATE_WORD0;
break;
}
}
mutex_unlock(&pit_state->lock);
}
static void pit_ioport_read(struct kvm_io_device *this,
gpa_t addr, int len, void *data)
{
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
int ret, count;
struct kvm_kpit_channel_state *s;
addr &= KVM_PIT_CHANNEL_MASK;
s = &pit_state->channels[addr];
mutex_lock(&pit_state->lock);
if (s->status_latched) {
s->status_latched = 0;
ret = s->status;
} else if (s->count_latched) {
switch (s->count_latched) {
default:
case RW_STATE_LSB:
ret = s->latched_count & 0xff;
s->count_latched = 0;
break;
case RW_STATE_MSB:
ret = s->latched_count >> 8;
s->count_latched = 0;
break;
case RW_STATE_WORD0:
ret = s->latched_count & 0xff;
s->count_latched = RW_STATE_MSB;
break;
}
} else {
switch (s->read_state) {
default:
case RW_STATE_LSB:
count = pit_get_count(kvm, addr);
ret = count & 0xff;
break;
case RW_STATE_MSB:
count = pit_get_count(kvm, addr);
ret = (count >> 8) & 0xff;
break;
case RW_STATE_WORD0:
count = pit_get_count(kvm, addr);
ret = count & 0xff;
s->read_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
count = pit_get_count(kvm, addr);
ret = (count >> 8) & 0xff;
s->read_state = RW_STATE_WORD0;
break;
}
}
if (len > sizeof(ret))
len = sizeof(ret);
memcpy(data, (char *)&ret, len);
mutex_unlock(&pit_state->lock);
}
static int pit_in_range(struct kvm_io_device *this, gpa_t addr)
{
return ((addr >= KVM_PIT_BASE_ADDRESS) &&
(addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH));
}
static void speaker_ioport_write(struct kvm_io_device *this,
gpa_t addr, int len, const void *data)
{
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
u32 val = *(u32 *) data;
mutex_lock(&pit_state->lock);
pit_state->speaker_data_on = (val >> 1) & 1;
pit_set_gate(kvm, 2, val & 1);
mutex_unlock(&pit_state->lock);
}
static void speaker_ioport_read(struct kvm_io_device *this,
gpa_t addr, int len, void *data)
{
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
unsigned int refresh_clock;
int ret;
/* Refresh clock toggles at about 15us. We approximate as 2^14ns. */
refresh_clock = ((unsigned int)ktime_to_ns(ktime_get()) >> 14) & 1;
mutex_lock(&pit_state->lock);
ret = ((pit_state->speaker_data_on << 1) | pit_get_gate(kvm, 2) |
(pit_get_out(kvm, 2) << 5) | (refresh_clock << 4));
if (len > sizeof(ret))
len = sizeof(ret);
memcpy(data, (char *)&ret, len);
mutex_unlock(&pit_state->lock);
}
static int speaker_in_range(struct kvm_io_device *this, gpa_t addr)
{
return (addr == KVM_SPEAKER_BASE_ADDRESS);
}
void kvm_pit_reset(struct kvm_pit *pit)
{
int i;
struct kvm_kpit_channel_state *c;
mutex_lock(&pit->pit_state.lock);
for (i = 0; i < 3; i++) {
c = &pit->pit_state.channels[i];
c->mode = 0xff;
c->gate = (i != 2);
pit_load_count(pit->kvm, i, 0);
}
mutex_unlock(&pit->pit_state.lock);
atomic_set(&pit->pit_state.pit_timer.pending, 0);
pit->pit_state.inject_pending = 1;
}
struct kvm_pit *kvm_create_pit(struct kvm *kvm)
{
struct kvm_pit *pit;
struct kvm_kpit_state *pit_state;
pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
if (!pit)
return NULL;
mutex_init(&pit->pit_state.lock);
mutex_lock(&pit->pit_state.lock);
/* Initialize PIO device */
pit->dev.read = pit_ioport_read;
pit->dev.write = pit_ioport_write;
pit->dev.in_range = pit_in_range;
pit->dev.private = pit;
kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
pit->speaker_dev.read = speaker_ioport_read;
pit->speaker_dev.write = speaker_ioport_write;
pit->speaker_dev.in_range = speaker_in_range;
pit->speaker_dev.private = pit;
kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
kvm->arch.vpit = pit;
pit->kvm = kvm;
pit_state = &pit->pit_state;
pit_state->pit = pit;
hrtimer_init(&pit_state->pit_timer.timer,
CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
mutex_unlock(&pit->pit_state.lock);
kvm_pit_reset(pit);
return pit;
}
void kvm_free_pit(struct kvm *kvm)
{
struct hrtimer *timer;
if (kvm->arch.vpit) {
mutex_lock(&kvm->arch.vpit->pit_state.lock);
timer = &kvm->arch.vpit->pit_state.pit_timer.timer;
hrtimer_cancel(timer);
mutex_unlock(&kvm->arch.vpit->pit_state.lock);
kfree(kvm->arch.vpit);
}
}
void __inject_pit_timer_intr(struct kvm *kvm)
{
mutex_lock(&kvm->lock);
kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 1);
kvm_ioapic_set_irq(kvm->arch.vioapic, 0, 0);
kvm_pic_set_irq(pic_irqchip(kvm), 0, 1);
kvm_pic_set_irq(pic_irqchip(kvm), 0, 0);
mutex_unlock(&kvm->lock);
}
void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
{
struct kvm_pit *pit = vcpu->kvm->arch.vpit;
struct kvm *kvm = vcpu->kvm;
struct kvm_kpit_state *ps;
if (vcpu && pit) {
ps = &pit->pit_state;
/* Try to inject pending interrupts when:
* 1. Pending exists
* 2. Last interrupt was accepted or waited for too long time*/
if (atomic_read(&ps->pit_timer.pending) &&
(ps->inject_pending ||
(jiffies - ps->last_injected_time
>= KVM_MAX_PIT_INTR_INTERVAL))) {
ps->inject_pending = 0;
__inject_pit_timer_intr(kvm);
ps->last_injected_time = jiffies;
}
}
}
void kvm_pit_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
{
struct kvm_arch *arch = &vcpu->kvm->arch;
struct kvm_kpit_state *ps;
if (vcpu && arch->vpit) {
ps = &arch->vpit->pit_state;
if (atomic_read(&ps->pit_timer.pending) &&
(((arch->vpic->pics[0].imr & 1) == 0 &&
arch->vpic->pics[0].irq_base == vec) ||
(arch->vioapic->redirtbl[0].fields.vector == vec &&
arch->vioapic->redirtbl[0].fields.mask != 1))) {
ps->inject_pending = 1;
atomic_dec(&ps->pit_timer.pending);
ps->channels[0].count_load_time = ktime_get();
}
}
}

63
arch/x86/kvm/i8254.h Normal file
View File

@ -0,0 +1,63 @@
#ifndef __I8254_H
#define __I8254_H
#include "iodev.h"
struct kvm_kpit_timer {
struct hrtimer timer;
int irq;
s64 period; /* unit: ns */
s64 scheduled;
ktime_t last_update;
atomic_t pending;
};
struct kvm_kpit_channel_state {
u32 count; /* can be 65536 */
u16 latched_count;
u8 count_latched;
u8 status_latched;
u8 status;
u8 read_state;
u8 write_state;
u8 write_latch;
u8 rw_mode;
u8 mode;
u8 bcd; /* not supported */
u8 gate; /* timer start */
ktime_t count_load_time;
};
struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
struct kvm_kpit_timer pit_timer;
u32 speaker_data_on;
struct mutex lock;
struct kvm_pit *pit;
bool inject_pending; /* if inject pending interrupts */
unsigned long last_injected_time;
};
struct kvm_pit {
unsigned long base_addresss;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;
struct kvm_kpit_state pit_state;
};
#define KVM_PIT_BASE_ADDRESS 0x40
#define KVM_SPEAKER_BASE_ADDRESS 0x61
#define KVM_PIT_MEM_LENGTH 4
#define KVM_PIT_FREQ 1193181
#define KVM_MAX_PIT_INTR_INTERVAL HZ / 100
#define KVM_PIT_CHANNEL_MASK 0x3
void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_pit_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
struct kvm_pit *kvm_create_pit(struct kvm *kvm);
void kvm_free_pit(struct kvm *kvm);
void kvm_pit_reset(struct kvm_pit *pit);
#endif

View File

@ -23,6 +23,22 @@
#include <linux/kvm_host.h>
#include "irq.h"
#include "i8254.h"
/*
* check if there are pending timer events
* to be processed.
*/
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
int ret;
ret = pit_has_pending_timer(vcpu);
ret |= apic_has_pending_timer(vcpu);
return ret;
}
EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
/*
* check if there is pending interrupt without
@ -66,6 +82,7 @@ EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
{
kvm_inject_apic_timer_irqs(vcpu);
kvm_inject_pit_timer_irqs(vcpu);
/* TODO: PIT, RTC etc. */
}
EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
@ -73,6 +90,7 @@ EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec)
{
kvm_apic_timer_intr_post(vcpu, vec);
kvm_pit_timer_intr_post(vcpu, vec);
/* TODO: PIT, RTC etc. */
}
EXPORT_SYMBOL_GPL(kvm_timer_intr_post);

View File

@ -85,4 +85,7 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
int pit_has_pending_timer(struct kvm_vcpu *vcpu);
int apic_has_pending_timer(struct kvm_vcpu *vcpu);
#endif

View File

@ -39,6 +39,8 @@ struct vcpu_svm {
unsigned long host_db_regs[NUM_DB_REGS];
unsigned long host_dr6;
unsigned long host_dr7;
u32 *msrpm;
};
#endif

View File

@ -338,10 +338,10 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
} else
apic_clear_vector(vector, apic->regs + APIC_TMR);
if (vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE)
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
kvm_vcpu_kick(vcpu);
else if (vcpu->arch.mp_state == VCPU_MP_STATE_HALTED) {
vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE;
else if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
if (waitqueue_active(&vcpu->wq))
wake_up_interruptible(&vcpu->wq);
}
@ -362,11 +362,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
case APIC_DM_INIT:
if (level) {
if (vcpu->arch.mp_state == VCPU_MP_STATE_RUNNABLE)
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
printk(KERN_DEBUG
"INIT on a runnable vcpu %d\n",
vcpu->vcpu_id);
vcpu->arch.mp_state = VCPU_MP_STATE_INIT_RECEIVED;
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
kvm_vcpu_kick(vcpu);
} else {
printk(KERN_DEBUG
@ -379,9 +379,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
case APIC_DM_STARTUP:
printk(KERN_DEBUG "SIPI to vcpu %d vector 0x%02x\n",
vcpu->vcpu_id, vector);
if (vcpu->arch.mp_state == VCPU_MP_STATE_INIT_RECEIVED) {
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
vcpu->arch.sipi_vector = vector;
vcpu->arch.mp_state = VCPU_MP_STATE_SIPI_RECEIVED;
vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
if (waitqueue_active(&vcpu->wq))
wake_up_interruptible(&vcpu->wq);
}
@ -658,7 +658,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
PRIx64 ", "
"timer initial count 0x%x, period %lldns, "
"expire @ 0x%016" PRIx64 ".\n", __FUNCTION__,
"expire @ 0x%016" PRIx64 ".\n", __func__,
APIC_BUS_CYCLE_NS, ktime_to_ns(now),
apic_get_reg(apic, APIC_TMICT),
apic->timer.period,
@ -691,7 +691,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
/* too common printing */
if (offset != APIC_EOI)
apic_debug("%s: offset 0x%x with length 0x%x, and value is "
"0x%x\n", __FUNCTION__, offset, len, val);
"0x%x\n", __func__, offset, len, val);
offset &= 0xff0;
@ -822,6 +822,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
apic_set_tpr(apic, ((cr8 & 0x0f) << 4)
| (apic_get_reg(apic, APIC_TASKPRI) & 4));
}
EXPORT_SYMBOL_GPL(kvm_lapic_set_tpr);
u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu)
{
@ -869,7 +870,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic;
int i;
apic_debug("%s\n", __FUNCTION__);
apic_debug("%s\n", __func__);
ASSERT(vcpu);
apic = vcpu->arch.apic;
@ -907,7 +908,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_update_ppr(apic);
apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
"0x%016" PRIx64 ", base_address=0x%0lx.\n", __FUNCTION__,
"0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
vcpu, kvm_apic_id(apic),
vcpu->arch.apic_base, apic->base_address);
}
@ -940,7 +941,7 @@ static int __apic_timer_fn(struct kvm_lapic *apic)
atomic_inc(&apic->timer.pending);
if (waitqueue_active(q)) {
apic->vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE;
apic->vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
wake_up_interruptible(q);
}
if (apic_lvtt_period(apic)) {
@ -952,6 +953,16 @@ static int __apic_timer_fn(struct kvm_lapic *apic)
return result;
}
int apic_has_pending_timer(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *lapic = vcpu->arch.apic;
if (lapic)
return atomic_read(&lapic->timer.pending);
return 0;
}
static int __inject_apic_timer_irq(struct kvm_lapic *apic)
{
int vector;

File diff suppressed because it is too large Load Diff

View File

@ -3,6 +3,12 @@
#include <linux/kvm_host.h>
#ifdef CONFIG_X86_64
#define TDP_ROOT_LEVEL PT64_ROOT_LEVEL
#else
#define TDP_ROOT_LEVEL PT32E_ROOT_LEVEL
#endif
static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
{
if (unlikely(vcpu->kvm->arch.n_free_mmu_pages < KVM_MIN_FREE_MMU_PAGES))

View File

@ -130,7 +130,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
unsigned index, pt_access, pte_access;
gpa_t pte_gpa;
pgprintk("%s: addr %lx\n", __FUNCTION__, addr);
pgprintk("%s: addr %lx\n", __func__, addr);
walk:
walker->level = vcpu->arch.mmu.root_level;
pte = vcpu->arch.cr3;
@ -155,7 +155,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
pte_gpa += index * sizeof(pt_element_t);
walker->table_gfn[walker->level - 1] = table_gfn;
walker->pte_gpa[walker->level - 1] = pte_gpa;
pgprintk("%s: table_gfn[%d] %lx\n", __FUNCTION__,
pgprintk("%s: table_gfn[%d] %lx\n", __func__,
walker->level - 1, table_gfn);
kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
@ -222,7 +222,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
walker->pt_access = pt_access;
walker->pte_access = pte_access;
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
__FUNCTION__, (u64)pte, pt_access, pte_access);
__func__, (u64)pte, pt_access, pte_access);
return 1;
not_present:
@ -243,31 +243,30 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
}
static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
u64 *spte, const void *pte, int bytes,
int offset_in_pte)
u64 *spte, const void *pte)
{
pt_element_t gpte;
unsigned pte_access;
struct page *npage;
pfn_t pfn;
int largepage = vcpu->arch.update_pte.largepage;
gpte = *(const pt_element_t *)pte;
if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
if (!offset_in_pte && !is_present_pte(gpte))
if (!is_present_pte(gpte))
set_shadow_pte(spte, shadow_notrap_nonpresent_pte);
return;
}
if (bytes < sizeof(pt_element_t))
return;
pgprintk("%s: gpte %llx spte %p\n", __FUNCTION__, (u64)gpte, spte);
pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
pte_access = page->role.access & FNAME(gpte_access)(vcpu, gpte);
if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn)
return;
npage = vcpu->arch.update_pte.page;
if (!npage)
pfn = vcpu->arch.update_pte.pfn;
if (is_error_pfn(pfn))
return;
get_page(npage);
kvm_get_pfn(pfn);
mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
gpte & PT_DIRTY_MASK, NULL, gpte_to_gfn(gpte), npage);
gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
pfn, true);
}
/*
@ -275,8 +274,8 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
*/
static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct guest_walker *walker,
int user_fault, int write_fault, int *ptwrite,
struct page *page)
int user_fault, int write_fault, int largepage,
int *ptwrite, pfn_t pfn)
{
hpa_t shadow_addr;
int level;
@ -304,11 +303,19 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
shadow_ent = ((u64 *)__va(shadow_addr)) + index;
if (level == PT_PAGE_TABLE_LEVEL)
break;
if (is_shadow_present_pte(*shadow_ent)) {
if (largepage && level == PT_DIRECTORY_LEVEL)
break;
if (is_shadow_present_pte(*shadow_ent)
&& !is_large_pte(*shadow_ent)) {
shadow_addr = *shadow_ent & PT64_BASE_ADDR_MASK;
continue;
}
if (is_large_pte(*shadow_ent))
rmap_remove(vcpu->kvm, shadow_ent);
if (level - 1 == PT_PAGE_TABLE_LEVEL
&& walker->level == PT_DIRECTORY_LEVEL) {
metaphysical = 1;
@ -329,7 +336,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
walker->pte_gpa[level - 2],
&curr_pte, sizeof(curr_pte));
if (r || curr_pte != walker->ptes[level - 2]) {
kvm_release_page_clean(page);
kvm_release_pfn_clean(pfn);
return NULL;
}
}
@ -342,7 +349,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
mmu_set_spte(vcpu, shadow_ent, access, walker->pte_access & access,
user_fault, write_fault,
walker->ptes[walker->level-1] & PT_DIRTY_MASK,
ptwrite, walker->gfn, page);
ptwrite, largepage, walker->gfn, pfn, false);
return shadow_ent;
}
@ -371,16 +378,16 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
u64 *shadow_pte;
int write_pt = 0;
int r;
struct page *page;
pfn_t pfn;
int largepage = 0;
pgprintk("%s: addr %lx err %x\n", __FUNCTION__, addr, error_code);
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
kvm_mmu_audit(vcpu, "pre page fault");
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
down_read(&vcpu->kvm->slots_lock);
/*
* Look up the shadow pte for the faulting address.
*/
@ -391,40 +398,45 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
* The page is not mapped by the guest. Let the guest handle it.
*/
if (!r) {
pgprintk("%s: guest page fault\n", __FUNCTION__);
pgprintk("%s: guest page fault\n", __func__);
inject_page_fault(vcpu, addr, walker.error_code);
vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
up_read(&vcpu->kvm->slots_lock);
return 0;
}
down_read(&current->mm->mmap_sem);
page = gfn_to_page(vcpu->kvm, walker.gfn);
if (walker.level == PT_DIRECTORY_LEVEL) {
gfn_t large_gfn;
large_gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE-1);
if (is_largepage_backed(vcpu, large_gfn)) {
walker.gfn = large_gfn;
largepage = 1;
}
}
pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
up_read(&current->mm->mmap_sem);
/* mmio */
if (is_error_pfn(pfn)) {
pgprintk("gfn %x is mmio\n", walker.gfn);
kvm_release_pfn_clean(pfn);
return 1;
}
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_free_some_pages(vcpu);
shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
&write_pt, page);
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __FUNCTION__,
largepage, &write_pt, pfn);
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
shadow_pte, *shadow_pte, write_pt);
if (!write_pt)
vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
/*
* mmio: emulate if accessible, otherwise its a guest fault.
*/
if (shadow_pte && is_io_pte(*shadow_pte)) {
spin_unlock(&vcpu->kvm->mmu_lock);
up_read(&vcpu->kvm->slots_lock);
return 1;
}
++vcpu->stat.pf_fixed;
kvm_mmu_audit(vcpu, "post page fault (fixed)");
spin_unlock(&vcpu->kvm->mmu_lock);
up_read(&vcpu->kvm->slots_lock);
return write_pt;
}

View File

@ -1,29 +0,0 @@
#ifndef __SEGMENT_DESCRIPTOR_H
#define __SEGMENT_DESCRIPTOR_H
struct segment_descriptor {
u16 limit_low;
u16 base_low;
u8 base_mid;
u8 type : 4;
u8 system : 1;
u8 dpl : 2;
u8 present : 1;
u8 limit_high : 4;
u8 avl : 1;
u8 long_mode : 1;
u8 default_op : 1;
u8 granularity : 1;
u8 base_high;
} __attribute__((packed));
#ifdef CONFIG_X86_64
/* LDT or TSS descriptor in the GDT. 16 bytes. */
struct segment_descriptor_64 {
struct segment_descriptor s;
u32 base_higher;
u32 pad_zero;
};
#endif
#endif

View File

@ -47,6 +47,18 @@ MODULE_LICENSE("GPL");
#define SVM_FEATURE_LBRV (1 << 1)
#define SVM_DEATURE_SVML (1 << 2)
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
/* enable NPT for AMD64 and X86 with PAE */
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
static bool npt_enabled = true;
#else
static bool npt_enabled = false;
#endif
static int npt = 1;
module_param(npt, int, S_IRUGO);
static void kvm_reput_irq(struct vcpu_svm *svm);
static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
@ -54,8 +66,7 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
return container_of(vcpu, struct vcpu_svm, vcpu);
}
unsigned long iopm_base;
unsigned long msrpm_base;
static unsigned long iopm_base;
struct kvm_ldttss_desc {
u16 limit0;
@ -182,7 +193,7 @@ static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
if (!(efer & EFER_LMA))
if (!npt_enabled && !(efer & EFER_LMA))
efer &= ~EFER_LME;
to_svm(vcpu)->vmcb->save.efer = efer | MSR_EFER_SVME_MASK;
@ -219,12 +230,12 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
if (!svm->next_rip) {
printk(KERN_DEBUG "%s: NOP\n", __FUNCTION__);
printk(KERN_DEBUG "%s: NOP\n", __func__);
return;
}
if (svm->next_rip - svm->vmcb->save.rip > MAX_INST_SIZE)
printk(KERN_ERR "%s: ip 0x%llx next 0x%llx\n",
__FUNCTION__,
__func__,
svm->vmcb->save.rip,
svm->next_rip);
@ -279,11 +290,7 @@ static void svm_hardware_enable(void *garbage)
struct svm_cpu_data *svm_data;
uint64_t efer;
#ifdef CONFIG_X86_64
struct desc_ptr gdt_descr;
#else
struct desc_ptr gdt_descr;
#endif
struct desc_struct *gdt;
int me = raw_smp_processor_id();
@ -302,7 +309,6 @@ static void svm_hardware_enable(void *garbage)
svm_data->asid_generation = 1;
svm_data->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
svm_data->next_asid = svm_data->max_asid + 1;
svm_features = cpuid_edx(SVM_CPUID_FUNC);
asm volatile ("sgdt %0" : "=m"(gdt_descr));
gdt = (struct desc_struct *)gdt_descr.address;
@ -361,12 +367,51 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
BUG();
}
static void svm_vcpu_init_msrpm(u32 *msrpm)
{
memset(msrpm, 0xff, PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER));
#ifdef CONFIG_X86_64
set_msr_interception(msrpm, MSR_GS_BASE, 1, 1);
set_msr_interception(msrpm, MSR_FS_BASE, 1, 1);
set_msr_interception(msrpm, MSR_KERNEL_GS_BASE, 1, 1);
set_msr_interception(msrpm, MSR_LSTAR, 1, 1);
set_msr_interception(msrpm, MSR_CSTAR, 1, 1);
set_msr_interception(msrpm, MSR_SYSCALL_MASK, 1, 1);
#endif
set_msr_interception(msrpm, MSR_K6_STAR, 1, 1);
set_msr_interception(msrpm, MSR_IA32_SYSENTER_CS, 1, 1);
set_msr_interception(msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
set_msr_interception(msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
}
static void svm_enable_lbrv(struct vcpu_svm *svm)
{
u32 *msrpm = svm->msrpm;
svm->vmcb->control.lbr_ctl = 1;
set_msr_interception(msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
set_msr_interception(msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
set_msr_interception(msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
}
static void svm_disable_lbrv(struct vcpu_svm *svm)
{
u32 *msrpm = svm->msrpm;
svm->vmcb->control.lbr_ctl = 0;
set_msr_interception(msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0);
set_msr_interception(msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
set_msr_interception(msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
}
static __init int svm_hardware_setup(void)
{
int cpu;
struct page *iopm_pages;
struct page *msrpm_pages;
void *iopm_va, *msrpm_va;
void *iopm_va;
int r;
iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
@ -379,41 +424,33 @@ static __init int svm_hardware_setup(void)
clear_bit(0x80, iopm_va); /* allow direct access to PC debug port */
iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
r = -ENOMEM;
if (!msrpm_pages)
goto err_1;
msrpm_va = page_address(msrpm_pages);
memset(msrpm_va, 0xff, PAGE_SIZE * (1 << MSRPM_ALLOC_ORDER));
msrpm_base = page_to_pfn(msrpm_pages) << PAGE_SHIFT;
#ifdef CONFIG_X86_64
set_msr_interception(msrpm_va, MSR_GS_BASE, 1, 1);
set_msr_interception(msrpm_va, MSR_FS_BASE, 1, 1);
set_msr_interception(msrpm_va, MSR_KERNEL_GS_BASE, 1, 1);
set_msr_interception(msrpm_va, MSR_LSTAR, 1, 1);
set_msr_interception(msrpm_va, MSR_CSTAR, 1, 1);
set_msr_interception(msrpm_va, MSR_SYSCALL_MASK, 1, 1);
#endif
set_msr_interception(msrpm_va, MSR_K6_STAR, 1, 1);
set_msr_interception(msrpm_va, MSR_IA32_SYSENTER_CS, 1, 1);
set_msr_interception(msrpm_va, MSR_IA32_SYSENTER_ESP, 1, 1);
set_msr_interception(msrpm_va, MSR_IA32_SYSENTER_EIP, 1, 1);
if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);
for_each_online_cpu(cpu) {
r = svm_cpu_init(cpu);
if (r)
goto err_2;
goto err;
}
svm_features = cpuid_edx(SVM_CPUID_FUNC);
if (!svm_has(SVM_FEATURE_NPT))
npt_enabled = false;
if (npt_enabled && !npt) {
printk(KERN_INFO "kvm: Nested Paging disabled\n");
npt_enabled = false;
}
if (npt_enabled) {
printk(KERN_INFO "kvm: Nested Paging enabled\n");
kvm_enable_tdp();
}
return 0;
err_2:
__free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
msrpm_base = 0;
err_1:
err:
__free_pages(iopm_pages, IOPM_ALLOC_ORDER);
iopm_base = 0;
return r;
@ -421,9 +458,8 @@ static __init int svm_hardware_setup(void)
static __exit void svm_hardware_unsetup(void)
{
__free_pages(pfn_to_page(msrpm_base >> PAGE_SHIFT), MSRPM_ALLOC_ORDER);
__free_pages(pfn_to_page(iopm_base >> PAGE_SHIFT), IOPM_ALLOC_ORDER);
iopm_base = msrpm_base = 0;
iopm_base = 0;
}
static void init_seg(struct vmcb_seg *seg)
@ -443,15 +479,14 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
seg->base = 0;
}
static void init_vmcb(struct vmcb *vmcb)
static void init_vmcb(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &vmcb->control;
struct vmcb_save_area *save = &vmcb->save;
struct vmcb_control_area *control = &svm->vmcb->control;
struct vmcb_save_area *save = &svm->vmcb->save;
control->intercept_cr_read = INTERCEPT_CR0_MASK |
INTERCEPT_CR3_MASK |
INTERCEPT_CR4_MASK |
INTERCEPT_CR8_MASK;
INTERCEPT_CR4_MASK;
control->intercept_cr_write = INTERCEPT_CR0_MASK |
INTERCEPT_CR3_MASK |
@ -471,23 +506,13 @@ static void init_vmcb(struct vmcb *vmcb)
INTERCEPT_DR7_MASK;
control->intercept_exceptions = (1 << PF_VECTOR) |
(1 << UD_VECTOR);
(1 << UD_VECTOR) |
(1 << MC_VECTOR);
control->intercept = (1ULL << INTERCEPT_INTR) |
(1ULL << INTERCEPT_NMI) |
(1ULL << INTERCEPT_SMI) |
/*
* selective cr0 intercept bug?
* 0: 0f 22 d8 mov %eax,%cr3
* 3: 0f 20 c0 mov %cr0,%eax
* 6: 0d 00 00 00 80 or $0x80000000,%eax
* b: 0f 22 c0 mov %eax,%cr0
* set cr3 ->interception
* get cr0 ->interception
* set cr0 -> no interception
*/
/* (1ULL << INTERCEPT_SELECTIVE_CR0) | */
(1ULL << INTERCEPT_CPUID) |
(1ULL << INTERCEPT_INVD) |
(1ULL << INTERCEPT_HLT) |
@ -508,7 +533,7 @@ static void init_vmcb(struct vmcb *vmcb)
(1ULL << INTERCEPT_MWAIT);
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = msrpm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
control->tsc_offset = 0;
control->int_ctl = V_INTR_MASKING_MASK;
@ -550,13 +575,30 @@ static void init_vmcb(struct vmcb *vmcb)
save->cr0 = 0x00000010 | X86_CR0_PG | X86_CR0_WP;
save->cr4 = X86_CR4_PAE;
/* rdx = ?? */
if (npt_enabled) {
/* Setup VMCB for Nested Paging */
control->nested_ctl = 1;
control->intercept &= ~(1ULL << INTERCEPT_TASK_SWITCH);
control->intercept_exceptions &= ~(1 << PF_VECTOR);
control->intercept_cr_read &= ~(INTERCEPT_CR0_MASK|
INTERCEPT_CR3_MASK);
control->intercept_cr_write &= ~(INTERCEPT_CR0_MASK|
INTERCEPT_CR3_MASK);
save->g_pat = 0x0007040600070406ULL;
/* enable caching because the QEMU Bios doesn't enable it */
save->cr0 = X86_CR0_ET;
save->cr3 = 0;
save->cr4 = 0;
}
force_new_asid(&svm->vcpu);
}
static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
init_vmcb(svm->vmcb);
init_vmcb(svm);
if (vcpu->vcpu_id != 0) {
svm->vmcb->save.rip = 0;
@ -571,6 +613,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
{
struct vcpu_svm *svm;
struct page *page;
struct page *msrpm_pages;
int err;
svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@ -589,12 +632,19 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
goto uninit;
}
err = -ENOMEM;
msrpm_pages = alloc_pages(GFP_KERNEL, MSRPM_ALLOC_ORDER);
if (!msrpm_pages)
goto uninit;
svm->msrpm = page_address(msrpm_pages);
svm_vcpu_init_msrpm(svm->msrpm);
svm->vmcb = page_address(page);
clear_page(svm->vmcb);
svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
svm->asid_generation = 0;
memset(svm->db_regs, 0, sizeof(svm->db_regs));
init_vmcb(svm->vmcb);
init_vmcb(svm);
fx_init(&svm->vcpu);
svm->vcpu.fpu_active = 1;
@ -617,6 +667,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
}
@ -731,6 +782,13 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
var->unusable = !var->present;
}
static int svm_get_cpl(struct kvm_vcpu *vcpu)
{
struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
return save->cpl;
}
static void svm_get_idt(struct kvm_vcpu *vcpu, struct descriptor_table *dt)
{
struct vcpu_svm *svm = to_svm(vcpu);
@ -784,6 +842,9 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
}
}
#endif
if (npt_enabled)
goto set;
if ((vcpu->arch.cr0 & X86_CR0_TS) && !(cr0 & X86_CR0_TS)) {
svm->vmcb->control.intercept_exceptions &= ~(1 << NM_VECTOR);
vcpu->fpu_active = 1;
@ -791,18 +852,29 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
vcpu->arch.cr0 = cr0;
cr0 |= X86_CR0_PG | X86_CR0_WP;
cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
if (!vcpu->fpu_active) {
svm->vmcb->control.intercept_exceptions |= (1 << NM_VECTOR);
cr0 |= X86_CR0_TS;
}
set:
/*
* re-enable caching here because the QEMU bios
* does not do it - this results in some delay at
* reboot
*/
cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
svm->vmcb->save.cr0 = cr0;
}
static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
vcpu->arch.cr4 = cr4;
to_svm(vcpu)->vmcb->save.cr4 = cr4 | X86_CR4_PAE;
unsigned long host_cr4_mce = read_cr4() & X86_CR4_MCE;
vcpu->arch.cr4 = cr4;
if (!npt_enabled)
cr4 |= X86_CR4_PAE;
cr4 |= host_cr4_mce;
to_svm(vcpu)->vmcb->save.cr4 = cr4;
}
static void svm_set_segment(struct kvm_vcpu *vcpu,
@ -833,13 +905,6 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
}
/* FIXME:
svm(vcpu)->vmcb->control.int_ctl &= ~V_TPR_MASK;
svm(vcpu)->vmcb->control.int_ctl |= (sregs->cr8 & V_TPR_MASK);
*/
static int svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_debug_guest *dbg)
{
return -EOPNOTSUPP;
@ -920,7 +985,7 @@ static void svm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long value,
}
default:
printk(KERN_DEBUG "%s: unexpected dr %u\n",
__FUNCTION__, dr);
__func__, dr);
*exception = UD_VECTOR;
return;
}
@ -962,6 +1027,19 @@ static int nm_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
return 1;
}
static int mc_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
{
/*
* On an #MC intercept the MCE handler is not called automatically in
* the host. So do it by hand here.
*/
asm volatile (
"int $0x12\n");
/* not sure if we ever come back to this point */
return 1;
}
static int shutdown_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
{
/*
@ -969,7 +1047,7 @@ static int shutdown_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
* so reinitialize it.
*/
clear_page(svm->vmcb);
init_vmcb(svm->vmcb);
init_vmcb(svm);
kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
return 0;
@ -1033,9 +1111,18 @@ static int invalid_op_interception(struct vcpu_svm *svm,
static int task_switch_interception(struct vcpu_svm *svm,
struct kvm_run *kvm_run)
{
pr_unimpl(&svm->vcpu, "%s: task switch is unsupported\n", __FUNCTION__);
kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
return 0;
u16 tss_selector;
tss_selector = (u16)svm->vmcb->control.exit_info_1;
if (svm->vmcb->control.exit_info_2 &
(1ULL << SVM_EXITINFOSHIFT_TS_REASON_IRET))
return kvm_task_switch(&svm->vcpu, tss_selector,
TASK_SWITCH_IRET);
if (svm->vmcb->control.exit_info_2 &
(1ULL << SVM_EXITINFOSHIFT_TS_REASON_JMP))
return kvm_task_switch(&svm->vcpu, tss_selector,
TASK_SWITCH_JMP);
return kvm_task_switch(&svm->vcpu, tss_selector, TASK_SWITCH_CALL);
}
static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
@ -1049,7 +1136,7 @@ static int emulate_on_interception(struct vcpu_svm *svm,
struct kvm_run *kvm_run)
{
if (emulate_instruction(&svm->vcpu, NULL, 0, 0, 0) != EMULATE_DONE)
pr_unimpl(&svm->vcpu, "%s: failed\n", __FUNCTION__);
pr_unimpl(&svm->vcpu, "%s: failed\n", __func__);
return 1;
}
@ -1179,8 +1266,19 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
svm->vmcb->save.sysenter_esp = data;
break;
case MSR_IA32_DEBUGCTLMSR:
pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
__FUNCTION__, data);
if (!svm_has(SVM_FEATURE_LBRV)) {
pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTL 0x%llx, nop\n",
__func__, data);
break;
}
if (data & DEBUGCTL_RESERVED_BITS)
return 1;
svm->vmcb->save.dbgctl = data;
if (data & (1ULL<<0))
svm_enable_lbrv(svm);
else
svm_disable_lbrv(svm);
break;
case MSR_K7_EVNTSEL0:
case MSR_K7_EVNTSEL1:
@ -1265,6 +1363,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_EXCP_BASE + UD_VECTOR] = ud_interception,
[SVM_EXIT_EXCP_BASE + PF_VECTOR] = pf_interception,
[SVM_EXIT_EXCP_BASE + NM_VECTOR] = nm_interception,
[SVM_EXIT_EXCP_BASE + MC_VECTOR] = mc_interception,
[SVM_EXIT_INTR] = nop_on_interception,
[SVM_EXIT_NMI] = nop_on_interception,
[SVM_EXIT_SMI] = nop_on_interception,
@ -1290,14 +1389,34 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_WBINVD] = emulate_on_interception,
[SVM_EXIT_MONITOR] = invalid_op_interception,
[SVM_EXIT_MWAIT] = invalid_op_interception,
[SVM_EXIT_NPF] = pf_interception,
};
static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
u32 exit_code = svm->vmcb->control.exit_code;
if (npt_enabled) {
int mmu_reload = 0;
if ((vcpu->arch.cr0 ^ svm->vmcb->save.cr0) & X86_CR0_PG) {
svm_set_cr0(vcpu, svm->vmcb->save.cr0);
mmu_reload = 1;
}
vcpu->arch.cr0 = svm->vmcb->save.cr0;
vcpu->arch.cr3 = svm->vmcb->save.cr3;
if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
if (!load_pdptrs(vcpu, vcpu->arch.cr3)) {
kvm_inject_gp(vcpu, 0);
return 1;
}
}
if (mmu_reload) {
kvm_mmu_reset_context(vcpu);
kvm_mmu_load(vcpu);
}
}
kvm_reput_irq(svm);
if (svm->vmcb->control.exit_code == SVM_EXIT_ERR) {
@ -1308,10 +1427,11 @@ static int handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
}
if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR)
exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
exit_code != SVM_EXIT_NPF)
printk(KERN_ERR "%s: unexpected exit_ini_info 0x%x "
"exit_code 0x%x\n",
__FUNCTION__, svm->vmcb->control.exit_int_info,
__func__, svm->vmcb->control.exit_int_info,
exit_code);
if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
@ -1364,6 +1484,27 @@ static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
svm_inject_irq(svm, irq);
}
static void update_cr8_intercept(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
int max_irr, tpr;
if (!irqchip_in_kernel(vcpu->kvm) || vcpu->arch.apic->vapic_addr)
return;
vmcb->control.intercept_cr_write &= ~INTERCEPT_CR8_MASK;
max_irr = kvm_lapic_find_highest_irr(vcpu);
if (max_irr == -1)
return;
tpr = kvm_lapic_get_cr8(vcpu) << 4;
if (tpr >= (max_irr & 0xf0))
vmcb->control.intercept_cr_write |= INTERCEPT_CR8_MASK;
}
static void svm_intr_assist(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@ -1376,14 +1517,14 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
SVM_EVTINJ_VEC_MASK;
vmcb->control.exit_int_info = 0;
svm_inject_irq(svm, intr_vector);
return;
goto out;
}
if (vmcb->control.int_ctl & V_IRQ_MASK)
return;
goto out;
if (!kvm_cpu_has_interrupt(vcpu))
return;
goto out;
if (!(vmcb->save.rflags & X86_EFLAGS_IF) ||
(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
@ -1391,12 +1532,14 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu)
/* unable to deliver irq, set pending irq */
vmcb->control.intercept |= (1ULL << INTERCEPT_VINTR);
svm_inject_irq(svm, 0x0);
return;
goto out;
}
/* Okay, we can deliver the interrupt: grab it and update PIC state. */
intr_vector = kvm_cpu_get_interrupt(vcpu);
svm_inject_irq(svm, intr_vector);
kvm_timer_intr_post(vcpu, intr_vector);
out:
update_cr8_intercept(vcpu);
}
static void kvm_reput_irq(struct vcpu_svm *svm)
@ -1482,6 +1625,29 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
{
}
static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
if (!(svm->vmcb->control.intercept_cr_write & INTERCEPT_CR8_MASK)) {
int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
kvm_lapic_set_tpr(vcpu, cr8);
}
}
static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
u64 cr8;
if (!irqchip_in_kernel(vcpu->kvm))
return;
cr8 = kvm_get_cr8(vcpu);
svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
}
static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
struct vcpu_svm *svm = to_svm(vcpu);
@ -1491,6 +1657,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
pre_svm_run(svm);
sync_lapic_to_cr8(vcpu);
save_host_msrs(vcpu);
fs_selector = read_fs();
gs_selector = read_gs();
@ -1499,6 +1667,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
svm->host_dr6 = read_dr6();
svm->host_dr7 = read_dr7();
svm->vmcb->save.cr2 = vcpu->arch.cr2;
/* required for live migration with NPT */
if (npt_enabled)
svm->vmcb->save.cr3 = vcpu->arch.cr3;
if (svm->vmcb->save.dr7 & 0xff) {
write_dr7(0);
@ -1635,6 +1806,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
stgi();
sync_cr8_to_lapic(vcpu);
svm->next_rip = 0;
}
@ -1642,6 +1815,12 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
{
struct vcpu_svm *svm = to_svm(vcpu);
if (npt_enabled) {
svm->vmcb->control.nested_cr3 = root;
force_new_asid(vcpu);
return;
}
svm->vmcb->save.cr3 = root;
force_new_asid(vcpu);
@ -1709,6 +1888,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.get_segment_base = svm_get_segment_base,
.get_segment = svm_get_segment,
.set_segment = svm_set_segment,
.get_cpl = svm_get_cpl,
.get_cs_db_l_bits = kvm_get_cs_db_l_bits,
.decache_cr4_guest_bits = svm_decache_cr4_guest_bits,
.set_cr0 = svm_set_cr0,

View File

@ -238,6 +238,9 @@ struct __attribute__ ((__packed__)) vmcb {
#define SVM_EXITINTINFO_VALID SVM_EVTINJ_VALID
#define SVM_EXITINTINFO_VALID_ERR SVM_EVTINJ_VALID_ERR
#define SVM_EXITINFOSHIFT_TS_REASON_IRET 36
#define SVM_EXITINFOSHIFT_TS_REASON_JMP 38
#define SVM_EXIT_READ_CR0 0x000
#define SVM_EXIT_READ_CR3 0x003
#define SVM_EXIT_READ_CR4 0x004

59
arch/x86/kvm/tss.h Normal file
View File

@ -0,0 +1,59 @@
#ifndef __TSS_SEGMENT_H
#define __TSS_SEGMENT_H
struct tss_segment_32 {
u32 prev_task_link;
u32 esp0;
u32 ss0;
u32 esp1;
u32 ss1;
u32 esp2;
u32 ss2;
u32 cr3;
u32 eip;
u32 eflags;
u32 eax;
u32 ecx;
u32 edx;
u32 ebx;
u32 esp;
u32 ebp;
u32 esi;
u32 edi;
u32 es;
u32 cs;
u32 ss;
u32 ds;
u32 fs;
u32 gs;
u32 ldt_selector;
u16 t;
u16 io_map;
};
struct tss_segment_16 {
u16 prev_task_link;
u16 sp0;
u16 ss0;
u16 sp1;
u16 ss1;
u16 sp2;
u16 ss2;
u16 ip;
u16 flag;
u16 ax;
u16 cx;
u16 dx;
u16 bx;
u16 sp;
u16 bp;
u16 si;
u16 di;
u16 es;
u16 cs;
u16 ss;
u16 ds;
u16 ldt;
};
#endif

View File

@ -17,7 +17,6 @@
#include "irq.h"
#include "vmx.h"
#include "segment_descriptor.h"
#include "mmu.h"
#include <linux/kvm_host.h>
@ -37,6 +36,12 @@ MODULE_LICENSE("GPL");
static int bypass_guest_pf = 1;
module_param(bypass_guest_pf, bool, 0);
static int enable_vpid = 1;
module_param(enable_vpid, bool, 0);
static int flexpriority_enabled = 1;
module_param(flexpriority_enabled, bool, 0);
struct vmcs {
u32 revision_id;
u32 abort;
@ -71,6 +76,7 @@ struct vcpu_vmx {
unsigned rip;
} irq;
} rmode;
int vpid;
};
static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@ -85,6 +91,10 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static struct page *vmx_io_bitmap_a;
static struct page *vmx_io_bitmap_b;
static struct page *vmx_msr_bitmap;
static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
static DEFINE_SPINLOCK(vmx_vpid_lock);
static struct vmcs_config {
int size;
@ -176,6 +186,11 @@ static inline int is_external_interrupt(u32 intr_info)
== (INTR_TYPE_EXT_INTR | INTR_INFO_VALID_MASK);
}
static inline int cpu_has_vmx_msr_bitmap(void)
{
return (vmcs_config.cpu_based_exec_ctrl & CPU_BASED_USE_MSR_BITMAPS);
}
static inline int cpu_has_vmx_tpr_shadow(void)
{
return (vmcs_config.cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW);
@ -194,8 +209,9 @@ static inline int cpu_has_secondary_exec_ctrls(void)
static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
{
return (vmcs_config.cpu_based_2nd_exec_ctrl &
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
return flexpriority_enabled
&& (vmcs_config.cpu_based_2nd_exec_ctrl &
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
}
static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
@ -204,6 +220,12 @@ static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
(irqchip_in_kernel(kvm)));
}
static inline int cpu_has_vmx_vpid(void)
{
return (vmcs_config.cpu_based_2nd_exec_ctrl &
SECONDARY_EXEC_ENABLE_VPID);
}
static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
{
int i;
@ -214,6 +236,20 @@ static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
return -1;
}
static inline void __invvpid(int ext, u16 vpid, gva_t gva)
{
struct {
u64 vpid : 16;
u64 rsvd : 48;
u64 gva;
} operand = { vpid, 0, gva };
asm volatile (ASM_VMX_INVVPID
/* CF==1 or ZF==1 --> rc = -1 */
"; ja 1f ; ud2 ; 1:"
: : "a"(&operand), "c"(ext) : "cc", "memory");
}
static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
{
int i;
@ -257,6 +293,14 @@ static void vcpu_clear(struct vcpu_vmx *vmx)
vmx->launched = 0;
}
static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx)
{
if (vmx->vpid == 0)
return;
__invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx->vpid, 0);
}
static unsigned long vmcs_readl(unsigned long field)
{
unsigned long value;
@ -353,7 +397,7 @@ static void reload_tss(void)
* VT restores TR but not its size. Useless.
*/
struct descriptor_table gdt;
struct segment_descriptor *descs;
struct desc_struct *descs;
get_gdt(&gdt);
descs = (void *)gdt.base;
@ -485,11 +529,12 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 phys_addr = __pa(vmx->vmcs);
u64 tsc_this, delta;
u64 tsc_this, delta, new_offset;
if (vcpu->cpu != cpu) {
vcpu_clear(vmx);
kvm_migrate_apic_timer(vcpu);
vpid_sync_vcpu_all(vmx);
}
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
@ -524,8 +569,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
* Make sure the time stamp counter is monotonous.
*/
rdtscll(tsc_this);
delta = vcpu->arch.host_tsc - tsc_this;
vmcs_write64(TSC_OFFSET, vmcs_read64(TSC_OFFSET) + delta);
if (tsc_this < vcpu->arch.host_tsc) {
delta = vcpu->arch.host_tsc - tsc_this;
new_offset = vmcs_read64(TSC_OFFSET) + delta;
vmcs_write64(TSC_OFFSET, new_offset);
}
}
}
@ -596,7 +644,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
{
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
nr | INTR_TYPE_EXCEPTION
| (has_error_code ? INTR_INFO_DELIEVER_CODE_MASK : 0)
| (has_error_code ? INTR_INFO_DELIVER_CODE_MASK : 0)
| INTR_INFO_VALID_MASK);
if (has_error_code)
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
@ -959,6 +1007,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
CPU_BASED_MOV_DR_EXITING |
CPU_BASED_USE_TSC_OFFSETING;
opt = CPU_BASED_TPR_SHADOW |
CPU_BASED_USE_MSR_BITMAPS |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
&_cpu_based_exec_control) < 0)
@ -971,7 +1020,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) {
min = 0;
opt = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_WBINVD_EXITING;
SECONDARY_EXEC_WBINVD_EXITING |
SECONDARY_EXEC_ENABLE_VPID;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
return -EIO;
@ -1080,6 +1130,10 @@ static __init int hardware_setup(void)
{
if (setup_vmcs_config(&vmcs_config) < 0)
return -EIO;
if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);
return alloc_kvm_area();
}
@ -1214,7 +1268,7 @@ static void enter_lmode(struct kvm_vcpu *vcpu)
guest_tr_ar = vmcs_read32(GUEST_TR_AR_BYTES);
if ((guest_tr_ar & AR_TYPE_MASK) != AR_TYPE_BUSY_64_TSS) {
printk(KERN_DEBUG "%s: tss fixup for long mode. \n",
__FUNCTION__);
__func__);
vmcs_write32(GUEST_TR_AR_BYTES,
(guest_tr_ar & ~AR_TYPE_MASK)
| AR_TYPE_BUSY_64_TSS);
@ -1239,6 +1293,11 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
#endif
static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
{
vpid_sync_vcpu_all(to_vmx(vcpu));
}
static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
{
vcpu->arch.cr4 &= KVM_GUEST_CR4_MASK;
@ -1275,6 +1334,7 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
vmx_flush_tlb(vcpu);
vmcs_writel(GUEST_CR3, cr3);
if (vcpu->arch.cr0 & X86_CR0_PE)
vmx_fpu_deactivate(vcpu);
@ -1288,14 +1348,14 @@ static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
vcpu->arch.cr4 = cr4;
}
#ifdef CONFIG_X86_64
static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct kvm_msr_entry *msr = find_msr_entry(vmx, MSR_EFER);
vcpu->arch.shadow_efer = efer;
if (!msr)
return;
if (efer & EFER_LMA) {
vmcs_write32(VM_ENTRY_CONTROLS,
vmcs_read32(VM_ENTRY_CONTROLS) |
@ -1312,8 +1372,6 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
setup_msrs(vmx);
}
#endif
static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
{
struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
@ -1344,6 +1402,20 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
var->unusable = (ar >> 16) & 1;
}
static int vmx_get_cpl(struct kvm_vcpu *vcpu)
{
struct kvm_segment kvm_seg;
if (!(vcpu->arch.cr0 & X86_CR0_PE)) /* if real mode */
return 0;
if (vmx_get_rflags(vcpu) & X86_EFLAGS_VM) /* if virtual 8086 */
return 3;
vmx_get_segment(vcpu, &kvm_seg, VCPU_SREG_CS);
return kvm_seg.selector & 3;
}
static u32 vmx_segment_access_rights(struct kvm_segment *var)
{
u32 ar;
@ -1433,7 +1505,6 @@ static int init_rmode_tss(struct kvm *kvm)
int ret = 0;
int r;
down_read(&kvm->slots_lock);
r = kvm_clear_guest_page(kvm, fn, 0, PAGE_SIZE);
if (r < 0)
goto out;
@ -1456,7 +1527,6 @@ static int init_rmode_tss(struct kvm *kvm)
ret = 1;
out:
up_read(&kvm->slots_lock);
return ret;
}
@ -1494,6 +1564,46 @@ static int alloc_apic_access_page(struct kvm *kvm)
return r;
}
static void allocate_vpid(struct vcpu_vmx *vmx)
{
int vpid;
vmx->vpid = 0;
if (!enable_vpid || !cpu_has_vmx_vpid())
return;
spin_lock(&vmx_vpid_lock);
vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
if (vpid < VMX_NR_VPIDS) {
vmx->vpid = vpid;
__set_bit(vpid, vmx_vpid_bitmap);
}
spin_unlock(&vmx_vpid_lock);
}
void vmx_disable_intercept_for_msr(struct page *msr_bitmap, u32 msr)
{
void *va;
if (!cpu_has_vmx_msr_bitmap())
return;
/*
* See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
* have the write-low and read-high bitmap offsets the wrong way round.
* We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
*/
va = kmap(msr_bitmap);
if (msr <= 0x1fff) {
__clear_bit(msr, va + 0x000); /* read-low */
__clear_bit(msr, va + 0x800); /* write-low */
} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
msr &= 0x1fff;
__clear_bit(msr, va + 0x400); /* read-high */
__clear_bit(msr, va + 0xc00); /* write-high */
}
kunmap(msr_bitmap);
}
/*
* Sets up the vmcs for emulated real mode.
*/
@ -1511,6 +1621,9 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_write64(IO_BITMAP_A, page_to_phys(vmx_io_bitmap_a));
vmcs_write64(IO_BITMAP_B, page_to_phys(vmx_io_bitmap_b));
if (cpu_has_vmx_msr_bitmap())
vmcs_write64(MSR_BITMAP, page_to_phys(vmx_msr_bitmap));
vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
/* Control */
@ -1532,6 +1645,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
if (!vm_need_virtualize_apic_accesses(vmx->vcpu.kvm))
exec_control &=
~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
if (vmx->vpid == 0)
exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}
@ -1613,6 +1728,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
down_read(&vcpu->kvm->slots_lock);
if (!init_rmode_tss(vmx->vcpu.kvm)) {
ret = -ENOMEM;
goto out;
@ -1621,7 +1737,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmx->vcpu.arch.rmode.active = 0;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
set_cr8(&vmx->vcpu, 0);
kvm_set_cr8(&vmx->vcpu, 0);
msr = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
if (vmx->vcpu.vcpu_id == 0)
msr |= MSR_IA32_APICBASE_BSP;
@ -1704,18 +1820,22 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write64(APIC_ACCESS_ADDR,
page_to_phys(vmx->vcpu.kvm->arch.apic_access_page));
if (vmx->vpid != 0)
vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
vmx->vcpu.arch.cr0 = 0x60000010;
vmx_set_cr0(&vmx->vcpu, vmx->vcpu.arch.cr0); /* enter rmode */
vmx_set_cr4(&vmx->vcpu, 0);
#ifdef CONFIG_X86_64
vmx_set_efer(&vmx->vcpu, 0);
#endif
vmx_fpu_activate(&vmx->vcpu);
update_exception_bitmap(&vmx->vcpu);
return 0;
vpid_sync_vcpu_all(vmx);
ret = 0;
out:
up_read(&vcpu->kvm->slots_lock);
return ret;
}
@ -1723,6 +1843,8 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu, int irq)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler);
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
vmx->rmode.irq.vector = irq;
@ -1844,7 +1966,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if ((vect_info & VECTORING_INFO_VALID_MASK) &&
!is_page_fault(intr_info))
printk(KERN_ERR "%s: unexpected, vectoring info 0x%x "
"intr info 0x%x\n", __FUNCTION__, vect_info, intr_info);
"intr info 0x%x\n", __func__, vect_info, intr_info);
if (!irqchip_in_kernel(vcpu->kvm) && is_external_interrupt(vect_info)) {
int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
@ -1869,10 +1991,12 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
error_code = 0;
rip = vmcs_readl(GUEST_RIP);
if (intr_info & INTR_INFO_DELIEVER_CODE_MASK)
if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
if (is_page_fault(intr_info)) {
cr2 = vmcs_readl(EXIT_QUALIFICATION);
KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
(u32)((u64)cr2 >> 32), handler);
return kvm_mmu_page_fault(vcpu, cr2, error_code);
}
@ -1901,6 +2025,7 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
++vcpu->stat.irq_exits;
KVMTRACE_1D(INTR, vcpu, vmcs_read32(VM_EXIT_INTR_INFO), handler);
return 1;
}
@ -1958,25 +2083,27 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
reg = (exit_qualification >> 8) & 15;
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
KVMTRACE_3D(CR_WRITE, vcpu, (u32)cr, (u32)vcpu->arch.regs[reg],
(u32)((u64)vcpu->arch.regs[reg] >> 32), handler);
switch (cr) {
case 0:
vcpu_load_rsp_rip(vcpu);
set_cr0(vcpu, vcpu->arch.regs[reg]);
kvm_set_cr0(vcpu, vcpu->arch.regs[reg]);
skip_emulated_instruction(vcpu);
return 1;
case 3:
vcpu_load_rsp_rip(vcpu);
set_cr3(vcpu, vcpu->arch.regs[reg]);
kvm_set_cr3(vcpu, vcpu->arch.regs[reg]);
skip_emulated_instruction(vcpu);
return 1;
case 4:
vcpu_load_rsp_rip(vcpu);
set_cr4(vcpu, vcpu->arch.regs[reg]);
kvm_set_cr4(vcpu, vcpu->arch.regs[reg]);
skip_emulated_instruction(vcpu);
return 1;
case 8:
vcpu_load_rsp_rip(vcpu);
set_cr8(vcpu, vcpu->arch.regs[reg]);
kvm_set_cr8(vcpu, vcpu->arch.regs[reg]);
skip_emulated_instruction(vcpu);
if (irqchip_in_kernel(vcpu->kvm))
return 1;
@ -1990,6 +2117,7 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu->arch.cr0 &= ~X86_CR0_TS;
vmcs_writel(CR0_READ_SHADOW, vcpu->arch.cr0);
vmx_fpu_activate(vcpu);
KVMTRACE_0D(CLTS, vcpu, handler);
skip_emulated_instruction(vcpu);
return 1;
case 1: /*mov from cr*/
@ -1998,18 +2126,24 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu_load_rsp_rip(vcpu);
vcpu->arch.regs[reg] = vcpu->arch.cr3;
vcpu_put_rsp_rip(vcpu);
KVMTRACE_3D(CR_READ, vcpu, (u32)cr,
(u32)vcpu->arch.regs[reg],
(u32)((u64)vcpu->arch.regs[reg] >> 32),
handler);
skip_emulated_instruction(vcpu);
return 1;
case 8:
vcpu_load_rsp_rip(vcpu);
vcpu->arch.regs[reg] = get_cr8(vcpu);
vcpu->arch.regs[reg] = kvm_get_cr8(vcpu);
vcpu_put_rsp_rip(vcpu);
KVMTRACE_2D(CR_READ, vcpu, (u32)cr,
(u32)vcpu->arch.regs[reg], handler);
skip_emulated_instruction(vcpu);
return 1;
}
break;
case 3: /* lmsw */
lmsw(vcpu, (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f);
kvm_lmsw(vcpu, (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f);
skip_emulated_instruction(vcpu);
return 1;
@ -2049,6 +2183,7 @@ static int handle_dr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
val = 0;
}
vcpu->arch.regs[reg] = val;
KVMTRACE_2D(DR_READ, vcpu, (u32)dr, (u32)val, handler);
} else {
/* mov to dr */
}
@ -2073,6 +2208,9 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
return 1;
}
KVMTRACE_3D(MSR_READ, vcpu, ecx, (u32)data, (u32)(data >> 32),
handler);
/* FIXME: handling of bits 32:63 of rax, rdx */
vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u;
vcpu->arch.regs[VCPU_REGS_RDX] = (data >> 32) & -1u;
@ -2086,6 +2224,9 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
KVMTRACE_3D(MSR_WRITE, vcpu, ecx, (u32)data, (u32)(data >> 32),
handler);
if (vmx_set_msr(vcpu, ecx, data) != 0) {
kvm_inject_gp(vcpu, 0);
return 1;
@ -2110,6 +2251,9 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
KVMTRACE_0D(PEND_INTR, vcpu, handler);
/*
* If the user space waits to inject interrupts, exit as soon as
* possible
@ -2152,6 +2296,8 @@ static int handle_apic_access(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
exit_qualification = vmcs_read64(EXIT_QUALIFICATION);
offset = exit_qualification & 0xffful;
KVMTRACE_1D(APIC_ACCESS, vcpu, (u32)offset, handler);
er = emulate_instruction(vcpu, kvm_run, 0, 0, 0);
if (er != EMULATE_DONE) {
@ -2163,6 +2309,20 @@ static int handle_apic_access(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
return 1;
}
static int handle_task_switch(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
unsigned long exit_qualification;
u16 tss_selector;
int reason;
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
reason = (u32)exit_qualification >> 30;
tss_selector = exit_qualification;
return kvm_task_switch(vcpu, tss_selector, reason);
}
/*
* The exit handlers return 1 if the exit was handled fully and guest execution
* may resume. Otherwise they set the kvm_run parameter to indicate what needs
@ -2185,6 +2345,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu,
[EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD] = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
};
static const int kvm_vmx_max_exit_handlers =
@ -2200,6 +2361,9 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 vectoring_info = vmx->idt_vectoring_info;
KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)vmcs_readl(GUEST_RIP),
(u32)((u64)vmcs_readl(GUEST_RIP) >> 32), entryexit);
if (unlikely(vmx->fail)) {
kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
kvm_run->fail_entry.hardware_entry_failure_reason
@ -2210,7 +2374,7 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
exit_reason != EXIT_REASON_EXCEPTION_NMI)
printk(KERN_WARNING "%s: unexpected, valid vectoring info and "
"exit reason is 0x%x\n", __FUNCTION__, exit_reason);
"exit reason is 0x%x\n", __func__, exit_reason);
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run);
@ -2221,10 +2385,6 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
return 0;
}
static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
{
}
static void update_tpr_threshold(struct kvm_vcpu *vcpu)
{
int max_irr, tpr;
@ -2285,11 +2445,13 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
return;
}
KVMTRACE_1D(REDELIVER_EVT, vcpu, idtv_info_field, handler);
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, idtv_info_field);
vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
if (unlikely(idtv_info_field & INTR_INFO_DELIEVER_CODE_MASK))
if (unlikely(idtv_info_field & INTR_INFO_DELIVER_CODE_MASK))
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
vmcs_read32(IDT_VECTORING_ERROR_CODE));
if (unlikely(has_ext_irq))
@ -2470,8 +2632,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* We need to handle NMIs before interrupts are enabled */
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) /* nmi */
if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) { /* nmi */
KVMTRACE_0D(NMI, vcpu, handler);
asm("int $2");
}
}
static void vmx_free_vmcs(struct kvm_vcpu *vcpu)
@ -2489,6 +2653,10 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
spin_lock(&vmx_vpid_lock);
if (vmx->vpid != 0)
__clear_bit(vmx->vpid, vmx_vpid_bitmap);
spin_unlock(&vmx_vpid_lock);
vmx_free_vmcs(vcpu);
kfree(vmx->host_msrs);
kfree(vmx->guest_msrs);
@ -2505,6 +2673,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
if (!vmx)
return ERR_PTR(-ENOMEM);
allocate_vpid(vmx);
err = kvm_vcpu_init(&vmx->vcpu, kvm, id);
if (err)
goto free_vcpu;
@ -2591,14 +2761,13 @@ static struct kvm_x86_ops vmx_x86_ops = {
.get_segment_base = vmx_get_segment_base,
.get_segment = vmx_get_segment,
.set_segment = vmx_set_segment,
.get_cpl = vmx_get_cpl,
.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
.decache_cr4_guest_bits = vmx_decache_cr4_guest_bits,
.set_cr0 = vmx_set_cr0,
.set_cr3 = vmx_set_cr3,
.set_cr4 = vmx_set_cr4,
#ifdef CONFIG_X86_64
.set_efer = vmx_set_efer,
#endif
.get_idt = vmx_get_idt,
.set_idt = vmx_set_idt,
.get_gdt = vmx_get_gdt,
@ -2626,7 +2795,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
static int __init vmx_init(void)
{
void *iova;
void *va;
int r;
vmx_io_bitmap_a = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
@ -2639,28 +2808,48 @@ static int __init vmx_init(void)
goto out;
}
vmx_msr_bitmap = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
if (!vmx_msr_bitmap) {
r = -ENOMEM;
goto out1;
}
/*
* Allow direct access to the PC debug port (it is often used for I/O
* delays, but the vmexits simply slow things down).
*/
iova = kmap(vmx_io_bitmap_a);
memset(iova, 0xff, PAGE_SIZE);
clear_bit(0x80, iova);
va = kmap(vmx_io_bitmap_a);
memset(va, 0xff, PAGE_SIZE);
clear_bit(0x80, va);
kunmap(vmx_io_bitmap_a);
iova = kmap(vmx_io_bitmap_b);
memset(iova, 0xff, PAGE_SIZE);
va = kmap(vmx_io_bitmap_b);
memset(va, 0xff, PAGE_SIZE);
kunmap(vmx_io_bitmap_b);
va = kmap(vmx_msr_bitmap);
memset(va, 0xff, PAGE_SIZE);
kunmap(vmx_msr_bitmap);
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE);
if (r)
goto out1;
goto out2;
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_FS_BASE);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_GS_BASE);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_CS);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP);
if (bypass_guest_pf)
kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
return 0;
out2:
__free_page(vmx_msr_bitmap);
out1:
__free_page(vmx_io_bitmap_b);
out:
@ -2670,6 +2859,7 @@ static int __init vmx_init(void)
static void __exit vmx_exit(void)
{
__free_page(vmx_msr_bitmap);
__free_page(vmx_io_bitmap_b);
__free_page(vmx_io_bitmap_a);

View File

@ -49,6 +49,7 @@
* Definitions of Secondary Processor-Based VM-Execution Controls.
*/
#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
#define SECONDARY_EXEC_ENABLE_VPID 0x00000020
#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040
@ -65,6 +66,7 @@
/* VMCS Encodings */
enum vmcs_field {
VIRTUAL_PROCESSOR_ID = 0x00000000,
GUEST_ES_SELECTOR = 0x00000800,
GUEST_CS_SELECTOR = 0x00000802,
GUEST_SS_SELECTOR = 0x00000804,
@ -231,12 +233,12 @@ enum vmcs_field {
*/
#define INTR_INFO_VECTOR_MASK 0xff /* 7:0 */
#define INTR_INFO_INTR_TYPE_MASK 0x700 /* 10:8 */
#define INTR_INFO_DELIEVER_CODE_MASK 0x800 /* 11 */
#define INTR_INFO_DELIVER_CODE_MASK 0x800 /* 11 */
#define INTR_INFO_VALID_MASK 0x80000000 /* 31 */
#define VECTORING_INFO_VECTOR_MASK INTR_INFO_VECTOR_MASK
#define VECTORING_INFO_TYPE_MASK INTR_INFO_INTR_TYPE_MASK
#define VECTORING_INFO_DELIEVER_CODE_MASK INTR_INFO_DELIEVER_CODE_MASK
#define VECTORING_INFO_DELIVER_CODE_MASK INTR_INFO_DELIVER_CODE_MASK
#define VECTORING_INFO_VALID_MASK INTR_INFO_VALID_MASK
#define INTR_TYPE_EXT_INTR (0 << 8) /* external interrupt */
@ -321,4 +323,8 @@ enum vmcs_field {
#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT 9
#define VMX_NR_VPIDS (1 << 16)
#define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
#define VMX_VPID_EXTENT_ALL_CONTEXT 2
#endif

File diff suppressed because it is too large Load Diff

View File

@ -65,6 +65,14 @@
#define MemAbs (1<<9) /* Memory operand is absolute displacement */
#define String (1<<10) /* String instruction (rep capable) */
#define Stack (1<<11) /* Stack instruction (push/pop) */
#define Group (1<<14) /* Bits 3:5 of modrm byte extend opcode */
#define GroupDual (1<<15) /* Alternate decoding of mod == 3 */
#define GroupMask 0xff /* Group number stored in bits 0:7 */
enum {
Group1_80, Group1_81, Group1_82, Group1_83,
Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
};
static u16 opcode_table[256] = {
/* 0x00 - 0x07 */
@ -123,14 +131,14 @@ static u16 opcode_table[256] = {
ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
/* 0x80 - 0x87 */
ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImmByte | ModRM,
Group | Group1_80, Group | Group1_81,
Group | Group1_82, Group | Group1_83,
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
/* 0x88 - 0x8F */
ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov,
ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
0, ModRM | DstReg, 0, DstMem | SrcNone | ModRM | Mov | Stack,
0, ModRM | DstReg, 0, Group | Group1A,
/* 0x90 - 0x9F */
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
@ -164,16 +172,15 @@ static u16 opcode_table[256] = {
0, 0, 0, 0,
/* 0xF0 - 0xF7 */
0, 0, 0, 0,
ImplicitOps, ImplicitOps,
ByteOp | DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM,
ImplicitOps, ImplicitOps, Group | Group3_Byte, Group | Group3,
/* 0xF8 - 0xFF */
ImplicitOps, 0, ImplicitOps, ImplicitOps,
0, 0, ByteOp | DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM
0, 0, Group | Group4, Group | Group5,
};
static u16 twobyte_table[256] = {
/* 0x00 - 0x0F */
0, SrcMem | ModRM | DstReg, 0, 0, 0, 0, ImplicitOps, 0,
0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0,
ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
/* 0x10 - 0x1F */
0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, 0,
@ -229,6 +236,56 @@ static u16 twobyte_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
static u16 group_table[] = {
[Group1_80*8] =
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
[Group1_81*8] =
DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
[Group1_82*8] =
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
[Group1_83*8] =
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
[Group1A*8] =
DstMem | SrcNone | ModRM | Mov | Stack, 0, 0, 0, 0, 0, 0, 0,
[Group3_Byte*8] =
ByteOp | SrcImm | DstMem | ModRM, 0,
ByteOp | DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
0, 0, 0, 0,
[Group3*8] =
DstMem | SrcImm | ModRM | SrcImm, 0,
DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
0, 0, 0, 0,
[Group4*8] =
ByteOp | DstMem | SrcNone | ModRM, ByteOp | DstMem | SrcNone | ModRM,
0, 0, 0, 0, 0, 0,
[Group5*8] =
DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM, 0, 0,
SrcMem | ModRM, 0, SrcMem | ModRM | Stack, 0,
[Group7*8] =
0, 0, ModRM | SrcMem, ModRM | SrcMem,
SrcNone | ModRM | DstMem | Mov, 0,
SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
};
static u16 group2_table[] = {
[Group7*8] =
SrcNone | ModRM, 0, 0, 0,
SrcNone | ModRM | DstMem | Mov, 0,
SrcMem16 | ModRM | Mov, 0,
};
/* EFLAGS bit definitions. */
#define EFLG_OF (1<<11)
#define EFLG_DF (1<<10)
@ -317,7 +374,7 @@ static u16 twobyte_table[256] = {
#define __emulate_2op(_op,_src,_dst,_eflags,_bx,_by,_wx,_wy,_lx,_ly,_qx,_qy) \
do { \
unsigned long _tmp; \
unsigned long __tmp; \
switch ((_dst).bytes) { \
case 1: \
__asm__ __volatile__ ( \
@ -325,7 +382,7 @@ static u16 twobyte_table[256] = {
_op"b %"_bx"3,%1; " \
_POST_EFLAGS("0", "4", "2") \
: "=m" (_eflags), "=m" ((_dst).val), \
"=&r" (_tmp) \
"=&r" (__tmp) \
: _by ((_src).val), "i" (EFLAGS_MASK)); \
break; \
default: \
@ -426,29 +483,40 @@ static u16 twobyte_table[256] = {
(_type)_x; \
})
/* Access/update address held in a register, based on addressing mode. */
#define address_mask(reg) \
((c->ad_bytes == sizeof(unsigned long)) ? \
(reg) : ((reg) & ((1UL << (c->ad_bytes << 3)) - 1)))
#define register_address(base, reg) \
((base) + address_mask(reg))
#define register_address_increment(reg, inc) \
do { \
/* signed type ensures sign extension to long */ \
int _inc = (inc); \
if (c->ad_bytes == sizeof(unsigned long)) \
(reg) += _inc; \
else \
(reg) = ((reg) & \
~((1UL << (c->ad_bytes << 3)) - 1)) | \
(((reg) + _inc) & \
((1UL << (c->ad_bytes << 3)) - 1)); \
} while (0)
static inline unsigned long ad_mask(struct decode_cache *c)
{
return (1UL << (c->ad_bytes << 3)) - 1;
}
#define JMP_REL(rel) \
do { \
register_address_increment(c->eip, rel); \
} while (0)
/* Access/update address held in a register, based on addressing mode. */
static inline unsigned long
address_mask(struct decode_cache *c, unsigned long reg)
{
if (c->ad_bytes == sizeof(unsigned long))
return reg;
else
return reg & ad_mask(c);
}
static inline unsigned long
register_address(struct decode_cache *c, unsigned long base, unsigned long reg)
{
return base + address_mask(c, reg);
}
static inline void
register_address_increment(struct decode_cache *c, unsigned long *reg, int inc)
{
if (c->ad_bytes == sizeof(unsigned long))
*reg += inc;
else
*reg = (*reg & ~ad_mask(c)) | ((*reg + inc) & ad_mask(c));
}
static inline void jmp_rel(struct decode_cache *c, int rel)
{
register_address_increment(c, &c->eip, rel);
}
static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops,
@ -763,7 +831,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
struct decode_cache *c = &ctxt->decode;
int rc = 0;
int mode = ctxt->mode;
int def_op_bytes, def_ad_bytes;
int def_op_bytes, def_ad_bytes, group;
/* Shadow copy of register state. Committed on successful emulation. */
@ -864,12 +932,24 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
c->b = insn_fetch(u8, 1, c->eip);
c->d = twobyte_table[c->b];
}
}
/* Unrecognised? */
if (c->d == 0) {
DPRINTF("Cannot emulate %02x\n", c->b);
return -1;
}
if (c->d & Group) {
group = c->d & GroupMask;
c->modrm = insn_fetch(u8, 1, c->eip);
--c->eip;
group = (group << 3) + ((c->modrm >> 3) & 7);
if ((c->d & GroupDual) && (c->modrm >> 6) == 3)
c->d = group2_table[group];
else
c->d = group_table[group];
}
/* Unrecognised? */
if (c->d == 0) {
DPRINTF("Cannot emulate %02x\n", c->b);
return -1;
}
if (mode == X86EMUL_MODE_PROT64 && (c->d & Stack))
@ -924,6 +1004,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
*/
if ((c->d & ModRM) && c->modrm_mod == 3) {
c->src.type = OP_REG;
c->src.val = c->modrm_val;
break;
}
c->src.type = OP_MEM;
@ -967,6 +1048,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
case DstMem:
if ((c->d & ModRM) && c->modrm_mod == 3) {
c->dst.type = OP_REG;
c->dst.val = c->dst.orig_val = c->modrm_val;
break;
}
c->dst.type = OP_MEM;
@ -984,8 +1066,8 @@ static inline void emulate_push(struct x86_emulate_ctxt *ctxt)
c->dst.type = OP_MEM;
c->dst.bytes = c->op_bytes;
c->dst.val = c->src.val;
register_address_increment(c->regs[VCPU_REGS_RSP], -c->op_bytes);
c->dst.ptr = (void *) register_address(ctxt->ss_base,
register_address_increment(c, &c->regs[VCPU_REGS_RSP], -c->op_bytes);
c->dst.ptr = (void *) register_address(c, ctxt->ss_base,
c->regs[VCPU_REGS_RSP]);
}
@ -995,13 +1077,13 @@ static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = &ctxt->decode;
int rc;
rc = ops->read_std(register_address(ctxt->ss_base,
rc = ops->read_std(register_address(c, ctxt->ss_base,
c->regs[VCPU_REGS_RSP]),
&c->dst.val, c->dst.bytes, ctxt->vcpu);
if (rc != 0)
return rc;
register_address_increment(c->regs[VCPU_REGS_RSP], c->dst.bytes);
register_address_increment(c, &c->regs[VCPU_REGS_RSP], c->dst.bytes);
return 0;
}
@ -1043,26 +1125,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt,
switch (c->modrm_reg) {
case 0 ... 1: /* test */
/*
* Special case in Grp3: test has an immediate
* source operand.
*/
c->src.type = OP_IMM;
c->src.ptr = (unsigned long *)c->eip;
c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
if (c->src.bytes == 8)
c->src.bytes = 4;
switch (c->src.bytes) {
case 1:
c->src.val = insn_fetch(s8, 1, c->eip);
break;
case 2:
c->src.val = insn_fetch(s16, 2, c->eip);
break;
case 4:
c->src.val = insn_fetch(s32, 4, c->eip);
break;
}
emulate_2op_SrcV("test", c->src, c->dst, ctxt->eflags);
break;
case 2: /* not */
@ -1076,7 +1138,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt,
rc = X86EMUL_UNHANDLEABLE;
break;
}
done:
return rc;
}
@ -1084,7 +1145,6 @@ static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
{
struct decode_cache *c = &ctxt->decode;
int rc;
switch (c->modrm_reg) {
case 0: /* inc */
@ -1094,36 +1154,11 @@ static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt,
emulate_1op("dec", c->dst, ctxt->eflags);
break;
case 4: /* jmp abs */
if (c->b == 0xff)
c->eip = c->dst.val;
else {
DPRINTF("Cannot emulate %02x\n", c->b);
return X86EMUL_UNHANDLEABLE;
}
c->eip = c->src.val;
break;
case 6: /* push */
/* 64-bit mode: PUSH always pushes a 64-bit operand. */
if (ctxt->mode == X86EMUL_MODE_PROT64) {
c->dst.bytes = 8;
rc = ops->read_std((unsigned long)c->dst.ptr,
&c->dst.val, 8, ctxt->vcpu);
if (rc != 0)
return rc;
}
register_address_increment(c->regs[VCPU_REGS_RSP],
-c->dst.bytes);
rc = ops->write_emulated(register_address(ctxt->ss_base,
c->regs[VCPU_REGS_RSP]), &c->dst.val,
c->dst.bytes, ctxt->vcpu);
if (rc != 0)
return rc;
c->dst.type = OP_NONE;
emulate_push(ctxt);
break;
default:
DPRINTF("Cannot emulate %02x\n", c->b);
return X86EMUL_UNHANDLEABLE;
}
return 0;
}
@ -1361,19 +1396,19 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
c->dst.type = OP_MEM;
c->dst.bytes = c->op_bytes;
c->dst.val = c->src.val;
register_address_increment(c->regs[VCPU_REGS_RSP],
register_address_increment(c, &c->regs[VCPU_REGS_RSP],
-c->op_bytes);
c->dst.ptr = (void *) register_address(
ctxt->ss_base, c->regs[VCPU_REGS_RSP]);
c, ctxt->ss_base, c->regs[VCPU_REGS_RSP]);
break;
case 0x58 ... 0x5f: /* pop reg */
pop_instruction:
if ((rc = ops->read_std(register_address(ctxt->ss_base,
if ((rc = ops->read_std(register_address(c, ctxt->ss_base,
c->regs[VCPU_REGS_RSP]), c->dst.ptr,
c->op_bytes, ctxt->vcpu)) != 0)
goto done;
register_address_increment(c->regs[VCPU_REGS_RSP],
register_address_increment(c, &c->regs[VCPU_REGS_RSP],
c->op_bytes);
c->dst.type = OP_NONE; /* Disable writeback. */
break;
@ -1393,9 +1428,9 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
1,
(c->d & ByteOp) ? 1 : c->op_bytes,
c->rep_prefix ?
address_mask(c->regs[VCPU_REGS_RCX]) : 1,
address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
(ctxt->eflags & EFLG_DF),
register_address(ctxt->es_base,
register_address(c, ctxt->es_base,
c->regs[VCPU_REGS_RDI]),
c->rep_prefix,
c->regs[VCPU_REGS_RDX]) == 0) {
@ -1409,9 +1444,9 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
0,
(c->d & ByteOp) ? 1 : c->op_bytes,
c->rep_prefix ?
address_mask(c->regs[VCPU_REGS_RCX]) : 1,
address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
(ctxt->eflags & EFLG_DF),
register_address(c->override_base ?
register_address(c, c->override_base ?
*c->override_base :
ctxt->ds_base,
c->regs[VCPU_REGS_RSI]),
@ -1425,7 +1460,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
int rel = insn_fetch(s8, 1, c->eip);
if (test_cc(c->b, ctxt->eflags))
JMP_REL(rel);
jmp_rel(c, rel);
break;
}
case 0x80 ... 0x83: /* Grp1 */
@ -1477,7 +1512,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
case 0x88 ... 0x8b: /* mov */
goto mov;
case 0x8d: /* lea r16/r32, m */
c->dst.val = c->modrm_val;
c->dst.val = c->modrm_ea;
break;
case 0x8f: /* pop (sole member of Grp1a) */
rc = emulate_grp1a(ctxt, ops);
@ -1501,27 +1536,27 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
case 0xa4 ... 0xa5: /* movs */
c->dst.type = OP_MEM;
c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = (unsigned long *)register_address(
c->dst.ptr = (unsigned long *)register_address(c,
ctxt->es_base,
c->regs[VCPU_REGS_RDI]);
if ((rc = ops->read_emulated(register_address(
if ((rc = ops->read_emulated(register_address(c,
c->override_base ? *c->override_base :
ctxt->ds_base,
c->regs[VCPU_REGS_RSI]),
&c->dst.val,
c->dst.bytes, ctxt->vcpu)) != 0)
goto done;
register_address_increment(c->regs[VCPU_REGS_RSI],
register_address_increment(c, &c->regs[VCPU_REGS_RSI],
(ctxt->eflags & EFLG_DF) ? -c->dst.bytes
: c->dst.bytes);
register_address_increment(c->regs[VCPU_REGS_RDI],
register_address_increment(c, &c->regs[VCPU_REGS_RDI],
(ctxt->eflags & EFLG_DF) ? -c->dst.bytes
: c->dst.bytes);
break;
case 0xa6 ... 0xa7: /* cmps */
c->src.type = OP_NONE; /* Disable writeback. */
c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->src.ptr = (unsigned long *)register_address(
c->src.ptr = (unsigned long *)register_address(c,
c->override_base ? *c->override_base :
ctxt->ds_base,
c->regs[VCPU_REGS_RSI]);
@ -1533,7 +1568,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
c->dst.type = OP_NONE; /* Disable writeback. */
c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = (unsigned long *)register_address(
c->dst.ptr = (unsigned long *)register_address(c,
ctxt->es_base,
c->regs[VCPU_REGS_RDI]);
if ((rc = ops->read_emulated((unsigned long)c->dst.ptr,
@ -1546,10 +1581,10 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
emulate_2op_SrcV("cmp", c->src, c->dst, ctxt->eflags);
register_address_increment(c->regs[VCPU_REGS_RSI],
register_address_increment(c, &c->regs[VCPU_REGS_RSI],
(ctxt->eflags & EFLG_DF) ? -c->src.bytes
: c->src.bytes);
register_address_increment(c->regs[VCPU_REGS_RDI],
register_address_increment(c, &c->regs[VCPU_REGS_RDI],
(ctxt->eflags & EFLG_DF) ? -c->dst.bytes
: c->dst.bytes);
@ -1557,11 +1592,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
case 0xaa ... 0xab: /* stos */
c->dst.type = OP_MEM;
c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = (unsigned long *)register_address(
c->dst.ptr = (unsigned long *)register_address(c,
ctxt->es_base,
c->regs[VCPU_REGS_RDI]);
c->dst.val = c->regs[VCPU_REGS_RAX];
register_address_increment(c->regs[VCPU_REGS_RDI],
register_address_increment(c, &c->regs[VCPU_REGS_RDI],
(ctxt->eflags & EFLG_DF) ? -c->dst.bytes
: c->dst.bytes);
break;
@ -1569,7 +1604,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
c->dst.type = OP_REG;
c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX];
if ((rc = ops->read_emulated(register_address(
if ((rc = ops->read_emulated(register_address(c,
c->override_base ? *c->override_base :
ctxt->ds_base,
c->regs[VCPU_REGS_RSI]),
@ -1577,7 +1612,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
c->dst.bytes,
ctxt->vcpu)) != 0)
goto done;
register_address_increment(c->regs[VCPU_REGS_RSI],
register_address_increment(c, &c->regs[VCPU_REGS_RSI],
(ctxt->eflags & EFLG_DF) ? -c->dst.bytes
: c->dst.bytes);
break;
@ -1616,14 +1651,14 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
goto cannot_emulate;
}
c->src.val = (unsigned long) c->eip;
JMP_REL(rel);
jmp_rel(c, rel);
c->op_bytes = c->ad_bytes;
emulate_push(ctxt);
break;
}
case 0xe9: /* jmp rel */
case 0xeb: /* jmp rel short */
JMP_REL(c->src.val);
jmp_rel(c, c->src.val);
c->dst.type = OP_NONE; /* Disable writeback. */
break;
case 0xf4: /* hlt */
@ -1690,6 +1725,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
goto done;
kvm_emulate_hypercall(ctxt->vcpu);
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
case 2: /* lgdt */
rc = read_descriptor(ctxt, ops, c->src.ptr,
@ -1697,6 +1734,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
if (rc)
goto done;
realmode_lgdt(ctxt->vcpu, size, address);
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
case 3: /* lidt/vmmcall */
if (c->modrm_mod == 3 && c->modrm_rm == 1) {
@ -1712,27 +1751,25 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
goto done;
realmode_lidt(ctxt->vcpu, size, address);
}
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
case 4: /* smsw */
if (c->modrm_mod != 3)
goto cannot_emulate;
*(u16 *)&c->regs[c->modrm_rm]
= realmode_get_cr(ctxt->vcpu, 0);
c->dst.bytes = 2;
c->dst.val = realmode_get_cr(ctxt->vcpu, 0);
break;
case 6: /* lmsw */
if (c->modrm_mod != 3)
goto cannot_emulate;
realmode_lmsw(ctxt->vcpu, (u16)c->modrm_val,
&ctxt->eflags);
realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
&ctxt->eflags);
break;
case 7: /* invlpg*/
emulate_invlpg(ctxt->vcpu, memop);
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
default:
goto cannot_emulate;
}
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
case 0x06:
emulate_clts(ctxt->vcpu);
@ -1823,7 +1860,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
goto cannot_emulate;
}
if (test_cc(c->b, ctxt->eflags))
JMP_REL(rel);
jmp_rel(c, rel);
c->dst.type = OP_NONE;
break;
}

View File

@ -5,7 +5,7 @@
CFLAGS_sysinfo.o += -Iinclude/math-emu -Iarch/s390/math-emu -w
obj-y += s390mach.o sysinfo.o s390_rdev.o
obj-y += cio/ block/ char/ crypto/ net/ scsi/
obj-y += cio/ block/ char/ crypto/ net/ scsi/ kvm/
drivers-y += drivers/s390/built-in.o

View File

@ -0,0 +1,9 @@
# Makefile for kvm guest drivers on s390
#
# Copyright IBM Corp. 2008
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License (version 2 only)
# as published by the Free Software Foundation.
obj-$(CONFIG_VIRTIO) += kvm_virtio.o

View File

@ -0,0 +1,338 @@
/*
* kvm_virtio.c - virtio for kvm on s390
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <linux/init.h>
#include <linux/bootmem.h>
#include <linux/err.h>
#include <linux/virtio.h>
#include <linux/virtio_config.h>
#include <linux/interrupt.h>
#include <linux/virtio_ring.h>
#include <asm/io.h>
#include <asm/kvm_para.h>
#include <asm/kvm_virtio.h>
#include <asm/setup.h>
#include <asm/s390_ext.h>
#define VIRTIO_SUBCODE_64 0x0D00
/*
* The pointer to our (page) of device descriptions.
*/
static void *kvm_devices;
/*
* Unique numbering for kvm devices.
*/
static unsigned int dev_index;
struct kvm_device {
struct virtio_device vdev;
struct kvm_device_desc *desc;
};
#define to_kvmdev(vd) container_of(vd, struct kvm_device, vdev)
/*
* memory layout:
* - kvm_device_descriptor
* struct kvm_device_desc
* - configuration
* struct kvm_vqconfig
* - feature bits
* - config space
*/
static struct kvm_vqconfig *kvm_vq_config(const struct kvm_device_desc *desc)
{
return (struct kvm_vqconfig *)(desc + 1);
}
static u8 *kvm_vq_features(const struct kvm_device_desc *desc)
{
return (u8 *)(kvm_vq_config(desc) + desc->num_vq);
}
static u8 *kvm_vq_configspace(const struct kvm_device_desc *desc)
{
return kvm_vq_features(desc) + desc->feature_len * 2;
}
/*
* The total size of the config page used by this device (incl. desc)
*/
static unsigned desc_size(const struct kvm_device_desc *desc)
{
return sizeof(*desc)
+ desc->num_vq * sizeof(struct kvm_vqconfig)
+ desc->feature_len * 2
+ desc->config_len;
}
/*
* This tests (and acknowleges) a feature bit.
*/
static bool kvm_feature(struct virtio_device *vdev, unsigned fbit)
{
struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
u8 *features;
if (fbit / 8 > desc->feature_len)
return false;
features = kvm_vq_features(desc);
if (!(features[fbit / 8] & (1 << (fbit % 8))))
return false;
/*
* We set the matching bit in the other half of the bitmap to tell the
* Host we want to use this feature.
*/
features[desc->feature_len + fbit / 8] |= (1 << (fbit % 8));
return true;
}
/*
* Reading and writing elements in config space
*/
static void kvm_get(struct virtio_device *vdev, unsigned int offset,
void *buf, unsigned len)
{
struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
BUG_ON(offset + len > desc->config_len);
memcpy(buf, kvm_vq_configspace(desc) + offset, len);
}
static void kvm_set(struct virtio_device *vdev, unsigned int offset,
const void *buf, unsigned len)
{
struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
BUG_ON(offset + len > desc->config_len);
memcpy(kvm_vq_configspace(desc) + offset, buf, len);
}
/*
* The operations to get and set the status word just access
* the status field of the device descriptor. set_status will also
* make a hypercall to the host, to tell about status changes
*/
static u8 kvm_get_status(struct virtio_device *vdev)
{
return to_kvmdev(vdev)->desc->status;
}
static void kvm_set_status(struct virtio_device *vdev, u8 status)
{
BUG_ON(!status);
to_kvmdev(vdev)->desc->status = status;
kvm_hypercall1(KVM_S390_VIRTIO_SET_STATUS,
(unsigned long) to_kvmdev(vdev)->desc);
}
/*
* To reset the device, we use the KVM_VIRTIO_RESET hypercall, using the
* descriptor address. The Host will zero the status and all the
* features.
*/
static void kvm_reset(struct virtio_device *vdev)
{
kvm_hypercall1(KVM_S390_VIRTIO_RESET,
(unsigned long) to_kvmdev(vdev)->desc);
}
/*
* When the virtio_ring code wants to notify the Host, it calls us here and we
* make a hypercall. We hand the address of the virtqueue so the Host
* knows which virtqueue we're talking about.
*/
static void kvm_notify(struct virtqueue *vq)
{
struct kvm_vqconfig *config = vq->priv;
kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, config->address);
}
/*
* This routine finds the first virtqueue described in the configuration of
* this device and sets it up.
*/
static struct virtqueue *kvm_find_vq(struct virtio_device *vdev,
unsigned index,
void (*callback)(struct virtqueue *vq))
{
struct kvm_device *kdev = to_kvmdev(vdev);
struct kvm_vqconfig *config;
struct virtqueue *vq;
int err;
if (index >= kdev->desc->num_vq)
return ERR_PTR(-ENOENT);
config = kvm_vq_config(kdev->desc)+index;
if (add_shared_memory(config->address,
vring_size(config->num, PAGE_SIZE))) {
err = -ENOMEM;
goto out;
}
vq = vring_new_virtqueue(config->num, vdev, (void *) config->address,
kvm_notify, callback);
if (!vq) {
err = -ENOMEM;
goto unmap;
}
/*
* register a callback token
* The host will sent this via the external interrupt parameter
*/
config->token = (u64) vq;
vq->priv = config;
return vq;
unmap:
remove_shared_memory(config->address, vring_size(config->num,
PAGE_SIZE));
out:
return ERR_PTR(err);
}
static void kvm_del_vq(struct virtqueue *vq)
{
struct kvm_vqconfig *config = vq->priv;
vring_del_virtqueue(vq);
remove_shared_memory(config->address,
vring_size(config->num, PAGE_SIZE));
}
/*
* The config ops structure as defined by virtio config
*/
static struct virtio_config_ops kvm_vq_configspace_ops = {
.feature = kvm_feature,
.get = kvm_get,
.set = kvm_set,
.get_status = kvm_get_status,
.set_status = kvm_set_status,
.reset = kvm_reset,
.find_vq = kvm_find_vq,
.del_vq = kvm_del_vq,
};
/*
* The root device for the kvm virtio devices.
* This makes them appear as /sys/devices/kvm_s390/0,1,2 not /sys/devices/0,1,2.
*/
static struct device kvm_root = {
.parent = NULL,
.bus_id = "kvm_s390",
};
/*
* adds a new device and register it with virtio
* appropriate drivers are loaded by the device model
*/
static void add_kvm_device(struct kvm_device_desc *d)
{
struct kvm_device *kdev;
kdev = kzalloc(sizeof(*kdev), GFP_KERNEL);
if (!kdev) {
printk(KERN_EMERG "Cannot allocate kvm dev %u\n",
dev_index++);
return;
}
kdev->vdev.dev.parent = &kvm_root;
kdev->vdev.index = dev_index++;
kdev->vdev.id.device = d->type;
kdev->vdev.config = &kvm_vq_configspace_ops;
kdev->desc = d;
if (register_virtio_device(&kdev->vdev) != 0) {
printk(KERN_ERR "Failed to register kvm device %u\n",
kdev->vdev.index);
kfree(kdev);
}
}
/*
* scan_devices() simply iterates through the device page.
* The type 0 is reserved to mean "end of devices".
*/
static void scan_devices(void)
{
unsigned int i;
struct kvm_device_desc *d;
for (i = 0; i < PAGE_SIZE; i += desc_size(d)) {
d = kvm_devices + i;
if (d->type == 0)
break;
add_kvm_device(d);
}
}
/*
* we emulate the request_irq behaviour on top of s390 extints
*/
static void kvm_extint_handler(u16 code)
{
void *data = (void *) *(long *) __LC_PFAULT_INTPARM;
u16 subcode = S390_lowcore.cpu_addr;
if ((subcode & 0xff00) != VIRTIO_SUBCODE_64)
return;
vring_interrupt(0, data);
}
/*
* Init function for virtio
* devices are in a single page above top of "normal" mem
*/
static int __init kvm_devices_init(void)
{
int rc;
if (!MACHINE_IS_KVM)
return -ENODEV;
rc = device_register(&kvm_root);
if (rc) {
printk(KERN_ERR "Could not register kvm_s390 root device");
return rc;
}
if (add_shared_memory((max_pfn) << PAGE_SHIFT, PAGE_SIZE)) {
device_unregister(&kvm_root);
return -ENOMEM;
}
kvm_devices = (void *) (max_pfn << PAGE_SHIFT);
ctl_set_bit(0, 9);
register_external_interrupt(0x2603, kvm_extint_handler);
scan_devices();
return 0;
}
/*
* We do this after core stuff, but before the drivers.
*/
postcore_initcall(kvm_devices_init);

View File

@ -21,6 +21,10 @@
#define ia64_invala_fr(regnum) asm volatile ("invala.e f%0" :: "i"(regnum))
#define ia64_flushrs() asm volatile ("flushrs;;":::"memory")
#define ia64_loadrs() asm volatile ("loadrs;;":::"memory")
extern void ia64_bad_param_for_setreg (void);
extern void ia64_bad_param_for_getreg (void);
@ -517,6 +521,14 @@ do { \
#define ia64_ptrd(addr, size) \
asm volatile ("ptr.d %0,%1" :: "r"(addr), "r"(size) : "memory")
#define ia64_ttag(addr) \
({ \
__u64 ia64_intri_res; \
asm volatile ("ttag %0=%1" : "=r"(ia64_intri_res) : "r" (addr)); \
ia64_intri_res; \
})
/* Values for lfhint in ia64_lfetch and ia64_lfetch_fault */
#define ia64_lfhint_none 0

View File

@ -1,6 +1,205 @@
#ifndef __LINUX_KVM_IA64_H
#define __LINUX_KVM_IA64_H
#ifndef __ASM_IA64_KVM_H
#define __ASM_IA64_KVM_H
/* ia64 does not support KVM */
/*
* asm-ia64/kvm.h: kvm structure definitions for ia64
*
* Copyright (C) 2007 Xiantao Zhang <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#include <asm/types.h>
#include <asm/fpu.h>
#include <linux/ioctl.h>
/* Architectural interrupt line count. */
#define KVM_NR_INTERRUPTS 256
#define KVM_IOAPIC_NUM_PINS 24
struct kvm_ioapic_state {
__u64 base_address;
__u32 ioregsel;
__u32 id;
__u32 irr;
__u32 pad;
union {
__u64 bits;
struct {
__u8 vector;
__u8 delivery_mode:3;
__u8 dest_mode:1;
__u8 delivery_status:1;
__u8 polarity:1;
__u8 remote_irr:1;
__u8 trig_mode:1;
__u8 mask:1;
__u8 reserve:7;
__u8 reserved[4];
__u8 dest_id;
} fields;
} redirtbl[KVM_IOAPIC_NUM_PINS];
};
#define KVM_IRQCHIP_PIC_MASTER 0
#define KVM_IRQCHIP_PIC_SLAVE 1
#define KVM_IRQCHIP_IOAPIC 2
#define KVM_CONTEXT_SIZE 8*1024
union context {
/* 8K size */
char dummy[KVM_CONTEXT_SIZE];
struct {
unsigned long psr;
unsigned long pr;
unsigned long caller_unat;
unsigned long pad;
unsigned long gr[32];
unsigned long ar[128];
unsigned long br[8];
unsigned long cr[128];
unsigned long rr[8];
unsigned long ibr[8];
unsigned long dbr[8];
unsigned long pkr[8];
struct ia64_fpreg fr[128];
};
};
struct thash_data {
union {
struct {
unsigned long p : 1; /* 0 */
unsigned long rv1 : 1; /* 1 */
unsigned long ma : 3; /* 2-4 */
unsigned long a : 1; /* 5 */
unsigned long d : 1; /* 6 */
unsigned long pl : 2; /* 7-8 */
unsigned long ar : 3; /* 9-11 */
unsigned long ppn : 38; /* 12-49 */
unsigned long rv2 : 2; /* 50-51 */
unsigned long ed : 1; /* 52 */
unsigned long ig1 : 11; /* 53-63 */
};
struct {
unsigned long __rv1 : 53; /* 0-52 */
unsigned long contiguous : 1; /*53 */
unsigned long tc : 1; /* 54 TR or TC */
unsigned long cl : 1;
/* 55 I side or D side cache line */
unsigned long len : 4; /* 56-59 */
unsigned long io : 1; /* 60 entry is for io or not */
unsigned long nomap : 1;
/* 61 entry cann't be inserted into machine TLB.*/
unsigned long checked : 1;
/* 62 for VTLB/VHPT sanity check */
unsigned long invalid : 1;
/* 63 invalid entry */
};
unsigned long page_flags;
}; /* same for VHPT and TLB */
union {
struct {
unsigned long rv3 : 2;
unsigned long ps : 6;
unsigned long key : 24;
unsigned long rv4 : 32;
};
unsigned long itir;
};
union {
struct {
unsigned long ig2 : 12;
unsigned long vpn : 49;
unsigned long vrn : 3;
};
unsigned long ifa;
unsigned long vadr;
struct {
unsigned long tag : 63;
unsigned long ti : 1;
};
unsigned long etag;
};
union {
struct thash_data *next;
unsigned long rid;
unsigned long gpaddr;
};
};
#define NITRS 8
#define NDTRS 8
struct saved_vpd {
unsigned long vhpi;
unsigned long vgr[16];
unsigned long vbgr[16];
unsigned long vnat;
unsigned long vbnat;
unsigned long vcpuid[5];
unsigned long vpsr;
unsigned long vpr;
unsigned long vcr[128];
};
struct kvm_regs {
char *saved_guest;
char *saved_stack;
struct saved_vpd vpd;
/*Arch-regs*/
int mp_state;
unsigned long vmm_rr;
/* TR and TC. */
struct thash_data itrs[NITRS];
struct thash_data dtrs[NDTRS];
/* Bit is set if there is a tr/tc for the region. */
unsigned char itr_regions;
unsigned char dtr_regions;
unsigned char tc_regions;
char irq_check;
unsigned long saved_itc;
unsigned long itc_check;
unsigned long timer_check;
unsigned long timer_pending;
unsigned long last_itc;
unsigned long vrr[8];
unsigned long ibr[8];
unsigned long dbr[8];
unsigned long insvc[4]; /* Interrupt in service. */
unsigned long xtp;
unsigned long metaphysical_rr0; /* from kvm_arch (so is pinned) */
unsigned long metaphysical_rr4; /* from kvm_arch (so is pinned) */
unsigned long metaphysical_saved_rr0; /* from kvm_arch */
unsigned long metaphysical_saved_rr4; /* from kvm_arch */
unsigned long fp_psr; /*used for lazy float register */
unsigned long saved_gp;
/*for phycial emulation */
};
struct kvm_sregs {
};
struct kvm_fpu {
};
#endif

524
include/asm-ia64/kvm_host.h Normal file
View File

@ -0,0 +1,524 @@
/*
* kvm_host.h: used for kvm module, and hold ia64-specific sections.
*
* Copyright (C) 2007, Intel Corporation.
*
* Xiantao Zhang <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
#ifndef __ASM_KVM_HOST_H
#define __ASM_KVM_HOST_H
#include <linux/types.h>
#include <linux/mm.h>
#include <linux/kvm.h>
#include <linux/kvm_para.h>
#include <linux/kvm_types.h>
#include <asm/pal.h>
#include <asm/sal.h>
#define KVM_MAX_VCPUS 4
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
/* define exit reasons from vmm to kvm*/
#define EXIT_REASON_VM_PANIC 0
#define EXIT_REASON_MMIO_INSTRUCTION 1
#define EXIT_REASON_PAL_CALL 2
#define EXIT_REASON_SAL_CALL 3
#define EXIT_REASON_SWITCH_RR6 4
#define EXIT_REASON_VM_DESTROY 5
#define EXIT_REASON_EXTERNAL_INTERRUPT 6
#define EXIT_REASON_IPI 7
#define EXIT_REASON_PTC_G 8
/*Define vmm address space and vm data space.*/
#define KVM_VMM_SIZE (16UL<<20)
#define KVM_VMM_SHIFT 24
#define KVM_VMM_BASE 0xD000000000000000UL
#define VMM_SIZE (8UL<<20)
/*
* Define vm_buffer, used by PAL Services, base address.
* Note: vmbuffer is in the VMM-BLOCK, the size must be < 8M
*/
#define KVM_VM_BUFFER_BASE (KVM_VMM_BASE + VMM_SIZE)
#define KVM_VM_BUFFER_SIZE (8UL<<20)
/*Define Virtual machine data layout.*/
#define KVM_VM_DATA_SHIFT 24
#define KVM_VM_DATA_SIZE (1UL << KVM_VM_DATA_SHIFT)
#define KVM_VM_DATA_BASE (KVM_VMM_BASE + KVM_VMM_SIZE)
#define KVM_P2M_BASE KVM_VM_DATA_BASE
#define KVM_P2M_OFS 0
#define KVM_P2M_SIZE (8UL << 20)
#define KVM_VHPT_BASE (KVM_P2M_BASE + KVM_P2M_SIZE)
#define KVM_VHPT_OFS KVM_P2M_SIZE
#define KVM_VHPT_BLOCK_SIZE (2UL << 20)
#define VHPT_SHIFT 18
#define VHPT_SIZE (1UL << VHPT_SHIFT)
#define VHPT_NUM_ENTRIES (1<<(VHPT_SHIFT-5))
#define KVM_VTLB_BASE (KVM_VHPT_BASE+KVM_VHPT_BLOCK_SIZE)
#define KVM_VTLB_OFS (KVM_VHPT_OFS+KVM_VHPT_BLOCK_SIZE)
#define KVM_VTLB_BLOCK_SIZE (1UL<<20)
#define VTLB_SHIFT 17
#define VTLB_SIZE (1UL<<VTLB_SHIFT)
#define VTLB_NUM_ENTRIES (1<<(VTLB_SHIFT-5))
#define KVM_VPD_BASE (KVM_VTLB_BASE+KVM_VTLB_BLOCK_SIZE)
#define KVM_VPD_OFS (KVM_VTLB_OFS+KVM_VTLB_BLOCK_SIZE)
#define KVM_VPD_BLOCK_SIZE (2UL<<20)
#define VPD_SHIFT 16
#define VPD_SIZE (1UL<<VPD_SHIFT)
#define KVM_VCPU_BASE (KVM_VPD_BASE+KVM_VPD_BLOCK_SIZE)
#define KVM_VCPU_OFS (KVM_VPD_OFS+KVM_VPD_BLOCK_SIZE)
#define KVM_VCPU_BLOCK_SIZE (2UL<<20)
#define VCPU_SHIFT 18
#define VCPU_SIZE (1UL<<VCPU_SHIFT)
#define MAX_VCPU_NUM KVM_VCPU_BLOCK_SIZE/VCPU_SIZE
#define KVM_VM_BASE (KVM_VCPU_BASE+KVM_VCPU_BLOCK_SIZE)
#define KVM_VM_OFS (KVM_VCPU_OFS+KVM_VCPU_BLOCK_SIZE)
#define KVM_VM_BLOCK_SIZE (1UL<<19)
#define KVM_MEM_DIRTY_LOG_BASE (KVM_VM_BASE+KVM_VM_BLOCK_SIZE)
#define KVM_MEM_DIRTY_LOG_OFS (KVM_VM_OFS+KVM_VM_BLOCK_SIZE)
#define KVM_MEM_DIRTY_LOG_SIZE (1UL<<19)
/* Get vpd, vhpt, tlb, vcpu, base*/
#define VPD_ADDR(n) (KVM_VPD_BASE+n*VPD_SIZE)
#define VHPT_ADDR(n) (KVM_VHPT_BASE+n*VHPT_SIZE)
#define VTLB_ADDR(n) (KVM_VTLB_BASE+n*VTLB_SIZE)
#define VCPU_ADDR(n) (KVM_VCPU_BASE+n*VCPU_SIZE)
/*IO section definitions*/
#define IOREQ_READ 1
#define IOREQ_WRITE 0
#define STATE_IOREQ_NONE 0
#define STATE_IOREQ_READY 1
#define STATE_IOREQ_INPROCESS 2
#define STATE_IORESP_READY 3
/*Guest Physical address layout.*/
#define GPFN_MEM (0UL << 60) /* Guest pfn is normal mem */
#define GPFN_FRAME_BUFFER (1UL << 60) /* VGA framebuffer */
#define GPFN_LOW_MMIO (2UL << 60) /* Low MMIO range */
#define GPFN_PIB (3UL << 60) /* PIB base */
#define GPFN_IOSAPIC (4UL << 60) /* IOSAPIC base */
#define GPFN_LEGACY_IO (5UL << 60) /* Legacy I/O base */
#define GPFN_GFW (6UL << 60) /* Guest Firmware */
#define GPFN_HIGH_MMIO (7UL << 60) /* High MMIO range */
#define GPFN_IO_MASK (7UL << 60) /* Guest pfn is I/O type */
#define GPFN_INV_MASK (1UL << 63) /* Guest pfn is invalid */
#define INVALID_MFN (~0UL)
#define MEM_G (1UL << 30)
#define MEM_M (1UL << 20)
#define MMIO_START (3 * MEM_G)
#define MMIO_SIZE (512 * MEM_M)
#define VGA_IO_START 0xA0000UL
#define VGA_IO_SIZE 0x20000
#define LEGACY_IO_START (MMIO_START + MMIO_SIZE)
#define LEGACY_IO_SIZE (64 * MEM_M)
#define IO_SAPIC_START 0xfec00000UL
#define IO_SAPIC_SIZE 0x100000
#define PIB_START 0xfee00000UL
#define PIB_SIZE 0x200000
#define GFW_START (4 * MEM_G - 16 * MEM_M)
#define GFW_SIZE (16 * MEM_M)
/*Deliver mode, defined for ioapic.c*/
#define dest_Fixed IOSAPIC_FIXED
#define dest_LowestPrio IOSAPIC_LOWEST_PRIORITY
#define NMI_VECTOR 2
#define ExtINT_VECTOR 0
#define NULL_VECTOR (-1)
#define IA64_SPURIOUS_INT_VECTOR 0x0f
#define VCPU_LID(v) (((u64)(v)->vcpu_id) << 24)
/*
*Delivery mode
*/
#define SAPIC_DELIV_SHIFT 8
#define SAPIC_FIXED 0x0
#define SAPIC_LOWEST_PRIORITY 0x1
#define SAPIC_PMI 0x2
#define SAPIC_NMI 0x4
#define SAPIC_INIT 0x5
#define SAPIC_EXTINT 0x7
/*
* vcpu->requests bit members for arch
*/
#define KVM_REQ_PTC_G 32
#define KVM_REQ_RESUME 33
#define KVM_PAGES_PER_HPAGE 1
struct kvm;
struct kvm_vcpu;
struct kvm_guest_debug{
};
struct kvm_mmio_req {
uint64_t addr; /* physical address */
uint64_t size; /* size in bytes */
uint64_t data; /* data (or paddr of data) */
uint8_t state:4;
uint8_t dir:1; /* 1=read, 0=write */
};
/*Pal data struct */
struct kvm_pal_call{
/*In area*/
uint64_t gr28;
uint64_t gr29;
uint64_t gr30;
uint64_t gr31;
/*Out area*/
struct ia64_pal_retval ret;
};
/* Sal data structure */
struct kvm_sal_call{
/*In area*/
uint64_t in0;
uint64_t in1;
uint64_t in2;
uint64_t in3;
uint64_t in4;
uint64_t in5;
uint64_t in6;
uint64_t in7;
struct sal_ret_values ret;
};
/*Guest change rr6*/
struct kvm_switch_rr6 {
uint64_t old_rr;
uint64_t new_rr;
};
union ia64_ipi_a{
unsigned long val;
struct {
unsigned long rv : 3;
unsigned long ir : 1;
unsigned long eid : 8;
unsigned long id : 8;
unsigned long ib_base : 44;
};
};
union ia64_ipi_d {
unsigned long val;
struct {
unsigned long vector : 8;
unsigned long dm : 3;
unsigned long ig : 53;
};
};
/*ipi check exit data*/
struct kvm_ipi_data{
union ia64_ipi_a addr;
union ia64_ipi_d data;
};
/*global purge data*/
struct kvm_ptc_g {
unsigned long vaddr;
unsigned long rr;
unsigned long ps;
struct kvm_vcpu *vcpu;
};
/*Exit control data */
struct exit_ctl_data{
uint32_t exit_reason;
uint32_t vm_status;
union {
struct kvm_mmio_req ioreq;
struct kvm_pal_call pal_data;
struct kvm_sal_call sal_data;
struct kvm_switch_rr6 rr_data;
struct kvm_ipi_data ipi_data;
struct kvm_ptc_g ptc_g_data;
} u;
};
union pte_flags {
unsigned long val;
struct {
unsigned long p : 1; /*0 */
unsigned long : 1; /* 1 */
unsigned long ma : 3; /* 2-4 */
unsigned long a : 1; /* 5 */
unsigned long d : 1; /* 6 */
unsigned long pl : 2; /* 7-8 */
unsigned long ar : 3; /* 9-11 */
unsigned long ppn : 38; /* 12-49 */
unsigned long : 2; /* 50-51 */
unsigned long ed : 1; /* 52 */
};
};
union ia64_pta {
unsigned long val;
struct {
unsigned long ve : 1;
unsigned long reserved0 : 1;
unsigned long size : 6;
unsigned long vf : 1;
unsigned long reserved1 : 6;
unsigned long base : 49;
};
};
struct thash_cb {
/* THASH base information */
struct thash_data *hash; /* hash table pointer */
union ia64_pta pta;
int num;
};
struct kvm_vcpu_stat {
};
struct kvm_vcpu_arch {
int launched;
int last_exit;
int last_run_cpu;
int vmm_tr_slot;
int vm_tr_slot;
#define KVM_MP_STATE_RUNNABLE 0
#define KVM_MP_STATE_UNINITIALIZED 1
#define KVM_MP_STATE_INIT_RECEIVED 2
#define KVM_MP_STATE_HALTED 3
int mp_state;
#define MAX_PTC_G_NUM 3
int ptc_g_count;
struct kvm_ptc_g ptc_g_data[MAX_PTC_G_NUM];
/*halt timer to wake up sleepy vcpus*/
struct hrtimer hlt_timer;
long ht_active;
struct kvm_lapic *apic; /* kernel irqchip context */
struct vpd *vpd;
/* Exit data for vmm_transition*/
struct exit_ctl_data exit_data;
cpumask_t cache_coherent_map;
unsigned long vmm_rr;
unsigned long host_rr6;
unsigned long psbits[8];
unsigned long cr_iipa;
unsigned long cr_isr;
unsigned long vsa_base;
unsigned long dirty_log_lock_pa;
unsigned long __gp;
/* TR and TC. */
struct thash_data itrs[NITRS];
struct thash_data dtrs[NDTRS];
/* Bit is set if there is a tr/tc for the region. */
unsigned char itr_regions;
unsigned char dtr_regions;
unsigned char tc_regions;
/* purge all */
unsigned long ptce_base;
unsigned long ptce_count[2];
unsigned long ptce_stride[2];
/* itc/itm */
unsigned long last_itc;
long itc_offset;
unsigned long itc_check;
unsigned long timer_check;
unsigned long timer_pending;
unsigned long vrr[8];
unsigned long ibr[8];
unsigned long dbr[8];
unsigned long insvc[4]; /* Interrupt in service. */
unsigned long xtp;
unsigned long metaphysical_rr0; /* from kvm_arch (so is pinned) */
unsigned long metaphysical_rr4; /* from kvm_arch (so is pinned) */
unsigned long metaphysical_saved_rr0; /* from kvm_arch */
unsigned long metaphysical_saved_rr4; /* from kvm_arch */
unsigned long fp_psr; /*used for lazy float register */
unsigned long saved_gp;
/*for phycial emulation */
int mode_flags;
struct thash_cb vtlb;
struct thash_cb vhpt;
char irq_check;
char irq_new_pending;
unsigned long opcode;
unsigned long cause;
union context host;
union context guest;
};
struct kvm_vm_stat {
u64 remote_tlb_flush;
};
struct kvm_sal_data {
unsigned long boot_ip;
unsigned long boot_gp;
};
struct kvm_arch {
unsigned long vm_base;
unsigned long metaphysical_rr0;
unsigned long metaphysical_rr4;
unsigned long vmm_init_rr;
unsigned long vhpt_base;
unsigned long vtlb_base;
unsigned long vpd_base;
spinlock_t dirty_log_lock;
struct kvm_ioapic *vioapic;
struct kvm_vm_stat stat;
struct kvm_sal_data rdv_sal_data;
};
union cpuid3_t {
u64 value;
struct {
u64 number : 8;
u64 revision : 8;
u64 model : 8;
u64 family : 8;
u64 archrev : 8;
u64 rv : 24;
};
};
struct kvm_pt_regs {
/* The following registers are saved by SAVE_MIN: */
unsigned long b6; /* scratch */
unsigned long b7; /* scratch */
unsigned long ar_csd; /* used by cmp8xchg16 (scratch) */
unsigned long ar_ssd; /* reserved for future use (scratch) */
unsigned long r8; /* scratch (return value register 0) */
unsigned long r9; /* scratch (return value register 1) */
unsigned long r10; /* scratch (return value register 2) */
unsigned long r11; /* scratch (return value register 3) */
unsigned long cr_ipsr; /* interrupted task's psr */
unsigned long cr_iip; /* interrupted task's instruction pointer */
unsigned long cr_ifs; /* interrupted task's function state */
unsigned long ar_unat; /* interrupted task's NaT register (preserved) */
unsigned long ar_pfs; /* prev function state */
unsigned long ar_rsc; /* RSE configuration */
/* The following two are valid only if cr_ipsr.cpl > 0: */
unsigned long ar_rnat; /* RSE NaT */
unsigned long ar_bspstore; /* RSE bspstore */
unsigned long pr; /* 64 predicate registers (1 bit each) */
unsigned long b0; /* return pointer (bp) */
unsigned long loadrs; /* size of dirty partition << 16 */
unsigned long r1; /* the gp pointer */
unsigned long r12; /* interrupted task's memory stack pointer */
unsigned long r13; /* thread pointer */
unsigned long ar_fpsr; /* floating point status (preserved) */
unsigned long r15; /* scratch */
/* The remaining registers are NOT saved for system calls. */
unsigned long r14; /* scratch */
unsigned long r2; /* scratch */
unsigned long r3; /* scratch */
unsigned long r16; /* scratch */
unsigned long r17; /* scratch */
unsigned long r18; /* scratch */
unsigned long r19; /* scratch */
unsigned long r20; /* scratch */
unsigned long r21; /* scratch */
unsigned long r22; /* scratch */
unsigned long r23; /* scratch */
unsigned long r24; /* scratch */
unsigned long r25; /* scratch */
unsigned long r26; /* scratch */
unsigned long r27; /* scratch */
unsigned long r28; /* scratch */
unsigned long r29; /* scratch */
unsigned long r30; /* scratch */
unsigned long r31; /* scratch */
unsigned long ar_ccv; /* compare/exchange value (scratch) */
/*
* Floating point registers that the kernel considers scratch:
*/
struct ia64_fpreg f6; /* scratch */
struct ia64_fpreg f7; /* scratch */
struct ia64_fpreg f8; /* scratch */
struct ia64_fpreg f9; /* scratch */
struct ia64_fpreg f10; /* scratch */
struct ia64_fpreg f11; /* scratch */
unsigned long r4; /* preserved */
unsigned long r5; /* preserved */
unsigned long r6; /* preserved */
unsigned long r7; /* preserved */
unsigned long eml_unat; /* used for emulating instruction */
unsigned long pad0; /* alignment pad */
};
static inline struct kvm_pt_regs *vcpu_regs(struct kvm_vcpu *v)
{
return (struct kvm_pt_regs *) ((unsigned long) v + IA64_STK_OFFSET) - 1;
}
typedef int kvm_vmm_entry(void);
typedef void kvm_tramp_entry(union context *host, union context *guest);
struct kvm_vmm_info{
struct module *module;
kvm_vmm_entry *vmm_entry;
kvm_tramp_entry *tramp_entry;
unsigned long vmm_ivt;
};
int kvm_highest_pending_irq(struct kvm_vcpu *vcpu);
int kvm_emulate_halt(struct kvm_vcpu *vcpu);
int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
void kvm_sal_emul(struct kvm_vcpu *vcpu);
#endif

View File

@ -0,0 +1,29 @@
#ifndef __IA64_KVM_PARA_H
#define __IA64_KVM_PARA_H
/*
* asm-ia64/kvm_para.h
*
* Copyright (C) 2007 Xiantao Zhang <xiantao.zhang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 59 Temple
* Place - Suite 330, Boston, MA 02111-1307 USA.
*
*/
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
}
#endif

View File

@ -119,6 +119,69 @@ struct ia64_psr {
__u64 reserved4 : 19;
};
union ia64_isr {
__u64 val;
struct {
__u64 code : 16;
__u64 vector : 8;
__u64 reserved1 : 8;
__u64 x : 1;
__u64 w : 1;
__u64 r : 1;
__u64 na : 1;
__u64 sp : 1;
__u64 rs : 1;
__u64 ir : 1;
__u64 ni : 1;
__u64 so : 1;
__u64 ei : 2;
__u64 ed : 1;
__u64 reserved2 : 20;
};
};
union ia64_lid {
__u64 val;
struct {
__u64 rv : 16;
__u64 eid : 8;
__u64 id : 8;
__u64 ig : 32;
};
};
union ia64_tpr {
__u64 val;
struct {
__u64 ig0 : 4;
__u64 mic : 4;
__u64 rsv : 8;
__u64 mmi : 1;
__u64 ig1 : 47;
};
};
union ia64_itir {
__u64 val;
struct {
__u64 rv3 : 2; /* 0-1 */
__u64 ps : 6; /* 2-7 */
__u64 key : 24; /* 8-31 */
__u64 rv4 : 32; /* 32-63 */
};
};
union ia64_rr {
__u64 val;
struct {
__u64 ve : 1; /* enable hw walker */
__u64 reserved0: 1; /* reserved */
__u64 ps : 6; /* log page size */
__u64 rid : 24; /* region id */
__u64 reserved1: 32; /* reserved */
};
};
/*
* CPU type, hardware bug flags, and per-CPU state. Frequently used
* state comes earlier:

View File

@ -1,6 +1,55 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __LINUX_KVM_POWERPC_H
#define __LINUX_KVM_POWERPC_H
/* powerpc does not support KVM */
#include <asm/types.h>
#endif
struct kvm_regs {
__u64 pc;
__u64 cr;
__u64 ctr;
__u64 lr;
__u64 xer;
__u64 msr;
__u64 srr0;
__u64 srr1;
__u64 pid;
__u64 sprg0;
__u64 sprg1;
__u64 sprg2;
__u64 sprg3;
__u64 sprg4;
__u64 sprg5;
__u64 sprg6;
__u64 sprg7;
__u64 gpr[32];
};
struct kvm_sregs {
};
struct kvm_fpu {
__u64 fpr[32];
};
#endif /* __LINUX_KVM_POWERPC_H */

View File

@ -0,0 +1,55 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2008
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __POWERPC_KVM_ASM_H__
#define __POWERPC_KVM_ASM_H__
/* IVPR must be 64KiB-aligned. */
#define VCPU_SIZE_ORDER 4
#define VCPU_SIZE_LOG (VCPU_SIZE_ORDER + 12)
#define VCPU_TLB_PGSZ PPC44x_TLB_64K
#define VCPU_SIZE_BYTES (1<<VCPU_SIZE_LOG)
#define BOOKE_INTERRUPT_CRITICAL 0
#define BOOKE_INTERRUPT_MACHINE_CHECK 1
#define BOOKE_INTERRUPT_DATA_STORAGE 2
#define BOOKE_INTERRUPT_INST_STORAGE 3
#define BOOKE_INTERRUPT_EXTERNAL 4
#define BOOKE_INTERRUPT_ALIGNMENT 5
#define BOOKE_INTERRUPT_PROGRAM 6
#define BOOKE_INTERRUPT_FP_UNAVAIL 7
#define BOOKE_INTERRUPT_SYSCALL 8
#define BOOKE_INTERRUPT_AP_UNAVAIL 9
#define BOOKE_INTERRUPT_DECREMENTER 10
#define BOOKE_INTERRUPT_FIT 11
#define BOOKE_INTERRUPT_WATCHDOG 12
#define BOOKE_INTERRUPT_DTLB_MISS 13
#define BOOKE_INTERRUPT_ITLB_MISS 14
#define BOOKE_INTERRUPT_DEBUG 15
#define BOOKE_MAX_INTERRUPT 15
#define RESUME_FLAG_NV (1<<0) /* Reload guest nonvolatile state? */
#define RESUME_FLAG_HOST (1<<1) /* Resume host? */
#define RESUME_GUEST 0
#define RESUME_GUEST_NV RESUME_FLAG_NV
#define RESUME_HOST RESUME_FLAG_HOST
#define RESUME_HOST_NV (RESUME_FLAG_HOST|RESUME_FLAG_NV)
#endif /* __POWERPC_KVM_ASM_H__ */

View File

@ -0,0 +1,152 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __POWERPC_KVM_HOST_H__
#define __POWERPC_KVM_HOST_H__
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <asm/kvm_asm.h>
#define KVM_MAX_VCPUS 1
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
/* We don't currently support large pages. */
#define KVM_PAGES_PER_HPAGE (1<<31)
struct kvm;
struct kvm_run;
struct kvm_vcpu;
struct kvm_vm_stat {
u32 remote_tlb_flush;
};
struct kvm_vcpu_stat {
u32 sum_exits;
u32 mmio_exits;
u32 dcr_exits;
u32 signal_exits;
u32 light_exits;
/* Account for special types of light exits: */
u32 itlb_real_miss_exits;
u32 itlb_virt_miss_exits;
u32 dtlb_real_miss_exits;
u32 dtlb_virt_miss_exits;
u32 syscall_exits;
u32 isi_exits;
u32 dsi_exits;
u32 emulated_inst_exits;
u32 dec_exits;
u32 ext_intr_exits;
};
struct tlbe {
u32 tid; /* Only the low 8 bits are used. */
u32 word0;
u32 word1;
u32 word2;
};
struct kvm_arch {
};
struct kvm_vcpu_arch {
/* Unmodified copy of the guest's TLB. */
struct tlbe guest_tlb[PPC44x_TLB_SIZE];
/* TLB that's actually used when the guest is running. */
struct tlbe shadow_tlb[PPC44x_TLB_SIZE];
/* Pages which are referenced in the shadow TLB. */
struct page *shadow_pages[PPC44x_TLB_SIZE];
/* Copy of the host's TLB. */
struct tlbe host_tlb[PPC44x_TLB_SIZE];
u32 host_stack;
u32 host_pid;
u64 fpr[32];
u32 gpr[32];
u32 pc;
u32 cr;
u32 ctr;
u32 lr;
u32 xer;
u32 msr;
u32 mmucr;
u32 sprg0;
u32 sprg1;
u32 sprg2;
u32 sprg3;
u32 sprg4;
u32 sprg5;
u32 sprg6;
u32 sprg7;
u32 srr0;
u32 srr1;
u32 csrr0;
u32 csrr1;
u32 dsrr0;
u32 dsrr1;
u32 dear;
u32 esr;
u32 dec;
u32 decar;
u32 tbl;
u32 tbu;
u32 tcr;
u32 tsr;
u32 ivor[16];
u32 ivpr;
u32 pir;
u32 pid;
u32 pvr;
u32 ccr0;
u32 ccr1;
u32 dbcr0;
u32 dbcr1;
u32 last_inst;
u32 fault_dear;
u32 fault_esr;
gpa_t paddr_accessed;
u8 io_gpr; /* GPR used as IO source/target */
u8 mmio_is_bigendian;
u8 dcr_needed;
u8 dcr_is_write;
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
struct timer_list dec_timer;
unsigned long pending_exceptions;
};
struct kvm_guest_debug {
int enabled;
unsigned long bp[4];
int singlestep;
};
#endif /* __POWERPC_KVM_HOST_H__ */

View File

@ -0,0 +1,37 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2008
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __POWERPC_KVM_PARA_H__
#define __POWERPC_KVM_PARA_H__
#ifdef __KERNEL__
static inline int kvm_para_available(void)
{
return 0;
}
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
}
#endif /* __KERNEL__ */
#endif /* __POWERPC_KVM_PARA_H__ */

View File

@ -0,0 +1,88 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2008
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
#ifndef __POWERPC_KVM_PPC_H__
#define __POWERPC_KVM_PPC_H__
/* This file exists just so we can dereference kvm_vcpu, avoiding nested header
* dependencies. */
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
struct kvm_tlb {
struct tlbe guest_tlb[PPC44x_TLB_SIZE];
struct tlbe shadow_tlb[PPC44x_TLB_SIZE];
};
enum emulation_result {
EMULATE_DONE, /* no further processing */
EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */
EMULATE_DO_DCR, /* kvm_run filled with DCR request */
EMULATE_FAIL, /* can't emulate this instruction */
};
extern const unsigned char exception_priority[];
extern const unsigned char priority_exception[];
extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
extern char kvmppc_handlers_start[];
extern unsigned long kvmppc_handler_len;
extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int rt, unsigned int bytes,
int is_bigendian);
extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
u32 val, unsigned int bytes, int is_bigendian);
extern int kvmppc_emulate_instruction(struct kvm_run *run,
struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn,
u64 asid, u32 flags);
extern void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid);
extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
extern void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu);
static inline void kvmppc_queue_exception(struct kvm_vcpu *vcpu, int exception)
{
unsigned int priority = exception_priority[exception];
set_bit(priority, &vcpu->arch.pending_exceptions);
}
static inline void kvmppc_clear_exception(struct kvm_vcpu *vcpu, int exception)
{
unsigned int priority = exception_priority[exception];
clear_bit(priority, &vcpu->arch.pending_exceptions);
}
static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
{
if ((new_msr & MSR_PR) != (vcpu->arch.msr & MSR_PR))
kvmppc_mmu_priv_switch(vcpu, new_msr & MSR_PR);
vcpu->arch.msr = new_msr;
}
#endif /* __POWERPC_KVM_PPC_H__ */

View File

@ -53,6 +53,8 @@
#ifndef __ASSEMBLY__
extern unsigned int tlb_44x_hwater;
typedef struct {
unsigned long id;
unsigned long vdso_base;

View File

@ -7,6 +7,7 @@ header-y += tape390.h
header-y += ucontext.h
header-y += vtoc.h
header-y += zcrypt.h
header-y += kvm.h
unifdef-y += cmb.h
unifdef-y += debug.h

View File

@ -1,6 +1,45 @@
#ifndef __LINUX_KVM_S390_H
#define __LINUX_KVM_S390_H
/* s390 does not support KVM */
/*
* asm-s390/kvm.h - KVM s390 specific structures and definitions
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
* Christian Borntraeger <borntraeger@de.ibm.com>
*/
#include <asm/types.h>
/* for KVM_GET_IRQCHIP and KVM_SET_IRQCHIP */
struct kvm_pic_state {
/* no PIC for s390 */
};
struct kvm_ioapic_state {
/* no IOAPIC for s390 */
};
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
/* general purpose regs for s390 */
__u64 gprs[16];
};
/* for KVM_GET_SREGS and KVM_SET_SREGS */
struct kvm_sregs {
__u32 acrs[16];
__u64 crs[16];
};
/* for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
__u32 fpc;
__u64 fprs[16];
};
#endif

234
include/asm-s390/kvm_host.h Normal file
View File

@ -0,0 +1,234 @@
/*
* asm-s390/kvm_host.h - definition for kernel virtual machines on s390
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Carsten Otte <cotte@de.ibm.com>
*/
#ifndef ASM_KVM_HOST_H
#define ASM_KVM_HOST_H
#include <linux/kvm_host.h>
#include <asm/debug.h>
#define KVM_MAX_VCPUS 64
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
struct kvm_guest_debug {
};
struct sca_entry {
atomic_t scn;
__u64 reserved;
__u64 sda;
__u64 reserved2[2];
} __attribute__((packed));
struct sca_block {
__u64 ipte_control;
__u64 reserved[5];
__u64 mcn;
__u64 reserved2;
struct sca_entry cpu[64];
} __attribute__((packed));
#define KVM_PAGES_PER_HPAGE 256
#define CPUSTAT_HOST 0x80000000
#define CPUSTAT_WAIT 0x10000000
#define CPUSTAT_ECALL_PEND 0x08000000
#define CPUSTAT_STOP_INT 0x04000000
#define CPUSTAT_IO_INT 0x02000000
#define CPUSTAT_EXT_INT 0x01000000
#define CPUSTAT_RUNNING 0x00800000
#define CPUSTAT_RETAINED 0x00400000
#define CPUSTAT_TIMING_SUB 0x00020000
#define CPUSTAT_SIE_SUB 0x00010000
#define CPUSTAT_RRF 0x00008000
#define CPUSTAT_SLSV 0x00004000
#define CPUSTAT_SLSR 0x00002000
#define CPUSTAT_ZARCH 0x00000800
#define CPUSTAT_MCDS 0x00000100
#define CPUSTAT_SM 0x00000080
#define CPUSTAT_G 0x00000008
#define CPUSTAT_J 0x00000002
#define CPUSTAT_P 0x00000001
struct sie_block {
atomic_t cpuflags; /* 0x0000 */
__u32 prefix; /* 0x0004 */
__u8 reserved8[32]; /* 0x0008 */
__u64 cputm; /* 0x0028 */
__u64 ckc; /* 0x0030 */
__u64 epoch; /* 0x0038 */
__u8 reserved40[4]; /* 0x0040 */
#define LCTL_CR0 0x8000
__u16 lctl; /* 0x0044 */
__s16 icpua; /* 0x0046 */
__u32 ictl; /* 0x0048 */
__u32 eca; /* 0x004c */
__u8 icptcode; /* 0x0050 */
__u8 reserved51; /* 0x0051 */
__u16 ihcpu; /* 0x0052 */
__u8 reserved54[2]; /* 0x0054 */
__u16 ipa; /* 0x0056 */
__u32 ipb; /* 0x0058 */
__u32 scaoh; /* 0x005c */
__u8 reserved60; /* 0x0060 */
__u8 ecb; /* 0x0061 */
__u8 reserved62[2]; /* 0x0062 */
__u32 scaol; /* 0x0064 */
__u8 reserved68[4]; /* 0x0068 */
__u32 todpr; /* 0x006c */
__u8 reserved70[16]; /* 0x0070 */
__u64 gmsor; /* 0x0080 */
__u64 gmslm; /* 0x0088 */
psw_t gpsw; /* 0x0090 */
__u64 gg14; /* 0x00a0 */
__u64 gg15; /* 0x00a8 */
__u8 reservedb0[30]; /* 0x00b0 */
__u16 iprcc; /* 0x00ce */
__u8 reservedd0[48]; /* 0x00d0 */
__u64 gcr[16]; /* 0x0100 */
__u64 gbea; /* 0x0180 */
__u8 reserved188[120]; /* 0x0188 */
} __attribute__((packed));
struct kvm_vcpu_stat {
u32 exit_userspace;
u32 exit_external_request;
u32 exit_external_interrupt;
u32 exit_stop_request;
u32 exit_validity;
u32 exit_instruction;
u32 instruction_lctl;
u32 instruction_lctg;
u32 exit_program_interruption;
u32 exit_instr_and_program;
u32 deliver_emergency_signal;
u32 deliver_service_signal;
u32 deliver_virtio_interrupt;
u32 deliver_stop_signal;
u32 deliver_prefix_signal;
u32 deliver_restart_signal;
u32 deliver_program_int;
u32 exit_wait_state;
u32 instruction_stidp;
u32 instruction_spx;
u32 instruction_stpx;
u32 instruction_stap;
u32 instruction_storage_key;
u32 instruction_stsch;
u32 instruction_chsc;
u32 instruction_stsi;
u32 instruction_stfl;
u32 instruction_sigp_sense;
u32 instruction_sigp_emergency;
u32 instruction_sigp_stop;
u32 instruction_sigp_arch;
u32 instruction_sigp_prefix;
u32 instruction_sigp_restart;
u32 diagnose_44;
};
struct io_info {
__u16 subchannel_id; /* 0x0b8 */
__u16 subchannel_nr; /* 0x0ba */
__u32 io_int_parm; /* 0x0bc */
__u32 io_int_word; /* 0x0c0 */
};
struct ext_info {
__u32 ext_params;
__u64 ext_params2;
};
#define PGM_OPERATION 0x01
#define PGM_PRIVILEGED_OPERATION 0x02
#define PGM_EXECUTE 0x03
#define PGM_PROTECTION 0x04
#define PGM_ADDRESSING 0x05
#define PGM_SPECIFICATION 0x06
#define PGM_DATA 0x07
struct pgm_info {
__u16 code;
};
struct prefix_info {
__u32 address;
};
struct interrupt_info {
struct list_head list;
u64 type;
union {
struct io_info io;
struct ext_info ext;
struct pgm_info pgm;
struct prefix_info prefix;
};
};
/* for local_interrupt.action_flags */
#define ACTION_STORE_ON_STOP 1
#define ACTION_STOP_ON_STOP 2
struct local_interrupt {
spinlock_t lock;
struct list_head list;
atomic_t active;
struct float_interrupt *float_int;
int timer_due; /* event indicator for waitqueue below */
wait_queue_head_t wq;
atomic_t *cpuflags;
unsigned int action_bits;
};
struct float_interrupt {
spinlock_t lock;
struct list_head list;
atomic_t active;
int next_rr_cpu;
unsigned long idle_mask [(64 + sizeof(long) - 1) / sizeof(long)];
struct local_interrupt *local_int[64];
};
struct kvm_vcpu_arch {
struct sie_block *sie_block;
unsigned long guest_gprs[16];
s390_fp_regs host_fpregs;
unsigned int host_acrs[NUM_ACRS];
s390_fp_regs guest_fpregs;
unsigned int guest_acrs[NUM_ACRS];
struct local_interrupt local_int;
struct timer_list ckc_timer;
union {
cpuid_t cpu_id;
u64 stidp_data;
};
};
struct kvm_vm_stat {
u32 remote_tlb_flush;
};
struct kvm_arch{
unsigned long guest_origin;
unsigned long guest_memsize;
struct sca_block *sca;
debug_info_t *dbf;
struct float_interrupt float_int;
};
extern int sie64a(struct sie_block *, __u64 *);
#endif

150
include/asm-s390/kvm_para.h Normal file
View File

@ -0,0 +1,150 @@
/*
* asm-s390/kvm_para.h - definition for paravirtual devices on s390
*
* Copyright IBM Corp. 2008
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Christian Borntraeger <borntraeger@de.ibm.com>
*/
#ifndef __S390_KVM_PARA_H
#define __S390_KVM_PARA_H
/*
* Hypercalls for KVM on s390. The calling convention is similar to the
* s390 ABI, so we use R2-R6 for parameters 1-5. In addition we use R1
* as hypercall number and R7 as parameter 6. The return value is
* written to R2. We use the diagnose instruction as hypercall. To avoid
* conflicts with existing diagnoses for LPAR and z/VM, we do not use
* the instruction encoded number, but specify the number in R1 and
* use 0x500 as KVM hypercall
*
* Copyright IBM Corp. 2007,2008
* Author(s): Christian Borntraeger <borntraeger@de.ibm.com>
*
* This work is licensed under the terms of the GNU GPL, version 2.
*/
static inline long kvm_hypercall0(unsigned long nr)
{
register unsigned long __nr asm("1") = nr;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr): "memory", "cc");
return __rc;
}
static inline long kvm_hypercall1(unsigned long nr, unsigned long p1)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1) : "memory", "cc");
return __rc;
}
static inline long kvm_hypercall2(unsigned long nr, unsigned long p1,
unsigned long p2)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register unsigned long __p2 asm("3") = p2;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1), "d" (__p2)
: "memory", "cc");
return __rc;
}
static inline long kvm_hypercall3(unsigned long nr, unsigned long p1,
unsigned long p2, unsigned long p3)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register unsigned long __p2 asm("3") = p2;
register unsigned long __p3 asm("4") = p3;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1), "d" (__p2),
"d" (__p3) : "memory", "cc");
return __rc;
}
static inline long kvm_hypercall4(unsigned long nr, unsigned long p1,
unsigned long p2, unsigned long p3,
unsigned long p4)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register unsigned long __p2 asm("3") = p2;
register unsigned long __p3 asm("4") = p3;
register unsigned long __p4 asm("5") = p4;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1), "d" (__p2),
"d" (__p3), "d" (__p4) : "memory", "cc");
return __rc;
}
static inline long kvm_hypercall5(unsigned long nr, unsigned long p1,
unsigned long p2, unsigned long p3,
unsigned long p4, unsigned long p5)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register unsigned long __p2 asm("3") = p2;
register unsigned long __p3 asm("4") = p3;
register unsigned long __p4 asm("5") = p4;
register unsigned long __p5 asm("6") = p5;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1), "d" (__p2),
"d" (__p3), "d" (__p4), "d" (__p5) : "memory", "cc");
return __rc;
}
static inline long kvm_hypercall6(unsigned long nr, unsigned long p1,
unsigned long p2, unsigned long p3,
unsigned long p4, unsigned long p5,
unsigned long p6)
{
register unsigned long __nr asm("1") = nr;
register unsigned long __p1 asm("2") = p1;
register unsigned long __p2 asm("3") = p2;
register unsigned long __p3 asm("4") = p3;
register unsigned long __p4 asm("5") = p4;
register unsigned long __p5 asm("6") = p5;
register unsigned long __p6 asm("7") = p6;
register long __rc asm("2");
asm volatile ("diag 2,4,0x500\n"
: "=d" (__rc) : "d" (__nr), "0" (__p1), "d" (__p2),
"d" (__p3), "d" (__p4), "d" (__p5), "d" (__p6)
: "memory", "cc");
return __rc;
}
/* kvm on s390 is always paravirtualization enabled */
static inline int kvm_para_available(void)
{
return 1;
}
/* No feature bits are currently assigned for kvm on s390 */
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
}
#endif /* __S390_KVM_PARA_H */

Some files were not shown because too many files have changed in this diff Show More