linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-26 13:55:09 +07:00

Author	SHA1	Message	Date
Paul Mackerras	9e368f2915	KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:59 +03:00
Paul Mackerras	aa04b4cc5b	KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	371fefd6f2	KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
David Gibson	54738c0971	KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:56 +03:00
Paul Mackerras	a8606e20e4	KVM: PPC: Handle some PAPR hcalls in the kernel This adds the infrastructure for handling PAPR hcalls in the kernel, either early in the guest exit path while we are still in real mode, or later once the MMU has been turned back on and we are in the full kernel context. The advantage of handling hcalls in real mode if possible is that we avoid two partition switches -- and this will become more important when we support SMT4 guests, since a partition switch means we have to pull all of the threads in the core out of the guest. The disadvantage is that we can only access the kernel linear mapping, not anything vmalloced or ioremapped, since the MMU is off. This also adds code to handle the following hcalls in real mode: H_ENTER Add an HPTE to the hashed page table H_REMOVE Remove an HPTE from the hashed page table H_READ Read HPTEs from the hashed page table H_PROTECT Change the protection bits in an HPTE H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table H_SET_DABR Set the data address breakpoint register Plus code to handle the following hcalls in the kernel: H_CEDE Idle the vcpu until an interrupt or H_PROD hcall arrives H_PROD Wake up a ceded vcpu H_REGISTER_VPA Register a virtual processor area (VPA) The code that runs in real mode has to be in the base kernel, not in the module, if KVM is compiled as a module. The real-mode code can only access the kernel linear mapping, not vmalloc or ioremap space. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:55 +03:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
Paul Mackerras	df6909e5d5	KVM: PPC: Move guest enter/exit down into subarch-specific code Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable() calls in powerpc.c, this moves them down into the subarch-specific book3s_pr.c and booke.c. This eliminates an extra local_irq_enable() call in book3s_pr.c, and will be needed for when we do SMT4 guest support in the book3s hypervisor mode code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:51 +03:00
Paul Mackerras	f9e0554dec	KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down This arranges for the top-level arch/powerpc/kvm/powerpc.c file to pass down some of the calls it gets to the lower-level subarchitecture specific code. The lower-level implementations (in booke.c and book3s.c) are no-ops. The coming book3s_hv.c will need this. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:50 +03:00
Scott Wood	a4cd8b23ac	KVM: PPC: e500: enable magic page This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:37 +03:00
Scott Wood	5ce941ee42	KVM: PPC: booke: add sregs support Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-05-22 08:47:53 -04:00
Scott Wood	eab176722f	KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0) Linux doesn't use USPRG0 (now renamed VRSAVE in the architecture, even when Altivec isn't involved), but a guest might. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-05-22 08:47:50 -04:00
Bharat Bhushan	09000adb86	KVM: PPC: Fix issue clearing exit timing counters Following dump is observed on host when clearing the exit timing counters [root@p1021mds kvm]# echo -n 'c' > vm1200_vcpu0_timing INFO: task echo:1276 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. echo D 0ff5bf94 0 1276 1190 0x00000000 Call Trace: [c2157e40] [c0007908] __switch_to+0x9c/0xc4 [c2157e50] [c040293c] schedule+0x1b4/0x3bc [c2157e90] [c04032dc] __mutex_lock_slowpath+0x74/0xc0 [c2157ec0] [c00369e4] kvmppc_init_timing_stats+0x20/0xb8 [c2157ed0] [c0036b00] kvmppc_exit_timing_write+0x84/0x98 [c2157ef0] [c00b9f90] vfs_write+0xc0/0x16c [c2157f10] [c00ba284] sys_write+0x4c/0x90 [c2157f40] [c000e320] ret_from_syscall+0x0/0x3c The vcpu->mutex is used by kvm_ioctl_* (KVM_RUN etc) and same was used when clearing the stats (in kvmppc_init_timing_stats()). What happens is that when the guest is idle then it held the vcpu->mutx. While the exiting timing process waits for guest to release the vcpu->mutex and a hang state is reached. Now using seprate lock for exit timing stats. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Jan Kiszka	d89f5eff70	KVM: Clean up vm creation and release IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:09 +02:00
Vasiliy Kulikov	d8cdddcd64	KVM: PPC: fix information leak to userland Structure kvm_ppc_pvinfo is copied to userland with flags and pad fields unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:26 -02:00
Alexander Graf	7b4203e8cb	KVM: PPC: Expose level based interrupt cap Now that we have all the level interrupt magic in place, let's expose the capability to user space, so it can make use of it! Signed-off-by: Alexander Graf <agraf@suse.de>	2010-10-24 10:52:19 +02:00
Alexander Graf	df1bfa25d8	KVM: PPC: Put segment registers in shared page Now that the actual mtsr doesn't do anything anymore, we can move the sr contents over to the shared page, so a guest can directly read and write its sr contents from guest context. Signed-off-by: Alexander Graf <agraf@suse.de>	2010-10-24 10:52:11 +02:00
Alexander Graf	7508e16c9f	KVM: PPC: Add feature bitmap for magic page We will soon add SR PV support to the shared page, so we need some infrastructure that allows the guest to query for features KVM exports. This patch adds a second return value to the magic mapping that indicated to the guest which features are available. Signed-off-by: Alexander Graf <agraf@suse.de>	2010-10-24 10:52:09 +02:00
Alexander Graf	15711e9c92	KVM: PPC: Add get_pvinfo interface to query hypercall instructions We need to tell the guest the opcodes that make up a hypercall through interfaces that are controlled by userspace. So we need to add a call for userspace to allow it to query those opcodes so it can pass them on. This is required because the hypercall opcodes can change based on the hypervisor conditions. If we're running in hardware accelerated hypervisor mode, a hypercall looks different from when we're running without hardware acceleration. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:57 +02:00
Alexander Graf	5fc87407b5	KVM: PPC: Expose magic page support to guest Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:49 +02:00
Alexander Graf	2a342ed577	KVM: PPC: Implement hypervisor interface To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:45 +02:00
Alexander Graf	666e7252a1	KVM: PPC: Convert MSR to shared page One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the "interrupts enabled" flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the documentation for a list of MSR bits that are safe to be set from inside the guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:43 +02:00
Avi Kivity	a1f4d39500	KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:00 +03:00
Denis Kirjanov	69b61833f7	KVM: PPC: fix build warning in kvm_arch_vcpu_ioctl_run Fix compile warning: CC [M] arch/powerpc/kvm/powerpc.o arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run': arch/powerpc/kvm/powerpc.c:290: warning: 'gpr' may be used uninitialized in this function arch/powerpc/kvm/powerpc.c:290: note: 'gpr' was declared here Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:36 +03:00
Avi Kivity	9373662463	KVM: Consolidate arch specific vcpu ioctl locking Now that all arch specific ioctls have centralized locking, it is easy to move it to the central dispatcher. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:48 +03:00
Avi Kivity	19483d1440	KVM: PPC: Centralize locking of arch specific vcpu ioctls Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:48 +03:00
Avi Kivity	2122ff5eab	KVM: move vcpu locking to dispatcher for generic vcpu ioctls All vcpu ioctls need to be locked, so instead of locking each one specifically we lock at the generic dispatcher. This patch only updates generic ioctls and leaves arch specific ioctls alone. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:47 +03:00
Alexander Graf	c7f38f46f2	KVM: PPC: Improve indirect svcpu accessors We already have some inline fuctions we use to access vcpu or svcpu structs, depending on whether we're on booke or book3s. Since we just put a few more registers into the svcpu, we also need to make sure the respective callbacks are available and get used. So this patch moves direct use of the now in the svcpu struct fields to inline function calls. While at it, it also moves the definition of those inline function calls to respective header files for booke and book3s, greatly improving readability. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:26 +03:00
Alexander Graf	287d5611fa	KVM: PPC: Only use QPRs when available BookE KVM doesn't know about QPRs, so let's not try to access then. This fixes a build error on BookE KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:17:24 +03:00
Alexander Graf	ad0a048b09	KVM: PPC: Add OSI hypercall interface MOL uses its own hypercall interface to call back into userspace when the guest wants to do something. So let's implement that as an exit reason, specify it with a CAP and only really use it when userspace wants us to. The only user of it so far is MOL. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:17:10 +03:00
Alexander Graf	71fbfd5f38	KVM: Add support for enabling capabilities per-vcpu Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:17:09 +03:00
Alexander Graf	18978768d8	KVM: PPC: Allow userspace to unset the IRQ line Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. Signed-off-by: Alexander Graf <agraf@suse.de> v2 -> v3: - Add CAP for unset irq Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:16:51 +03:00
Wei Yongjun	06056bfb94	KVM: PPC: Do not create debugfs if fail to create vcpu If fail to create the vcpu, we should not create the debugfs for it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Alexander Graf <agraf@suse.de> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:15:21 +03:00
Alexander Graf	a595405df9	KVM: PPC: Destory timer on vcpu destruction When we destory a vcpu, we should also make sure to kill all pending timers that could still be up. When not doing this, hrtimers might dereference null pointers trying to call our code. This patch fixes spontanious kernel panics seen after closing VMs. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:39:25 +03:00
Alexander Graf	c10207fe86	KVM: PPC: Add capability for paired singles We need to tell userspace that we can emulate paired single instructions. So let's add a capability export. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:37:47 +03:00
Alexander Graf	3587d5348c	KVM: PPC: Teach MMIO Signedness The guest I was trying to get to run uses the LHA and LHAU instructions. Those instructions basically do a load, but also sign extend the result. Since we need to fill our registers by hand when doing MMIO, we also need to sign extend manually. This patch implements sign extended MMIO and the LHA(U) instructions. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:44 +03:00
Alexander Graf	b104d06632	KVM: PPC: Enable MMIO to do 64 bits, fprs and qprs Right now MMIO access can only happen for GPRs and is at most 32 bit wide. That's actually enough for almost all types of hardware out there. Unfortunately, the guest I was using used FPU writes to MMIO regions, so it ended up writing 64 bit MMIOs using FPRs and QPRs. So let's add code to handle those odd cases too. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-04-25 12:34:41 +03:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Marcelo Tosatti	6474920477	KVM: fix cleanup_srcu_struct on vm destruction cleanup_srcu_struct on VM destruction remains broken: BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 RIP: 0010:[<ffffffff802533d2>] [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 Call Trace: [<ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm] [<ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm] [<ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel] [<ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm] [<ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm] Move it to kvm_arch_destroy_vm. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Jan Kiszka <jan.kiszka@siemens.com>	2010-03-01 12:36:01 -03:00
Alexander Graf	8e5b26b55a	KVM: PPC: Use accessor functions for GPR access All code in PPC KVM currently accesses gprs in the vcpu struct directly. While there's nothing wrong with that wrt the current way gprs are stored and loaded, it doesn't suffice for the PACA acceleration that will follow in this patchset. So let's just create little wrapper inline functions that we call whenever a GPR needs to be read from or written to. The compiled code shouldn't really change at all for now. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:47 -03:00
Marcelo Tosatti	f7784b8ec9	KVM: split kvm_arch_set_memory_region into prepare and commit Required for SRCU convertion later. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:35:44 -03:00
Benjamin Herrenschmidt	bcd6acd51f	Merge commit 'origin/master' into next Conflicts: include/linux/kvm.h	2009-12-09 17:14:38 +11:00
Alexander Graf	e15a113700	powerpc/kvm: Sync guest visible MMU state Currently userspace has no chance to find out which virtual address space we're in and resolve addresses. While that is a big problem for migration, it's also unpleasent when debugging, as gdb and the monitor don't work on virtual addresses. This patch exports enough of the MMU segment state to userspace to make debugging work and thus also includes the groundwork for migration. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-08 16:02:50 +11:00
Alexander Graf	10474ae894	KVM: Activate Virtualization On Demand X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:10 +02:00
Avi Kivity	367e1319b2	KVM: Return -ENOTTY on unrecognized ioctls Not the incorrect -EINVAL. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:08 +02:00
Alexander Graf	544c6761bb	Use hrtimers for the decrementer Following S390's good example we should use hrtimers for the decrementer too! This patch converts the timer from the old mechanism to hrtimers. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:51:05 +11:00
Alexander Graf	4e755758cb	Move dirty logging code to sub-arch PowerPC code handles dirty logging in the generic parts atm. While this is great for "return -ENOTSUPP", we need to be rather target specific when actually implementing it. So let's split it to implementation specific code, so we can implement it for book3s. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:49:51 +11:00
Gleb Natapov	a1b37100d9	KVM: Reduce runnability interface with arch support code Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:13 +03:00
Marcelo Tosatti	46f43c6ee0	KVM: powerpc: convert marker probes to event trace [avi: make it build] [avi: fold trace-arch.h into trace.h] CC: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Gleb Natapov	988a2cae6a	KVM: Use macro to iterate over vcpus. [christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	78646121e9	KVM: Fix interrupt unhalting a vcpu when it shouldn't kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Hollis Blanchard	f5d0906b5b	KVM: ppc: remove debug support broken by KVM debug rewrite After the rewrite of KVM's debug support, this code doesn't even build any more. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:03:01 +02:00
Hollis Blanchard	ecc0981ff0	KVM: ppc: cosmetic changes to mmu hook names Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:55 +02:00
Jan Kiszka	d0bfb940ec	KVM: New guest debug interface This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:49 +02:00
Sheng Yang	ad8ba2cd44	KVM: Add kvm_arch_sync_events to sync with asynchronize events kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-02-15 02:47:36 +02:00
Avi Kivity	ca9edaee1a	KVM: Consolidate userspace memory capability reporting into common code Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:55:46 +02:00
Hollis Blanchard	73e75b416f	KVM: ppc: Implement in-kernel exit timing statistics Existing KVM statistics are either just counters (kvm_stat) reported for KVM generally or trace based aproaches like kvm_trace. For KVM on powerpc we had the need to track the timings of the different exit types. While this could be achieved parsing data created with a kvm_trace extension this adds too much overhead (at least on embedded PowerPC) slowing down the workloads we wanted to measure. Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm code. These statistic is available per vm&vcpu under the kvm debugfs directory. As this statistic is low, but still some overhead it can be enabled via a .config entry and should be off by default. Since this patch touched all powerpc kvm_stat code anyway this code is now merged and simplified together with the exit timing statistic code (still working with exit timing disabled in .config). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:55:41 +02:00
Hollis Blanchard	5cf8ca2214	KVM: ppc: adjust vcpu types to support 64-bit cores However, some of these fields could be split into separate per-core structures in the future. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:22 +02:00
Hollis Blanchard	db93f5745d	KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor This patch doesn't yet move all 44x-specific data into the new structure, but is the first step down that path. In the future we may also want to create a struct kvm_vcpu_booke. Based on patch from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:22 +02:00
Hollis Blanchard	9dd921cfea	KVM: ppc: Refactor powerpc.c to relocate 440-specific code This introduces a set of core-provided hooks. For 440, some of these are implemented by booke.c, with the rest in (the new) 44x.c. Note that these hooks are link-time, not run-time. Since it is not possible to build a single kernel for both e500 and 440 (for example), using function pointers would only add overhead. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:21 +02:00
Paul Mackerras	fad7b9b51e	powerpc: Fix KVM build on ppc440 Commit `2a4aca1144` ("powerpc/mm: Split low level tlb invalidate for nohash processors") changed a call to _tlbia to _tlbil_all but didn't include the header that defines _tlbil_all, leading to a build failure on 440 if KVM is enabled. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 14:58:30 +11:00
Benjamin Herrenschmidt	2a4aca1144	powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Hollis Blanchard	c30f8a6c6d	KVM: ppc: stop leaking host memory on VM exit When the VM exits, we must call put_page() for every page referenced in the shadow TLB. Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB pages). The maximum number of pages leaked is the size of our shadow TLB, 64 pages. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-11-25 12:02:48 +02:00
Hollis Blanchard	83aae4a809	KVM: ppc: Write only modified shadow entries into the TLB on exit Track which TLB entries need to be written, instead of overwriting everything below the high water mark. Typically only a single guest TLB entry will be modified in a single exit. Guest boot time performance improvement: about 15%. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:16 +02:00
Hollis Blanchard	6a0ab738ef	KVM: ppc: guest breakpoint support Allow host userspace to program hardware debug registers to set breakpoints inside guests. Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-10-15 10:15:16 +02:00
Marcelo Tosatti	34d4cb8fca	KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:40 +03:00
Laurent Vivier	588968b6b7	KVM: Add coalesced MMIO support (powerpc part) This patch enables coalesced MMIO for powerpc architecture. It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO. It enables the compilation of coalesced_mmio.c. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:31 +03:00
Avi Kivity	7cc8883074	KVM: Remove decache_vcpus_on_cpu() and related callbacks Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-20 12:42:25 +03:00
Hollis Blanchard	45c5eb67da	KVM: ppc: Handle guest idle by emulating MSR[WE] writes This reduces host CPU usage when the guest is idle. However, the guest must set MSR[WE] in its idle loop, which Linux did not do until 2.6.26. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Jerone Young <jyoung5@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-05-04 14:44:44 +03:00
Hollis Blanchard	bbf45ba57e	KVM: ppc: PowerPC 440 KVM implementation This functionality is definitely experimental, but is capable of running unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only tested with 440EP "Bamboo" guests so far, but with appropriate userspace support other SoC/board combinations should work.) See Documentation/powerpc/kvm_440.txt for technical details. [stephen: build fix] Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:39 +03:00

1 2 3 4 5

219 Commits