linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-30 19:26:38 +07:00

Author	SHA1	Message	Date
Christoffer Dall	4b4b4512da	arm/arm64: KVM: Rework the arch timer to use level-triggered semantics The arch timer currently uses edge-triggered semantics in the sense that the line is never sampled by the vgic and lowering the line from the timer to the vgic doesn't have any effect on the pending state of virtual interrupts in the vgic. This means that we do not support a guest with the otherwise valid behavior of (1) disable interrupts (2) enable the timer (3) disable the timer (4) enable interrupts. Such a guest would validly not expect to see any interrupts on real hardware, but will see interrupts on KVM. This patch fixes this shortcoming through the following series of changes. First, we change the flow of the timer/vgic sync/flush operations. Now the timer is always flushed/synced before the vgic, because the vgic samples the state of the timer output. This has the implication that we move the timer operations in to non-preempible sections, but that is fine after the previous commit getting rid of hrtimer schedules on every entry/exit. Second, we change the internal behavior of the timer, letting the timer keep track of its previous output state, and only lower/raise the line to the vgic when the state changes. Note that in theory this could have been accomplished more simply by signalling the vgic every time the state potentially changed, but we don't want to be hitting the vgic more often than necessary. Third, we get rid of the use of the map->active field in the vgic and instead simply set the interrupt as active on the physical distributor whenever the input to the GIC is asserted and conversely clear the physical active state when the input to the GIC is deasserted. Fourth, and finally, we now initialize the timer PPIs (and all the other unused PPIs for now), to be level-triggered, and modify the sync code to sample the line state on HW sync and re-inject a new interrupt if it is still pending at that time. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:44 +02:00
Christoffer Dall	54723bb37f	arm/arm64: KVM: Use appropriate define in VGIC reset code We currently initialize the SGIs to be enabled in the VGIC code, but we use the VGIC_NR_PPIS define for this purpose, instead of the the more natural VGIC_NR_SGIS. Change this slightly confusing use of the defines. Note: This should have no functional change, as both names are defined to the number 16. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:43 +02:00
Christoffer Dall	8bf9a701e1	arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only. We currently simulate this behavior by writing a hardcoded value to the register for the SGIs and PPIs on every write of these bits to the register (ignoring what the guest actually wrote), and by writing the same value as the reset value to the register. This is a bit counter-intuitive, as the register is RO for these bits, and we can just implement it that way, allowing us to control the value of the bits purely in the reset code. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:42 +02:00
Christoffer Dall	9103617df2	arm/arm64: KVM: vgic: Factor out level irq processing on guest exit Currently vgic_process_maintenance() processes dealing with a completed level-triggered interrupt directly, but we are soon going to reuse this logic for level-triggered mapped interrupts with the HW bit set, so move this logic into a separate static function. Probably the most scary part of this commit is convincing yourself that the current flow is safe compared to the old one. In the following I try to list the changes and why they are harmless: Move vgic_irq_clear_queued after kvm_notify_acked_irq: Harmless because the only potential effect of clearing the queued flag wrt. kvm_set_irq is that vgic_update_irq_pending does not set the pending bit on the emulated CPU interface or in the pending_on_cpu bitmask if the function is called with level=1. However, the point of kvm_notify_acked_irq is to call kvm_set_irq with level=0, and we set the queued flag again in __kvm_vgic_sync_hwstate later on if the level is stil high. Move vgic_set_lr before kvm_notify_acked_irq: Also, harmless because the LR are cpu-local operations and kvm_notify_acked only affects the dist Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq: Also harmless, because now we check the level state in the clear_soft_pend function and lower the pending bits if the level is low. Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:42 +02:00
Christoffer Dall	d35268da66	arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block We currently schedule a soft timer every time we exit the guest if the timer did not expire while running the guest. This is really not necessary, because the only work we do in the timer work function is to kick the vcpu. Kicking the vcpu does two things: (1) If the vpcu thread is on a waitqueue, make it runnable and remove it from the waitqueue. (2) If the vcpu is running on a different physical CPU from the one doing the kick, it sends a reschedule IPI. The second case cannot happen, because the soft timer is only ever scheduled when the vcpu is not running. The first case is only relevant when the vcpu thread is on a waitqueue, which is only the case when the vcpu thread has called kvm_vcpu_block(). Therefore, we only need to make sure a timer is scheduled for kvm_vcpu_block(), which we do by encapsulating all calls to kvm_vcpu_block() with kvm_timer_{un}schedule calls. Additionally, we only schedule a soft timer if the timer is enabled and unmasked, since it is useless otherwise. Note that theoretically userspace can use the SET_ONE_REG interface to change registers that should cause the timer to fire, even if the vcpu is blocked without a scheduled timer, but this case was not supported before this patch and we leave it for future work for now. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:42 +02:00
Christoffer Dall	3217f7c25b	KVM: Add kvm_arch_vcpu_{un}blocking callbacks Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-22 23:01:41 +02:00
Christoffer Dall	0d997491f8	arm/arm64: KVM: Fix disabled distributor operation We currently do a single update of the vgic state when the distributor enable/disable control register is accessed and then bypass updating the state for as long as the distributor remains disabled. This is incorrect, because updating the state does not consider the distributor enable bit, and this you can end up in a situation where an interrupt is marked as pending on the CPU interface, but not pending on the distributor, which is an impossible state to be in, and triggers a warning. Consider for example the following sequence of events: 1. An interrupt is marked as pending on the distributor - the interrupt is also forwarded to the CPU interface 2. The guest turns off the distributor (it's about to do a reboot) - we stop updating the CPU interface state from now on 3. The guest disables the pending interrupt - we remove the pending state from the distributor, but don't touch the CPU interface, see point 2. Since the distributor disable bit really means that no interrupts should be forwarded to the CPU interface, we modify the code to keep updating the internal VGIC state, but always set the CPU interface pending bits to zero when the distributor is disabled. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-20 18:09:13 +02:00
Christoffer Dall	544c572e03	arm/arm64: KVM: Clear map->active on pend/active clear When a guest reboots or offlines/onlines CPUs, it is not uncommon for it to clear the pending and active states of an interrupt through the emulated VGIC distributor. However, since the architected timers are defined by the architecture to be level triggered and the guest rightfully expects them to be that, but we emulate them as edge-triggered, we have to mimic level-triggered behavior for an edge-triggered virtual implementation. We currently do not signal the VGIC when the map->active field is true, because it indicates that the guest has already been signalled of the interrupt as required. Normally this field is set to false when the guest deactivates the virtual interrupt through the sync path. We also need to catch the case where the guest deactivates the interrupt through the emulated distributor, again allowing guests to boot even if the original virtual timer signal hit before the guest's GIC initialization sequence is run. Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-20 18:06:34 +02:00
Christoffer Dall	cff9211eb1	arm/arm64: KVM: Fix arch timer behavior for disabled interrupts We have an interesting issue when the guest disables the timer interrupt on the VGIC, which happens when turning VCPUs off using PSCI, for example. The problem is that because the guest disables the virtual interrupt at the VGIC level, we never inject interrupts to the guest and therefore never mark the interrupt as active on the physical distributor. The host also never takes the timer interrupt (we only use the timer device to trigger a guest exit and everything else is done in software), so the interrupt does not become active through normal means. The result is that we keep entering the guest with a programmed timer that will always fire as soon as we context switch the hardware timer state and run the guest, preventing forward progress for the VCPU. Since the active state on the physical distributor is really part of the timer logic, it is the job of our virtual arch timer driver to manage this state. The timer->map->active boolean field indicates whether we have signalled this interrupt to the vgic and if that interrupt is still pending or active. As long as that is the case, the hardware doesn't have to generate physical interrupts and therefore we mark the interrupt as active on the physical distributor. We also have to restore the pending state of an interrupt that was queued to an LR but was retired from the LR for some reason, while remaining pending in the LR. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-20 18:04:54 +02:00
Pavel Fedin	437f9963bc	KVM: arm/arm64: Do not inject spurious interrupts When lowering a level-triggered line from userspace, we forgot to lower the pending bit on the emulated CPU interface and we also did not re-compute the pending_on_cpu bitmap for the CPU affected by the change. Update vgic_update_irq_pending() to fix the two issues above and also raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt pending on a CPU which is neither marked active nor pending. [ Commit text reworked completely - Christoffer ] Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-10-20 18:04:43 +02:00
Andrey Smetanin	f33143d809	kvm/irqchip: allow only multiple irqchip routes per GSI Any other irq routing types (MSI, S390_ADAPTER, upcoming Hyper-V SynIC) map one-to-one to GSI. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:30 +02:00
Andrey Smetanin	c9a5eccac1	kvm/eventfd: add arch-specific set_irq Allow for arch-specific interrupt types to be set. For that, add kvm_arch_set_irq() which takes interrupt type-specific action if it recognizes the interrupt type given, and -EWOULDBLOCK otherwise. The default implementation always returns -EWOULDBLOCK. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:29 +02:00
Andrey Smetanin	ba1aefcd6d	kvm/eventfd: factor out kvm_notify_acked_gsi() Factor out kvm_notify_acked_gsi() helper to iterate over EOI listeners and notify those matching the given gsi. It will be reused in the upcoming Hyper-V SynIC implementation. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:29 +02:00
Andrey Smetanin	351dc6477c	kvm/eventfd: avoid loop inside irqfd_update() The loop(for) inside irqfd_update() is unnecessary because any other value for irq_entry.type will just trigger schedule_work(&irqfd->inject) in irqfd_wakeup. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-16 10:34:28 +02:00
Kosuke Tatsukawa	6003a42010	kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c async_pf_execute() seems to be missing a memory barrier which might cause the waker to not notice the waiter and miss sending a wake_up as in the following figure. async_pf_execute kvm_vcpu_block ------------------------------------------------------------------------ spin_lock(&vcpu->async_pf.lock); if (waitqueue_active(&vcpu->wq)) /* The CPU might reorder the test for the waitqueue up here, before prior writes complete / prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); /if (kvm_vcpu_check_block(vcpu) < 0) / /if (kvm_arch_vcpu_runnable(vcpu)) { / ... return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) \|\| !list_empty_careful(&vcpu->async_pf.done) ... return 0; list_add_tail(&apf->link, &vcpu->async_pf.done); spin_unlock(&vcpu->async_pf.lock); waited = true; schedule(); ------------------------------------------------------------------------ The attached patch adds the missing memory barrier. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-14 16:41:08 +02:00
Jean-Philippe Brucker	4f64cb65bf	arm/arm64: KVM: Only allow 64bit hosts to build VGICv3 Hardware virtualisation of GICv3 is only supported by 64bit hosts for the moment. Some VGICv3 bits are missing from the 32bit side, and this patch allows to still be able to build 32bit hosts when CONFIG_ARM_GIC_V3 is selected. To this end, we introduce a new option, CONFIG_KVM_ARM_VGIC_V3, that is only enabled on the 64bit side. The selection is done unconditionally because CONFIG_ARM_GIC_V3 is always enabled on arm64. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-10-09 23:11:57 +01:00
Feng Wu	bf9f6ac8d7	KVM: Update Posted-Interrupts Descriptor when vCPU is blocked This patch updates the Posted-Interrupts Descriptor when vCPU is blocked. pre-block: - Add the vCPU to the blocked per-CPU list - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR post-block: - Remove the vCPU from the per-CPU list Signed-off-by: Feng Wu <feng.wu@intel.com> [Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:53 +02:00
Feng Wu	f70c20aaf1	KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' This patch adds an arch specific hooks 'arch_update' in 'struct kvm_kernel_irqfd'. On Intel side, it is used to update the IRTE when VT-d posted-interrupts is used. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:47 +02:00
Eric Auger	9016cfb577	KVM: eventfd: add irq bypass consumer management This patch adds the registration/unregistration of an irq_bypass_consumer on irqfd assignment/deassignment. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:46 +02:00
Eric Auger	1a02b27035	KVM: introduce kvm_arch functions for IRQ bypass This patch introduces - kvm_arch_irq_bypass_add_producer - kvm_arch_irq_bypass_del_producer - kvm_arch_irq_bypass_stop - kvm_arch_irq_bypass_start They make possible to specialize the KVM IRQ bypass consumer in case CONFIG_KVM_HAVE_IRQ_BYPASS is set. Signed-off-by: Eric Auger <eric.auger@linaro.org> [Add weak implementations of the callbacks. - Feng] Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:45 +02:00
Eric Auger	166c9775f1	KVM: create kvm_irqfd.h Move _irqfd_resampler and _irqfd struct declarations in a new public header: kvm_irqfd.h. They are respectively renamed into kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes will be used by architecture specific code, in the context of IRQ bypass manager integration. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:44 +02:00
Feng Wu	37d9fe4783	virt: Add virt directory to the top Makefile We need to build files in virt/lib/, which are now used by KVM and VFIO, so add virt directory to the top Makefile. Signed-off-by: Feng Wu <feng.wu@intel.com> Acked-by: Michal Marek <mmarek@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:44 +02:00
Alex Williamson	f73f817312	virt: IRQ bypass manager When a physical I/O device is assigned to a virtual machine through facilities like VFIO and KVM, the interrupt for the device generally bounces through the host system before being injected into the VM. However, hardware technologies exist that often allow the host to be bypassed for some of these scenarios. Intel Posted Interrupts allow the specified physical edge interrupts to be directly injected into a guest when delivered to a physical processor while the vCPU is running. ARM IRQ Forwarding allows forwarded physical interrupts to be directly deactivated by the guest. The IRQ bypass manager here is meant to provide the shim to connect interrupt producers, generally the host physical device driver, with interrupt consumers, generally the hypervisor, in order to configure these bypass mechanism. To do this, we base the connection on a shared, opaque token. For KVM-VFIO this is expected to be an eventfd_ctx since this is the connection we already use to connect an eventfd to an irqfd on the in-kernel path. When a producer and consumer with matching tokens is found, callbacks via both registered participants allow the bypass facilities to be automatically enabled. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:43 +02:00
Jason Wang	e9ea5069d9	kvm: add capability for any-length ioeventfds Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:31 +02:00
Jason Wang	d3febddde9	kvm: use kmalloc() instead of kzalloc() during iodev register/unregister All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:29 +02:00
Steve Rutherford	b053b2aef2	KVM: x86: Add EOI exit bitmap inference In order to support a userspace IOAPIC interacting with an in kernel APIC, the EOI exit bitmaps need to be configurable. If the IOAPIC is in userspace (i.e. the irqchip has been split), the EOI exit bitmaps will be set whenever the GSI Routes are configured. In particular, for the low MSI routes are reservable for userspace IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the destination vector of the route will be set for the destination VCPU. The intention is for the userspace IOAPICs to use the reservable MSI routes to inject interrupts into the guest. This is a slight abuse of the notion of an MSI Route, given that MSIs classically bypass the IOAPIC. It might be worthwhile to add an additional route type to improve clarity. Compile tested for Intel x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-01 15:06:28 +02:00
David Hildenbrand	920552b213	KVM: disable halt_poll_ns as default for s390x We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-25 10:31:30 +02:00
Paolo Bonzini	efe4d36a75	Second set of KVM/ARM changes for 4.3-rc2 - Workaround for a Cortex-A57 erratum - Bug fix for the debugging infrastructure - Fix for 32bit guests with more than 4GB of address space on a 32bit host - A number of fixes for the (unusual) case when we don't use the in-kernel GIC emulation - Removal of ThumbEE handling on arm64, since these have been dropped from the architecture before anyone actually ever built a CPU - Remove the KVM_ARM_MAX_VCPUS limitation which has become fairly pointless -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV+slQAAoJECPQ0LrRPXpD+WUQAMLC3ZUasJX1gsVixd++zAwB FXu0TFlKCUsLWllXZtyhGI6ya7ljuCzfhRbA/eZFmFVbwDnULt1p5ahw7eHCIZ2a yY93TS6XN3YHwVpY7f2lDsvLhBLyWeTdWhj5TtLy6mslQyEUqxdmsiC7gl40Fp2S 8tKIxoYYRpmbgKl/Lbi8GxdHH6c0aQ2Nt7Fq4nV9dJqy5tiGdg6OxqgU/rVmkdkv Rv1jrdtncstNRi9NBbKRRDp5DTqWboF35HJQpdIRpR8jJTLuuzzCimP5Hz9crKuO uXchIq2GtQB60NklZtPL15zMdmfdq+JHwdC14v05kB5Ai8NThGwKYQ3JF+krO3cG RKsAlrIq0AwPN8hAboLcKGzjLFFryaHZsa+d7elxaaDQz1FGz4uP56fIUURoGZuX vWTsKLRKcuPCYtnV6Frg2BCTB6nq1cRgjmMC9TABnraelZ3z0lDl4wFngg4aL2u6 QYOdP8L++/S1HAPOF7VhFYndXkbM3KoVLAepev8jvzRnwg4QVrqsvfgwFSdMNcMz ga7bJ4pUEP+Qq1i0qc41P9O708bCGm7TIw3CzTdKIZhc/l0t137lw1rhv67JfXZh cAni4osjhpdZUT0F9lIl/6OQB3Kgk6on3cs909Y/tT1srh9s+iVO1AwpGY1j5T4j gFRy90o2LBuepoI/8yF3 =NNtz -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.3-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master Second set of KVM/ARM changes for 4.3-rc2 - Workaround for a Cortex-A57 erratum - Bug fix for the debugging infrastructure - Fix for 32bit guests with more than 4GB of address space on a 32bit host - A number of fixes for the (unusual) case when we don't use the in-kernel GIC emulation - Removal of ThumbEE handling on arm64, since these have been dropped from the architecture before anyone actually ever built a CPU - Remove the KVM_ARM_MAX_VCPUS limitation which has become fairly pointless	2015-09-17 16:51:59 +02:00
Ming Lei	ef748917b5	arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' This patch removes config option of KVM_ARM_MAX_VCPUS, and like other ARCHs, just choose the maximum allowed value from hardware, and follows the reasons: 1) from distribution view, the option has to be defined as the max allowed value because it need to meet all kinds of virtulization applications and need to support most of SoCs; 2) using a bigger value doesn't introduce extra memory consumption, and the help text in Kconfig isn't accurate because kvm_vpu structure isn't allocated until request of creating VCPU is sent from QEMU; 3) the main effect is that the field of vcpus[] in 'struct kvm' becomes a bit bigger(sizeof(void *) per vcpu) and need more cache lines to hold the structure, but 'struct kvm' is one generic struct, and it has worked well on other ARCHs already in this way. Also, the world switch frequecy is often low, for example, it is ~2000 when running kernel building load in VM from APM xgene KVM host, so the effect is very small, and the difference can't be observed in my test at all. Cc: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-09-17 13:13:27 +01:00
Paolo Bonzini	62bea5bff4	KVM: add halt_attempted_poll to VCPU stats This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-16 12:17:00 +02:00
Jason Wang	8f4216c7d2	kvm: fix zero length mmio searching Currently, if we had a zero length mmio eventfd assigned on KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it always compares the kvm_io_range() with the length that guest wrote. This will cause e.g for vhost, kick will be trapped by qemu userspace instead of vhost. Fixing this by using zero length if an iodevice is zero length. Cc: stable@vger.kernel.org Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-15 16:59:46 +02:00
Jason Wang	eefd6b06b1	kvm: fix double free for fast mmio eventfd We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS and once on KVM_FAST_MMIO_BUS but with a single iodev instance. This will lead to an issue: kvm_io_bus_destroy() knows nothing about the devices on two buses pointing to a single dev. Which will lead to double free[1] during exit. Fix this by allocating two instances of iodevs then registering one on KVM_MMIO_BUS and another on KVM_FAST_MMIO_BUS. CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000 RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292 RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900 RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000 R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880 FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0 Stack: ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622 ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8 Call Trace: [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm] [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm] [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff811f69f7>] __fput+0xe7/0x250 [<ffffffff811f6bae>] ____fput+0xe/0x10 [<ffffffff81093f04>] task_work_run+0xd4/0xf0 [<ffffffff81079358>] do_exit+0x368/0xa50 [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60 [<ffffffff81079ad5>] do_group_exit+0x45/0xb0 [<ffffffff81085c71>] get_signal+0x291/0x750 [<ffffffff810144d8>] do_signal+0x28/0xab0 [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0 [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20 [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170 [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0 [<ffffffff817cb9af>] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00 RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm] RSP <ffff88020e7f3bc8> Cc: stable@vger.kernel.org Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-15 16:59:31 +02:00
Jason Wang	85da11ca58	kvm: factor out core eventfd assign/deassign logic This patch factors out core eventfd assign/deassign logic and leaves the argument checking and bus index selection to callers. Cc: stable@vger.kernel.org Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-15 16:58:47 +02:00
Jason Wang	8453fecbec	kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd We only want zero length mmio eventfd to be registered on KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to make sure this. Cc: stable@vger.kernel.org Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-15 16:58:27 +02:00
Wei Yang	81523aac01	KVM: make the declaration of functions within 80 characters After 'commit `0b8ba4a2b6` ("KVM: fix checkpatch.pl errors in kvm/coalesced_mmio.h")', the declaration of the two function will exceed 80 characters. This patch reduces the TAPs to make each line in 80 characters. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-14 18:43:19 +02:00
Paolo Bonzini	51256484c0	KVM/ARM changes for 4.3-rc2 - Fix timer interrupt injection after the rework that went in during the merge window - Reset the timer to zero on reboot - Make sure the TCR_EL2 RES1 bits are really set to 1 - Fix a PSCI affinity bug for non-existing vcpus -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV9VQYAAoJECPQ0LrRPXpDYFsQAIp+nIxv6LijQdq310EF7Z95 1j16vx9NfsAsNH0pwiRmx4gfNOHlPp6f30Kb1HJf2TN08Rc7TS8j4Dr3zYnZEdQj 9eNGTjCjlz93VQwKnTBvBvYkHsCEWC76pOxiYoFmnLsGc4m/OilHhohOzOZTAMFC Le07VXJwrhGHQ5bjdIMmFj+5rMj4eWfT2bYg8uVl5EUNFNkDgAMtT/Nbml90friC x+r2I9H+G8hHmemOv6hefW7l2JScLSYpqLeGdxZUtnGy9LNP3+jTD1QyqUftUPuS HtcB4zCtRk62nMupgFNV514Kf26wcwpS53vhd8aM2kxSkpr5xkUFoL+uJNT8ssIH Y+FwAOeTD0osWbmxw8Gf9d3qYzPBQSCRP3E4mCbNNNsTlMRU+Hu9au0VNBKggkfF fZ4UphCVX2C+cWaQcB40950h1F76Vx3boAfcMgiXoO+8LJx9OogDtr8+j5cBBbKU +tMngLhelsMtSCTam9c0jS6BsQfHDy3W1CQNl30cqd2RFjvygOSAqzdCMlBk0lLG 4Sq0eib1zJM+rE2OQs8SaAftyQ0kbElmM3XZB3ZiitaHgkUKV7kt9dMUG9Vfn2SE /ycgR9eoBq5P6FKhyROmL+0g824nYJmn4fwpecw1rHoRr8jyZjvpbE3VSL0aI5XW w4UYsAQSTcqM6bHpNYuu =G02h -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM changes for 4.3-rc2 - Fix timer interrupt injection after the rework that went in during the merge window - Reset the timer to zero on reboot - Make sure the TCR_EL2 RES1 bits are really set to 1 - Fix a PSCI affinity bug for non-existing vcpus	2015-09-14 17:07:35 +02:00
Wanpeng Li	edb9272f35	KVM: fix polling for guest halt continued even if disable it If there is already some polling ongoing, it's impossible to disable the polling, since as soon as somebody sets halt_poll_ns to 0, polling will never stop, as grow and shrink are only handled if halt_poll_ns is != 0. This patch fix it by reset vcpu->halt_poll_ns in order to stop polling when polling is disabled. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-14 17:07:04 +02:00
Linus Torvalds	33e247c7e5	Merge branch 'akpm' (patches from Andrew) Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...	2015-09-10 18:19:42 -07:00
Vladimir Davydov	1d7715c676	mmu-notifier: add clear_young callback In the scope of the idle memory tracking feature, which is introduced by the following patch, we need to clear the referenced/accessed bit not only in primary, but also in secondary ptes. The latter is required in order to estimate wss of KVM VMs. At the same time we want to avoid flushing tlb, because it is quite expensive and it won't really affect the final result. Currently, there is no function for clearing pte young bit that would meet our requirements, so this patch introduces one. To achieve that we have to add a new mmu-notifier callback, clear_young, since there is no method for testing-and-clearing a secondary pte w/o flushing tlb. The new method is not mandatory and currently only implemented by KVM. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-10 13:29:01 -07:00
Sudip Mukherjee	ba60c41ae3	kvm: irqchip: fix memory leak We were taking the exit path after checking ue->flags and return value of setup_routing_entry(), but 'e' was not freed incase of a failure. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-08 11:16:41 +02:00
Wanpeng Li	2cbd78244f	KVM: trace kvm_halt_poll_ns grow/shrink Tracepoint for dynamic halt_pool_ns, fired on every potential change. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-06 16:33:14 +02:00
Wanpeng Li	aca6ff29c4	KVM: dynamic halt-polling There is a downside of always-poll since poll is still happened for idle vCPUs which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected, and to shrink halt_poll_ns when long halt is detected. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. no-poll always-poll dynamic-poll ----------------------------------------------------------------------- Idle (nohz) vCPU %c0 0.15% 0.3% 0.2% Idle (250HZ) vCPU %c0 1.1% 4.6%~14% 1.2% TCP_RR latency 34us 27us 26.7us "Idle (X) vCPU %c0" is the percent of time the physical cpu spent in c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the guest was tickless. (250HZ) means the guest was ticking at 250HZ. The big win is with ticking operating systems. Running the linux guest with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close to no-polling overhead levels by using the dynamic-poll. The savings should be even higher for higher frequency ticks. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Simplify the patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-06 16:32:43 +02:00
Wanpeng Li	19020f8ab8	KVM: make halt_poll_ns per-vCPU Change halt_poll_ns into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-06 16:27:10 +02:00
Christoffer Dall	4ad9e16af3	arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0 Provide a better quality of implementation and be architecture compliant on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on reset of the timer. This change alone fixes the UEFI reset issue reported by Laszlo back in February. Cc: Laszlo Ersek <lersek@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Drew Jones <drjones@redhat.com> Cc: Wei Huang <wei@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-09-04 16:26:56 +01:00
Christoffer Dall	04bdfa8ab5	arm/arm64: KVM: vgic: Move active state handling to flush_hwstate We currently set the physical active state only when we inject a new pending virtual interrupt, but this is actually not correct, because we could have been preempted and run something else on the system that resets the active state to clear. This causes us to run the VM with the timer set to fire, but without setting the physical active state. The solution is to always check the LR configurations, and we if have a mapped interrupt in the LR in either the pending or active state (virtual), then set the physical active state. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-09-04 16:26:52 +01:00
Paolo Bonzini	e3dbc572fe	Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJV2D6aAAoJECszeR4D/txggBMP/3nHD3UjEAFUhhA6VjfK2wNw IW2aXQ5+2T51l1K8iSGMyKpW2w4zG5Bv9LdBP2badhaVpgM4//nVf7kcEBrdhjYq ns7V3klzTuNY5RBbWZz3Zri0mgCkJVF1XlC3xBzGPSNKpZyrkORhlxfg5GXig8lj pvUcku7XgkCFabAIIZmf0pg9hpDHpH3k1G9yZxuA8pys951IPRoo1CgsYmWSbmzh jfA2CxBl10dHZOuk/ENyJveJgtthmBB4ezCbWXy+wcMzBKhMC5R93LUoiKXMLWpM HkziNGjHA1gFSxDtfUVgkcXfan3a5JmlC+u50dLCTetXOVL7m2beIiXwv3smfjLn AkpcChceEChxn0MxwKJjNvU+RVh3kmv8rklfPlBXHTtQ5ZSXxlcxYrmgL64stmrt e27dzvJd9J7KX6wEpNyuZINsmFyn3lM3IoxqmSsVCRd43fzhZt9QGcYEXMIe1+lb E7QncsYMuuWB/sfSieyPaXtmK5ym2+R220xlKezBZdzWdtisPrpCRyl7BdiqCj6O 1gROi6qEyj3m5Qw/eGbFKBF0d8oVXqo1wBJkbihMl55D+jMeZMk673aeGhno8au1 kH+Im+H5xU3oEzdqvC9y3c9kE2sRkzj43GjepIb86Y463fg6KQ5j2gbZUZolGsGH AnRSGcbbVer/q+9kymPw =t+9t -----END PGP SIGNATURE----- Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8	2015-08-22 14:57:59 -07:00
Marc Zyngier	f120cd6533	KVM: arm/arm64: timer: Allow the timer to control the active state In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:26 +01:00
Marc Zyngier	773299a570	KVM: arm/arm64: vgic: Prevent userspace injection of a mapped interrupt Virtual interrupts mapped to a HW interrupt should only be triggered from inside the kernel. Otherwise, you could end up confusing the kernel (and the GIC's) state machine. Rearrange the injection path so that kvm_vgic_inject_irq is used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is used for mapped interrupts. The latter should only be called from inside the kernel (timer, irqfd). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:26 +01:00
Marc Zyngier	6e84e0e067	KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active In order to control the active state of an interrupt, introduce a pair of accessors allowing the state to be set/queried. This only affects the logical state, and the HW state will only be applied at world-switch time. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:26 +01:00
Marc Zyngier	08fd6461e8	KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest To allow a HW interrupt to be injected into a guest, we lookup the guest virtual interrupt in the irq_phys_map list, and if we have a match, encode both interrupts in the LR. We also mark the interrupt as "active" at the host distributor level. On guest EOI on the virtual interrupt, the host interrupt will be deactivated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:25 +01:00
Marc Zyngier	6c3d63c9a2	KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:25 +01:00
Marc Zyngier	7a67b4b7e0	KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs We only set the irq_queued flag for level interrupts, meaning that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate for all interrupts. This will allow us to inject edge HW interrupts, for which the state ACTIVE+PENDING is not allowed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:25 +01:00
Marc Zyngier	fb182cf845	KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR Now that struct vgic_lr supports the LR_HW bit and carries a hwirq field, we can encode that information into the list registers. This patch provides implementations for both GICv2 and GICv3. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-08-12 11:28:24 +01:00
Paolo Bonzini	dd489240a2	KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-30 16:02:54 +02:00
Paolo Bonzini	d71ba78834	KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 This is another remnant of ia64 support. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-29 14:27:21 +02:00
Paolo Bonzini	5544eb9b81	KVM: count number of assigned devices If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:25:26 +02:00
Peter Zijlstra	2ecd9d29ab	sched, preempt_notifier: separate notifier registration from static_key inc/dec Commit `1cde2930e1` ("sched/preempt: Add static_key() to preempt_notifiers") had two problems. First, the preempt-notifier API needs to sleep with the addition of the static_key, we do however need to hold off preemption while modifying the preempt notifier list, otherwise a preemption could observe an inconsistent list state. KVM correctly registers and unregisters preempt notifiers with preemption disabled, so the sleep caused dmesg splats. Second, KVM registers and unregisters preemption notifiers very often (in vcpu_load/vcpu_put). With a single uniprocessor guest the static key would move between 0 and 1 continuously, hitting the slow path on every userspace exit. To fix this, wrap the static_key inc/dec in a new API, and call it from KVM. Fixes: `1cde2930e1` ("sched/preempt: Add static_key() to preempt_notifiers") Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-03 18:55:00 +02:00
Linus Torvalds	e3d8238d7f	arm64 updates for 4.2, mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fixing another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVitZgAAoJEGvWsS0AyF7x+ToP/0Yci5bNsYVwVay8N1rK6WHh SGzDMzyxcSBjQpz2IhhTJ28eTAEH+a+HWQms+0tBehjqxqkvjuzBN0okDkc/z8NB 7Z0BV2aLkQcMwTbjgIh5jm25ZpGmvmvbWPD5oBwgmgQ4v4i1OLRKgx7+YQ+z9rWZ zC70d0UwyWjs2oxmjd2ZrAkps7v/qozEFhcRHxLzCn8Mbw+3FcTQsqMbfnoWGnH0 YuGfHQQqBY4/HC7uAslMCy7tXeJXqb+NsgrnAovjfEbHGDjsg0KNl06K++LHwE37 A5Noa3M0AQEPYqx/sg0Ec8RNUUEMB4RA2DCaibp7XlVGncXOwFfiyk/M5uVrYXIO ku5QF0ytUfZKzrMq3yQKbEDuCPOFTqjjdVpkeXKFdW66zYTohKVc3vUBV5xHZ5uO 7Kr8H0ZnhAv3OxPyKdEwAuQ5sJdWwQSvZyGClxMUO4eC/UzD0Mwwf1Y8WYtiAXx+ NSTeBKw/m33W3/KhNuNH1+qGEOKhuXuKX7AcYA84Rab8ytxYWcurHCG2bmhzpEse 2DZtNMybrP/HMQPyJlYgGac8B3QbsAIAkkU1f+dJTAv9otuBDhscaDQyZ9Y6WVht /k8zJiZeMEuGAmwgTkzLmWs/8pQq42nW4J4eQdXPZAwp4ghCIypPWfaZASAwee6/ p+es3v8P4k9wkv2TFZMh =YeGl -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fix another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: use private ratelimit state along with show_unhandled_signals arm64: show unhandled SP/PC alignment faults arm64: vdso: work-around broken ELF toolchains in Makefile arm64: kernel: rename __cpu_suspend to keep it aligned with arm arm64: compat: print compat_sp instead of sp arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP arm64: entry: fix context tracking for el0_sp_pc arm64: defconfig: enable memtest arm64: mm: remove reference to tlb.S from comment block arm64: Do not attempt to use init_mm in reset_context() arm64: KVM: Switch vgic save/restore to alternative_insn arm64: alternative: Introduce feature for GICv3 CPU interface arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning arm64: fix bug for reloading FPSIMD state after CPU hotplug. arm64: kernel thread don't need to save fpsimd context. arm64: fix missing syscall trace exit arm64: alternative: Work around .inst assembler bugs arm64: alternative: Merge alternative-asm.h into alternative.h arm64: alternative: Allow immediate branch as alternative instruction arm64: Rework alternate sequence for ARM erratum 845719 ...	2015-06-24 10:02:15 -07:00
Kevin Mulvey	0b8ba4a2b6	KVM: fix checkpatch.pl errors in kvm/coalesced_mmio.h Tabs rather than spaces Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:26 +02:00
Kevin Mulvey	d626f3d5b3	KVM: fix checkpatch.pl errors in kvm/async_pf.h fix brace spacing Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:25 +02:00
Joerg Roedel	e73f61e41f	kvm: irqchip: Break up high order allocations of kvm_irq_routing_table The allocation size of the kvm_irq_routing_table depends on the number of irq routing entries because they are all allocated with one kzalloc call. When the irq routing table gets bigger this requires high order allocations which fail from time to time: qemu-kvm: page allocation failure: order:4, mode:0xd0 This patch fixes this issue by breaking up the allocation of the table and its entries into individual kzalloc calls. These could all be satisfied with order-0 allocations, which are less likely to fail. The downside of this change is the lower performance, because of more calls to kzalloc. But given how often kvm_set_irq_routing is called in the lifetime of a guest, it doesn't really matter much. Signed-off-by: Joerg Roedel <jroedel@suse.de> [Avoid sparse warning through rcu_access_pointer. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:25 +02:00
Paolo Bonzini	05fe125fa3	KVM/ARM changes for v4.2: - Proper guest time accounting - FP access fix for 32bit - The usual pile of GIC fixes - PSCI fixes - Random cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVg/c2AAoJECPQ0LrRPXpDjSoQALjxhdY0ROAz2yjVqcYD5Goj 8XHLbwDxlXQIqXulEakpx/2CtQs3tYsnt7gg3HoO3aY0TqxKY+0nh9RuZ/b15KcP hailUK/hbdLo9sDmoPBRBBP26gqjnXVMYTNqSFk4u2KADYUYgl5auhXcTNARH6Gx +c3LJG9HUbbvKkTG+RB+Xgl3f9N0uaIaV15oTX+WsolNIQVmA8Sh5X7nXPnFXV4r klb9pcR3FPWA9m+fDW8VszoLZ39a12kfQsK0yshKy5ep/AA1liPEYDVc6Wp/cEqz ybDfGuO9GPoPLfzhYq1Cw/SeKaAFruu6QD1ugsSlh1DVZ2evme+OsEvzRsfCqEYc VkKkX2Rc8DQTjsJoaTCOmNfU+v0dWwriCXmfLPBj01+55M4oCDcntTasrEsDAc3o YkTk68T9HJxsNesWpkZDo0pYzGeu3GDS1Hg/ElESminCNfK/S/GBTtidLq7Rs7KU VJCaUeob6CoQnRdmNEm+nHInh5pSoOlKzdduFM9IcvOO5B12YlNMcVnr48Y16ea7 ElYGkxR/n/uLt33A3Yvaj8+PeJbwxQ1RcWRDJy8Fw77AFLEaT2VTr2kZgJIqTatQ ENnCLMxNP2uAt1TrGhEempndYZdWK/RiuaA9MjnxSscFkrXPfn4bMJzV58ghF9L+ 62hk2S8I8q69SvqbPTgv =VQWP -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM changes for v4.2: - Proper guest time accounting - FP access fix for 32bit - The usual pile of GIC fixes - PSCI fixes - Random cleanups	2015-06-19 17:15:24 +02:00
Marc Zyngier	c62e631d4a	KVM: arm/arm64: vgic: Remove useless arm-gic.h #include Back in the days, vgic.c used to have an intimate knowledge of the actual GICv2. These days, this has been abstracted away into hardware-specific backends. Remove the now useless arm-gic.h #include directive, making it clear that GICv2 specific code doesn't belong here. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-18 15:50:31 +01:00
Marc Zyngier	4839ddc27b	KVM: arm/arm64: vgic: Avoid injecting reserved IRQ numbers Commit `fd1d0ddf2a` (KVM: arm/arm64: check IRQ number on userland injection) rightly limited the range of interrupts userspace can inject in a guest, but failed to consider the (unlikely) case where a guest is configured with 1024 interrupts. In this case, interrupts ranging from 1020 to 1023 are unuseable, as they have a special meaning for the GIC CPU interface. Make sure that these number cannot be used as an IRQ. Also delete a redundant (and similarily buggy) check in kvm_set_irq. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Andre Przywara <andre.przywara@arm.com> Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18 Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-17 15:18:59 +01:00
Marc Zyngier	f5a202db12	KVM: arm: vgic: Drop useless Group0 warning If a GICv3-enabled guest tries to configure Group0, we print a warning on the console (because we don't support Group0 interrupts). This is fairly pointless, and would allow a guest to spam the console. Let's just drop the warning. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-17 09:58:12 +01:00
Marc Zyngier	8a14849b4a	arm64: KVM: Switch vgic save/restore to alternative_insn So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-12 15:12:08 +01:00
Andre Przywara	c11b532910	KVM: arm64: add active register handling to GICv3 emulation as well Commit `47a98b15ba` ("arm/arm64: KVM: support for un-queuing active IRQs") introduced handling of the GICD_I[SC]ACTIVER registers, but only for the GICv2 emulation. For the sake of completeness and as this is a pre-requisite for save/restore of the GICv3 distributor state, we should also emulate their handling in the distributor and redistributor frames of an emulated GICv3. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-09 18:05:17 +01:00
Paolo Bonzini	f481b069e6	KVM: implement multiple address spaces Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:35 +02:00
Paolo Bonzini	8e73485c79	KVM: add vcpu-specific functions to read/write/translate GFNs We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to use different address spaces, depending on whether the VCPU is in system management mode. We need to introduce a new family of functions for this purpose. For now, the VCPU-based functions have the same behavior as the existing per-VM ones, they just accept a different type for the first argument. Later however they will be changed to use one of many "struct kvm_memslots" stored in struct kvm, through an architecture hook. VM-based functions will unconditionally use the first memslots pointer. Whenever possible, this patch introduces slot-based functions with an __ prefix, with two wrappers for generic and vcpu-based actions. The exceptions are kvm_read_guest and kvm_write_guest, which are copied into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:34 +02:00
Paolo Bonzini	bc009e4330	KVM: remove unused argument from mark_page_dirty_in_slot Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:43:37 +02:00
Paolo Bonzini	e37afc6ee5	KVM: remove __gfn_to_pfn Most of the function that wrap it can be rewritten without it, except for gfn_to_pfn_prot. Just inline it into gfn_to_pfn_prot, and rewrite the other function on top of gfn_to_pfn_memslot*. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:43:34 +02:00
Paolo Bonzini	d9ef13c2b3	KVM: pass kvm_memory_slot to gfn_to_page_many_atomic The memory slot is already available from gfn_to_memslot_dirty_bitmap. Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic agnostic of multiple VCPU address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:43:33 +02:00
Paolo Bonzini	f36f3f2846	KVM: add "new" argument to kvm_arch_commit_memory_region This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:42:58 +02:00
Paolo Bonzini	15f46015ee	KVM: add memslots argument to kvm_arch_memslots_updated Prepare for the case of multiple address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:17 +02:00
Paolo Bonzini	09170a4942	KVM: const-ify uses of struct kvm_userspace_memory_region Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:13 +02:00
Paolo Bonzini	9f6b802978	KVM: use kvm_memslots whenever possible kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:08 +02:00
Paolo Bonzini	a47d2b07ea	KVM: introduce kvm_alloc/free_memslots kvm_alloc_memslots is extracted out of previously scattered code that was in kvm_init_memslots_id and kvm_create_vm. kvm_free_memslot and kvm_free_memslots are new names of kvm_free_physmem and kvm_free_physmem_slot, but they also take an explicit pointer to struct kvm_memslots. This will simplify the transition to multiple address spaces, each represented by one pointer to struct kvm_memslots. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:39:53 +02:00
Paolo Bonzini	3520469d65	KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:45 +02:00
Heiko Carstens	a4cca3b419	KVM: remove pointless cpu hotplug messages On cpu hotplug only KVM emits an unconditional message that its notifier has been called. It certainly can be assumed that calling cpu hotplug notifiers work, therefore there is no added value if KVM prints a message. If an error happens on cpu online KVM will still emit a warning. So let's remove this superfluous message. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:46 +02:00
Radim Krčmář	251eb84144	KVM: reuse memslot in kvm_write_guest_page Caching memslot value and using mark_page_dirty_in_slot() avoids another O(log N) search when dirtying the page. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428695247-27603-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-08 10:51:38 +02:00
Paolo Bonzini	2fa462f826	KVM/ARM changes for v4.1, take #2 : Rather small this time: - a fix for a nasty bug with virtual IRQ injection - a fix for irqfd -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVN7ecAAoJECPQ0LrRPXpDmcEP/jTcP89kPm8r7xk0YggalpQT 4An+Nx5dnumO7ZJMVZM5eqBuXT3C2tvgKdz1VpEBr9J7F31QBEANCXd9P6smEhXI /0GuJ82tz+VC/XaVFUXsmtSW0Kkt7dIEJ29OCHdSjchxsM3C4X175IMhtVvKDGOc qVvHJllS4WBddNRTrWuk5su62QmEra6TgQMdXPabtcVcUYgQ3DLHnp/rvlYFsCfD MvZKCrlqAHNftzxW4fcimPdaToUWVNmKADm0ejujIxDRfNBH/6w7vmcFkNICAyqa Q/UgVKpVxa/baRr6jKyXh32S0ewQSki2OViIijs45tHXlSQMgUnOQ9ZMKkgR215q J5QsQFDJ5QjVth42CX3jAN6+iZFO1QX7xMBCutjg/If8rL+Cv5zKEGPzv4mOjt2M slSvgXo0Rv2c/o+7XOz15rCIW8Io4+MA2y1M6htZ8UCVAmQCGwkvlIo/9ZU2b8Hl NGnA+C8thXtrsMxjLfNWFQBGlBHKWuptTpB0qypws0C/KSXG3LM/R41q+WXqmJKy UZJWgC75ITeRle9PAHSoIzdDiH+hNhhUUyRRruLXoses98lIgUyuTvHNFhPHjIfC V/S6Ai8lBAMk/jBw/iMH1G+ubw2i2ZzFVw132NQrpucPchZvZ1DYMPA1tZEQT7Tl QkfD5j64y5qiQWbbqfZL =Cfx3 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM changes for v4.1, take #2: Rather small this time: - a fix for a nasty bug with virtual IRQ injection - a fix for irqfd	2015-04-22 17:08:12 +02:00
Andre Przywara	fd1d0ddf2a	KVM: arm/arm64: check IRQ number on userland injection When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] pgd=00000000f658b003, pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-04-22 15:42:24 +01:00
Eric Auger	0b3289ebc2	KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi irqfd/arm curently does not support routing. kvm_irq_map_gsi is supposed to return all the routing entries associated with the provided gsi and return the number of those entries. We should return 0 at this point. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-04-22 15:37:54 +01:00
Paul Mackerras	e23a808b16	KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-04-21 15:21:31 +02:00
Linus Torvalds	9003601310	The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs= =yu2b -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...	2015-04-13 09:47:01 -07:00
Radim Krčmář	ca3f087472	KVM: use slowpath for cross page cached accesses kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: `8f964525a1` ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-10 16:04:45 +02:00
Paolo Bonzini	3180a7fcbc	KVM: remove kvm_read_hva and kvm_read_hva_atomic The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit `86ab8cffb4` (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>	2015-04-08 10:46:54 +02:00
Paolo Bonzini	7f22b45d66	Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJVGvEUAAoJEBF7vIC1phx82tEP/3KrwsDRs+buiBqyv9k+qCFV v+R94gReBB5ggfbGfUYgBJMR2/4XQ+0jcZ55jfBCC4osOq6Juw/8HIj2nSgbQHmz F9Go0n8IqJ3DnqPTc0KYdFZ7kqDvMV5ME3XJrFiAHv1TUL9H/KpZArkcVIwD2NOo w01AVrCDY4bTajYqKShzGFymQl1K5vTGGvgxhh4kAHct4Nt5N5HFmyROm0RrsFZx Sycx4t177O7zhCN2tv5Zy8iWaEvzHAESoXkhZ2cJ6t+FXii2Eov5IgyyfYRXBfbm YACyvlFD087UdFGTt85ggPVS/S/5hn9xXmVHuIimHeyZU7CXCN5vYPcn+ZyksYr5 uA8+/2OPAgcaeDa2f7nCjl8jmcLR3hkQ0n/urA+pPYAZANJoFDfiGOr/kVk6aKff JTGSFUjNK891/IGEsdrSk2p64U5xMd8LFa3Il++kZT91gc2nrZOHNz5FGlXlkLdJ sADeNFWhoprEt/2P4aX6W2j26L8G874XkldDSjrS41U8L55+IiEm09r8oAWgfc5A pryeDaN4nSjFC+HOtlPkcVkAcsswiI6nHIm3+/XFetCq+v4pnVKFMHWsTeEjiQgQ H5aV9mfEKTJaCPrAJMsj8ZsKq0usG+BeRNqpIvxPAQB8fyl3jw9iu+RHeY1xWsTg BRHB/+CGYIxDu4XdRexv =Rrx5 -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection	2015-04-07 18:10:03 +02:00
Paolo Bonzini	bf0fb67cf9	KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVHQ0kAAoJECPQ0LrRPXpDHKQQALjw6STaZd7n20OFopNgHd4P qVeWYEKBxnsiSvL4p3IOSlZlEul+7x08aZqtyxWQRQcDT4ggTI+3FKKfc+8yeRpH WV6YJP0bGqz7039PyMLuIgs48xkSZtntePw69hPJfHZh4C1RBlP5T2SfE8mU8VZX fWToiU3W12QfKnmN7JFgxZopnGhrYCrG0EexdTDziAZu0GEMlDrO4wnyTR60WCvT 4TEF73R0kpAz4yplKuhcDHuxIG7VFhQ4z7b09M1JtR0gQ3wUvfbD3Wqqi49SwHkv NQOStcyLsIlDosSRcLXNCwb3IxjObXTBcAxnzgm2Aoc1xMMZX1ZPQNNs6zFZzycb 2c6QMiQ35zm7ellbvrG+bT+BP86JYWcAtHjWcaUFgqSJjb8MtqcMtsCea/DURhqx /kictqbPYBBwKW6SKbkNkisz59hPkuQnv35fuf992MRCbT9LAXLPRLbcirucCzkE p1MOotsWoO3ldJMZaVn0KYk3sQf6mCIfbYPEdOcw3fhJlvyy3NdjVkLOFbA5UUg1 rQ7Ru2rTemBc0ExVrymngNTMpMB4XcEeJzXfhcgMl3DWbDj60Ku/O26sDtZ6bsFv JuDYn8FVDHz9gpEQHgiUi1YMsBKXLhnILa1ppaa6AflykU3BRfYjAk1SXmX84nQK mJUJEdFuxi6pHN0UKxUI =avA4 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups	2015-04-07 18:09:20 +02:00
Paolo Bonzini	8999602d08	Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVBs95AAoJEEtpOizt6ddyAfoH/Rwj2T38ZDikImPpgfeFrmJs ZlWC+Z3akwjVHPv308/MsKdyashtA7OjiMp3DOheMFMYJay/ecgY/92vFCc6uh5S LDoXCbp+Pneth6C6bbU2Gw+aoCD07ZYCn9PeFq40MfpQUhCEGWhx41OFzHppqOZx e+jodHRE+sBVTFUtbz+HubAfcM46f/8bP7682CEKsVZPeTSiHyeojdZEglfB37MG ar/iC1/cyO/097vWaBqv1t1WZoHbWmMrDlzo5X+AtayVXFNdv4Ztw0Rz2kRhnLB8 8GXYawoSQoTN8FX1oyTyr5YWcWD7wDTzhcHsHS1xZHhvrdLCEcFrHeEWkuUlYjU= =YS6j -----END PGP SIGNATURE----- Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest.	2015-04-07 18:06:01 +02:00
Jens Freimann	47b43c52ee	KVM: s390: add ioctl to inject local interrupts We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2015-03-31 21:07:30 +02:00
Andre Przywara	950324ab81	KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-30 17:07:19 +01:00
Andre Przywara	fb8f61abab	KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-30 17:07:13 +01:00
Andre Przywara	0ba10d5392	KVM: arm/arm64: merge GICv3 RD_base and SGI_base register frames Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-30 17:07:08 +01:00
Andre Przywara	a9cf86f62b	KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_bus Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:15 +00:00
Andre Przywara	6777f77f0f	KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:14 +00:00
Andre Przywara	9f199d0a0e	KVM: arm/arm64: simplify vgic_find_range() and callers The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:14 +00:00
Andre Przywara	cf50a1eb43	KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_range The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:14 +00:00
Andre Przywara	af669ac6dc	KVM: move iodev.h from virt/kvm/ to include/kvm iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:12 +00:00
Nikolay Nikolaev	e32edf4fd0	KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:11 +00:00
Igor Mammedov	744961341d	kvm: avoid page allocation failure in kvm_set_memory_region() KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-23 21:23:44 -03:00
Takuya Yoshikawa	58d2930f4e	KVM: Eliminate extra function calls in kvm_get_dirty_log_protect() When all bits in mask are not set, kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since it needs to be called from the generic code, it cannot be inlined, and a few function calls, two when PML is enabled, are wasted. Since it is common to see many pages remain clean, e.g. framebuffers can stay calm for a long time, it is worth eliminating this overhead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-18 22:23:33 -03:00
Marcelo Tosatti	f710a12d73	Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVBs95AAoJEEtpOizt6ddyAfoH/Rwj2T38ZDikImPpgfeFrmJs ZlWC+Z3akwjVHPv308/MsKdyashtA7OjiMp3DOheMFMYJay/ecgY/92vFCc6uh5S LDoXCbp+Pneth6C6bbU2Gw+aoCD07ZYCn9PeFq40MfpQUhCEGWhx41OFzHppqOZx e+jodHRE+sBVTFUtbz+HubAfcM46f/8bP7682CEKsVZPeTSiHyeojdZEglfB37MG ar/iC1/cyO/097vWaBqv1t1WZoHbWmMrDlzo5X+AtayVXFNdv4Ztw0Rz2kRhnLB8 8GXYawoSQoTN8FX1oyTyr5YWcWD7wDTzhcHsHS1xZHhvrdLCEcFrHeEWkuUlYjU= =YS6j -----END PGP SIGNATURE----- Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest.	2015-03-16 20:08:56 -03:00
Christoffer Dall	1a74847885	arm/arm64: KVM: Fix migration race in the arch timer When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-14 13:48:00 +01:00
Christoffer Dall	47a98b15ba	arm/arm64: KVM: support for un-queuing active IRQs Migrating active interrupts causes the active state to be lost completely. This implements some additional bitmaps to track the active state on the distributor and export this to user space. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-14 13:46:44 +01:00
Alex Bennée	71760950bf	arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn This helps re-factor away some of the repetitive code and makes the code flow more nicely. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-14 13:44:52 +01:00
Christoffer Dall	ae705930fc	arm/arm64: KVM: Keep elrsr/aisr in sync with software model There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-14 13:42:07 +01:00
Wei Yongjun	b52104e509	arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create() Add the missing unlock before return from function kvm_vgic_create() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-13 11:40:57 +01:00
Eric Auger	174178fed3	KVM: arm/arm64: add irqfd support This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-12 15:15:34 +01:00
Eric Auger	649cf73994	KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstate To prepare for irqfd addition, coarse grain locking is removed at kvm_vgic_sync_hwstate level and finer grain locking is introduced in vgic_process_maintenance only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-12 15:15:33 +01:00
Eric Auger	01c94e64f5	KVM: introduce kvm_arch_intc_initialized and use it in irqfd Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and associated kvm_arch_intc_initialized function. This latter allows to test whether the virtual interrupt controller is initialized and ready to accept virtual IRQ injection. On some architectures, the virtual interrupt controller is dynamically instantiated, justifying that kind of check. The new function can now be used by irqfd to check whether the virtual interrupt controller is ready on KVM_IRQFD request. If not, KVM_IRQFD returns -EAGAIN. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-12 15:15:32 +01:00
Mark Rutland	0f37247574	KVM: vgic: add virt-capable compatible strings Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC compatible list, and while this is correct (and supported by the GIC driver), KVM will fail to detect that it can support these cases. This patch adds the missing strings to the VGIC code. The of_device_id entries are padded to keep the probe function data aligned. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-03-11 15:27:48 +01:00
Paolo Bonzini	dc9be0fac7	kvm: move advertising of KVM_CAP_IRQFD to common code POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: `297e21053a` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 21:18:59 -03:00
Xiubo Li	1170adc6dd	KVM: Use pr_info/pr_err in kvm_main.c WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... + printk(KERN_INFO "kvm: exiting hardware virtualization\n"); WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR "kvm: misc device register failed\n"); Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:45 -03:00
Xiubo Li	20e87b7224	KVM: Fix indentation in kvm_main.c ERROR: code indent should use tabs where possible + const struct kvm_io_range r2)$ WARNING: please, no spaces at the start of a line + const struct kvm_io_range r2)$ This patch fixes this ERROR & WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:44 -03:00
Xiubo Li	b7d409deb9	KVM: no space before tabs in kvm_main.c WARNING: please, no space before tabs + * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$ WARNING: please, no space before tabs +^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$ WARNING: please, no space before tabs +^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$ This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:44 -03:00
Xiubo Li	f95ef0cd02	KVM: Missing blank line after declarations in kvm_main.c There are many Warnings like this: WARNING: Missing a blank line after declarations + struct kvm_coalesced_mmio_zone zone; + r = -EFAULT; This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:44 -03:00
Xiubo Li	ee543159d5	KVM: EXPORT_SYMBOL should immediately follow its function WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL_GPL(gfn_to_page); This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:44 -03:00
Xiubo Li	f4fee93270	KVM: Fix ERROR: do not initialise statics to 0 or NULL in kvm_main.c ERROR: do not initialise statics to 0 or NULL +static int kvm_usage_count = 0; The kvm_usage_count will be placed to .bss segment when linking, so not need to set it to 0 here obviously. This patch fixes this ERROR to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:44 -03:00
Xiubo Li	a642a17567	KVM: Fix WARNING: labels should not be indented in kvm_main.c WARNING: labels should not be indented + out_free_irq_routing: This patch fixes this WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:43 -03:00
Xiubo Li	893bdbf165	KVM: Fix WARNINGs for 'sizeof(X)' instead of 'sizeof X' in kvm_main.c There are many WARNINGs like this: WARNING: sizeof tr should be sizeof(tr) + if (copy_from_user(&tr, argp, sizeof tr)) In kvm_main.c many places are using 'sizeof(X)', and the other places are using 'sizeof X', while the kernel recommands to use 'sizeof(X)', so this patch will replace all 'sizeof X' to 'sizeof(X)' to make them consistent and at the same time to reduce the WARNINGs noise when we are checking new patches. Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:43 -03:00
Thomas Huth	548ef28449	KVM: Get rid of kvm_kvfree() kvm_kvfree() provides exactly the same functionality as the new common kvfree() function - so let's simply replace the kvm function with the common function. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:43 -03:00
Christian Borntraeger	0fa9778895	KVM: make halt_poll_ns static halt_poll_ns is used only locally. Make it static. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:43 -03:00
Kevin Mulvey	ae548c5c80	KVM: fix checkpatch.pl errors in kvm/irqchip.c Fix whitespace around while Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:42 -03:00
Kevin Mulvey	bfda0e8491	KVM: white space formatting in kvm_main.c Better alignment of loop using tabs rather than spaces, this makes checkpatch.pl happier. Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:42 -03:00
Linus Torvalds	b9085bcbf5	Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: the highlights are support for GICv3 emulation and dirty page tracking s390: several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. ARM has other conflicts where functions are added in the same place by 3.19-rc and 3.20 patches. These are not large though, and entirely within KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak= =7gdx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...	2015-02-13 09:55:09 -08:00
Andrea Arcangeli	0664e57ff0	mm: gup: kvm use get_user_pages_unlocked Use the more generic get_user_pages_unlocked which has the additional benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault (which allows the first page fault in an unmapped area to be always able to block indefinitely by being allowed to release the mmap_sem). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:05 -08:00
Christian Borntraeger	de8e5d7440	KVM: Disable compat ioctl for s390 We never had a 31bit QEMU/kuli running. We would need to review several ioctls to check if this creates holes, bugs or whatever to make it work. Lets just disable compat support for KVM on s390. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-09 12:44:14 +01:00
Paolo Bonzini	f781951299	kvm: add halt_poll_ns module parameter This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-06 13:08:37 +01:00
Kai Huang	3b0f1d01e5	KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:30:38 +01:00
Tiejun Chen	b0165f1b41	kvm: update_memslots: clean flags for invalid memslots Indeed, any invalid memslots should be new->npages = 0, new->base_gfn = 0 and new->flags = 0 at the same time. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-27 21:31:44 +01:00
Christoffer Dall	4b99058995	KVM: Remove unused config symbol The dirty patch logging series introduced both HAVE_KVM_ARCH_DIRTY_LOG_PROTECT and KVM_GENERIC_DIRTYLOG_READ_PROTECT config symbols, but only KVM_GENERIC_DIRTYLOG_READ_PROTECT is used. Just remove the unused one. (The config symbol was renamed during the development of the patch series and the old name just creeped in by accident.() Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-23 10:52:03 +01:00
Andre Przywara	4fa96afd94	arm/arm64: KVM: force alignment of VGIC dist/CPU/redist addresses Although the GIC architecture requires us to map the MMIO regions only at page aligned addresses, we currently do not enforce this from the kernel side. Restrict any vGICv2 regions to be 4K aligned and any GICv3 regions to be 64K aligned. Document this requirement. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:33 +01:00
Andre Przywara	ac3d373564	arm/arm64: KVM: allow userland to request a virtual GICv3 With all of the GICv3 code in place now we allow userland to ask the kernel for using a virtual GICv3 in the guest. Also we provide the necessary support for guests setting the memory addresses for the virtual distributor and redistributors. This requires some userland code to make use of that feature and explicitly ask for a virtual GICv3. Document that KVM_CREATE_IRQCHIP only works for GICv2, but is considered legacy and using KVM_CREATE_DEVICE is preferred. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:33 +01:00
Andre Przywara	b5d84ff600	arm/arm64: KVM: enable kernel side of GICv3 emulation With all the necessary GICv3 emulation code in place, we can now connect the code to the GICv3 backend in the kernel. The LR register handling is different depending on the emulated GIC model, so provide different implementations for each. Also allow non-v2-compatible GICv3 implementations (which don't provide MMIO regions for the virtual CPU interface in the DT), but restrict those hosts to support GICv3 guests only. If the device tree provides a GICv2 compatible GICV resource entry, but that one is faulty, just disable the GICv2 emulation and let the user use at least the GICv3 emulation for guests. To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl, note virtual GICv2 compatibility in struct vgic_params and use it on creating a VGICv2. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:32 +01:00
Andre Przywara	6d52f35af1	arm64: KVM: add SGI generation register emulation While the generation of a (virtual) inter-processor interrupt (SGI) on a GICv2 works by writing to a MMIO register, GICv3 uses the system register ICC_SGI1R_EL1 to trigger them. Add a trap handler function that calls the new SGI register handler in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0, this will not trap yet, but will only be used later when all the data structures have been initialized properly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:32 +01:00
Andre Przywara	a0675c25d6	arm/arm64: KVM: add virtual GICv3 distributor emulation With everything separated and prepared, we implement a model of a GICv3 distributor and redistributors by using the existing framework to provide handler functions for each register group. Currently we limit the emulation to a model enforcing a single security state, with SRE==1 (forcing system register access) and ARE==1 (allowing more than 8 VCPUs). We share some of the functions provided for GICv2 emulation, but take the different ways of addressing (v)CPUs into account. Save and restore is currently not implemented. Similar to the split-off of the GICv2 specific code, the new emulation code goes into a new file (vgic-v3-emul.c). Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:31 +01:00
Andre Przywara	9fedf14677	arm/arm64: KVM: add opaque private pointer to MMIO data For a GICv2 there is always only one (v)CPU involved: the one that does the access. On a GICv3 the access to a CPU redistributor is memory-mapped, but not banked, so the (v)CPU affected is determined by looking at the MMIO address region being accessed. To allow passing the affected CPU into the accessors later, extend struct kvm_exit_mmio to add an opaque private pointer parameter. The current GICv2 emulation just does not use it. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:30 +01:00
Andre Przywara	1d916229e3	arm/arm64: KVM: split GICv2 specific emulation code from vgic.c vgic.c is currently a mixture of generic vGIC emulation code and functions specific to emulating a GICv2. To ease the addition of GICv3, split off strictly v2 specific parts into a new file vgic-v2-emul.c. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> ------- As the diff isn't always obvious here (and to aid eventual rebases), here is a list of high-level changes done to the code: * added new file to respective arm/arm64 Makefiles * moved GICv2 specific functions to vgic-v2-emul.c: - handle_mmio_misc() - handle_mmio_set_enable_reg() - handle_mmio_clear_enable_reg() - handle_mmio_set_pending_reg() - handle_mmio_clear_pending_reg() - handle_mmio_priority_reg() - vgic_get_target_reg() - vgic_set_target_reg() - handle_mmio_target_reg() - handle_mmio_cfg_reg() - handle_mmio_sgi_reg() - vgic_v2_unqueue_sgi() - read_set_clear_sgi_pend_reg() - write_set_clear_sgi_pend_reg() - handle_mmio_sgi_set() - handle_mmio_sgi_clear() - vgic_v2_handle_mmio() - vgic_get_sgi_sources() - vgic_dispatch_sgi() - vgic_v2_queue_sgi() - vgic_v2_map_resources() - vgic_v2_init() - vgic_v2_add_sgi_source() - vgic_v2_init_model() - vgic_v2_init_emulation() - handle_cpu_mmio_misc() - handle_mmio_abpr() - handle_cpu_mmio_ident() - vgic_attr_regs_access() - vgic_create() (renamed to vgic_v2_create()) - vgic_destroy() (renamed to vgic_v2_destroy()) - vgic_has_attr() (renamed to vgic_v2_has_attr()) - vgic_set_attr() (renamed to vgic_v2_set_attr()) - vgic_get_attr() (renamed to vgic_v2_get_attr()) - struct kvm_mmio_range vgic_dist_ranges[] - struct kvm_mmio_range vgic_cpu_ranges[] - struct kvm_device_ops kvm_arm_vgic_v2_ops {} Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:30 +01:00
Andre Przywara	832158125d	arm/arm64: KVM: add vgic.h header file vgic.c is currently a mixture of generic vGIC emulation code and functions specific to emulating a GICv2. To ease the addition of GICv3 later, we create new header file vgic.h, which holds constants and prototypes of commonly used functions. Rename some identifiers to avoid name space clutter. I removed the long-standing comment about using the kvm_io_bus API to tackle the GIC register ranges, as it wouldn't be a win for us anymore. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> ------- As the diff isn't always obvious here (and to aid eventual rebases), here is a list of high-level changes done to the code: * moved definitions and prototypes from vgic.c to vgic.h: - VGIC_ADDR_UNDEF - ACCESS_{READ,WRITE}_* - vgic_init() - vgic_update_state() - vgic_kick_vcpus() - vgic_get_vmcr() - vgic_set_vmcr() - struct mmio_range {} (renamed to struct kvm_mmio_range) * removed static keyword and exported prototype in vgic.h: - vgic_bitmap_get_reg() - vgic_bitmap_set_irq_val() - vgic_bitmap_get_shared_map() - vgic_bytemap_get_reg() - vgic_dist_irq_set_pending() - vgic_dist_irq_clear_pending() - vgic_cpu_irq_clear() - vgic_reg_access() - handle_mmio_raz_wi() - vgic_handle_enable_reg() - vgic_handle_set_pending_reg() - vgic_handle_clear_pending_reg() - vgic_handle_cfg_reg() - vgic_unqueue_irqs() - find_matching_range() (renamed to vgic_find_range) - vgic_handle_mmio_range() - vgic_update_state() - vgic_get_vmcr() - vgic_set_vmcr() - vgic_queue_irq() - vgic_kick_vcpus() - vgic_init() - vgic_v2_init_emulation() - vgic_has_attr_regs() - vgic_set_common_attr() - vgic_get_common_attr() - vgic_destroy() - vgic_create() * moved functions to vgic.h (static inline): - mmio_data_read() - mmio_data_write() - is_in_range() Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:30 +01:00
Andre Przywara	b60da146c1	arm/arm64: KVM: refactor/wrap vgic_set/get_attr() vgic_set_attr() and vgic_get_attr() contain both code specific for the emulated GIC as well as code for the userland facing, generic part of the GIC. Split the guest GIC facing code of from the generic part to allow easier splitting later. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:29 +01:00
Andre Przywara	d97f683d0f	arm/arm64: KVM: refactor MMIO accessors The MMIO accessors for GICD_I[CS]ENABLER, GICD_I[CS]PENDR and GICD_ICFGR behave very similar for GICv2 and GICv3, although the way the affected VCPU is determined differs. Since we need them to access the registers from three different places in the future, we factor out a generic, backend-facing implementation and use small wrappers in the current GICv2 emulation. This will ease adding GICv3 accessors later. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:29 +01:00
Andre Przywara	2f5fa41a7a	arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the GIC CPU interface for EL1 (guests). Currently we force it to 0, but for proper GICv3 support we have to allow guests to use it (depending on their selected virtual GIC model). So add ICC_SRE_EL1 to the list of saved/restored registers on a world switch, but actually disallow a guest to change it by only restoring a fixed, once-initialized value. This value depends on the GIC model userland has chosen for a guest. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:28 +01:00
Andre Przywara	3caa2d8c3b	arm/arm64: KVM: make the maximum number of vCPUs a per-VM value Currently the maximum number of vCPUs supported is a global value limited by the used GIC model. GICv3 will lift this limit, but we still need to observe it for guests using GICv2. So the maximum number of vCPUs is per-VM value, depending on the GIC model the guest uses. Store and check the value in struct kvm_arch, but keep it down to 8 for now. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:28 +01:00
Andre Przywara	4ce7ebdfc6	arm/arm64: KVM: dont rely on a valid GICH base address To check whether the vGIC was already initialized, we currently check the GICH base address for not being NULL. Since with GICv3 we may get along without this address, lets use the irqchip_in_kernel() function to detect an already initialized vGIC. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:27 +01:00
Andre Przywara	ea2f83a7de	arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing Currently we unconditionally register the GICv2 emulation device during the host's KVM initialization. Since with GICv3 support we may end up with only v2 or only v3 or both supported, we move the registration into the GIC probing function, where we will later know which combination is valid. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:27 +01:00
Andre Przywara	b26e5fdac4	arm/arm64: KVM: introduce per-VM ops Currently we only have one virtual GIC model supported, so all guests use the same emulation code. With the addition of another model we end up with different guests using potentially different vGIC models, so we have to split up some functions to be per VM. Introduce a vgic_vm_ops struct to hold function pointers for those functions that are different and provide the necessary code to initialize them. Also split up the vgic_init() function to separate out VGIC model specific functionality into a separate function, which will later be different for a GICv3 model. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:26 +01:00
Andre Przywara	05bc8aafe6	arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Some GICv3 registers can and will be accessed as 64 bit registers. Currently the register handling code can only deal with 32 bit accesses, so we do two consecutive calls to cover this. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:26 +01:00
Andre Przywara	96415257a1	arm/arm64: KVM: refactor vgic_handle_mmio() function Currently we only need to deal with one MMIO region for the GIC emulation (the GICv2 distributor), but we soon need to extend this. Refactor the existing code to allow easier addition of different ranges without code duplication. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:25 +01:00
Andre Przywara	59892136c4	arm/arm64: KVM: pass down user space provided GIC type into vGIC code With the introduction of a second emulated GIC model we need to let userspace specify the GIC model to use for each VM. Pass the userspace provided value down into the vGIC code and store it there to differentiate later. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-20 18:25:25 +01:00
Mario Smarduch	ba0513b5b8	KVM: Add generic support for dirty page logging kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused by several architectures. Building on that we intrdoduce kvm_get_dirty_log_protect() adding write protection to mark these pages dirty for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user space. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>	2015-01-16 14:40:14 +01:00
Mario Smarduch	a6d5101661	KVM: Add architecture-defined TLB flush support Allow architectures to override the generic kvm_flush_remote_tlbs() function via HAVE_KVM_ARCH_TLB_FLUSH_ALL. ARMv7 will need this to provide its own TLB flush interface. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>	2015-01-16 14:40:14 +01:00
Eric Auger	065c003482	KVM: arm/arm64: vgic: add init entry to VGIC KVM device Since the advent of VGIC dynamic initialization, this latter is initialized quite late on the first vcpu run or "on-demand", when injecting an IRQ or when the guest sets its registers. This initialization could be initiated explicitly much earlier by the users-space, as soon as it has provided the requested dimensioning parameters. This patch adds a new entry to the VGIC KVM device that allows the user to manually request the VGIC init: - a new KVM_DEV_ARM_VGIC_GRP_CTRL group is introduced. - Its first attribute is KVM_DEV_ARM_VGIC_CTRL_INIT The rationale behind introducing a group is to be able to add other controls later on, if needed. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-11 14:12:15 +01:00
Eric Auger	66b030e48a	KVM: arm/arm64: vgic: vgic_init returns -ENODEV when no online vcpu To be more explicit on vgic initialization failure, -ENODEV is returned by vgic_init when no online vcpus can be found at init. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2015-01-11 14:12:15 +01:00
Wincy Van	ff651cb613	KVM: nVMX: Add nested msr load/restore algorithm Several hypervisors need MSR auto load/restore feature. We read MSRs from VM-entry MSR load area which specified by L1, and load them via kvm_set_msr in the nested entry. When nested exit occurs, we get MSRs via kvm_get_msr, writing them to L1`s MSR store area. After this, we read MSRs from VM-exit MSR load area, and load them via kvm_set_msr. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:14 +01:00
Richard Cochran	2eebdde652	timecounter: keep track of accumulated fractional nanoseconds The current timecounter implementation will drop a variable amount of resolution, depending on the magnitude of the time delta. In other words, reading the clock too often or too close to a time stamp conversion will introduce errors into the time values. This patch fixes the issue by introducing a fractional nanosecond field that accumulates the low order bits. Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-30 18:29:27 -05:00
Paolo Bonzini	dbaff30940	kvm: warn on more invariant breakage Modifying a non-existent slot is not allowed. Also check that the first loop doesn't move a deleted slot beyond the used part of the mslots array. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-28 10:01:25 +01:00
Paolo Bonzini	efbeec7098	kvm: fix sorting of memslots with base_gfn == 0 Before commit `0e60b0799f` (kvm: change memslot sorting rule from size to GFN, 2014-12-01), the memslots' sorting key was npages, meaning that a valid memslot couldn't have its sorting key equal to zero. On the other hand, a valid memslot can have base_gfn == 0, and invalid memslots are identified by base_gfn == npages == 0. Because of this, commit `0e60b0799f` broke the invariant that invalid memslots are at the end of the mslots array. When a memslot with base_gfn == 0 was created, any invalid memslot before it were left in place. This can be fixed by changing the insertion to use a ">=" comparison instead of "<=", but some care is needed to avoid breaking the case of deleting a memslot; see the comment in update_memslots. Thanks to Tiejun Chen for posting an initial patch for this bug. Reported-by: Jamie Heilman <jamie@audible.transient.net> Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-28 10:01:17 +01:00
Linus Torvalds	66dcff86ba	3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware-assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support. Right now KVM is broken for PPC BookE in your tree (doesn't compile). I'll reply to the pull request with a patch, please apply it either before the pull request or in the merge commit, in order to preserve bisectability somewhat. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE= =Zwq1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...	2014-12-18 16:05:28 -08:00
Paolo Bonzini	333bce5aac	Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot problems, clarifies VCPU init, and fixes a regression concerning the VGIC init flow. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUjsVhAAoJEEtpOizt6ddy5rIH/1V/YVwhprC55YqdHelU9Qu2 Muzsx+7F71NxC7xgMGFqPD1YrPR+hxvoPhy+ADOBlvcqlolrkDnV9I+8e3geaYNc nZ/yEnoGTtbAggiS1smx7usBv34Z88Sd5txNjmj1cmHBy+VOWlyidWMkGBTsfBRe mVc61BDUfyC47udgRHXhwS80sbHLJHElmADisFOVmQNBYwwiHiTdx0hMBMnHcC3Y /3T0tKxHdeTISnmA+J+n7TcChtTIM4xqC6kwf3rw3b7XX8gdtTKylDHX2GLAg646 RdebAG2twmGpIc6SxXZbo38f3oY9OFo1Le5xZGa6iUjD56VDw/e4wg4iA2juo0Y= =J2Ut -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot problems, clarifies VCPU init, and fixes a regression concerning the VGIC init flow. Conflicts: arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]	2014-12-15 13:06:40 +01:00
Christoffer Dall	05971120fc	arm/arm64: KVM: Require in-kernel vgic for the arch timers It is curently possible to run a VM with architected timers support without creating an in-kernel VGIC, which will result in interrupts from the virtual timer going nowhere. To address this issue, move the architected timers initialization to the time when we run a VCPU for the first time, and then only initialize (and enable) the architected timers if we have a properly created and initialized in-kernel VGIC. When injecting interrupts from the virtual timer to the vgic, the current setup should ensure that this never calls an on-demand init of the VGIC, which is the only call path that could return an error from kvm_vgic_inject_irq(), so capture the return value and raise a warning if there's an error there. We also change the kvm_timer_init() function from returning an int to be a void function, since the function always succeeds. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-12-15 11:50:42 +01:00
Christoffer Dall	ca7d9c829d	arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs Userspace assumes that it can wire up IRQ injections after having created all VCPUs and after having created the VGIC, but potentially before starting the first VCPU. This can currently lead to lost IRQs because the state of that IRQ injection is not stored anywhere and we don't return an error to userspace. We haven't seen this problem manifest itself yet, presumably because guests reset the devices on boot, but this could cause issues with migration and other non-standard startup configurations. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-12-15 11:36:21 +01:00
Christoffer Dall	1f57be2895	arm/arm64: KVM: Add (new) vgic_initialized macro Some code paths will need to check to see if the internal state of the vgic has been initialized (such as when creating new VCPUs), so introduce such a macro that checks the nr_cpus field which is set when the vgic has been initialized. Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in vgic_init() will call this function, and code should never errornously assume the vgic to be properly initialized after an error. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-12-13 14:17:10 +01:00
Christoffer Dall	c52edf5f8c	arm/arm64: KVM: Rename vgic_initialized to vgic_ready The vgic_initialized() macro currently returns the state of the vgic->ready flag, which indicates if the vgic is ready to be used when running a VM, not specifically if its internal state has been initialized. Rename the macro accordingly in preparation for a more nuanced initialization flow. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-12-13 14:17:05 +01:00
Peter Maydell	6d3cfbe21b	arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps() VGIC initialization currently happens in three phases: (1) kvm_vgic_create() (triggered by userspace GIC creation) (2) vgic_init_maps() (triggered by userspace GIC register read/write requests, or from kvm_vgic_init() if not already run) (3) kvm_vgic_init() (triggered by first VM run) We were doing initialization of some state to correspond with the state of a freshly-reset GIC in kvm_vgic_init(); this is too late, since it will overwrite changes made by userspace using the register access APIs before the VM is run. Move this initialization earlier, into the vgic_init_maps() phase. This fixes a bug where QEMU could successfully restore a saved VM state snapshot into a VM that had already been run, but could not restore it "from cold" using the -loadvm command line option (the symptoms being that the restored VM would run but interrupts were ignored). Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to kvm_vgic_map_resources. [ This patch is originally written by Peter Maydell, but I have modified it somewhat heavily, renaming various bits and moving code around. If something is broken, I am to be blamed. - Christoffer ] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-12-13 14:15:52 +01:00
Christian Borntraeger	7a72f7a140	KVM: track pid for VCPU only on KVM_RUN ioctl We currently track the pid of the task that runs the VCPU in vcpu_load. If a yield to that VCPU is triggered while the PID of the wrong thread is active, the wrong thread might receive a yield, but this will most likely not help the executing thread at all. Instead, if we only track the pid on the KVM_RUN ioctl, there are two possibilities: 1) the thread that did a non-KVM_RUN ioctl is holding a mutex that the VCPU thread is waiting for. In this case, the VCPU thread is not runnable, but we also do not do a wrong yield. 2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing something that does not block the VCPU thread. In this case, the VCPU thread can receive the directed yield correctly. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: Rik van Riel <riel@redhat.com> CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> CC: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:13 +01:00
David Hildenbrand	eed6e79d73	KVM: don't check for PF_VCPU when yielding kvm_enter_guest() has to be called with preemption disabled and will set PF_VCPU. Current code takes PF_VCPU as a hint that the VCPU thread is running and therefore needs no yield. However, the check on PF_VCPU is wrong on s390, where preemption has to stay enabled in order to correctly process page faults. Thus, s390 reenables preemption and starts to execute the guest. The thread might be scheduled out between kvm_enter_guest() and kvm_exit_guest(), resulting in PF_VCPU being set but not being run. When this happens, the opportunity for directed yield is missed. However, this check is done already in kvm_vcpu_on_spin before calling kvm_vcpu_yield_loop: if (!ACCESS_ONCE(vcpu->preempted)) continue; so the check on PF_VCPU is superfluous in general, and this patch removes it. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:12 +01:00
Igor Mammedov	9c1a5d3878	kvm: optimize GFN to memslot lookup with large slots amount Current linear search doesn't scale well when large amount of memslots is used and looked up slot is not in the beginning memslots array. Taking in account that memslots don't overlap, it's possible to switch sorting order of memslots array from 'npages' to 'base_gfn' and use binary search for memslot lookup by GFN. As result of switching to binary search lookup times are reduced with large amount of memslots. Following is a table of search_memslot() cycles during WS2008R2 guest boot. boot, boot + ~10 min mostly same of using it, slot lookup randomized lookup max average average cycles cycles cycles 13 slots : 1450 28 30 13 slots : 1400 30 40 binary search 117 slots : 13000 30 460 117 slots : 2000 35 180 binary search Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:11 +01:00
Igor Mammedov	0e60b0799f	kvm: change memslot sorting rule from size to GFN it will allow to use binary search for GFN -> memslot lookups, reducing lookup cost with large slots amount. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:11 +01:00
Igor Mammedov	7f379cff11	kvm: update_memslots: drop not needed check for the same slot UP/DOWN shift loops will shift array in needed direction and stop at place where new slot should be placed regardless of old slot size. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:09 +01:00
Igor Mammedov	5a38b6e6b4	kvm: update_memslots: drop not needed check for the same number of pages if number of pages haven't changed sorting algorithm will do nothing, so there is no need to do extra check to avoid entering sorting logic. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:09 +01:00
Ard Biesheuvel	d3fccc7ef8	kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() This reverts commit `85c8555ff0` ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-26 14:40:45 +01:00
Christoffer Dall	6b50f54064	arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create() If we detect another vCPU is running we just exit and return 0 as if we succesfully created the VGIC, but the VGIC wouldn't actual be created. This shouldn't break in-kernel behavior because the kernel will not observe the failed the attempt to create the VGIC, but userspace could be rightfully confused. Cc: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-26 14:40:44 +01:00
Shannon Zhao	016ed39c54	arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all When call kvm_vgic_inject_irq to inject interrupt, we can known which vcpu the interrupt for by the irq_num and the cpuid. So we should just kick this vcpu to avoid iterating through all. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-11-26 10:19:37 +00:00
Christoffer Dall	b1e952b4e4	arm/arm64: vgic: Remove unreachable irq_clear_pending When 'injecting' an edge-triggered interrupt with a falling edge we shouldn't clear the pending state on the distributor. In fact, we don't, because the check in vgic_validate_injection would prevent us from ever reaching this bit of code. Remove the unreachable snippet. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-11-25 13:57:28 +00:00
Ard Biesheuvel	bf4bea8e9a	kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() This reverts commit `85c8555ff0` ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-11-25 13:57:26 +00:00
wanghaibin	7d39f9e32c	KVM: ARM: VGIC: Optimize the vGIC vgic_update_irq_pending function. When vgic_update_irq_pending with level-sensitive false, it is need to deactivates an interrupt, and, it can go to out directly. Here return a false value, because it will be not need to kick. Signed-off-by: wanghaibin <wanghaibin.wang@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-11-25 13:57:26 +00:00
Radim Krčmář	c274e03af7	kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-23 18:33:36 +01:00
Paolo Bonzini	6ef768fac9	kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-21 18:02:37 +01:00
Paolo Bonzini	003f7de625	KVM: ia64: remove KVM for ia64 has been marked as broken not just once, but twice even, and the last patch from the maintainer is now roughly 5 years old. Time for it to rest in peace. Acked-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-20 11:08:33 +01:00
Paolo Bonzini	5cc1502799	kvm: simplify update_memslots invocation The update_memslots invocation is only needed in one case. Make the code clearer by moving it to __kvm_set_memory_region, and removing the wrapper around insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:13 +01:00
Paolo Bonzini	f2a8103651	kvm: commonize allocation of the new memory slots The two kmemdup invocations can be unified. I find that the new placement of the comment makes it easier to see what happens. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:15:34 +01:00
Paolo Bonzini	8593176c67	kvm: memslots: track id_to_index changes during the insertion sort This completes the optimization from the previous patch, by removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-14 15:40:17 +01:00
Igor Mammedov	063584d443	kvm: memslots: replace heap sort with an insertion sort pass memslots is a sorted array. When a slot is changed, heapsort (lib/sort.c) would take O(n log n) time to update it; an optimized insertion sort will only cost O(n) on an array with just one item out of order. Replace sort() with a custom sort that takes advantage of memslots usage pattern and the known position of the changed slot. performance change of 128 memslots insertions with gradually increasing size (the worst case): heap sort custom sort max: 249747 2500 cycles with custom sort alg taking ~98% less then original update time. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-14 10:49:04 +01:00
Dominik Dingel	02d5d55b7e	KVM: trivial fix comment regarding __kvm_set_memory_region commit `72dc67a696` ("KVM: remove the usage of the mmap_sem for the protection of the memory slots.") changed the lock which will be taken. This should be reflected in the function commentary. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:30 +01:00
Nadav Amit	394457a928	KVM: x86: some apic broadcast modes does not work KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes." In addition, the local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all destination bits to one." This patch enables cluster mode broadcast. The fix tries to combine broadcast in different modes through a unified code. One rare case occurs when the source of IPI has its APIC disabled. In such case, the source can still issue IPIs, but since the source is not obliged to have the same LAPIC mode as the enabled ones, we cannot rely on it. Since it is a rare case, it is unoptimized and done on the slow-path. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from kvm_apic_broadcast. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:22 +01:00
Wanpeng Li	571ee1b685	kvm: vfio: fix unregister kvm_device_ops of vfio After commit `80ce163` (KVM: VFIO: register kvm_device_ops dynamically), kvm_device_ops of vfio can be registered dynamically. Commit `3c3c29fd` (kvm-vfio: do not use module_init) move the dynamic register invoked by kvm_init in order to fix broke unloading of the kvm module. However, kvm_device_ops of vfio is unregistered after rmmod kvm-intel module which lead to device type collision detection warning after kvm-intel module reinsmod. WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]() Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel] CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9 0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48 ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff Call Trace: [<ffffffff814a61d9>] dump_stack+0x49/0x60 [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96 [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm] [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17 [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm] [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel] [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel] [<ffffffff810002ab>] do_one_initcall+0xe3/0x170 [<ffffffff811168a9>] ? __vunmap+0xad/0xb8 [<ffffffff8109c58f>] do_init_module+0x2b/0x174 [<ffffffff8109d414>] load_module+0x43e/0x569 [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174 [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82 [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20 [<ffffffff8109d65f>] SyS_init_module+0x54/0x81 [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b ---[ end trace 0626f4a3ddea56f3 ]--- The bug can be reproduced by: rmmod kvm_intel.ko insmod kvm_intel.ko without rmmod/insmod kvm.ko This patch fixes the bug by unregistering kvm_device_ops of vfio when the kvm-intel module is removed. Reported-by: Liu Rongrong <rongrongx.liu@intel.com> Fixes: `3c3c29fd0d` Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:47 +02:00
Quentin Casasnovas	3d32e4dbe7	kvm: fix excessive pages un-pinning in kvm_iommu_map error path. The third parameter of kvm_unpin_pages() when called from kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin and not the page size. This error was facilitated with an inconsistent API: kvm_pin_pages() takes a size, but kvn_unpin_pages() takes a number of pages, so fix the problem by matching the two. This was introduced by commit `350b8bd` ("kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of un-pinning for pages intended to be un-pinned (i.e. memory leak) but unfortunately potentially aggravated the number of pages we un-pin that should have stayed pinned. As far as I understand though, the same practical mitigations apply. This issue was found during review of Red Hat 6.6 patches to prepare Ksplice rebootless updates. Thanks to Vegard for his time on a late Friday evening to help me in understanding this code. Fixes: `350b8bd` ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)") Cc: stable@vger.kernel.org Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jamie Iles <jamie.iles@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:30:37 +02:00
Linus Torvalds	8a5de18239	Second batch of changes for KVM/{arm,arm64} for 3.18 - Support for 48bit IPA and VA (EL2) - A number of fixes for devices mapped into guests - Yet another VGIC fix for BE - A fix for CPU hotplug - A few compile fixes (disabled VGIC, strict mm checks) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUQkyTAAoJECPQ0LrRPXpDwrUP/1WbELgB74W35CJ1zIc4KuBi unP1muW3QAr9Vmp/KovRKyLKFiRaTRlQsszaI78f4ZQ++0vzivU8dZwV81Gn1y/v 0qF63OB0UYsOgXMRrh5JTEqzUyNyNBLUH+FAQiEO/srDoH5WLp3Zq7ThjzjwGn7Q K2ArxFiml+p2BGIGKWe3XIrxNgpW4oWhfe1kW4WU7sshuJlut3Nee+q2lSIg9mZx 2VXYnLNzSsHizgQHuVEyXIqn8HA5FSCvjBYIUcLERlWB0I66WvzOqg9rH/BmNNR2 H+cBDY+9D8KBUBG9zZSG7hZ0mAONKcOnxGZWGzte3Oi7FMZkB3Y/zrIs0na4iB5Y FxE8j+2qclZk9fkHQ7wn9Ws8hpGR2OrFlc2O5ZoBJJ2KJ4wMRHMeEbYRBCRQbTCN +81SUW7mh2j/La0JBqZ6DhhTiymUdIB+6v78im9WGlHsFRAIHBt0Q0u/pIyY+GJs OH7FoswI3vF5iODlHeRO1yjaO3rkj+IJwqTuUuhAIGu9+qnof3ge+eM1cOqrudNa u2kDz+BC21+Q8dflOF99Ryz7cMWqMiwtR+N+OUYpxc7RL7mCeHVANJxdWIFHKa2Z XJaHmbKjmw8AoR0fbS6YWOl2xmIhqU+FAngI+mow/Hz4pJDpR2K3w17ASXIVVQVX go2bvGHONkdn8Ji3Asap =3fl5 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.18-take-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm Pull second batch of changes for KVM/{arm,arm64} from Marc Zyngier: "The most obvious thing is the sizeable MMU changes to support 48bit VAs on arm64. Summary: - support for 48bit IPA and VA (EL2) - a number of fixes for devices mapped into guests - yet another VGIC fix for BE - a fix for CPU hotplug - a few compile fixes (disabled VGIC, strict mm checks)" [ I'm pulling directly from Marc at the request of Paolo Bonzini, whose backpack was stolen at Düsseldorf airport and will do new keys and rebuild his web of trust. - Linus ] * tag 'kvm-arm-for-3.18-take-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm: arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs arm: kvm: STRICT_MM_TYPECHECKS fix for user_mem_abort arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 arm/arm64: KVM: map MMIO regions at creation time arm64: kvm: define PAGE_S2_DEVICE as read-only by default ARM: kvm: define PAGE_S2_DEVICE as read-only by default arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap arm/arm64: KVM: fix potential NULL dereference in user_mem_abort() arm/arm64: KVM: use __GFP_ZERO not memset() to get zeroed pages ARM: KVM: fix vgic-disabled build arm: kvm: fix CPU hotplug	2014-10-18 14:32:31 -07:00
Christoffer Dall	2df36a5dd6	arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we store these as an array of two such registers on the vgic vcpu struct. However, we access them as a single 64-bit value or as a bitmap pointer in the generic vgic code, which breaks BE support. Instead, store them as u64 values on the vgic structure and do the word-swapping in the assembly code, which already handles the byte order for BE systems. Tested-by: Victor Kamensky <victor.kamensky@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-10-16 10:57:41 +02:00
Linus Torvalds	23971bdfff	IOMMU Updates for Linux v3.18 This pull-request includes: * Change in the IOMMU-API to convert the former iommu_domain_capable function to just iommu_capable * Various fixes in handling RMRR ranges for the VT-d driver (one fix requires a device driver core change which was acked by Greg KH) * The AMD IOMMU driver now assigns and deassigns complete alias groups to fix issues with devices using the wrong PCI request-id * MMU-401 support for the ARM SMMU driver * Multi-master IOMMU group support for the ARM SMMU driver * Various other small fixes all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJUPNxYAAoJECvwRC2XARrjMwMP/RLSr+oA31rGVjLXcmcCHl7Q Uj7xpcnG19qB0aqNR1JeJuZNkK/tw44pE353MQPbz4N9UVUiogklGIVD1iJvFV53 0qm84bvpDJIof4aP35B3H3Umft2USTn/lmsQg/RklQcNTW8DzNj63b8BTNR7k/GL G7bLg7F1BUCl0shZCCsFspOIulQPAJYN2OvHlfYBav/bfDvfouQ3lrV+loGrK44r F2Hmp+imXlIhUCjfbiWz6wKFxvPrxZx482vm2pXBCSnXEdW4/fz6nf9VHUK/Cfsq JAimY1CfiDo1aqH9/yVHUOw5SD/NYOXq6E5bFPg/WENbipbbae5cK2u6PX5MMBAn CG4BM8l9xicfGPqgn5YFSRY/6qC6K7NlxMnt9U8l18QIkDVDqEtUgJQISJuce7wx FWx6eSWaxpIe5yhq19/h2ELalUUyR/fPq+UXXjYDL1kLV/vcvC/lC3mbNAQU93zU WK0bG2tDg88JHavc25Ewa2aOn4BVM2BpwuLbYlgQReaEmsQRnEPgtmRNyLJHqbFE wwpCj8pBWdufsJWRyvpnXQ+CfA7oSz4e7hz1G+0/5uiDmagfvg16Ql5JtPmmuLUm Kc3dVIiG0s1ewohZIIJETGCqprQbCSqs8CCQqB6p2zDBWFKpNT7F38lm/KlehkCz JpAiI7Y2K9Jejp0VIPrt =OMOt -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This pull-request includes: - change in the IOMMU-API to convert the former iommu_domain_capable function to just iommu_capable - various fixes in handling RMRR ranges for the VT-d driver (one fix requires a device driver core change which was acked by Greg KH) - the AMD IOMMU driver now assigns and deassigns complete alias groups to fix issues with devices using the wrong PCI request-id - MMU-401 support for the ARM SMMU driver - multi-master IOMMU group support for the ARM SMMU driver - various other small fixes all over the place" * tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits) iommu/vt-d: Work around broken RMRR firmware entries iommu/vt-d: Store bus information in RMRR PCI device path iommu/vt-d: Only remove domain when device is removed driver core: Add BUS_NOTIFY_REMOVED_DEVICE event iommu/amd: Fix devid mapping for ivrs_ioapic override iommu/irq_remapping: Fix the regression of hpet irq remapping iommu: Fix bus notifier breakage iommu/amd: Split init_iommu_group() from iommu_init_device() iommu: Rework iommu_group_get_for_pci_dev() iommu: Make of_device_id array const amd_iommu: do not dereference a NULL pointer address. iommu/omap: Remove omap_iommu unused owner field iommu: Remove iommu_domain_has_cap() API function IB/usnic: Convert to use new iommu_capable() API function vfio: Convert to use new iommu_capable() API function kvm: iommu: Convert to use new iommu_capable() API function iommu/tegra: Convert to iommu_capable() API function iommu/msm: Convert to iommu_capable() API function iommu/vt-d: Convert to iommu_capable() API function iommu/fsl: Convert to iommu_capable() API function ...	2014-10-15 07:23:49 +02:00
Ard Biesheuvel	c40f2f8ff8	arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap Add support for read-only MMIO passthrough mappings by adding a 'writable' parameter to kvm_phys_addr_ioremap. For the moment, mappings will be read-write even if 'writable' is false, but once the definition of PAGE_S2_DEVICE gets changed, those mappings will be created read-only. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-10-10 13:07:37 +02:00
Linus Torvalds	80213c03c4	PCI changes for the v3.18 merge window: Enumeration - Check Vendor ID only for Config Request Retry Status (Rajat Jain) - Enable Config Request Retry Status when supported (Rajat Jain) - Add generic domain handling (Catalin Marinas) - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado) Resource management - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu) - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Move _HPP & _HPX handling into core (Bjorn Helgaas) - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas) - Apply _HPP/_HPX to display devices (Bjorn Helgaas) - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas) - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas) - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas) - Fix wait time in pciehp timeout message (Yinghai Lu) - Add more pciehp Slot Control debug output (Yinghai Lu) - Stop disabling pciehp notifications during init (Yinghai Lu) MSI - Remove arch_msi_check_device() (Alexander Gordeev) - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev) - Move D0 check into pci_msi_check_device() (Alexander Gordeev) - Remove unused kobject from struct msi_desc (Yijing Wang) - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang) - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang) - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang) - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang) - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang) Power management - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki) - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki) AER - Add additional AER error strings (Gong Chen) - Make <linux/aer.h> standalone includable (Thierry Reding) Virtualization - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson) - Add ACS quirk for Intel 10G NICs (Alex Williamson) - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp) - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson) - Add device flag helpers (Ethan Zhao) - Assume all Mellanox devices have broken INTx masking (Gavin Shan) Generic host bridge driver - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau) - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau) - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau) - Fix the conversion of IO ranges into IO resources (Liviu Dudau) - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau) - Add support for parsing PCI host bridge resources from DT (Liviu Dudau) - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau) - Add arm64 architectural support for PCI (Liviu Dudau) APM X-Gene - Add APM X-Gene PCIe driver (Tanmay Inamdar) - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar) Freescale i.MX6 - Probe in module_init(), not fs_initcall() (Lucas Stach) - Delay enabling reference clock for SS until it stabilizes (Tim Harvey) Marvell MVEBU - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni) NVIDIA Tegra - Make sure the PCIe PLL is really reset (Eric Yuen) - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang) - Fix extended configuration space mapping (Peter Daifuku) - Implement resource hierarchy (Thierry Reding) - Clear CLKREQ# enable on port disable (Thierry Reding) - Add Tegra124 support (Thierry Reding) ST Microelectronics SPEAr13xx - Pass config resource through reg property (Pratyush Anand) Synopsys DesignWare - Use NULL instead of false (Fabio Estevam) - Parse bus-range property from devicetree (Lucas Stach) - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach) - Remove pci_assign_unassigned_resources() (Lucas Stach) - Check private_data validity in single place (Lucas Stach) - Setup and clear exactly one MSI at a time (Lucas Stach) - Remove open-coded bitmap operations (Lucas Stach) - Fix configuration base address when using 'reg' (Minghuan Lian) - Fix IO resource end address calculation (Minghuan Lian) - Rename get_msi_data() to get_msi_addr() (Minghuan Lian) - Add get_msi_data() to pcie_host_ops (Minghuan Lian) - Add support for v3.65 hardware (Murali Karicheri) - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand) TI Keystone - Add TI Keystone PCIe driver (Murali Karicheri) - Limit MRSS for all downstream devices (Murali Karicheri) - Assume controller is already in RC mode (Murali Karicheri) - Set device ID based on SoC to support multiple ports (Murali Karicheri) Xilinx AXI - Add Xilinx AXI PCIe driver (Srikanth Thokala) - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter) Miscellaneous - Clean up whitespace (Quentin Lambert) - Remove assignments from "if" conditions (Quentin Lambert) - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri) - x86: Mark DMI tables as initialization data (Mathias Krause) - x86: Move __init annotation to the correct place (Mathias Krause) - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause) - x86: Constify pci_mmcfg_probes[] array (Mathias Krause) - x86: Mark PCI BIOS initialization code as such (Mathias Krause) - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya) - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUNWmJAAoJEFmIoMA60/r8GncP/3uHRoBrnaF6pv+S1l1p3Fs/ l1kKH91/IuAAU7VJX8pkNybFqx02topWmiVVXAzqvD01PcRLGCLjPbWl5h+y5/Ja CHZH33AwHAmm0kt4BrOSOeHTLJhAigly2zV3P4F8jRIgyaeMoGZ6Ko4tkQUpm21k +ohrOd4cxYkmzzCjKwsZZhKnyRNpae8FmTk3VQBPuN8DbhvFPrqo5/+GeAdSZTdS HZHpfl2HL4095aY7uBVsZqNkjQyl6SnWwjkjLnuI8q3qA3BLgDZE/Jr8F/MNuW1V y01JIjerFWMDFyBIkpg7moYnODy6oP3KvczwYdKGmqsJja+0MQvYhLTwD+R/yTQS SewJA0mL3T3EJEfnFYkCiaIX27xIwk/FxHfaKPN91xgx/QM7xCVZNrU2/dXjhoX1 GqLKxOEaFHhWWTyT5Dj27I0ZcElzFZ3tIwvrHfs8y22oAuAlsAypaUgvUwRfL4CO hOj4ITZa0t041sYWqxCoGAA9Fdp8HMzNKKS5F4mhADz4Ad9v6uPCNv/s/RoxVsbm jhZOtPYJ0/iCA+kNVX563S8Z3VpfPI+7bBjcj2WKdzW+IlICvOKT+kvwL2Tv/rE7 w0hrNsbkgGsYbPldMx7LwCavsUtYFuNj0zoU6vkhP2jk6O2Tn5VXDmjrXH0v3iHI v03vlUtre0bQ26fzDyLQ =4Zv1 -----END PGP SIGNATURE----- Merge tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The interesting things here are: - Turn on Config Request Retry Status Software Visibility. This caused hangs last time, but we included a fix this time. - Rework PCI device configuration to use _HPP/_HPX more aggressively - Allow PCI devices to be put into D3cold during system suspend - Add arm64 PCI support - Add APM X-Gene host bridge driver - Add TI Keystone host bridge driver - Add Xilinx AXI host bridge driver More detailed summary: Enumeration - Check Vendor ID only for Config Request Retry Status (Rajat Jain) - Enable Config Request Retry Status when supported (Rajat Jain) - Add generic domain handling (Catalin Marinas) - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado) Resource management - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu) - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr) PCI device hotplug - Prevent NULL dereference during pciehp probe (Andreas Noever) - Move _HPP & _HPX handling into core (Bjorn Helgaas) - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas) - Apply _HPP/_HPX to display devices (Bjorn Helgaas) - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas) - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas) - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas) - Fix wait time in pciehp timeout message (Yinghai Lu) - Add more pciehp Slot Control debug output (Yinghai Lu) - Stop disabling pciehp notifications during init (Yinghai Lu) MSI - Remove arch_msi_check_device() (Alexander Gordeev) - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev) - Move D0 check into pci_msi_check_device() (Alexander Gordeev) - Remove unused kobject from struct msi_desc (Yijing Wang) - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang) - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang) - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang) - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang) - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang) Power management - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki) - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki) AER - Add additional AER error strings (Gong Chen) - Make <linux/aer.h> standalone includable (Thierry Reding) Virtualization - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson) - Add ACS quirk for Intel 10G NICs (Alex Williamson) - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp) - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson) - Add device flag helpers (Ethan Zhao) - Assume all Mellanox devices have broken INTx masking (Gavin Shan) Generic host bridge driver - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau) - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau) - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau) - Fix the conversion of IO ranges into IO resources (Liviu Dudau) - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau) - Add support for parsing PCI host bridge resources from DT (Liviu Dudau) - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau) - Add arm64 architectural support for PCI (Liviu Dudau) APM X-Gene - Add APM X-Gene PCIe driver (Tanmay Inamdar) - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar) Freescale i.MX6 - Probe in module_init(), not fs_initcall() (Lucas Stach) - Delay enabling reference clock for SS until it stabilizes (Tim Harvey) Marvell MVEBU - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni) NVIDIA Tegra - Make sure the PCIe PLL is really reset (Eric Yuen) - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang) - Fix extended configuration space mapping (Peter Daifuku) - Implement resource hierarchy (Thierry Reding) - Clear CLKREQ# enable on port disable (Thierry Reding) - Add Tegra124 support (Thierry Reding) ST Microelectronics SPEAr13xx - Pass config resource through reg property (Pratyush Anand) Synopsys DesignWare - Use NULL instead of false (Fabio Estevam) - Parse bus-range property from devicetree (Lucas Stach) - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach) - Remove pci_assign_unassigned_resources() (Lucas Stach) - Check private_data validity in single place (Lucas Stach) - Setup and clear exactly one MSI at a time (Lucas Stach) - Remove open-coded bitmap operations (Lucas Stach) - Fix configuration base address when using 'reg' (Minghuan Lian) - Fix IO resource end address calculation (Minghuan Lian) - Rename get_msi_data() to get_msi_addr() (Minghuan Lian) - Add get_msi_data() to pcie_host_ops (Minghuan Lian) - Add support for v3.65 hardware (Murali Karicheri) - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand) TI Keystone - Add TI Keystone PCIe driver (Murali Karicheri) - Limit MRSS for all downstream devices (Murali Karicheri) - Assume controller is already in RC mode (Murali Karicheri) - Set device ID based on SoC to support multiple ports (Murali Karicheri) Xilinx AXI - Add Xilinx AXI PCIe driver (Srikanth Thokala) - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter) Miscellaneous - Clean up whitespace (Quentin Lambert) - Remove assignments from "if" conditions (Quentin Lambert) - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri) - x86: Mark DMI tables as initialization data (Mathias Krause) - x86: Move __init annotation to the correct place (Mathias Krause) - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause) - x86: Constify pci_mmcfg_probes[] array (Mathias Krause) - x86: Mark PCI BIOS initialization code as such (Mathias Krause) - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya) - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)" * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits) arm64: dts: Add APM X-Gene PCIe device tree nodes PCI: Add ACS quirk for AMD A88X southbridge devices PCI: xgene: Add APM X-Gene PCIe driver PCI: designware: Remove open-coded bitmap operations PCI/MSI: Remove unnecessary temporary variable PCI/MSI: Use __write_msi_msg() instead of write_msi_msg() MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg() PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg() PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib PCI/MSI: Remove unused kobject from struct msi_desc PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported() PCI/MSI: Move D0 check into pci_msi_check_device() PCI/MSI: Remove arch_msi_check_device() irqchip: armada-370-xp: Remove arch_msi_check_device() PCI/MSI/PPC: Remove arch_msi_check_device() arm64: Add architectural support for PCI PCI: Add pci_remap_iospace() to map bus I/O resources of/pci: Add support for parsing PCI host bridge resources from DT of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr() ... Conflicts: arch/arm64/boot/dts/apm-storm.dtsi	2014-10-09 15:03:49 -04:00
Linus Torvalds	e4e65676f2	Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes. To verify future signed pull requests from me, please update my key with "gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the one for signing will be a 2048-bit RSA key, 4E6B09D7. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG BgEfS2x6jRc5zB3hjwDr =iEVi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits) kvm: do not handle APIC access page if in-kernel irqchip is not in use KVM: s390: count vcpu wakeups in stat.halt_wakeup KVM: s390/facilities: allow TOD-CLOCK steering facility bit KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode arm/arm64: KVM: Report correct FSC for unsupported fault types arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc kvm: Fix kvm_get_page_retry_io __gup retval check arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset kvm: x86: Unpin and remove kvm_arch->apic_access_page kvm: vmx: Implement set_apic_access_page_addr kvm: x86: Add request bit to reload APIC access page address kvm: Add arch specific mmu notifier for page invalidation kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static kvm: Fix page ageing bugs kvm/x86/mmu: Pass gfn and level to rmapp callback. x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only kvm: x86: use macros to compute bank MSRs KVM: x86: Remove debug assertion of non-PAE reserved bits kvm: don't take vcpu mutex for obviously invalid vcpu ioctls kvm: Faults which trigger IO release the mmap_sem ...	2014-10-08 05:27:39 -04:00
Joerg Roedel	09b5269a1b	Merge branches 'arm/exynos', 'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next Conflicts: drivers/iommu/arm-smmu.c	2014-10-02 12:24:45 +02:00
Paolo Bonzini	e77d99d4a4	Changes for KVM for arm/arm64 for 3.18 This includes a bunch of changes: - Support read-only memory slots on arm/arm64 - Various changes to fix Sparse warnings - Correctly detect write vs. read Stage-2 faults - Various VGIC cleanups and fixes - Dynamic VGIC data strcuture sizing - Fix SGI set_clear_pend offset bug - Fix VTTBR_BADDR Mask - Correctly report the FSC on Stage-2 faults -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUJWAdAAoJEEtpOizt6ddy9cMH+gIoUPnRJLe+PPcOOyxOx6pr +CnD/zAd0sLvxZLP/LBOzu99H3YrbO5kwI/172/8G1zUNI2hp6YxEEJaBCTHrz6l RwgLy7a3EMMY51nJo5w2dkFUo8cUX9MsHqMpl2Xb7Dvo2ZHp+nDqRjwRY6yi+t4V dWSJTRG6X+DIWyysij6jBtfKU6MpU+4NW3Zdk1fapf8QDkn+cBtV5X2QcmERCaIe A1j9hiGi43KA3XWeeePU3aVaxC2XUhTayP8VsfVxoNG2manaS6lqjmbif5ghs/0h rw7R3/Aj0MJny2zT016MkvKJKRukuVRD6e1lcYghqnSJhL2FossowZ9fHRADpqU= =QgU8 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next Changes for KVM for arm/arm64 for 3.18 This includes a bunch of changes: - Support read-only memory slots on arm/arm64 - Various changes to fix Sparse warnings - Correctly detect write vs. read Stage-2 faults - Various VGIC cleanups and fixes - Dynamic VGIC data strcuture sizing - Fix SGI set_clear_pend offset bug - Fix VTTBR_BADDR Mask - Correctly report the FSC on Stage-2 faults Conflicts: virt/kvm/eventfd.c [duplicate, different patch where the kvm-arm version broke x86. The kvm tree instead has the right one]	2014-09-27 11:03:33 +02:00
Andres Lagar-Cavilla	bb0ca6acd4	kvm: Fix kvm_get_page_retry_io __gup retval check Confusion around -EBUSY and zero (inside a BUG_ON no less). Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-26 10:21:29 +02:00
Christoffer Dall	0fea6d7628	arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset The sgi values calculated in read_set_clear_sgi_pend_reg() and write_set_clear_sgi_pend_reg() were horribly incorrectly multiplied by 4 with catastrophic results in that subfunctions ended up overwriting memory not allocated for the expected purpose. This showed up as bugs in kfree() and the kernel complaining a lot of you turn on memory debugging. This addresses: http://marc.info/?l=kvm&m=141164910007868&w=2 Reported-by: Shannon Zhao <zhaoshenglong@huawei.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-25 19:38:25 +02:00
Joerg Roedel	ee5ba30ff7	kvm: iommu: Convert to use new iommu_capable() API function Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2014-09-25 15:47:38 +02:00
Tang Chen	fe71557afb	kvm: Add arch specific mmu notifier for page invalidation This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:59 +02:00
Tang Chen	445b823695	kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static Different architectures need different requests, and in fact we will use this function in architecture-specific code later. This will be outside kvm_main.c, so make it non-static and rename it to kvm_make_all_cpus_request(). Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:59 +02:00
Andres Lagar-Cavilla	5712846808	kvm: Fix page ageing bugs 1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:58 +02:00
David Matlack	2ea75be321	kvm: don't take vcpu mutex for obviously invalid vcpu ioctls vcpu ioctls can hang the calling thread if issued while a vcpu is running. However, invalid ioctls can happen when userspace tries to probe the kind of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case, we know the ioctl is going to be rejected as invalid anyway and we can fail before trying to take the vcpu mutex. This patch does not change functionality, it just makes invalid ioctls fail faster. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:55 +02:00
Andres Lagar-Cavilla	234b239bea	kvm: Faults which trigger IO release the mmap_sem When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory has been swapped out or is behind a filemap, this will trigger async readahead and return immediately. The rationale is that KVM will kick back the guest with an "async page fault" and allow for some other guest process to take over. If async PFs are enabled the fault is retried asap from an async workqueue. If not, it's retried immediately in the same code path. In either case the retry will not relinquish the mmap semaphore and will block on the IO. This is a bad thing, as other mmap semaphore users now stall as a function of swap or filemap latency. This patch ensures both the regular and async PF path re-enter the fault allowing for the mmap semaphore to be relinquished in the case of IO wait. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:54 +02:00
Paolo Bonzini	3c3c29fd0d	kvm-vfio: do not use module_init /me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. So, commit `80ce163972` (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Cc: Will Deacon <will.deacon@arm.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Alex Williamson <Alex.Williamson@redhat.com> Fixes: `80ce163972` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:06:36 +02:00
Christoffer Dall	29f1b65b59	KVM: EVENTFD: Remove inclusion of irq.h Commit `c77dcac` (KVM: Move more code under CONFIG_HAVE_KVM_IRQFD) added functionality that depends on definitions in ioapic.h when __KVM_HAVE_IOAPIC is defined. At the same time, kvm-arm commit `0ba0951` (KVM: EVENTFD: remove inclusion of irq.h) removed the inclusion of irq.h, an architecture-specific header that is not present on ARM but which happened to include ioapic.h on x86. Include ioapic.h directly in eventfd.c if __KVM_HAVE_IOAPIC is defined. This fixes x86 and lets ARM use eventfd.c. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 12:06:25 +02:00
Paolo Bonzini	9542639387	Fixes unaligned access to the gicv2 virtual cpu status. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUIJAcAAoJEEtpOizt6ddyXeAH/AiI0baWL4NZs3nFWt2+Pc1/ hQAEquucdoGd4I7BLIAbcrJMwSc2zYe2mQWkJ1EVroqSdk5upkqWGvnqdEwt3WXH kWpqK7B3JjNTShTESYpjRv/ymOYq/a/cjbW3B0wSuuc46TUc97pOa1VMK8k3XknJ 45GYKSxVCW4fKPnXN8H1PIxWy3oGia6kGm+yV/r8hKpX7dAK/AuOOVT3oaE9IAgJ MnwGCc4AHyj2wD5DVfiWlkABf/JcRv29wu42w1KGGfHjZPHXO8WmgcJai5HxLa9G NnBr/7sIIj5RFlAmFsbX4tLBwsNIHSs5TxPXeyYH+eEvQehqmiVjVh1ca/SuXGQ= =tqqB -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v3.17-rc7-or-final' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master Fixes unaligned access to the gicv2 virtual cpu status.	2014-09-23 15:18:02 +02:00
Christoffer Dall	1f2bb4acc1	arm/arm64: KVM: Fix unaligned access bug on gicv2 access We were using an atomic bitop on the vgic_v2.vgic_elrsr field which was not aligned to the natural size on 64-bit platforms. This bug showed up after QEMU correctly identifies the pl011 line as being level-triggered, and not edge-triggered. These data structures are protected by a spinlock so simply use a non-atomic version of the accessor instead. Tested-by: Joel Schopp <joel.schopp@amd.com> Reported-by: Riku Voipio <riku.voipio@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-22 23:05:56 +02:00
Sam Bobroff	27fbe64bfa	KVM: correct null pid check in kvm_vcpu_yield_to() Correct a simple mistake of checking the wrong variable before a dereference, resulting in the dereference not being properly protected by rcu_dereference(). Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-22 13:21:29 +02:00
Marc Zyngier	a98f26f183	arm/arm64: KVM: vgic: make number of irqs a configurable attribute In order to make the number of interrupts configurable, use the new fancy device management API to add KVM_DEV_ARM_VGIC_GRP_NR_IRQS as a VGIC configurable attribute. Userspace can now specify the exact size of the GIC (by increments of 32 interrupts). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:58 -07:00
Marc Zyngier	4956f2bc1f	arm/arm64: KVM: vgic: delay vgic allocation until init time It is now quite easy to delay the allocation of the vgic tables until we actually require it to be up and running (when the first vcpu is kicking around, or someones tries to access the GIC registers). This allow us to allocate memory for the exact number of CPUs we have. As nobody configures the number of interrupts just yet, use a fallback to VGIC_NR_IRQS_LEGACY. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:58 -07:00
Marc Zyngier	5fb66da640	arm/arm64: KVM: vgic: kill VGIC_NR_IRQS Nuke VGIC_NR_IRQS entierly, now that the distributor instance contains the number of IRQ allocated to this GIC. Also add VGIC_NR_IRQS_LEGACY to preserve the current API. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:57 -07:00
Marc Zyngier	c3c918361a	arm/arm64: KVM: vgic: handle out-of-range MMIO accesses Now that we can (almost) dynamically size the number of interrupts, we're facing an interesting issue: We have to evaluate at runtime whether or not an access hits a valid register, based on the sizing of this particular instance of the distributor. Furthermore, the GIC spec says that accessing a reserved register is RAZ/WI. For this, add a new field to our range structure, indicating the number of bits a single interrupts uses. That allows us to find out whether or not the access is in range. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:57 -07:00
Marc Zyngier	fc675e355e	arm/arm64: KVM: vgic: kill VGIC_MAX_CPUS We now have the information about the number of CPU interfaces in the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just rely on KVM_MAX_VCPUS where we don't have the choice. Yet. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:57 -07:00
Marc Zyngier	fb65ab63b8	arm/arm64: KVM: vgic: Parametrize VGIC_NR_SHARED_IRQS Having a dynamic number of supported interrupts means that we cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore. Instead, make it take the distributor structure as a parameter, so it can return the right value. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:56 -07:00
Marc Zyngier	c1bfb577ad	arm/arm64: KVM: vgic: switch to dynamic allocation So far, all the VGIC data structures are statically defined by the maximum number of vcpus and interrupts it supports. It means that we always have to oversize it to cater for the worse case. Start by changing the data structures to be dynamically sizeable, and allocate them at runtime. The sizes are still very static though. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-18 18:48:52 -07:00
Marc Zyngier	71afaba4a2	KVM: ARM: vgic: plug irq injection race As it stands, nothing prevents userspace from injecting an interrupt before the guest's GIC is actually initialized. This goes unnoticed so far (as everything is pretty much statically allocated), but ends up exploding in a spectacular way once we switch to a more dynamic allocation (the GIC data structure isn't there yet). The fix is to test for the "ready" flag in the VGIC distributor before trying to inject the interrupt. Note that in order to avoid breaking userspace, we have to ignore what is essentially an error. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:45:06 -07:00
Christoffer Dall	7e362919a5	arm/arm64: KVM: vgic: Clarify and correct vgic documentation The VGIC virtual distributor implementation documentation was written a very long time ago, before the true nature of the beast had been partially absorbed into my bloodstream. Clarify the docs. Plus, it fixes an actual bug. ICFRn, pfff. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:32 -07:00
Christoffer Dall	9da48b5502	arm/arm64: KVM: vgic: Fix SGI writes to GICD_I{CS}PENDR0 Writes to GICD_ISPENDR0 and GICD_ICPENDR0 ignore all settings of the pending state for SGIs. Make sure the implementation handles this correctly. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:32 -07:00
Christoffer Dall	faa1b46c3e	arm/arm64: KVM: vgic: Improve handling of GICD_I{CS}PENDRn Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled correctly for level-triggered interrupts. The spec states that for level-triggered interrupts, writes to the GICD_ISPENDRn activate the output of a flip-flop which is in turn or'ed with the actual input interrupt signal. Correspondingly, writes to GICD_ICPENDRn simply deactivates the output of that flip-flop, but does not (of course) affect the external input signal. Reads from GICC_IAR will also deactivate the flip-flop output. This requires us to track the state of the level-input separately from the state in the flip-flop. We therefore introduce two new variables on the distributor struct to track these two states. Astute readers may notice that this is introducing more state than required (because an OR of the two states gives you the pending state), but the remaining vgic code uses the pending bitmap for optimized operations to figure out, at the end of the day, if an interrupt is pending or not on the distributor side. Refactoring the code to consider the two state variables all the places where we currently access the precomputed pending value, did not look pretty. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:31 -07:00
Christoffer Dall	cced50c928	arm/arm64: KVM: vgic: Clear queued flags on unqueue If we unqueue a level-triggered interrupt completely, and the LR does not stick around in the active state (and will therefore no longer generate a maintenance interrupt), then we should clear the queued flag so that the vgic can actually queue this level-triggered interrupt at a later time and deal with its pending state then. Note: This should actually be properly fixed to handle the active state on the distributor. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:31 -07:00
Christoffer Dall	dbf20f9d81	arm/arm64: KVM: Rename irq_active to irq_queued We have a special bitmap on the distributor struct to keep track of when level-triggered interrupts are queued on the list registers. This was named irq_active, which is confusing, because the active state of an interrupt as per the GIC spec is a different thing, not specifically related to edge-triggered/level-triggered configurations but rather indicates an interrupt which has been ack'ed but not yet eoi'ed. Rename the bitmap and the corresponding accessor functions to irq_queued to clarify what this is actually used for. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:30 -07:00
Christoffer Dall	227844f538	arm/arm64: KVM: Rename irq_state to irq_pending The irq_state field on the distributor struct is ambiguous in its meaning; the comment says it's the level of the input put, but that doesn't make much sense for edge-triggered interrupts. The code actually uses this state variable to check if the interrupt is in the pending state on the distributor so clarify the comment and rename the actual variable and accessor methods. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-09-18 18:44:30 -07:00
Christoffer Dall	a875dafcf9	Merge remote-tracking branch 'kvm/next' into queue Conflicts: arch/arm64/include/asm/kvm_host.h virt/kvm/arm/vgic.c	2014-09-18 18:15:32 -07:00
Will Deacon	80ce163972	KVM: VFIO: register kvm_device_ops dynamically Now that we have a dynamic means to register kvm_device_ops, use that for the VFIO kvm device, instead of relying on the static table. This is achieved by a module_init call to register the ops with KVM. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Alex Williamson <Alex.Williamson@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:10 +02:00
Cornelia Huck	84877d9333	KVM: s390: register flic ops dynamically Using the new kvm_register_device_ops() interface makes us get rid of an #ifdef in common code. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:09 +02:00
Will Deacon	c06a841bf3	KVM: ARM: vgic: register kvm_device_ops dynamically Now that we have a dynamic means to register kvm_device_ops, use that for the ARM VGIC, instead of relying on the static table. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:09 +02:00
Will Deacon	d60eacb070	KVM: device: add simple registration mechanism for kvm_device_ops kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson <Alex.Williamson@redhat.com> Cc: Alex Graf <agraf@suse.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:08 +02:00
Ethan Zhao	ad0d217ca6	KVM: Use PCI device flag helper functions Use PCI device flag helper functions when assigning or releasing device. No functional change. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-16 16:18:40 -06:00
Zhang Haoyu	184564efae	kvm: ioapic: conditionally delay irq delivery duringeoi broadcast Currently, we call ioapic_service() immediately when we find the irq is still active during eoi broadcast. But for real hardware, there's some delay between the EOI writing and irq delivery. If we do not emulate this behavior, and re-inject the interrupt immediately after the guest sends an EOI and re-enables interrupts, a guest might spend all its time in the ISR if it has a broken handler for a level-triggered interrupt. Such livelock actually happens with Windows guests when resuming from hibernation. As there's no way to recognize the broken handle from new raised ones, this patch delays an interrupt if 10.000 consecutive EOIs found that the interrupt was still high. The guest can then make a little forward progress, until a proper IRQ handler is set or until some detection routine in the guest (such as Linux's note_interrupt()) recognizes the situation. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-16 14:44:48 +02:00
Ard Biesheuvel	85c8555ff0	KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn() Read-only memory ranges may be backed by the zero page, so avoid misidentifying it a a MMIO pfn. This fixes another issue I identified when testing QEMU+KVM_UEFI, where a read to an uninitialized emulated NOR flash brought in the zero page, but mapped as a read-write device region, because kvm_is_mmio_pfn() misidentifies it as a MMIO pfn due to its PG_reserved bit being set. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: `b88657674d` ("ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-14 16:26:05 +02:00
Eric Auger	0ba09511dd	KVM: EVENTFD: remove inclusion of irq.h No more needed. irq.h would be void on ARM. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2014-09-11 11:31:19 +01:00
Christian Borntraeger	f2a2516088	KVM: remove redundant assignments in __kvm_set_memory_region __kvm_set_memory_region sets r to EINVAL very early. Doing it again is not necessary. The same is true later on, where r is assigned -ENOMEM twice. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:15 +02:00
Christian Borntraeger	a13f533b2f	KVM: remove redundant assigment of return value in kvm_dev_ioctl The first statement of kvm_dev_ioctl is long r = -EINVAL; No need to reassign the same value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:15 +02:00
Christian Borntraeger	3465611318	KVM: remove redundant check of in_spin_loop The expression `vcpu->spin_loop.in_spin_loop' is always true, because it is evaluated only when the condition `!vcpu->spin_loop.in_spin_loop' is false. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:14 +02:00
David Matlack	ee3d1570b5	kvm: fix potentially corrupt mmio cache vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) \| thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit \| 2 srcu_read_unlock(&kvm->srcu) \| 3 decide to cache something based on \| old memslots \| 4 \| change memslots \| (increments generation) 5 \| synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots \| 7 tag cache with new memslot generation \| 8 srcu_read_unlock(&kvm->srcu) \| ... \| <action based on cache occurs even \| though the caching decision was based \| on the old memslots> \| ... \| <action continues to occur until next \| memslot generation change, which may \| be never> \| \| By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-03 10:03:41 +02:00
Paolo Bonzini	00f034a12f	KVM: do not bias the generation number in kvm_current_mmio_generation The next patch will give a meaning (a la seqcount) to the low bit of the generation number. Ensure that it matches between kvm->memslots->generation and kvm_current_mmio_generation(). Cc: stable@vger.kernel.org Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-03 10:03:35 +02:00
Radim Krčmář	13a34e067e	KVM: remove garbage arg to *hardware_{en,dis}able In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 16:35:55 +02:00
Christoffer Dall	0f8a4de3e0	KVM: Unconditionally export KVM_CAP_READONLY_MEM The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_READONLY_MEM was guarded by #ifdef __KVM_HAVE_READONLY_MEM, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 13:47:04 +02:00
Will Deacon	de56fb1923	KVM: vgic: declare probe function pointer as const We extract the vgic probe function from the of_device_id data pointer, which is const. Kill the sparse warning by ensuring that the local function pointer is also marked as const. Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-08-27 22:49:45 +02:00
Will Deacon	1fa451bcc6	KVM: vgic: return int instead of bool when checking I/O ranges vgic_ioaddr_overlap claims to return a bool, but in reality it returns an int. Shut sparse up by fixing the type signature. Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-08-27 22:49:45 +02:00
Christoffer Dall	64d831269c	KVM: Introduce gfn_to_hva_memslot_prot To support read-only memory regions on arm and arm64, we have a need to resolve a gfn to an hva given a pointer to a memslot to avoid looping through the memslots twice and to reuse the hva error checking of gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot() function and refactor gfn_to_hva_prot() to use this function. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2014-08-27 22:46:08 +02:00
Radim Krčmář	e790d9ef64	KVM: add kvm_arch_sched_in Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 18:45:21 +02:00
Christian Borntraeger	7103f60de8	KVM: avoid unnecessary synchronize_rcu We dont have to wait for a grace period if there is no oldpid that we are going to free. putpid also checks for NULL, so this patch only fences synchronize_rcu. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 13:50:22 +02:00
Chen Gang	30d1e0e806	virt/kvm/assigned-dev.c: Set 'dev->irq_source_id' to '-1' after free it As a generic function, deassign_guest_irq() assumes it can be called even if assign_guest_irq() is not be called successfully (which can be triggered by ioctl from user mode, indirectly). So for assign_guest_irq() failure process, need set 'dev->irq_source_id' to -1 after free 'dev->irq_source_id', or deassign_guest_irq() may free it again. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:28 +02:00
Michael S. Tsirkin	350b8bdd68	kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) The third parameter of kvm_iommu_put_pages is wrong, It should be 'gfn - slot->base_gfn'. By making gfn very large, malicious guest or userspace can cause kvm to go to this error path, and subsequently to pass a huge value as size. Alternatively if gfn is small, then pages would be pinned but never unpinned, causing host memory leak and local DOS. Passing a reasonable but large value could be the most dangerous case, because it would unpin a page that should have stayed pinned, and thus allow the device to DMA into arbitrary memory. However, this cannot happen because of the condition that can trigger the error: - out of memory (where you can't allocate even a single page) should not be possible for the attacker to trigger - when exceeding the iommu's address space, guest pages after gfn will also exceed the iommu's address space, and inside kvm_iommu_put_pages() the iommu_iova_to_phys() will fail. The page thus would not be unpinned at all. Reported-by: Jack Morgenstein <jackm@mellanox.com> Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:04:45 +02:00
Paolo Bonzini	c77dcacb39	KVM: Move more code under CONFIG_HAVE_KVM_IRQFD Commits `e4d57e1ee1` (KVM: Move irq notifier implementation into eventfd.c, 2014-06-30) included the irq notifier code unconditionally in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before. Similarly, commit `297e21053a` (KVM: Give IRQFD its own separate enabling Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be under CONFIG_HAVE_KVM_IRQCHIP. Together, this broke compilation without CONFIG_KVM_XICS. Fix by adding or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-06 14:24:47 +02:00
Paul Mackerras	297e21053a	KVM: Give IRQFD its own separate enabling Kconfig option Currently, the IRQFD code is conditional on CONFIG_HAVE_KVM_IRQ_ROUTING. So that we can have the IRQFD code compiled in without having the IRQ routing code, this creates a new CONFIG_HAVE_KVM_IRQFD, makes the IRQFD code conditional on it instead of CONFIG_HAVE_KVM_IRQ_ROUTING, and makes all the platforms that currently select HAVE_KVM_IRQ_ROUTING also select HAVE_KVM_IRQFD. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-05 14:26:28 +02:00
Paul Mackerras	e4d57e1ee1	KVM: Move irq notifier implementation into eventfd.c This moves the functions kvm_irq_has_notifier(), kvm_notify_acked_irq(), kvm_register_irq_ack_notifier() and kvm_unregister_irq_ack_notifier() from irqchip.c to eventfd.c. The reason for doing this is that those functions are used in connection with IRQFDs, which are implemented in eventfd.c. In future we will want to use IRQFDs on platforms that don't implement the GSI routing implemented in irqchip.c, so we won't be compiling in irqchip.c, but we still need the irq notifiers. The implementation is unchanged. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-05 14:26:24 +02:00
Paul Mackerras	9957c86d65	KVM: Move all accesses to kvm::irq_routing into irqchip.c Now that struct _irqfd does not keep a reference to storage pointed to by the irq_routing field of struct kvm, we can move the statement that updates it out from under the irqfds.lock and put it in kvm_set_irq_routing() instead. That means we then have to take a srcu_read_lock on kvm->irq_srcu around the irqfd_update call in kvm_irqfd_assign(), since holding the kvm->irqfds.lock no longer ensures that that the routing can't change. Combined with changing kvm_irq_map_gsi() and kvm_irq_map_chip_pin() to take a struct kvm * argument instead of the pointer to the routing table, this allows us to to move all references to kvm->irq_routing into irqchip.c. That in turn allows us to move the definition of the kvm_irq_routing_table struct into irqchip.c as well. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-05 14:26:20 +02:00

... 3 4 5 6 7 ...

1127 Commits