2013-01-21 06:28:06 +07:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
|
|
|
|
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License, version 2, as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write to the Free Software
|
|
|
|
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __ARM_KVM_MMU_H__
|
|
|
|
#define __ARM_KVM_MMU_H__
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#include <asm/memory.h>
|
|
|
|
#include <asm/page.h>
|
2012-10-15 17:27:37 +07:00
|
|
|
|
2012-10-28 07:09:14 +07:00
|
|
|
/*
|
|
|
|
* We directly use the kernel VA for the HYP, as we can directly share
|
|
|
|
* the mapping (HTTBR "covers" TTBR1).
|
|
|
|
*/
|
2016-07-01 00:40:51 +07:00
|
|
|
#define kern_hyp_va(kva) (kva)
|
2012-10-28 07:09:14 +07:00
|
|
|
|
2014-10-10 17:14:28 +07:00
|
|
|
/*
|
|
|
|
* KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels.
|
|
|
|
*/
|
|
|
|
#define KVM_MMU_CACHE_MIN_PAGES 2
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2014-12-19 23:48:06 +07:00
|
|
|
#include <linux/highmem.h>
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#include <asm/cacheflush.h>
|
|
|
|
#include <asm/pgalloc.h>
|
2016-03-22 21:08:17 +07:00
|
|
|
#include <asm/stage2_pgtable.h>
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
|
2016-06-13 21:00:45 +07:00
|
|
|
int create_hyp_mappings(void *from, void *to, pgprot_t prot);
|
2013-01-21 06:28:06 +07:00
|
|
|
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
|
2013-04-13 01:12:05 +07:00
|
|
|
void free_hyp_pgds(void);
|
2013-01-21 06:28:06 +07:00
|
|
|
|
2014-11-27 16:35:03 +07:00
|
|
|
void stage2_unmap_vm(struct kvm *kvm);
|
2013-01-21 06:28:07 +07:00
|
|
|
int kvm_alloc_stage2_pgd(struct kvm *kvm);
|
|
|
|
void kvm_free_stage2_pgd(struct kvm *kvm);
|
|
|
|
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
|
2014-09-18 04:56:18 +07:00
|
|
|
phys_addr_t pa, unsigned long size, bool writable);
|
2013-01-21 06:28:07 +07:00
|
|
|
|
|
|
|
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
|
|
|
|
|
|
|
|
void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
|
|
|
|
|
2013-01-21 06:28:06 +07:00
|
|
|
phys_addr_t kvm_mmu_get_httbr(void);
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
phys_addr_t kvm_get_idmap_vector(void);
|
2013-01-21 06:28:06 +07:00
|
|
|
int kvm_mmu_init(void);
|
|
|
|
void kvm_clear_hyp_idmap(void);
|
2013-01-21 06:28:12 +07:00
|
|
|
|
2012-11-01 23:14:45 +07:00
|
|
|
static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
|
|
|
|
{
|
|
|
|
*pmd = new_pmd;
|
arm/kvm: excise redundant cache maintenance
When modifying Stage-2 page tables, we perform cache maintenance to
account for non-coherent page table walks. However, this is unnecessary,
as page table walks are guaranteed to be coherent in the presence of the
virtualization extensions.
Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
virtualization extensions mandate the multiprocessing extensions.
Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
requirements"), as described in the sub-section titled "TLB maintenance
operations and the memory order model", this maintenance is not required
in the presence of the multiprocessing extensions.
Hence, we need not perform this cache maintenance when modifying Stage-2
entries.
This patch removes the logic for performing the redundant maintenance.
To ensure visibility and ordering of updates, a dsb(ishst) that was
otherwise implicit in the maintenance is folded into kvm_set_pmd() and
kvm_set_pte().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-30 23:05:55 +07:00
|
|
|
dsb(ishst);
|
2012-11-01 23:14:45 +07:00
|
|
|
}
|
|
|
|
|
2012-10-15 17:27:37 +07:00
|
|
|
static inline void kvm_set_pte(pte_t *pte, pte_t new_pte)
|
|
|
|
{
|
2013-08-09 10:35:07 +07:00
|
|
|
*pte = new_pte;
|
arm/kvm: excise redundant cache maintenance
When modifying Stage-2 page tables, we perform cache maintenance to
account for non-coherent page table walks. However, this is unnecessary,
as page table walks are guaranteed to be coherent in the presence of the
virtualization extensions.
Per ARM DDI 0406C.c, section B1.7 ("The Virtualization Extensions"), the
virtualization extensions mandate the multiprocessing extensions.
Per ARM DDI 0406C.c, section B3.10.1 ("General TLB maintenance
requirements"), as described in the sub-section titled "TLB maintenance
operations and the memory order model", this maintenance is not required
in the presence of the multiprocessing extensions.
Hence, we need not perform this cache maintenance when modifying Stage-2
entries.
This patch removes the logic for performing the redundant maintenance.
To ensure visibility and ordering of updates, a dsb(ishst) that was
otherwise implicit in the maintenance is folded into kvm_set_pmd() and
kvm_set_pte().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-30 23:05:55 +07:00
|
|
|
dsb(ishst);
|
2012-10-15 17:27:37 +07:00
|
|
|
}
|
|
|
|
|
2016-04-13 23:57:37 +07:00
|
|
|
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
|
2012-10-15 17:27:37 +07:00
|
|
|
{
|
2016-04-13 23:57:37 +07:00
|
|
|
pte_val(pte) |= L_PTE_S2_RDWR;
|
|
|
|
return pte;
|
2012-10-15 17:27:37 +07:00
|
|
|
}
|
|
|
|
|
2016-04-13 23:57:37 +07:00
|
|
|
static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
|
2012-11-01 23:14:45 +07:00
|
|
|
{
|
2016-04-13 23:57:37 +07:00
|
|
|
pmd_val(pmd) |= L_PMD_S2_RDWR;
|
|
|
|
return pmd;
|
2012-11-01 23:14:45 +07:00
|
|
|
}
|
|
|
|
|
2015-01-16 06:58:56 +07:00
|
|
|
static inline void kvm_set_s2pte_readonly(pte_t *pte)
|
|
|
|
{
|
|
|
|
pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool kvm_s2pte_readonly(pte_t *pte)
|
|
|
|
{
|
|
|
|
return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
|
|
|
|
{
|
|
|
|
pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
|
|
|
|
{
|
|
|
|
return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
2014-05-10 04:31:31 +07:00
|
|
|
static inline bool kvm_page_empty(void *ptr)
|
|
|
|
{
|
|
|
|
struct page *ptr_page = virt_to_page(ptr);
|
|
|
|
return page_count(ptr_page) == 1;
|
|
|
|
}
|
|
|
|
|
2014-10-10 17:14:28 +07:00
|
|
|
#define kvm_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)
|
|
|
|
#define kvm_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
|
2016-03-23 00:15:55 +07:00
|
|
|
#define kvm_pud_table_empty(kvm, pudp) false
|
2014-05-10 04:31:31 +07:00
|
|
|
|
2016-03-23 00:15:55 +07:00
|
|
|
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
|
|
|
|
#define hyp_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
|
|
|
|
#define hyp_pud_table_empty(pudp) false
|
2014-05-10 04:31:31 +07:00
|
|
|
|
2012-10-15 17:27:37 +07:00
|
|
|
struct kvm;
|
|
|
|
|
2014-01-15 02:13:10 +07:00
|
|
|
#define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l))
|
|
|
|
|
|
|
|
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2016-01-03 18:26:01 +07:00
|
|
|
return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
|
2014-01-15 02:13:10 +07:00
|
|
|
}
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 07:56:11 +07:00
|
|
|
static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
|
|
|
|
kvm_pfn_t pfn,
|
2017-01-25 20:33:11 +07:00
|
|
|
unsigned long size)
|
2012-10-15 17:27:37 +07:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If we are going to insert an instruction page and the icache is
|
|
|
|
* either VIPT or PIPT, there is a potential problem where the host
|
|
|
|
* (or another VM) may have used the same page as this guest, and we
|
|
|
|
* read incorrect data from the icache. If we're using a PIPT cache,
|
|
|
|
* we can invalidate just that page, but if we are using a VIPT cache
|
|
|
|
* we need to invalidate the entire icache - damn shame - as written
|
|
|
|
* in the ARM ARM (DDI 0406C.b - Page B3-1393).
|
|
|
|
*
|
|
|
|
* VIVT caches are tagged using both the ASID and the VMID and doesn't
|
|
|
|
* need any kind of flushing (DDI 0406C.b - Page B3-1392).
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
*
|
|
|
|
* We need to do this through a kernel mapping (using the
|
|
|
|
* user-space mapping has proved to be the wrong
|
|
|
|
* solution). For that, we need to kmap one page at a time,
|
|
|
|
* and iterate over the range.
|
2012-10-15 17:27:37 +07:00
|
|
|
*/
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2015-02-08 04:21:20 +07:00
|
|
|
VM_BUG_ON(size & ~PAGE_MASK);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
|
|
|
while (size) {
|
|
|
|
void *va = kmap_atomic_pfn(pfn);
|
|
|
|
|
2017-01-25 19:29:59 +07:00
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
|
|
|
if (icache_is_pipt())
|
|
|
|
__cpuc_coherent_user_range((unsigned long)va,
|
|
|
|
(unsigned long)va + PAGE_SIZE);
|
|
|
|
|
|
|
|
size -= PAGE_SIZE;
|
|
|
|
pfn++;
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
|
2012-10-15 17:27:37 +07:00
|
|
|
/* any kind of VIPT cache */
|
|
|
|
__flush_icache_all();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-19 23:48:06 +07:00
|
|
|
static inline void __kvm_flush_dcache_pte(pte_t pte)
|
|
|
|
{
|
|
|
|
void *va = kmap_atomic(pte_page(pte));
|
|
|
|
|
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
|
|
|
|
{
|
|
|
|
unsigned long size = PMD_SIZE;
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 07:56:11 +07:00
|
|
|
kvm_pfn_t pfn = pmd_pfn(pmd);
|
2014-12-19 23:48:06 +07:00
|
|
|
|
|
|
|
while (size) {
|
|
|
|
void *va = kmap_atomic_pfn(pfn);
|
|
|
|
|
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
|
|
|
|
|
|
|
pfn++;
|
|
|
|
size -= PAGE_SIZE;
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __kvm_flush_dcache_pud(pud_t pud)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-11-20 02:59:12 +07:00
|
|
|
#define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x))
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
|
2014-12-19 23:05:31 +07:00
|
|
|
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
|
|
|
|
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
|
2014-01-15 19:50:23 +07:00
|
|
|
|
2015-03-19 23:42:28 +07:00
|
|
|
static inline bool __kvm_cpu_uses_extended_idmap(void)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
|
|
|
|
pgd_t *hyp_pgd,
|
|
|
|
pgd_t *merged_hyp_pgd,
|
|
|
|
unsigned long hyp_idmap_start) { }
|
|
|
|
|
2015-11-16 18:28:18 +07:00
|
|
|
static inline unsigned int kvm_get_vmid_bits(void)
|
|
|
|
{
|
|
|
|
return 8;
|
|
|
|
}
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#endif /* !__ASSEMBLY__ */
|
|
|
|
|
2013-01-21 06:28:06 +07:00
|
|
|
#endif /* __ARM_KVM_MMU_H__ */
|