2019-05-29 21:12:40 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2013-01-21 06:28:06 +07:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
|
|
|
|
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __ARM_KVM_MMU_H__
|
|
|
|
#define __ARM_KVM_MMU_H__
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#include <asm/memory.h>
|
|
|
|
#include <asm/page.h>
|
2012-10-15 17:27:37 +07:00
|
|
|
|
2012-10-28 07:09:14 +07:00
|
|
|
/*
|
|
|
|
* We directly use the kernel VA for the HYP, as we can directly share
|
|
|
|
* the mapping (HTTBR "covers" TTBR1).
|
|
|
|
*/
|
2016-07-01 00:40:51 +07:00
|
|
|
#define kern_hyp_va(kva) (kva)
|
2012-10-28 07:09:14 +07:00
|
|
|
|
2017-12-04 02:28:56 +07:00
|
|
|
/* Contrary to arm64, there is no need to generate a PC-relative address */
|
|
|
|
#define hyp_symbol_addr(s) \
|
|
|
|
({ \
|
|
|
|
typeof(s) *addr = &(s); \
|
|
|
|
addr; \
|
|
|
|
})
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2014-12-19 23:48:06 +07:00
|
|
|
#include <linux/highmem.h>
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#include <asm/cacheflush.h>
|
2017-10-23 23:11:17 +07:00
|
|
|
#include <asm/cputype.h>
|
2018-09-26 23:32:44 +07:00
|
|
|
#include <asm/kvm_arm.h>
|
2017-10-23 23:11:17 +07:00
|
|
|
#include <asm/kvm_hyp.h>
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#include <asm/pgalloc.h>
|
2016-03-22 21:08:17 +07:00
|
|
|
#include <asm/stage2_pgtable.h>
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
|
2017-12-05 00:04:38 +07:00
|
|
|
/* Ensure compatibility with arm64 */
|
|
|
|
#define VA_BITS 32
|
|
|
|
|
2018-09-26 23:32:44 +07:00
|
|
|
#define kvm_phys_shift(kvm) KVM_PHYS_SHIFT
|
|
|
|
#define kvm_phys_size(kvm) (1ULL << kvm_phys_shift(kvm))
|
|
|
|
#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - 1ULL)
|
|
|
|
#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK
|
|
|
|
|
|
|
|
#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t))
|
|
|
|
|
2016-06-13 21:00:45 +07:00
|
|
|
int create_hyp_mappings(void *from, void *to, pgprot_t prot);
|
2017-12-04 23:26:09 +07:00
|
|
|
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
|
2017-12-04 23:43:23 +07:00
|
|
|
void __iomem **kaddr,
|
|
|
|
void __iomem **haddr);
|
2018-02-13 18:00:29 +07:00
|
|
|
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
|
|
|
|
void **haddr);
|
2013-04-13 01:12:05 +07:00
|
|
|
void free_hyp_pgds(void);
|
2013-01-21 06:28:06 +07:00
|
|
|
|
2014-11-27 16:35:03 +07:00
|
|
|
void stage2_unmap_vm(struct kvm *kvm);
|
2013-01-21 06:28:07 +07:00
|
|
|
int kvm_alloc_stage2_pgd(struct kvm *kvm);
|
|
|
|
void kvm_free_stage2_pgd(struct kvm *kvm);
|
|
|
|
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
|
2014-09-18 04:56:18 +07:00
|
|
|
phys_addr_t pa, unsigned long size, bool writable);
|
2013-01-21 06:28:07 +07:00
|
|
|
|
|
|
|
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
|
|
|
|
|
|
|
|
void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
|
|
|
|
|
2013-01-21 06:28:06 +07:00
|
|
|
phys_addr_t kvm_mmu_get_httbr(void);
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
phys_addr_t kvm_get_idmap_vector(void);
|
2013-01-21 06:28:06 +07:00
|
|
|
int kvm_mmu_init(void);
|
|
|
|
void kvm_clear_hyp_idmap(void);
|
2018-06-27 21:51:05 +07:00
|
|
|
|
|
|
|
#define kvm_mk_pmd(ptep) __pmd(__pa(ptep) | PMD_TYPE_TABLE)
|
|
|
|
#define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE)
|
|
|
|
#define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; })
|
2012-10-15 17:27:37 +07:00
|
|
|
|
2018-12-12 00:10:36 +07:00
|
|
|
#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
|
|
|
|
#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
|
2018-12-12 00:10:41 +07:00
|
|
|
#define kvm_pfn_pud(pfn, prot) (__pud(0))
|
2018-12-12 00:10:36 +07:00
|
|
|
|
2018-12-12 00:10:39 +07:00
|
|
|
#define kvm_pud_pfn(pud) ({ WARN_ON(1); 0; })
|
|
|
|
|
|
|
|
|
2018-12-12 00:10:36 +07:00
|
|
|
#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd)
|
2018-12-12 00:10:41 +07:00
|
|
|
/* No support for pud hugepages */
|
|
|
|
#define kvm_pud_mkhuge(pud) ( {WARN_ON(1); pud; })
|
2018-12-12 00:10:36 +07:00
|
|
|
|
2018-12-12 00:10:37 +07:00
|
|
|
/*
|
|
|
|
* The following kvm_*pud*() functions are provided strictly to allow
|
|
|
|
* sharing code with arm64. They should never be called in practice.
|
|
|
|
*/
|
|
|
|
static inline void kvm_set_s2pud_readonly(pud_t *pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool kvm_s2pud_readonly(pud_t *pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-12-12 00:10:41 +07:00
|
|
|
static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
return pud;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pud_t kvm_s2pud_mkexec(pud_t pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
return pud;
|
|
|
|
}
|
|
|
|
|
2018-12-12 00:10:38 +07:00
|
|
|
static inline bool kvm_s2pud_exec(pud_t *pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-12-12 00:10:39 +07:00
|
|
|
static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
|
|
|
|
{
|
|
|
|
BUG();
|
|
|
|
return pud;
|
|
|
|
}
|
|
|
|
|
2018-12-12 00:10:40 +07:00
|
|
|
static inline bool kvm_s2pud_young(pud_t pud)
|
|
|
|
{
|
|
|
|
WARN_ON(1);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-04-13 23:57:37 +07:00
|
|
|
static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
|
2012-10-15 17:27:37 +07:00
|
|
|
{
|
2016-04-13 23:57:37 +07:00
|
|
|
pte_val(pte) |= L_PTE_S2_RDWR;
|
|
|
|
return pte;
|
2012-10-15 17:27:37 +07:00
|
|
|
}
|
|
|
|
|
2016-04-13 23:57:37 +07:00
|
|
|
static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
|
2012-11-01 23:14:45 +07:00
|
|
|
{
|
2016-04-13 23:57:37 +07:00
|
|
|
pmd_val(pmd) |= L_PMD_S2_RDWR;
|
|
|
|
return pmd;
|
2012-11-01 23:14:45 +07:00
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:19 +07:00
|
|
|
static inline pte_t kvm_s2pte_mkexec(pte_t pte)
|
|
|
|
{
|
|
|
|
pte_val(pte) &= ~L_PTE_XN;
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
|
|
|
|
{
|
|
|
|
pmd_val(pmd) &= ~PMD_SECT_XN;
|
|
|
|
return pmd;
|
|
|
|
}
|
|
|
|
|
2015-01-16 06:58:56 +07:00
|
|
|
static inline void kvm_set_s2pte_readonly(pte_t *pte)
|
|
|
|
{
|
|
|
|
pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool kvm_s2pte_readonly(pte_t *pte)
|
|
|
|
{
|
|
|
|
return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:21 +07:00
|
|
|
static inline bool kvm_s2pte_exec(pte_t *pte)
|
|
|
|
{
|
|
|
|
return !(pte_val(*pte) & L_PTE_XN);
|
|
|
|
}
|
|
|
|
|
2015-01-16 06:58:56 +07:00
|
|
|
static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
|
|
|
|
{
|
|
|
|
pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
|
|
|
|
{
|
|
|
|
return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
|
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:21 +07:00
|
|
|
static inline bool kvm_s2pmd_exec(pmd_t *pmd)
|
|
|
|
{
|
|
|
|
return !(pmd_val(*pmd) & PMD_SECT_XN);
|
|
|
|
}
|
|
|
|
|
2014-05-10 04:31:31 +07:00
|
|
|
static inline bool kvm_page_empty(void *ptr)
|
|
|
|
{
|
|
|
|
struct page *ptr_page = virt_to_page(ptr);
|
|
|
|
return page_count(ptr_page) == 1;
|
|
|
|
}
|
|
|
|
|
2014-10-10 17:14:28 +07:00
|
|
|
#define kvm_pte_table_empty(kvm, ptep) kvm_page_empty(ptep)
|
|
|
|
#define kvm_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
|
2016-03-23 00:15:55 +07:00
|
|
|
#define kvm_pud_table_empty(kvm, pudp) false
|
2014-05-10 04:31:31 +07:00
|
|
|
|
2016-03-23 00:15:55 +07:00
|
|
|
#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
|
|
|
|
#define hyp_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
|
|
|
|
#define hyp_pud_table_empty(pudp) false
|
2014-05-10 04:31:31 +07:00
|
|
|
|
2012-10-15 17:27:37 +07:00
|
|
|
struct kvm;
|
|
|
|
|
2014-01-15 02:13:10 +07:00
|
|
|
#define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l))
|
|
|
|
|
|
|
|
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2016-01-03 18:26:01 +07:00
|
|
|
return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
|
2014-01-15 02:13:10 +07:00
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:22 +07:00
|
|
|
static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
|
2012-10-15 17:27:37 +07:00
|
|
|
{
|
|
|
|
/*
|
2017-10-23 23:11:15 +07:00
|
|
|
* Clean the dcache to the Point of Coherency.
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
*
|
|
|
|
* We need to do this through a kernel mapping (using the
|
|
|
|
* user-space mapping has proved to be the wrong
|
|
|
|
* solution). For that, we need to kmap one page at a time,
|
|
|
|
* and iterate over the range.
|
2012-10-15 17:27:37 +07:00
|
|
|
*/
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2015-02-08 04:21:20 +07:00
|
|
|
VM_BUG_ON(size & ~PAGE_MASK);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
|
|
|
while (size) {
|
|
|
|
void *va = kmap_atomic_pfn(pfn);
|
|
|
|
|
2017-01-25 19:29:59 +07:00
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
|
|
|
size -= PAGE_SIZE;
|
|
|
|
pfn++;
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
2017-10-23 23:11:15 +07:00
|
|
|
}
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2017-10-23 23:11:22 +07:00
|
|
|
static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
|
2017-10-23 23:11:15 +07:00
|
|
|
unsigned long size)
|
2012-10-15 17:27:37 +07:00
|
|
|
{
|
2017-10-23 23:11:17 +07:00
|
|
|
u32 iclsz;
|
|
|
|
|
2012-10-15 17:27:37 +07:00
|
|
|
/*
|
|
|
|
* If we are going to insert an instruction page and the icache is
|
|
|
|
* either VIPT or PIPT, there is a potential problem where the host
|
|
|
|
* (or another VM) may have used the same page as this guest, and we
|
|
|
|
* read incorrect data from the icache. If we're using a PIPT cache,
|
|
|
|
* we can invalidate just that page, but if we are using a VIPT cache
|
|
|
|
* we need to invalidate the entire icache - damn shame - as written
|
|
|
|
* in the ARM ARM (DDI 0406C.b - Page B3-1393).
|
|
|
|
*
|
|
|
|
* VIVT caches are tagged using both the ASID and the VMID and doesn't
|
|
|
|
* need any kind of flushing (DDI 0406C.b - Page B3-1392).
|
|
|
|
*/
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2015-02-08 04:21:20 +07:00
|
|
|
VM_BUG_ON(size & ~PAGE_MASK);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2017-10-23 23:11:15 +07:00
|
|
|
if (icache_is_vivt_asid_tagged())
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!icache_is_pipt()) {
|
2012-10-15 17:27:37 +07:00
|
|
|
/* any kind of VIPT cache */
|
|
|
|
__flush_icache_all();
|
2017-10-23 23:11:15 +07:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:17 +07:00
|
|
|
/*
|
|
|
|
* CTR IminLine contains Log2 of the number of words in the
|
|
|
|
* cache line, so we can get the number of words as
|
|
|
|
* 2 << (IminLine - 1). To get the number of bytes, we
|
|
|
|
* multiply by 4 (the number of bytes in a 32-bit word), and
|
|
|
|
* get 4 << (IminLine).
|
|
|
|
*/
|
|
|
|
iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
|
|
|
|
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
while (size) {
|
|
|
|
void *va = kmap_atomic_pfn(pfn);
|
2017-10-23 23:11:17 +07:00
|
|
|
void *end = va + PAGE_SIZE;
|
|
|
|
void *addr = va;
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2017-10-23 23:11:17 +07:00
|
|
|
do {
|
|
|
|
write_sysreg(addr, ICIMVAU);
|
|
|
|
addr += iclsz;
|
|
|
|
} while (addr < end);
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
2017-10-23 23:11:17 +07:00
|
|
|
dsb(ishst);
|
|
|
|
isb();
|
arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-06 04:13:24 +07:00
|
|
|
|
|
|
|
size -= PAGE_SIZE;
|
|
|
|
pfn++;
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
|
2017-10-23 23:11:17 +07:00
|
|
|
/* Check if we need to invalidate the BTB */
|
|
|
|
if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 28) != 4) {
|
|
|
|
write_sysreg(0, BPIALLIS);
|
|
|
|
dsb(ishst);
|
|
|
|
isb();
|
2012-10-15 17:27:37 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-12-19 23:48:06 +07:00
|
|
|
static inline void __kvm_flush_dcache_pte(pte_t pte)
|
|
|
|
{
|
|
|
|
void *va = kmap_atomic(pte_page(pte));
|
|
|
|
|
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
|
|
|
|
{
|
|
|
|
unsigned long size = PMD_SIZE;
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 07:56:11 +07:00
|
|
|
kvm_pfn_t pfn = pmd_pfn(pmd);
|
2014-12-19 23:48:06 +07:00
|
|
|
|
|
|
|
while (size) {
|
|
|
|
void *va = kmap_atomic_pfn(pfn);
|
|
|
|
|
|
|
|
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
|
|
|
|
|
|
|
pfn++;
|
|
|
|
size -= PAGE_SIZE;
|
|
|
|
|
|
|
|
kunmap_atomic(va);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __kvm_flush_dcache_pud(pud_t pud)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-11-20 02:59:12 +07:00
|
|
|
#define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x))
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
|
2014-12-19 23:05:31 +07:00
|
|
|
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
|
|
|
|
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
|
2014-01-15 19:50:23 +07:00
|
|
|
|
2015-03-19 23:42:28 +07:00
|
|
|
static inline bool __kvm_cpu_uses_extended_idmap(void)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
arm64: allow ID map to be extended to 52 bits
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-12-14 00:07:24 +07:00
|
|
|
static inline unsigned long __kvm_idmap_ptrs_per_pgd(void)
|
|
|
|
{
|
|
|
|
return PTRS_PER_PGD;
|
|
|
|
}
|
|
|
|
|
2015-03-19 23:42:28 +07:00
|
|
|
static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
|
|
|
|
pgd_t *hyp_pgd,
|
|
|
|
pgd_t *merged_hyp_pgd,
|
|
|
|
unsigned long hyp_idmap_start) { }
|
|
|
|
|
2015-11-16 18:28:18 +07:00
|
|
|
static inline unsigned int kvm_get_vmid_bits(void)
|
|
|
|
{
|
|
|
|
return 8;
|
|
|
|
}
|
|
|
|
|
2018-05-11 21:20:14 +07:00
|
|
|
/*
|
|
|
|
* We are not in the kvm->srcu critical section most of the time, so we take
|
|
|
|
* the SRCU read lock here. Since we copy the data from the user page, we
|
|
|
|
* can immediately drop the lock again.
|
|
|
|
*/
|
|
|
|
static inline int kvm_read_guest_lock(struct kvm *kvm,
|
|
|
|
gpa_t gpa, void *data, unsigned long len)
|
|
|
|
{
|
|
|
|
int srcu_idx = srcu_read_lock(&kvm->srcu);
|
|
|
|
int ret = kvm_read_guest(kvm, gpa, data, len);
|
|
|
|
|
|
|
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory
When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.
When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:
[ 69.680416] =============================
[ 69.680819] WARNING: suspicious RCU usage
[ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[ 69.682096] -----------------------------
[ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[ 69.683225]
[ 69.683225] other info that might help us debug this:
[ 69.683225]
[ 69.683975]
[ 69.683975] rcu_scheduler_active = 2, debug_locks = 1
[ 69.684598] 6 locks held by qemu-system-aar/4097:
[ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[ 69.690729]
[ 69.690729] stack backtrace:
[ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[ 69.692831] Call trace:
[ 69.694072] lockdep_rcu_suspicious+0xcc/0x110
[ 69.694490] gfn_to_memslot+0x174/0x190
[ 69.694853] kvm_write_guest+0x50/0xb0
[ 69.695209] vgic_its_save_tables_v0+0x248/0x330
[ 69.695639] vgic_its_set_attr+0x298/0x3a0
[ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8
[ 69.696424] kvm_device_ioctl+0x8c/0xf8
[ 69.696788] do_vfs_ioctl+0xc8/0x960
[ 69.697128] ksys_ioctl+0x8c/0xa0
[ 69.697445] __arm64_sys_ioctl+0x28/0x38
[ 69.697817] el0_svc_common+0xd8/0x138
[ 69.698173] el0_svc_handler+0x38/0x78
[ 69.698528] el0_svc+0x8/0xc
The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242ab98. One wonders why this wasn't
fixed at the same time, but hey...
Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19 19:47:11 +07:00
|
|
|
static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa,
|
|
|
|
const void *data, unsigned long len)
|
|
|
|
{
|
|
|
|
int srcu_idx = srcu_read_lock(&kvm->srcu);
|
|
|
|
int ret = kvm_write_guest(kvm, gpa, data, len);
|
|
|
|
|
|
|
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-01-03 23:38:35 +07:00
|
|
|
static inline void *kvm_get_hyp_vector(void)
|
|
|
|
{
|
2018-02-01 18:07:35 +07:00
|
|
|
switch(read_cpuid_part()) {
|
|
|
|
#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
|
|
|
|
case ARM_CPU_PART_CORTEX_A12:
|
|
|
|
case ARM_CPU_PART_CORTEX_A17:
|
|
|
|
{
|
|
|
|
extern char __kvm_hyp_vector_bp_inv[];
|
|
|
|
return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
|
|
|
|
}
|
|
|
|
|
2018-05-10 23:52:18 +07:00
|
|
|
case ARM_CPU_PART_BRAHMA_B15:
|
2018-02-01 18:07:38 +07:00
|
|
|
case ARM_CPU_PART_CORTEX_A15:
|
|
|
|
{
|
|
|
|
extern char __kvm_hyp_vector_ic_inv[];
|
|
|
|
return kvm_ksym_ref(__kvm_hyp_vector_ic_inv);
|
|
|
|
}
|
2018-02-01 18:07:35 +07:00
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
{
|
|
|
|
extern char __kvm_hyp_vector[];
|
|
|
|
return kvm_ksym_ref(__kvm_hyp_vector);
|
|
|
|
}
|
|
|
|
}
|
2018-01-03 23:38:35 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int kvm_map_vectors(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-29 19:11:16 +07:00
|
|
|
static inline int hyp_map_aux_data(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-12-14 00:07:18 +07:00
|
|
|
#define kvm_phys_to_vttbr(addr) (addr)
|
|
|
|
|
2018-09-26 23:32:52 +07:00
|
|
|
static inline void kvm_set_ipa_limit(void) {}
|
|
|
|
|
2018-12-11 21:26:31 +07:00
|
|
|
static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
|
2018-07-31 20:08:57 +07:00
|
|
|
{
|
2018-12-11 21:26:31 +07:00
|
|
|
struct kvm_vmid *vmid = &kvm->arch.vmid;
|
|
|
|
u64 vmid_field, baddr;
|
|
|
|
|
|
|
|
baddr = kvm->arch.pgd_phys;
|
|
|
|
vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
|
|
|
|
return kvm_phys_to_vttbr(baddr) | vmid_field;
|
2018-07-31 20:08:57 +07:00
|
|
|
}
|
|
|
|
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 01:12:06 +07:00
|
|
|
#endif /* !__ASSEMBLY__ */
|
|
|
|
|
2013-01-21 06:28:06 +07:00
|
|
|
#endif /* __ARM_KVM_MMU_H__ */
|