Currently, the internal vcpu finalization functions use a different
name ("what") for the feature parameter than the name ("feature")
used in the documentation.
To avoid future confusion, this patch converts everything to use
the name "feature" consistently.
No functional change.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Correct virtualization of SVE relies for correctness on code in
set_sve_vls() that verifies consistency between the set of vector
lengths requested by userspace and the set of vector lengths
available on the host.
However, the purpose of this code is not obvious, and not likely to
be apparent at all to people who do not have detailed knowledge of
the SVE system-level architecture.
This patch adds a suitable comment to explain what these checks are
for.
No functional change.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A complicated DIV_ROUND_UP() expression is currently written out
explicitly in multiple places in order to specify the size of the
bitmap exchanged with userspace to represent the value of the
KVM_REG_ARM64_SVE_VLS pseudo-register.
Userspace currently has no direct way to work this out either: for
documentation purposes, the size is just quoted as 8 u64s.
To make this more intuitive, this patch replaces these with a
single define, which is also exported to userspace as
KVM_ARM64_SVE_VLS_WORDS.
Since the number of words in a bitmap is just the index of the last
word used + 1, this patch expresses the bound that way instead.
This should make it clearer what is being expressed.
For userspace convenience, the minimum and maximum possible vector
lengths relevant to the KVM ABI are exposed to UAPI as
KVM_ARM64_SVE_VQ_MIN, KVM_ARM64_SVE_VQ_MAX. Since the only direct
use for these at present is manipulation of KVM_REG_ARM64_SVE_VLS,
no corresponding _VL_ macros are defined. They could be added
later if a need arises.
Since use of DIV_ROUND_UP() was the only reason for including
<linux/kernel.h> in guest.c, this patch also removes that #include.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
sve_reg_to_region() currently passes the result of
vcpu_sve_state_size() to array_index_nospec(), effectively
leading to a divide / modulo operation.
Currently the code bails out and returns -EINVAL if
vcpu_sve_state_size() turns out to be zero, in order to avoid going
ahead and attempting to divide by zero. This is reasonable, but it
should only happen if the kernel contains some other bug that
allowed this code to be reached without the vcpu having been
properly initialised.
To make it clear that this is a defence against bugs rather than
something that the user should be able to trigger, this patch marks
the check with WARN_ON().
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently, the way error codes are generated when processing the
SVE register access ioctls in a bit haphazard.
This patch refactors the code so that the behaviour is more
consistent: now, -EINVAL should be returned only for unrecognised
register IDs or when some other runtime error occurs. -ENOENT is
returned for register IDs that are recognised, but whose
corresponding register (or slice) does not exist for the vcpu.
To this end, in {get,set}_sve_reg() we now delegate the
vcpu_has_sve() check down into {get,set}_sve_vls() and
sve_reg_to_region(). The KVM_REG_ARM64_SVE_VLS special case is
picked off first, then sve_reg_to_region() plays the role of
exhaustively validating or rejecting the register ID and (where
accepted) computing the applicable register region as before.
sve_reg_to_region() is rearranged so that -ENOENT or -EPERM is not
returned prematurely, before checking whether reg->id is in a
recognised range.
-EPERM is now only returned when an attempt is made to access an
actually existing register slice on an unfinalized vcpu.
Fixes: e1c9c98345 ("KVM: arm64/sve: Add SVE support to register access ioctl interface")
Fixes: 9033bba4b5 ("KVM: arm64/sve: Add pseudo-register for the guest's vector lengths")
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* Remove a few redundant blank lines that are stylistically
inconsistent with code already in guest.c and are just taking up
space.
* Delete a couple of pointless empty default cases from switch
statements whose behaviour is otherwise obvious anyway.
* Fix some typos and consolidate some redundantly duplicated
comments.
* Respell the slice index check in sve_reg_to_region() as "> 0"
to be more consistent with what is logically being checked here
(i.e., "is the slice index too large"), even though we don't try
to cope with multiple slices yet.
No functional change.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently, the SVE register ID macros are not all defined in the
same way, and advertise the fact that FFR maps onto the nonexistent
predicate register P16. This is really just for kernel
convenience, and may lead userspace into bad habits.
Instead, this patch masks the ID macro arguments so that
architecturally invalid register numbers will not be passed through
any more, and uses a literal KVM_REG_ARM64_SVE_FFR_BASE macro to
define KVM_REG_ARM64_SVE_FFR(), similarly to the way the _ZREG()
and _PREG() macros are defined.
Rather than plugging in magic numbers for the number of Z- and P-
registers and the maximum possible number of register slices, this
patch provides definitions for those too. Userspace is going to
need them in any case, and it makes sense for them to come from
<uapi/asm/kvm.h>.
sve_reg_to_region() uses convenience constants that are defined in
a different way, and also makes use of the fact that the FFR IDs
are really contiguous with the P15 IDs, so this patch retains the
existing convenience constants in guest.c, supplemented with a
couple of sanity checks to check for consistency with the UAPI
header.
Fixes: e1c9c98345 ("KVM: arm64/sve: Add SVE support to register access ioctl interface")
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Because of the logic in kvm_arm_sys_reg_{get,set}_reg() and
sve_id_visibility(), we should never call
{get,set}_id_aa64zfr0_el1() for a vcpu where !vcpu_has_sve(vcpu).
To avoid the code giving the impression that it is valid for these
functions to be called in this situation, and to help the compiler
make the right optimisation decisions, this patch adds WARN_ON()
for these cases.
Given the way the logic is spread out, this seems preferable to
dropping the checks altogether.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The introduction of kvm_arm_init_arch_resources() looks like
premature factoring, since nothing else uses this hook yet and it
is not clear what will use it in the future.
For now, let's not pretend that this is a general thing:
This patch simply renames the function to kvm_arm_init_sve(),
retaining the arm stub version under the new name.
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The introduction of the SVE registers to userspace started with a
refactoring of the way we expose any register via the ONE_REG
interface.
Unfortunately, this change doesn't exactly behave as expected
if the number of registers is non-zero and consider everything
to be an error. The visible result is that QEMU barfs very early
when creating vcpus.
Make sure we only exit early in case there is an actual error, rather
than a positive number of registers...
Fixes: be25bbb392 ("KVM: arm64: Factor out core register ID enumeration")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
To provide a uniform way to check for KVM SVE support amongst other
features, this patch adds a suitable capability KVM_CAP_ARM_SVE,
and reports it as present when SVE is available.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that all the pieces are in place, this patch offers a new flag
KVM_ARM_VCPU_SVE that userspace can pass to KVM_ARM_VCPU_INIT to
turn on SVE for the guest, on a per-vcpu basis.
As part of this, support for initialisation and reset of the SVE
vector length set and registers is added in the appropriate places,
as well as finally setting the KVM_ARM64_GUEST_HAS_SVE vcpu flag,
to turn on the SVE support code.
Allocation of the SVE register storage in vcpu->arch.sve_state is
deferred until the SVE configuration is finalized, by which time
the size of the registers is known.
Setting the vector lengths supported by the vcpu is considered
configuration of the emulated hardware rather than runtime
configuration, so no support is offered for changing the vector
lengths available to an existing vcpu across reset.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
allow userspace to set and query the set of vector lengths visible
to the guest.
In the future, multiple register slices per SVE register may be
visible through the ioctl interface. Once the set of slices has
been determined we would not be able to allow the vector length set
to be changed any more, in order to avoid userspace seeing
inconsistent sets of registers. For this reason, this patch adds
support for explicit finalization of the SVE configuration via the
KVM_ARM_VCPU_FINALIZE ioctl.
Finalization is the proper place to allocate the SVE register state
storage in vcpu->arch.sve_state, so this patch adds that as
appropriate. The data is freed via kvm_arch_vcpu_uninit(), which
was previously a no-op on arm64.
To simplify the logic for determining what vector lengths can be
supported, some code is added to KVM init to work this out, in the
kvm_arm_init_arch_resources() hook.
The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
Subsequent patches will allow SVE to be turned on for guest vcpus,
making it visible.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch includes the SVE register IDs in the list returned by
KVM_GET_REG_LIST, as appropriate.
On a non-SVE-enabled vcpu, no new IDs are added.
On an SVE-enabled vcpu, IDs for the FPSIMD V-registers are removed
from the list, since userspace is required to access the Z-
registers instead in order to access the V-register content. For
the variably-sized SVE registers, the appropriate set of slice IDs
are enumerated, depending on the maximum vector length for the
vcpu.
As it currently stands, the SVE architecture never requires more
than one slice to exist per register, so this patch adds no
explicit support for enumerating multiple slices. The code can be
extended straightforwardly to support this in the future, if
needed.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch adds the following registers for access via the
KVM_{GET,SET}_ONE_REG interface:
* KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
* KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
* KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
In order to adapt gracefully to future architectural extensions,
the registers are logically divided up into slices as noted above:
the i parameter denotes the slice index.
This allows us to reserve space in the ABI for future expansion of
these registers. However, as of today the architecture does not
permit registers to be larger than a single slice, so no code is
needed in the kernel to expose additional slices, for now. The
code can be extended later as needed to expose them up to a maximum
of 32 slices (as carved out in the architecture itself) if they
really exist someday.
The registers are only visible for vcpus that have SVE enabled.
They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
have SVE.
Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
allowed for SVE-enabled vcpus: SVE-aware userspace can use the
KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
register state. This avoids some complex and pointless emulation
in the kernel to convert between the two views of these aliased
registers.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to avoid the pointless complexity of maintaining two ioctl
register access views of the same data, this patch blocks ioctl
access to the FPSIMD V-registers on vcpus that support SVE.
This will make it more straightforward to add SVE register access
support.
Since SVE is an opt-in feature for userspace, this will not affect
existing users.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for adding logic to filter out some KVM_REG_ARM_CORE
registers from the KVM_GET_REG_LIST output, this patch factors out
the core register enumeration into a separate function and rebuilds
num_core_regs() on top of it.
This may be a little more expensive (depending on how good a job
the compiler does of specialising the code), but KVM_GET_REG_LIST
is not a hot path.
This will make it easier to consolidate ID filtering code in one
place.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
arch/arm64/kvm/guest.c uses the string functions, but the
corresponding header is not included.
We seem to get away with this for now, but for completeness this
patch adds the #include, in preparation for adding yet more
memset() calls.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to give each vcpu its own view of the SVE registers, this
patch adds context storage via a new sve_state pointer in struct
vcpu_arch. An additional member sve_max_vl is also added for each
vcpu, to determine the maximum vector length visible to the guest
and thus the value to be configured in ZCR_EL2.LEN while the vcpu
is active. This also determines the layout and size of the storage
in sve_state, which is read and written by the same backend
functions that are used for context-switching the SVE state for
host tasks.
On SVE-enabled vcpus, SVE access traps are now handled by switching
in the vcpu's SVE context and disabling the trap before returning
to the guest. On other vcpus, the trap is not handled and an exit
back to the host occurs, where the handle_sve() fallback path
reflects an undefined instruction exception back to the guest,
consistently with the behaviour of non-SVE-capable hardware (as was
done unconditionally prior to this patch).
No SVE handling is added on non-VHE-only paths, since VHE is an
architectural and Kconfig prerequisite of SVE.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch adds the necessary support for context switching ZCR_EL1
for each vcpu.
ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
sense for it to be handled as part of the guest FPSIMD/SVE context
for context switch purposes instead of handling it as a general
system register. This means that it can be switched in lazily at
the appropriate time. No effort is made to track host context for
this register, since SVE requires VHE: thus the hosts's value for
this register lives permanently in ZCR_EL2 and does not alias the
guest's value at any time.
The Hyp switch and fpsimd context handling code is extended
appropriately.
Accessors are added in sys_regs.c to expose the SVE system
registers and ID register fields. Because these need to be
conditionally visible based on the guest configuration, they are
implemented separately for now rather than by use of the generic
system register helpers. This may be abstracted better later on
when/if there are more features requiring this model.
ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
guest, but for compatibility with non-SVE aware KVM implementations
the register should not be enumerated at all for KVM_GET_REG_LIST
in this case. For consistency we also reject ioctl access to the
register. This ensures that a non-SVE-enabled guest looks the same
to userspace, irrespective of whether the kernel KVM implementation
supports SVE.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Some optional features of the Arm architecture add new system
registers that are not present in the base architecture.
Where these features are optional for the guest, the visibility of
these registers may need to depend on some runtime configuration,
such as a flag passed to KVM_ARM_VCPU_INIT.
For example, ZCR_EL1 and ID_AA64ZFR0_EL1 need to be hidden if SVE
is not enabled for the guest, even though these registers may be
present in the hardware and visible to the host at EL2.
Adding special-case checks all over the place for individual
registers is going to get messy as the number of conditionally-
visible registers grows.
In order to help solve this problem, this patch adds a new sysreg
method visibility() that can be used to hook in any needed runtime
visibility checks. This method can currently return
REG_HIDDEN_USER to inhibit enumeration and ioctl access to the
register for userspace, and REG_HIDDEN_GUEST to inhibit runtime
access by the guest using MSR/MRS. Wrappers are added to allow
these flags to be conveniently queried.
This approach allows a conditionally modified view of individual
system registers such as the CPU ID registers, in addition to
completely hiding register where appropriate.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Architecture features that are conditionally visible to the guest
will require run-time checks in the ID register accessor functions.
In particular, read_id_reg() will need to perform checks in order
to generate the correct emulated value for certain ID register
fields such as ID_AA64PFR0_EL1.SVE for example.
This patch propagates vcpu into read_id_reg() so that future
patches can add run-time checks on the guest configuration here.
For now, there is no functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The current FPSIMD/SVE context handling support for non-task (i.e.,
KVM vcpu) contexts does not take SVE into account. This means that
only task contexts can safely use SVE at present.
In preparation for enabling KVM guests to use SVE, it is necessary
to keep track of SVE state for non-task contexts too.
This patch adds the necessary support, removing assumptions from
the context switch code about the location of the SVE context
storage.
When binding a vcpu context, its vector length is arbitrarily
specified as SVE_VL_MIN for now. In any case, because TIF_SVE is
presently cleared at vcpu context bind time, the specified vector
length will not be used for anything yet. In later patches TIF_SVE
will be set here as appropriate, and the appropriate maximum vector
length for the vcpu will be passed when binding.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
kvm_arm_num_regs() adds together various partial register counts in
a freeform sum expression, which makes it harder than necessary to
read diffs that add, modify or remove a single term in the sum
(which is expected to the common case under maintenance).
This patch refactors the code to add the term one per line, for
maximum readability.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
for 32-bit guests
s390: interrupt cleanup, introduction of the Guest Information Block,
preparation for processor subfunctions in cpu models
PPC: bug fixes and improvements, especially related to machine checks
and protection keys
x86: many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations; plus AVIC fixes.
Generic: memcg accounting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
=XIzU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests
s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models
PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys
x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes
Generic:
- memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception level
and to avoid the propagation of the UNKNOWN FAR value into the si_code
for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlyCl0cACgkQa9axLQDI
XvEyKxAAiogBZLbyhcy8bTUHVzVoJE0FyAkdO2wWnnaff2Ohkhy1Y/npv33IeK2q
RknxqDIx2DUUVPJNRZGoI/WwBtTZdKaAnW4rIKG84yC1eAkFcd96WQasaZzcp1qY
HmvbJiYXM0bh+0J7i3Wgry/QzOkrltJFJW2kp6Wd5aFE+R1WyWyxT6d+Fp0J3vlA
bT70jlpBK6LXEOmmBS+04Ml02+8MvaGxIl8EInBHSfDLRLErj5E8n41rRHKUiSWz
maWI+kVoLYwOE68xiZlDftUBEeQpUSWgg2nxeK+640QSl1wJmVcRcY9nm6TZeMG2
AiZTR9a7cP5rrdSN5suUmb7d4AMMVlVMisGDlwb+9oCxeTRDzg0uwACaVgHfPqQr
UeBdHbL9nStN7uBH23H8L9mKk+tqpFmk0sgzdrKejOwysAiqWV8aazb/Na3qnVRl
J1B5opxMnGOsjXmHvtG/tiZl281Uwz5ZmzfLmIY3gUZgUgdA3511Egp0ry5y1dzJ
SkYC4Hmzb2ybQvXGIDDa3OzCwXXiqyqKsO+O8Egg1k4OIwbp3w+NHE7gKeA+dMgD
gjN7zEalCUi46Q28xiCPEb+88BpQ18czIWGQLb9mAnmYeZPjqqenXKXuRHr4lgVe
jPURJ/vqvFEglZJN1RDuQHKzHEcm5f2XE566sMZYdSoeiUCb0QM=
=2U56
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by
the riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception
level and to avoid the propagation of the UNKNOWN FAR value into the
si_code for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
arm64: mmu: drop paging_init comments
arm64: debug: Ensure debug handlers check triggering exception level
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
Revert "arm64: uaccess: Implement unsafe accessors"
arm64: avoid clang warning about self-assignment
arm64: Kconfig.platforms: fix warning unmet direct dependencies
lib/raid6: arm: optimize away a mask operation in NEON recovery routine
lib/raid6: use vdupq_n_u8 to avoid endianness warnings
arm64: io: Hook up __io_par() for inX() ordering
riscv: io: Update __io_[p]ar() macros to take an argument
asm-generic/io: Pass result of I/O accessor to __io_[p]ar()
arm64: Add workaround for Fujitsu A64FX erratum 010001
arm64: Rename get_thread_info()
arm64: Remove documentation about TIF_USEDFPU
arm64: irqflags: Fix clang build warnings
arm64: Enable the support of pseudo-NMIs
arm64: Skip irqflags tracing for NMI in IRQs disabled context
arm64: Skip preemption when exiting an NMI
arm64: Handle serror in NMI context
irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
...
Due to what looks like a typo dating back to the original addition
of FPEXC32_EL2 handling, KVM currently initialises this register to
an architecturally invalid value.
As a result, the VECITR field (RES1) in bits [10:8] is initialised
with 0, and the two reserved (RES0) bits [6:5] are initialised with
1. (In the Common VFP Subarchitecture as specified by ARMv7-A,
these two bits were IMP DEF. ARMv8-A removes them.)
This patch changes the reset value from 0x70 to 0x700, which
reflects the architectural constraints and is presumably what was
originally intended.
Cc: <stable@vger.kernel.org> # 4.12.x-
Cc: Christoffer Dall <christoffer.dall@arm.com>
Fixes: 62a89c4495 ("arm64: KVM: 32bit handling of coprocessor traps")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit 48f6e3cf5b
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/9632347/
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.
I was able to build without these extra header search paths.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On SMP ARM systems, cache maintenance by set/way should only ever be
done in the context of onlining or offlining CPUs, which is typically
done by bare metal firmware and never in a virtual machine. For this
reason, we trap set/way cache maintenance operations and replace them
with conditional flushing of the entire guest address space.
Due to this trapping, the set/way arguments passed into the set/way
ops are completely ignored, and thus irrelevant. This also means that
the set/way geometry is equally irrelevant, and we can simply report
it as 1 set and 1 way, so that legacy 32-bit ARM system software (i.e.,
the kind that only receives odd fixes) doesn't take a performance hit
due to the trapping when iterating over the cachelines.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We currently permit CPUs in the same system to deviate in the exact
topology of the caches, and we subsequently hide this fact from user
space by exposing a sanitised value of the cache type register CTR_EL0.
However, guests running under KVM see the bare value of CTR_EL0, which
could potentially result in issues with, e.g., JITs or other pieces of
code that are sensitive to misreported cache line sizes.
So let's start trapping cache ID instructions if there is a mismatch,
and expose the sanitised version of CTR_EL0 to guests. Note that CTR_EL0
is treated as an invariant to KVM user space, so update that part as well.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
At the moment we have separate system register emulation handlers for
each timer register. Actually they are quite similar, and we rely on
kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's
just merge all of those handlers into one function, which just marshalls
the arguments and then hands off to a set of common accessors.
This makes extending the emulation to include EL2 timers much easier.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[Fixed 32-bit VM breakage and reduced to reworking existing code]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Fixed 32bit host, general cleanup]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of having an open-coded macro, reuse the sys_reg() macro
that does the exact same thing (the encoding is slightly different,
but the ordering property is the same).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
We currently eagerly save/restore MPIDR. It turns out to be
slightly pointless:
- On the host, this value is known as soon as we're scheduled on a
physical CPU
- In the guest, this value cannot change, as it is set by KVM
(and this is a read-only register)
The result of the above is that we can perfectly avoid the eager
saving of MPIDR_EL1, and only keep the restore. We just have
to setup the host contexts appropriately at boot time.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
We now call VHE code directly, without going through any central
dispatching function. Let's drop that code.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
On systems with VHE the kernel and KVM's world-switch code run at the
same exception level. Code that is only used on a VHE system does not
need to be annotated as __hyp_text as it can reside anywhere in the
kernel text.
__hyp_text was also used to prevent kprobes from patching breakpoint
instructions into this region, as this code runs at a different
exception level. While this is no longer true with VHE, KVM still
switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
world-switch code will cause a hyp-panic.
echo "p:weasel sysreg_save_guest_state_vhe" > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/weasel/enable
lkvm run -k /boot/Image --console serial -p "console=ttyS0 earlycon=uart,mmio,0x3f8"
# lkvm run -k /boot/Image -m 384 -c 3 --name guest-1474
Info: Placing fdt at 0x8fe00000 - 0x8fffffff
Info: virtio-mmio.devices=0x200@0x10000:36
Info: virtio-mmio.devices=0x200@0x10200:37
Info: virtio-mmio.devices=0x200@0x10400:38
[ 614.178186] Kernel panic - not syncing: HYP panic:
[ 614.178186] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004
[ 614.178186] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de
[ 614.178186] VCPU:00000000f8de32f1
[ 614.178383] CPU: 2 PID: 1482 Comm: kvm-vcpu-0 Not tainted 5.0.0-rc2 #10799
[ 614.178446] Call trace:
[ 614.178480] dump_backtrace+0x0/0x148
[ 614.178567] show_stack+0x24/0x30
[ 614.178658] dump_stack+0x90/0xb4
[ 614.178710] panic+0x13c/0x2d8
[ 614.178793] hyp_panic+0xac/0xd8
[ 614.178880] kvm_vcpu_run_vhe+0x9c/0xe0
[ 614.178958] kvm_arch_vcpu_ioctl_run+0x454/0x798
[ 614.179038] kvm_vcpu_ioctl+0x360/0x898
[ 614.179087] do_vfs_ioctl+0xc4/0x858
[ 614.179174] ksys_ioctl+0x84/0xb8
[ 614.179261] __arm64_sys_ioctl+0x28/0x38
[ 614.179348] el0_svc_common+0x94/0x108
[ 614.179401] el0_svc_handler+0x38/0x78
[ 614.179487] el0_svc+0x8/0xc
[ 614.179558] SMP: stopping secondary CPUs
[ 614.179661] Kernel Offset: disabled
[ 614.179695] CPU features: 0x003,2a80aa38
[ 614.179758] Memory Limit: none
[ 614.179858] ---[ end Kernel panic - not syncing: HYP panic:
[ 614.179858] PS:404003c9 PC:ffff0000100d70e0 ESR:f2000004
[ 614.179858] FAR:0000000080080000 HPFAR:0000000000800800 PAR:1d00007edbadc0de
[ 614.179858] VCPU:00000000f8de32f1 ]---
Annotate the VHE world-switch functions that aren't marked
__hyp_text using NOKPROBE_SYMBOL().
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: 3f5c90b890 ("KVM: arm64: Introduce VHE-specific kvm_vcpu_run")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Failing to properly reset system registers is pretty bad. But not
quite as bad as bringing the whole machine down... So warn loudly,
but slightly more gracefully.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it. However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.
Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
We have two ways to reset a vcpu:
- either through VCPU_INIT
- or through a PSCI_ON call
The first one is easy to reason about. The second one is implemented
in a more bizarre way, as it is the vcpu that handles PSCI_ON that
resets the vcpu that is being powered-on. As we need to turn the logic
around and have the target vcpu to reset itself, we must take some
preliminary steps.
Resetting the VCPU state modifies the system register state in memory,
but this may interact with vcpu_load/vcpu_put if running with preemption
disabled, which in turn may lead to corrupted system register state.
Address this by disabling preemption and doing put/load if required
around the reset logic.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
means that hypervisor will not receive masked interrupts while running a
guest.
We need to make sure that all maskable interrupts are masked from the
time we call local_irq_disable() in the main run loop, and remain so
until we call local_irq_enable() after returning from the guest, and we
need to ensure that we see no interrupts at all (including pseudo-NMIs)
in the middle of the VM world-switch, while at the same time we need to
ensure we exit the guest when there are interrupts for the host.
We can accomplish this with pseudo-NMIs enabled by:
(1) local_irq_disable: set the priority mask
(2) enter guest: set PSTATE.I
(3) clear the priority mask
(4) eret to guest
(5) exit guest: set the priotiy mask
clear PSTATE.I (and restore other host PSTATE bits)
(6) local_irq_enable: clear the priority mask.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently hide the LORegion feature, and generate an UNDEF
if the guest dares using the corresponding registers. This is
a bit extreme, as ARMv8.1 guarantees the feature to be present.
The guest should check the feature register before doing anything,
but we could also give the guest some slack (read "allow the
guest to be a bit stupid").
So instead of unconditionnaly deliver an exception, let's
only do it when the host doesn't support LORegion at all (or
when the feature has been sanitized out), and treat the registers
as RAZ/WI otherwise (with the exception of LORID_EL1 being RO).
Fixes: cc33c4e201 ("arm64/kvm: Prohibit guest LOR accesses")
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
- support -y option for merge_config.sh to avoid downgrading =y to =m
- remove S_OTHER symbol type, and touch include/config/*.h files correctly
- fix file name and line number in lexer warnings
- fix memory leak when EOF is encountered in quotation
- resolve all shift/reduce conflicts of the parser
- warn no new line at end of file
- make 'source' statement more strict to take only string literal
- rewrite the lexer and remove the keyword lookup table
- convert to SPDX License Identifier
- compile C files independently instead of including them from zconf.y
- fix various warnings of gconfig
- misc cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJcJieuAAoJED2LAQed4NsGHlIP/1s0fQ86XD9dIMyHzAO0gh2f
7rylfe2kEXJgIzJ0DyZdLu4iZtwbkEUqTQrRS1abriNGVemPkfBAnZdM5d92lOQX
3iREa700AJ2xo7V7gYZ6AbhZoG3p0S9U9Q2qE5S+tFTe8c2Gy4xtjnODF+Vel85r
S0P8tF5sE1/d00lm+yfMI/CJVfDjyNaMm+aVEnL0kZTPiRkaktjWgo6Fc2p4z1L5
HFmMMP6/iaXmRZ+tHJGPQ2AT70GFVZw5ePxPcl50EotUP25KHbuUdzs8wDpYm3U/
rcESVsIFpgqHWmTsdBk6dZk0q8yFZNkMlkaP/aYukVZpUn/N6oAXgTFckYl8dmQL
fQBkQi6DTfr9EBPVbj18BKm7xI3Y4DdQ2fzTfYkJ2XwNRGFA5r9N3sjd7ZTVGjxC
aeeMHCwvGdSx1x8PeZAhZfsUHW8xVDMSQiT713+ljBY+6cwzA+2NF0kP7B6OAqwr
ETFzd4Xu2/lZcL7gQRH8WU3L2S5iedmDG6RnZgJMXI0/9V4qAA+nlsWaCgnl1TgA
mpxYlLUMrd6AUJevE34FlnyFdk8IMn9iKRFsvF0f3doO5C7QzTVGqFdJu5a0CuWO
4NBJvZjFT8/4amoWLfnDlfApWXzTfwLbKG+r6V2F30fLuXpYg5LxWhBoGRPYLZSq
oi4xN1Mpx3TvXz6WcKVZ
=r3Fl
-----END PGP SIGNATURE-----
Merge tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig updates from Masahiro Yamada:
- support -y option for merge_config.sh to avoid downgrading =y to =m
- remove S_OTHER symbol type, and touch include/config/*.h files correctly
- fix file name and line number in lexer warnings
- fix memory leak when EOF is encountered in quotation
- resolve all shift/reduce conflicts of the parser
- warn no new line at end of file
- make 'source' statement more strict to take only string literal
- rewrite the lexer and remove the keyword lookup table
- convert to SPDX License Identifier
- compile C files independently instead of including them from zconf.y
- fix various warnings of gconfig
- misc cleanups
* tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning
kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings
kconfig: add static qualifiers to fix gconf warnings
kconfig: split the lexer out of zconf.y
kconfig: split some C files out of zconf.y
kconfig: convert to SPDX License Identifier
kconfig: remove keyword lookup table entirely
kconfig: update current_pos in the second lexer
kconfig: switch to ASSIGN_VAL state in the second lexer
kconfig: stop associating kconf_id with yylval
kconfig: refactor end token rules
kconfig: stop supporting '.' and '/' in unquoted words
treewide: surround Kconfig file paths with double quotes
microblaze: surround string default in Kconfig with double quotes
kconfig: use T_WORD instead of T_VARIABLE for variables
kconfig: use specific tokens instead of T_ASSIGN for assignments
kconfig: refactor scanning and parsing "option" properties
kconfig: use distinct tokens for type and default properties
kconfig: remove redundant token defines
kconfig: rename depends_list to comment_option_list
...
single-stepping fixes, improved tracing, various timer and vGIC
fixes
* x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
enlightenments (direct mode for synthetic timers)
* PPC: nested VFIO
* s390: bugfixes only this time
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
=b4Jx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- selftests improvements
- large PUD support for HugeTLB
- single-stepping fixes
- improved tracing
- various timer and vGIC fixes
x86:
- Processor Tracing virtualization
- STIBP support
- some correctness fixes
- refactorings and splitting of vmx.c
- use the Hyper-V range TLB flush hypercall
- reduce order of vcpu struct
- WBNOINVD support
- do not use -ftrace for __noclone functions
- nested guest support for PAUSE filtering on AMD
- more Hyper-V enlightenments (direct mode for synthetic timers)
PPC:
- nested VFIO
s390:
- bugfixes only this time"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: x86: Add CPUID support for new instruction WBNOINVD
kvm: selftests: ucall: fix exit mmio address guessing
Revert "compiler-gcc: disable -ftracer for __noclone functions"
KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
KVM: Make kvm_set_spte_hva() return int
KVM: Replace old tlb flush function with new one to flush a specified range.
KVM/MMU: Add tlb flush with range helper function
KVM/VMX: Add hv tlb range flush support
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM: x86: Disable Intel PT when VMXON in L1 guest
KVM: x86: Set intercept for Intel PT MSRs read/write
KVM: x86: Implement Intel PT MSRs read/write emulation
...
The Kconfig lexer supports special characters such as '.' and '/' in
the parameter context. In my understanding, the reason is just to
support bare file paths in the source statement.
I do not see a good reason to complicate Kconfig for the room of
ambiguity.
The majority of code already surrounds file paths with double quotes,
and it makes sense since file paths are constant string literals.
Make it treewide consistent now.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
We're pretty blind when it comes to system register tracing,
and rely on the ESR value displayed by kvm_handle_sys, which
isn't much.
Instead, let's add an actual name to the sysreg entries, so that
we can finally print it as we're about to perform the access
itself.
The new tracepoint is conveniently called kvm_sys_access.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
vcpu_read_sys_reg should not be modifying the VCPU structure.
Eventually, to handle EL2 sysregs for nested virtualization, we will
call vcpu_read_sys_reg from places that have a const vcpu pointer, which
will complain about the lack of the const modifier on the read path.
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When KVM traps an unhandled sysreg/coproc access from a guest, it logs
the guest PC. To aid debugging, it would be helpful to know which
exception level the trap came from, along with other PSTATE/CPSR bits,
so let's log the PSTATE/CPSR too.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.
We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.
Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In subsequent patches we're going to expose ptrauth to the host kernel
and userspace, but things are a bit trickier for guest kernels. For the
time being, let's hide ptrauth from KVM guests.
Regardless of how well-behaved the guest kernel is, guest userspace
could attempt to use ptrauth instructions, triggering a trap to EL2,
resulting in noise from kvm_handle_unknown_ec(). So let's write up a
handler for the PAC trap, which silently injects an UNDEF into the
guest, as if the feature were really missing.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Will Deacon <will.deacon@arm.com>