mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
60ca05c3b4
Fix small typo (wiil -> will) in the "3.4. Nested virtual machines"
section.
Fixes: 5b76a3cff0
("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Cc: linux-kernel@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: trivial@kernel.org
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
611 lines
24 KiB
ReStructuredText
611 lines
24 KiB
ReStructuredText
L1TF - L1 Terminal Fault
|
|
========================
|
|
|
|
L1 Terminal Fault is a hardware vulnerability which allows unprivileged
|
|
speculative access to data which is available in the Level 1 Data Cache
|
|
when the page table entry controlling the virtual address, which is used
|
|
for the access, has the Present bit cleared or other reserved bits set.
|
|
|
|
Affected processors
|
|
-------------------
|
|
|
|
This vulnerability affects a wide range of Intel processors. The
|
|
vulnerability is not present on:
|
|
|
|
- Processors from AMD, Centaur and other non Intel vendors
|
|
|
|
- Older processor models, where the CPU family is < 6
|
|
|
|
- A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
|
|
Penwell, Pineview, Silvermont, Airmont, Merrifield)
|
|
|
|
- The Intel XEON PHI family
|
|
|
|
- Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
|
|
IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
|
|
by the Meltdown vulnerability either. These CPUs should become
|
|
available by end of 2018.
|
|
|
|
Whether a processor is affected or not can be read out from the L1TF
|
|
vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
|
|
|
|
Related CVEs
|
|
------------
|
|
|
|
The following CVE entries are related to the L1TF vulnerability:
|
|
|
|
============= ================= ==============================
|
|
CVE-2018-3615 L1 Terminal Fault SGX related aspects
|
|
CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
|
|
CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
|
|
============= ================= ==============================
|
|
|
|
Problem
|
|
-------
|
|
|
|
If an instruction accesses a virtual address for which the relevant page
|
|
table entry (PTE) has the Present bit cleared or other reserved bits set,
|
|
then speculative execution ignores the invalid PTE and loads the referenced
|
|
data if it is present in the Level 1 Data Cache, as if the page referenced
|
|
by the address bits in the PTE was still present and accessible.
|
|
|
|
While this is a purely speculative mechanism and the instruction will raise
|
|
a page fault when it is retired eventually, the pure act of loading the
|
|
data and making it available to other speculative instructions opens up the
|
|
opportunity for side channel attacks to unprivileged malicious code,
|
|
similar to the Meltdown attack.
|
|
|
|
While Meltdown breaks the user space to kernel space protection, L1TF
|
|
allows to attack any physical memory address in the system and the attack
|
|
works across all protection domains. It allows an attack of SGX and also
|
|
works from inside virtual machines because the speculation bypasses the
|
|
extended page table (EPT) protection mechanism.
|
|
|
|
|
|
Attack scenarios
|
|
----------------
|
|
|
|
1. Malicious user space
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Operating Systems store arbitrary information in the address bits of a
|
|
PTE which is marked non present. This allows a malicious user space
|
|
application to attack the physical memory to which these PTEs resolve.
|
|
In some cases user-space can maliciously influence the information
|
|
encoded in the address bits of the PTE, thus making attacks more
|
|
deterministic and more practical.
|
|
|
|
The Linux kernel contains a mitigation for this attack vector, PTE
|
|
inversion, which is permanently enabled and has no performance
|
|
impact. The kernel ensures that the address bits of PTEs, which are not
|
|
marked present, never point to cacheable physical memory space.
|
|
|
|
A system with an up to date kernel is protected against attacks from
|
|
malicious user space applications.
|
|
|
|
2. Malicious guest in a virtual machine
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The fact that L1TF breaks all domain protections allows malicious guest
|
|
OSes, which can control the PTEs directly, and malicious guest user
|
|
space applications, which run on an unprotected guest kernel lacking the
|
|
PTE inversion mitigation for L1TF, to attack physical host memory.
|
|
|
|
A special aspect of L1TF in the context of virtualization is symmetric
|
|
multi threading (SMT). The Intel implementation of SMT is called
|
|
HyperThreading. The fact that Hyperthreads on the affected processors
|
|
share the L1 Data Cache (L1D) is important for this. As the flaw allows
|
|
only to attack data which is present in L1D, a malicious guest running
|
|
on one Hyperthread can attack the data which is brought into the L1D by
|
|
the context which runs on the sibling Hyperthread of the same physical
|
|
core. This context can be host OS, host user space or a different guest.
|
|
|
|
If the processor does not support Extended Page Tables, the attack is
|
|
only possible, when the hypervisor does not sanitize the content of the
|
|
effective (shadow) page tables.
|
|
|
|
While solutions exist to mitigate these attack vectors fully, these
|
|
mitigations are not enabled by default in the Linux kernel because they
|
|
can affect performance significantly. The kernel provides several
|
|
mechanisms which can be utilized to address the problem depending on the
|
|
deployment scenario. The mitigations, their protection scope and impact
|
|
are described in the next sections.
|
|
|
|
The default mitigations and the rationale for choosing them are explained
|
|
at the end of this document. See :ref:`default_mitigations`.
|
|
|
|
.. _l1tf_sys_info:
|
|
|
|
L1TF system information
|
|
-----------------------
|
|
|
|
The Linux kernel provides a sysfs interface to enumerate the current L1TF
|
|
status of the system: whether the system is vulnerable, and which
|
|
mitigations are active. The relevant sysfs file is:
|
|
|
|
/sys/devices/system/cpu/vulnerabilities/l1tf
|
|
|
|
The possible values in this file are:
|
|
|
|
=========================== ===============================
|
|
'Not affected' The processor is not vulnerable
|
|
'Mitigation: PTE Inversion' The host protection is active
|
|
=========================== ===============================
|
|
|
|
If KVM/VMX is enabled and the processor is vulnerable then the following
|
|
information is appended to the 'Mitigation: PTE Inversion' part:
|
|
|
|
- SMT status:
|
|
|
|
===================== ================
|
|
'VMX: SMT vulnerable' SMT is enabled
|
|
'VMX: SMT disabled' SMT is disabled
|
|
===================== ================
|
|
|
|
- L1D Flush mode:
|
|
|
|
================================ ====================================
|
|
'L1D vulnerable' L1D flushing is disabled
|
|
|
|
'L1D conditional cache flushes' L1D flush is conditionally enabled
|
|
|
|
'L1D cache flushes' L1D flush is unconditionally enabled
|
|
================================ ====================================
|
|
|
|
The resulting grade of protection is discussed in the following sections.
|
|
|
|
|
|
Host mitigation mechanism
|
|
-------------------------
|
|
|
|
The kernel is unconditionally protected against L1TF attacks from malicious
|
|
user space running on the host.
|
|
|
|
|
|
Guest mitigation mechanisms
|
|
---------------------------
|
|
|
|
.. _l1d_flush:
|
|
|
|
1. L1D flush on VMENTER
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
To make sure that a guest cannot attack data which is present in the L1D
|
|
the hypervisor flushes the L1D before entering the guest.
|
|
|
|
Flushing the L1D evicts not only the data which should not be accessed
|
|
by a potentially malicious guest, it also flushes the guest
|
|
data. Flushing the L1D has a performance impact as the processor has to
|
|
bring the flushed guest data back into the L1D. Depending on the
|
|
frequency of VMEXIT/VMENTER and the type of computations in the guest
|
|
performance degradation in the range of 1% to 50% has been observed. For
|
|
scenarios where guest VMEXIT/VMENTER are rare the performance impact is
|
|
minimal. Virtio and mechanisms like posted interrupts are designed to
|
|
confine the VMEXITs to a bare minimum, but specific configurations and
|
|
application scenarios might still suffer from a high VMEXIT rate.
|
|
|
|
The kernel provides two L1D flush modes:
|
|
- conditional ('cond')
|
|
- unconditional ('always')
|
|
|
|
The conditional mode avoids L1D flushing after VMEXITs which execute
|
|
only audited code paths before the corresponding VMENTER. These code
|
|
paths have been verified that they cannot expose secrets or other
|
|
interesting data to an attacker, but they can leak information about the
|
|
address space layout of the hypervisor.
|
|
|
|
Unconditional mode flushes L1D on all VMENTER invocations and provides
|
|
maximum protection. It has a higher overhead than the conditional
|
|
mode. The overhead cannot be quantified correctly as it depends on the
|
|
workload scenario and the resulting number of VMEXITs.
|
|
|
|
The general recommendation is to enable L1D flush on VMENTER. The kernel
|
|
defaults to conditional mode on affected processors.
|
|
|
|
**Note**, that L1D flush does not prevent the SMT problem because the
|
|
sibling thread will also bring back its data into the L1D which makes it
|
|
attackable again.
|
|
|
|
L1D flush can be controlled by the administrator via the kernel command
|
|
line and sysfs control files. See :ref:`mitigation_control_command_line`
|
|
and :ref:`mitigation_control_kvm`.
|
|
|
|
.. _guest_confinement:
|
|
|
|
2. Guest VCPU confinement to dedicated physical cores
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
To address the SMT problem, it is possible to make a guest or a group of
|
|
guests affine to one or more physical cores. The proper mechanism for
|
|
that is to utilize exclusive cpusets to ensure that no other guest or
|
|
host tasks can run on these cores.
|
|
|
|
If only a single guest or related guests run on sibling SMT threads on
|
|
the same physical core then they can only attack their own memory and
|
|
restricted parts of the host memory.
|
|
|
|
Host memory is attackable, when one of the sibling SMT threads runs in
|
|
host OS (hypervisor) context and the other in guest context. The amount
|
|
of valuable information from the host OS context depends on the context
|
|
which the host OS executes, i.e. interrupts, soft interrupts and kernel
|
|
threads. The amount of valuable data from these contexts cannot be
|
|
declared as non-interesting for an attacker without deep inspection of
|
|
the code.
|
|
|
|
**Note**, that assigning guests to a fixed set of physical cores affects
|
|
the ability of the scheduler to do load balancing and might have
|
|
negative effects on CPU utilization depending on the hosting
|
|
scenario. Disabling SMT might be a viable alternative for particular
|
|
scenarios.
|
|
|
|
For further information about confining guests to a single or to a group
|
|
of cores consult the cpusets documentation:
|
|
|
|
https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
|
|
|
|
.. _interrupt_isolation:
|
|
|
|
3. Interrupt affinity
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Interrupts can be made affine to logical CPUs. This is not universally
|
|
true because there are types of interrupts which are truly per CPU
|
|
interrupts, e.g. the local timer interrupt. Aside of that multi queue
|
|
devices affine their interrupts to single CPUs or groups of CPUs per
|
|
queue without allowing the administrator to control the affinities.
|
|
|
|
Moving the interrupts, which can be affinity controlled, away from CPUs
|
|
which run untrusted guests, reduces the attack vector space.
|
|
|
|
Whether the interrupts with are affine to CPUs, which run untrusted
|
|
guests, provide interesting data for an attacker depends on the system
|
|
configuration and the scenarios which run on the system. While for some
|
|
of the interrupts it can be assumed that they won't expose interesting
|
|
information beyond exposing hints about the host OS memory layout, there
|
|
is no way to make general assumptions.
|
|
|
|
Interrupt affinity can be controlled by the administrator via the
|
|
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
|
|
available at:
|
|
|
|
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
|
|
|
|
.. _smt_control:
|
|
|
|
4. SMT control
|
|
^^^^^^^^^^^^^^
|
|
|
|
To prevent the SMT issues of L1TF it might be necessary to disable SMT
|
|
completely. Disabling SMT can have a significant performance impact, but
|
|
the impact depends on the hosting scenario and the type of workloads.
|
|
The impact of disabling SMT needs also to be weighted against the impact
|
|
of other mitigation solutions like confining guests to dedicated cores.
|
|
|
|
The kernel provides a sysfs interface to retrieve the status of SMT and
|
|
to control it. It also provides a kernel command line interface to
|
|
control SMT.
|
|
|
|
The kernel command line interface consists of the following options:
|
|
|
|
=========== ==========================================================
|
|
nosmt Affects the bring up of the secondary CPUs during boot. The
|
|
kernel tries to bring all present CPUs online during the
|
|
boot process. "nosmt" makes sure that from each physical
|
|
core only one - the so called primary (hyper) thread is
|
|
activated. Due to a design flaw of Intel processors related
|
|
to Machine Check Exceptions the non primary siblings have
|
|
to be brought up at least partially and are then shut down
|
|
again. "nosmt" can be undone via the sysfs interface.
|
|
|
|
nosmt=force Has the same effect as "nosmt" but it does not allow to
|
|
undo the SMT disable via the sysfs interface.
|
|
=========== ==========================================================
|
|
|
|
The sysfs interface provides two files:
|
|
|
|
- /sys/devices/system/cpu/smt/control
|
|
- /sys/devices/system/cpu/smt/active
|
|
|
|
/sys/devices/system/cpu/smt/control:
|
|
|
|
This file allows to read out the SMT control state and provides the
|
|
ability to disable or (re)enable SMT. The possible states are:
|
|
|
|
============== ===================================================
|
|
on SMT is supported by the CPU and enabled. All
|
|
logical CPUs can be onlined and offlined without
|
|
restrictions.
|
|
|
|
off SMT is supported by the CPU and disabled. Only
|
|
the so called primary SMT threads can be onlined
|
|
and offlined without restrictions. An attempt to
|
|
online a non-primary sibling is rejected
|
|
|
|
forceoff Same as 'off' but the state cannot be controlled.
|
|
Attempts to write to the control file are rejected.
|
|
|
|
notsupported The processor does not support SMT. It's therefore
|
|
not affected by the SMT implications of L1TF.
|
|
Attempts to write to the control file are rejected.
|
|
============== ===================================================
|
|
|
|
The possible states which can be written into this file to control SMT
|
|
state are:
|
|
|
|
- on
|
|
- off
|
|
- forceoff
|
|
|
|
/sys/devices/system/cpu/smt/active:
|
|
|
|
This file reports whether SMT is enabled and active, i.e. if on any
|
|
physical core two or more sibling threads are online.
|
|
|
|
SMT control is also possible at boot time via the l1tf kernel command
|
|
line parameter in combination with L1D flush control. See
|
|
:ref:`mitigation_control_command_line`.
|
|
|
|
5. Disabling EPT
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Disabling EPT for virtual machines provides full mitigation for L1TF even
|
|
with SMT enabled, because the effective page tables for guests are
|
|
managed and sanitized by the hypervisor. Though disabling EPT has a
|
|
significant performance impact especially when the Meltdown mitigation
|
|
KPTI is enabled.
|
|
|
|
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
|
|
|
|
There is ongoing research and development for new mitigation mechanisms to
|
|
address the performance impact of disabling SMT or EPT.
|
|
|
|
.. _mitigation_control_command_line:
|
|
|
|
Mitigation control on the kernel command line
|
|
---------------------------------------------
|
|
|
|
The kernel command line allows to control the L1TF mitigations at boot
|
|
time with the option "l1tf=". The valid arguments for this option are:
|
|
|
|
============ =============================================================
|
|
full Provides all available mitigations for the L1TF
|
|
vulnerability. Disables SMT and enables all mitigations in
|
|
the hypervisors, i.e. unconditional L1D flushing
|
|
|
|
SMT control and L1D flush control via the sysfs interface
|
|
is still possible after boot. Hypervisors will issue a
|
|
warning when the first VM is started in a potentially
|
|
insecure configuration, i.e. SMT enabled or L1D flush
|
|
disabled.
|
|
|
|
full,force Same as 'full', but disables SMT and L1D flush runtime
|
|
control. Implies the 'nosmt=force' command line option.
|
|
(i.e. sysfs control of SMT is disabled.)
|
|
|
|
flush Leaves SMT enabled and enables the default hypervisor
|
|
mitigation, i.e. conditional L1D flushing
|
|
|
|
SMT control and L1D flush control via the sysfs interface
|
|
is still possible after boot. Hypervisors will issue a
|
|
warning when the first VM is started in a potentially
|
|
insecure configuration, i.e. SMT enabled or L1D flush
|
|
disabled.
|
|
|
|
flush,nosmt Disables SMT and enables the default hypervisor mitigation,
|
|
i.e. conditional L1D flushing.
|
|
|
|
SMT control and L1D flush control via the sysfs interface
|
|
is still possible after boot. Hypervisors will issue a
|
|
warning when the first VM is started in a potentially
|
|
insecure configuration, i.e. SMT enabled or L1D flush
|
|
disabled.
|
|
|
|
flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is
|
|
started in a potentially insecure configuration.
|
|
|
|
off Disables hypervisor mitigations and doesn't emit any
|
|
warnings.
|
|
============ =============================================================
|
|
|
|
The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
|
|
|
|
|
|
.. _mitigation_control_kvm:
|
|
|
|
Mitigation control for KVM - module parameter
|
|
-------------------------------------------------------------
|
|
|
|
The KVM hypervisor mitigation mechanism, flushing the L1D cache when
|
|
entering a guest, can be controlled with a module parameter.
|
|
|
|
The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
|
|
following arguments:
|
|
|
|
============ ==============================================================
|
|
always L1D cache flush on every VMENTER.
|
|
|
|
cond Flush L1D on VMENTER only when the code between VMEXIT and
|
|
VMENTER can leak host memory which is considered
|
|
interesting for an attacker. This still can leak host memory
|
|
which allows e.g. to determine the hosts address space layout.
|
|
|
|
never Disables the mitigation
|
|
============ ==============================================================
|
|
|
|
The parameter can be provided on the kernel command line, as a module
|
|
parameter when loading the modules and at runtime modified via the sysfs
|
|
file:
|
|
|
|
/sys/module/kvm_intel/parameters/vmentry_l1d_flush
|
|
|
|
The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
|
|
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
|
|
module parameter is ignored and writes to the sysfs file are rejected.
|
|
|
|
|
|
Mitigation selection guide
|
|
--------------------------
|
|
|
|
1. No virtualization in use
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The system is protected by the kernel unconditionally and no further
|
|
action is required.
|
|
|
|
2. Virtualization with trusted guests
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If the guest comes from a trusted source and the guest OS kernel is
|
|
guaranteed to have the L1TF mitigations in place the system is fully
|
|
protected against L1TF and no further action is required.
|
|
|
|
To avoid the overhead of the default L1D flushing on VMENTER the
|
|
administrator can disable the flushing via the kernel command line and
|
|
sysfs control files. See :ref:`mitigation_control_command_line` and
|
|
:ref:`mitigation_control_kvm`.
|
|
|
|
|
|
3. Virtualization with untrusted guests
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
3.1. SMT not supported or disabled
|
|
""""""""""""""""""""""""""""""""""
|
|
|
|
If SMT is not supported by the processor or disabled in the BIOS or by
|
|
the kernel, it's only required to enforce L1D flushing on VMENTER.
|
|
|
|
Conditional L1D flushing is the default behaviour and can be tuned. See
|
|
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
|
|
|
|
3.2. EPT not supported or disabled
|
|
""""""""""""""""""""""""""""""""""
|
|
|
|
If EPT is not supported by the processor or disabled in the hypervisor,
|
|
the system is fully protected. SMT can stay enabled and L1D flushing on
|
|
VMENTER is not required.
|
|
|
|
EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
|
|
|
|
3.3. SMT and EPT supported and active
|
|
"""""""""""""""""""""""""""""""""""""
|
|
|
|
If SMT and EPT are supported and active then various degrees of
|
|
mitigations can be employed:
|
|
|
|
- L1D flushing on VMENTER:
|
|
|
|
L1D flushing on VMENTER is the minimal protection requirement, but it
|
|
is only potent in combination with other mitigation methods.
|
|
|
|
Conditional L1D flushing is the default behaviour and can be tuned. See
|
|
:ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
|
|
|
|
- Guest confinement:
|
|
|
|
Confinement of guests to a single or a group of physical cores which
|
|
are not running any other processes, can reduce the attack surface
|
|
significantly, but interrupts, soft interrupts and kernel threads can
|
|
still expose valuable data to a potential attacker. See
|
|
:ref:`guest_confinement`.
|
|
|
|
- Interrupt isolation:
|
|
|
|
Isolating the guest CPUs from interrupts can reduce the attack surface
|
|
further, but still allows a malicious guest to explore a limited amount
|
|
of host physical memory. This can at least be used to gain knowledge
|
|
about the host address space layout. The interrupts which have a fixed
|
|
affinity to the CPUs which run the untrusted guests can depending on
|
|
the scenario still trigger soft interrupts and schedule kernel threads
|
|
which might expose valuable information. See
|
|
:ref:`interrupt_isolation`.
|
|
|
|
The above three mitigation methods combined can provide protection to a
|
|
certain degree, but the risk of the remaining attack surface has to be
|
|
carefully analyzed. For full protection the following methods are
|
|
available:
|
|
|
|
- Disabling SMT:
|
|
|
|
Disabling SMT and enforcing the L1D flushing provides the maximum
|
|
amount of protection. This mitigation is not depending on any of the
|
|
above mitigation methods.
|
|
|
|
SMT control and L1D flushing can be tuned by the command line
|
|
parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
|
|
time with the matching sysfs control files. See :ref:`smt_control`,
|
|
:ref:`mitigation_control_command_line` and
|
|
:ref:`mitigation_control_kvm`.
|
|
|
|
- Disabling EPT:
|
|
|
|
Disabling EPT provides the maximum amount of protection as well. It is
|
|
not depending on any of the above mitigation methods. SMT can stay
|
|
enabled and L1D flushing is not required, but the performance impact is
|
|
significant.
|
|
|
|
EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
|
|
parameter.
|
|
|
|
3.4. Nested virtual machines
|
|
""""""""""""""""""""""""""""
|
|
|
|
When nested virtualization is in use, three operating systems are involved:
|
|
the bare metal hypervisor, the nested hypervisor and the nested virtual
|
|
machine. VMENTER operations from the nested hypervisor into the nested
|
|
guest will always be processed by the bare metal hypervisor. If KVM is the
|
|
bare metal hypervisor it will:
|
|
|
|
- Flush the L1D cache on every switch from the nested hypervisor to the
|
|
nested virtual machine, so that the nested hypervisor's secrets are not
|
|
exposed to the nested virtual machine;
|
|
|
|
- Flush the L1D cache on every switch from the nested virtual machine to
|
|
the nested hypervisor; this is a complex operation, and flushing the L1D
|
|
cache avoids that the bare metal hypervisor's secrets are exposed to the
|
|
nested virtual machine;
|
|
|
|
- Instruct the nested hypervisor to not perform any L1D cache flush. This
|
|
is an optimization to avoid double L1D flushing.
|
|
|
|
|
|
.. _default_mitigations:
|
|
|
|
Default mitigations
|
|
-------------------
|
|
|
|
The kernel default mitigations for vulnerable processors are:
|
|
|
|
- PTE inversion to protect against malicious user space. This is done
|
|
unconditionally and cannot be controlled.
|
|
|
|
- L1D conditional flushing on VMENTER when EPT is enabled for
|
|
a guest.
|
|
|
|
The kernel does not by default enforce the disabling of SMT, which leaves
|
|
SMT systems vulnerable when running untrusted guests with EPT enabled.
|
|
|
|
The rationale for this choice is:
|
|
|
|
- Force disabling SMT can break existing setups, especially with
|
|
unattended updates.
|
|
|
|
- If regular users run untrusted guests on their machine, then L1TF is
|
|
just an add on to other malware which might be embedded in an untrusted
|
|
guest, e.g. spam-bots or attacks on the local network.
|
|
|
|
There is no technical way to prevent a user from running untrusted code
|
|
on their machines blindly.
|
|
|
|
- It's technically extremely unlikely and from today's knowledge even
|
|
impossible that L1TF can be exploited via the most popular attack
|
|
mechanisms like JavaScript because these mechanisms have no way to
|
|
control PTEs. If this would be possible and not other mitigation would
|
|
be possible, then the default might be different.
|
|
|
|
- The administrators of cloud and hosting setups have to carefully
|
|
analyze the risk for their scenarios and make the appropriate
|
|
mitigation choices, which might even vary across their deployed
|
|
machines and also result in other changes of their overall setup.
|
|
There is no way for the kernel to provide a sensible default for this
|
|
kind of scenarios.
|