When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.
Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.
Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.
With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).
POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.
Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to allocate free SLB entries first, before resorting to the round
robin allocator.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
User SLB mappig data is copied into the PACA from the mm->context so
it can be accessed by the SLB miss handlers.
After the C conversion, SLB miss handlers now run with relocation on,
and user SLB misses are able to take recursive kernel SLB misses, so
the user SLB mapping data can be removed from the paca and accessed
directly.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.
This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.
Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).
User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.
[ Credit to Aneesh for bug fixes, error checks, and improvements to bad
address handling, etc ]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Since RFC:
- Added MSR[RI] handling
- Fixed up a register loss bug exposed by irq tracing (Aneesh)
- Reject misses outside the defined kernel regions (Aneesh)
- Added several more sanity checks and error handling (Aneesh), we may
look at consolidating these tests and tightenig up the code but for
a first pass we decided it's better to check carefully.
Since v1:
- Fixed SLB cache corruption (Aneesh)
- Fixed untidy SLBE allocation "leak" in get_vsid error case
- Now survives some stress testing on real hardware
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and
associated lookaside information that have a class value of 1, which
Linux assigns to user addresses. This matches what switch_slb wants,
and allows a simple fast implementation that avoids the slb_cache
complexity.
As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is
also avoided on POWER9.
Process context switching rate is improved about 2.2% for a small
process that hits the slb cache which is the best case for the current
code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The SLBIA IH=1 hint will remove all non-zero SLBEs, but only
invalidate ERAT entries associated with a class value of 1, for
processors that support the hint (e.g., POWER6 and newer), which
Linux assigns to user addresses.
This prevents kernel ERAT entries from being invalidated when
context switchig (if the thread faulted in more than 8 user SLBEs).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove the vmalloc segment from bolted SLBEs. This is not required to
be bolted, and seems like it was added to help pre-load the SLB on
context switch. However there are now other segments like the vmemmap
segment and non-zero node memory that often take misses after a context
switch, so it is better to solve this in a more general way.
A subsequent change will track free SLB entries and uses those rather
than round-robin overwrite valid entries, which makes it far less
likely for kernel SLBEs to be evicted after they are installed.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The POWER5 < DD2.1 issue is that slbie needs to be issued more than
once. It came in with this change:
ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
[PATCH] POWER5 erratum workaround
Early POWER5 revisions (<DD2.1) have a problem requiring slbie
instructions to be repeated under some circumstances. The patch below
adds a workaround (patch made by Anton Blanchard).
(aka. 3e4520f7605243abf66a7ccd3d2e49e48e8c0483 in the full history tree)
The extra slbie in switch_slb is done even for the case where slbia is
called (slb_flush_and_rebolt). I don't believe that is required
because there are other slb_flush_and_rebolt callers which do not
issue the workaround slbie, which would be broken if it was required.
It also seems to be fine inside the isync with the first slbie, as it
is in the kernel stack switch code.
So move this workaround to where it is required. This is not much of
an optimisation because this is the fast path, but it makes the code
more understandable and neater.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain slbie_data initialisation to avoid compiler warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I only have POWER8/9 to test, so just remove it for those.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This causes SLB alloation to start 1 beyond the start of the SLB.
There is no real problem because after it wraps it stats behaving
properly, it's just surprisig to see when looking at SLB traces.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that other platforms also implements real mode mce handler,
lets consolidate the code by sharing existing powernv machine check
early code. Rename machine_check_powernv_early to
machine_check_common_early and reuse the code.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Extract the MCE error details from RTAS extended log and display it to
console.
With this patch you should now see mce logs like below:
[ 142.371818] Severe Machine check interrupt [Recovered]
[ 142.371822] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[ 142.371822] Initiator: CPU
[ 142.371823] Error type: SLB [Multihit]
[ 142.371824] Effective address: d00000000ca70000
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries, as of today system crashes if we get a machine check
exceptions due to SLB errors. These are soft errors and can be fixed
by flushing the SLBs so the kernel can continue to function instead of
system crash. We do this in real mode before turning on MMU. Otherwise
we would run into nested machine checks. This patch now fetches the
rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check
if it is not NULL.
Just moving the dereference code to after the check, where there will
be guarantee that 'tbl' will not be NULL.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Function xive_native_get_ipi() might use chip_id without it being
initialized, if the CPU node is not found, as reported by smatch:
error: uninitialized symbol 'chip_id'
As suggested by Cédric, we can use xc->chip_id instead of consulting
the device tree for chip id, which is safe since xive_prepare_cpu()
should have initialized ->chip_id by the time xive_native_get_ipi() is
called.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[mpe: Tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When hot-removing memory release_mem_region_adjustable() splits iomem
resources if they are not the exact size of the memory being
hot-deleted. Adding this memory back to the kernel adds a new resource.
Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from
0xf40000000 results in the single resource 0x0-0xfffffffff being split
into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff.
When we hot-add the memory back we now have three resources:
0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff.
This is an issue if we try to remove some memory that overlaps
resources. Eg when trying to remove 2GB at address 0xf40000000,
release_mem_region_adjustable() fails as it expects the chunk of memory
to be within the boundaries of a single resource. We then get the
warning: "Unable to release resource" and attempting to use memtrace
again gives us this error: "bash: echo: write error: Resource
temporarily unavailable"
This patch makes memtrace remove memory in chunks that are always the
same size from an address that is always equal to end_of_memory -
n*size, for some n. So hotremoving and hotadding memory of different
sizes will now not attempt to remove memory that spans multiple
resources.
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Static branch hints override dynamic branch prediction on recent
POWER CPUs. We should only use them when we are overwhelmingly
sure of the direction.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This hypervisor's call allows to remove up to 8 ptes with only call to
tlbie.
The virtual pages must be all within the same naturally aligned 8 pages
virtual address block and have the same page and segment size encodings.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This part of code will be called also when dealing with H_BLOCK_REMOVE.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This feature tells if the hcall H_BLOCK_REMOVE is available.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch simply fix part of the documentation on the HTM code.
This fixes reference to old fields that were renamed in commit
000ec280e3 ("powerpc: tm: Rename transct_(*) to ck(\1)_state")
It also documents better the flow after commit eb5c3f1c86 ("powerpc:
Always save/restore checkpointed regs during treclaim/trecheckpoint"),
where tm_recheckpoint can recheckpoint what is in ck{fp,vr}_state
blindly.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Deciding wich govenors should be built into the kernel can be left to
users to configure.
Fixes: 81f359027a ("cpufreq: powernv: Select CPUFreq related Kconfig options for powernv")
Signed-off-by: Joel Stanley <joel@jms.id.au>
[mpe: Update powernv/ppc64 defconfigs to enable them by default]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Call Frame Information is used by gdb for back-traces and inserting
breakpoints on function return for the "finish" command. This failed
when inside __kernel_clock_gettime. More concerning than difficulty
debugging is that CFI is also used by stack frame unwinding code to
implement exceptions. If you have an app that needs to handle
asynchronous exceptions for some reason, and you are unlucky enough to
get one inside the VDSO time functions, your app will crash.
What's wrong: There is control flow in __kernel_clock_gettime that
reaches label 99 without saving lr in r12. CFI info however is
interpreted by the unwinder without reference to control flow: It's a
simple matter of "Execute all the CFI opcodes up to the current
address". That means the unwinder thinks r12 contains the return
address at label 99. Disabuse it of that notion by resetting CFI for
the return address at label 99.
Note that the ".cfi_restore lr" could have gone anywhere from the
"mtlr r12" a few instructions earlier to the instruction at label 99.
I put the CFI as late as possible, because in general that's best
practice (and if possible grouped with other CFI in order to reduce
the number of CFI opcodes executed when unwinding). Using r12 as the
return address is perfectly fine after the "mtlr r12" since r12 on
that code path still contains the return address.
__get_datapage also has a CFI error. That function temporarily saves
lr in r0, and reflects that fact with ".cfi_register lr,r0". A later
use of r0 means the CFI at that point isn't correct, as r0 no longer
contains the return address. Fix that too.
Signed-off-by: Alan Modra <amodra@gmail.com>
Tested-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently on P9N DD2.1 we end up taking infinite TM facility
unavailable exceptions on the first TM usage by userspace.
In the special case of TM no suspend (P9N DD2.1), Linux is told TM is
off via CPU dt-ftrs but told to (partially) use it via
OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from
dt-ftrs but we need to turn it on for the no suspend case.
This patch fixes this by enabling HFSCR TM in this case.
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Prevent multiplication result truncation on 32bit. Introduced with
the early timestamp reworrk.
- Ensure microcode revision storage to be consistent under all
circumstances
- Prevent write tearing of PTEs
- Prevent confusion of user and kernel reegisters when dumping fatal
signals verbosely
- Make an error return value in a failure path of the vector
allocation negative. Returning EINVAL might the caller assume
success and causes further wreckage.
- A trivial kernel doc warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Use WRITE_ONCE() when setting PTEs
x86/apic/vector: Make error return value negative
x86/process: Don't mix user/kernel regs in 64bit __show_regs()
x86/tsc: Prevent result truncation on 32bit
x86: Fix kernel-doc atomic.h warnings
x86/microcode: Update the new microcode revision unconditionally
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing
guest memory corruption on migration,
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading to
failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls (introduced
in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault
and improve the surrounding emulation code
- Two minor cleanups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJbk5gAAAoJEED/6hsPKofoS0UH/1clCzg/8x3jhpDcKKp6tDm7
9XHOOQ6XmydT0HXYJNqZepGNqU99ip+2u4x8E9LCT5MTvTMZ1BcNM6PmenjJVULY
GMJtwZhjqoklrOcNkXGqIye4Ec+I0pBuMmt0AN0N85CcHO8VUBpMzsdxgJLuxcRm
UT6OZnCLyJsock6BqkZmqVsJj/gemFnI9MpudnrU8cCFk60roXmQWJ66fMIFfKjt
q0R61t8nmbapQKE8pjqBNgbCsuotVOtU1zgMkeM5LkaYEfc65ZPdgt3sdpyG8Guq
WA7Vt6HEvmNrcQxHFX5P0GxTVM9lOVCUx1bKXE4+57CMZOYl/8hDaTudlcacutg=
=FyuN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
s390:
- Fallout from the hugetlbfs support: pfmf interpretion and locking
- VSIE: fix keywrapping for nested guests
PPC:
- Fix a bug where pages might not get marked dirty, causing guest
memory corruption on migration
- Fix a bug causing reads from guest memory to use the wrong guest
real address for very large HPT guests (>256G of memory), leading
to failures in instruction emulation.
x86:
- Fix out of bound access from malicious pv ipi hypercalls
(introduced in rc1)
- Fix delivery of pending interrupts when entering a nested guest,
preventing arbitrarily late injection
- Sanitize kvm_stat output after destroying a guest
- Fix infinite loop when emulating a nested guest page fault and
improve the surrounding emulation code
- Two minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: LAPIC: Fix pv ipis out-of-bounds access
KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2
arm64: KVM: Remove pgd_lock
KVM: Remove obsolete kvm_unmap_hva notifier backend
arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
KVM: s390: Properly lock mm context allow_gmap_hpage_1m setting
KVM: s390: vsie: copy wrapping keys to right place
KVM: s390: Fix pfmf and conditional skey emulation
tools/kvm_stat: re-animate display of dead guests
tools/kvm_stat: indicate dead guests as such
tools/kvm_stat: handle guest removals more gracefully
tools/kvm_stat: don't reset stats when setting PID filter for debugfs
tools/kvm_stat: fix updates for dead guests
tools/kvm_stat: fix handling of invalid paths in debugfs provider
tools/kvm_stat: fix python3 issues
KVM: x86: Unexport x86_emulate_instruction()
KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction()
KVM: x86: Do not re-{try,execute} after failed emulation in L2
KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault
...
A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAluUAp0PHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3+6YP/2T9NuOUTjssbVBho92lF9dV58Y5xOgDv9wX
mFT7gePXovTPQrgrpDi4RWrv0wAkjMa3grJfL2RGZXSZtsgkyHstb3mXf1O6sbnF
Ry1yc4ByJ0+JKJRq2tBxhQmLpBVFNXiav4vhIdPNZRdtZid7WzZaqF0JrCj6iyNf
CDhiGFRAZC9NcaCdOvI0aHFVC47Cp/Uacbh3PzZmdRWJJ2rCGO9X4vwQoMai/1cq
vVuiOBOs2ArXQQvvDoVixb3sCcdblCsDoS57lArJ5jKrHFm8iu6Z2+6UGhi2QEhc
9PKp5tySctWVqitOn0Ueixq+nKCXF3/dVAqjMVViSfC7G0Pt2XIAeqZU+2Ou3Zkj
nFcHqTZAXfSs6I1hnXqJYQ9Me3JzwQ+pRFJY8/+tbq2eGv7eZzUuzUppr13eF62s
NeBzJiGiI7ab9sGJknhmoXVDyuB7ctuZXA8JgO/kZvL8dfuWcF3GNocs2p9916JD
uWGwnfXiTLMhbxKkYrjaOClaVyx2bf996M3Z4NqxBQ9XGNXyh+V/6bzUh9DGPSL0
+9W7YcRFT08v4I1Zh7/P5zXVAOyqj3awWeD6gpg7PAsmKPdN/f17EEqk6KH7rOVZ
Vvw3/w+Ef9u4onGpbpE/IyCco75vXrv1GtkHMX7VlMjLe0eAv5Cpw7UwLDO2tVnu
pEJFkk45
=oZbn
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes who have trickled in:
- MMC bus width fixup for some Allwinner platforms
- Fix for NULL deref in ti-aemif when no platform data is passed in
- Fix div by 0 in SCMI code
- Add a missing module alias in a new RPi driver"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
memory: ti-aemif: fix a potential NULL-pointer dereference
firmware: arm_scmi: fix divide by zero when sustained_perf_level is zero
hwmon: rpi: add module alias to raspberrypi-hwmon
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Just one fix for H6 mmc on the Pine H64: the mmc bus width was missing
from the device tree. This was added in 4.19-rc1.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAluQmz0OHHdlbnNAY3Np
ZS5vcmcACgkQOJpUIZwPJDD01RAApxHO0v7/7y9w/8pGGpjpTYpliF9lndaQYD3o
+Xc/Y7bcy2+Iy4Lz0TbkOObjIVUoLsQcpGKvttHa/gIsjbgd9xBpxd5X2PVBRmWx
/ERA5HdMG4RvznLD3P7X0JOAL/3w1ad/4DarOHOibqYk3KqX+iG6kphIRx326INt
SSqPZNNub/LXmHSUnyprQ+ccfKs87uiy9dT1LrSTxXGjh9tdXXmkGmDCOSX+oCKm
EXeFIK1uTmyGyE8OXa2NbCktwNylw6c4XwcaWLIPQeJTEW6oVh95IkewBphi+nFw
rU82W2aqCGqP2EYHJwzD7zx53V7cGAJVkb/u3ENXSXgE/kyTdmoFukxWRb7upfEb
9bjgQUMQ+6RG1f5lDYIHSVNXdk81AshMc1Y7qKG5EoCfJUIcG0gyyQYpO+lKji7V
nvTeiA0882a/PMYYkGU7vWGD7oIuPHEWEmnSZDWUNsqcKXaX5b3km/BsoLfTii9a
45MDQ9Wo2B26PL6zflN78BrDfuX+UgmX1bbxY0b+rOal4CKuz+VqwEnQIumu1SYE
9GaMHFKGMh2JCQ/U8o4AGdomEUjX79dgZbwz7W4KBnaS7K4iKrQfxcKLFXcXLtI9
EaA4nNsHeIe6ByE5z4FNVUPHEcLkfqlpqdFBRdd/xt+MfDYQaorh73NQfGvN4s0x
3pGu1fI=
=WbHO
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Allwinner fixes for 4.19
Just one fix for H6 mmc on the Pine H64: the mmc bus width was missing
from the device tree. This was added in 4.19-rc1.
* tag 'sunxi-fixes-for-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: allwinner: dts: h6: fix Pine H64 MMC bus width
Signed-off-by: Olof Johansson <olof@lixom.net>
When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.
Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.
I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Fixes: 2db1f959d9 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
- Remove accidental VM_WARN_ON
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbkorgAAoJELescNyEwWM0my8IAKsVsc5heKBeL/0Ep5gfXJLS
H3kjkToFKfOeVADLfZXfTkPzlx9f1NrEP4+b/hQYgqGqXQcvCIwEXzpTMFg4pT4/
ERhYtq9qYBNQmg4AZnTHl2cKSRFt+s7knTZMoTEwNk1NxdBQAtbIZa9HB9Ly2mSn
xK6UP7zsZvRcY02BlyDQ0A/QBjzQAi3I83FRLizxjPYaSUhF0QqhrzTr0ANoKEjv
DnX04nJEMYqLEjSKWTn3rzot2PgLVDcMEjXKwMB3XB6LML3KLRUsvnTpxED5c+dW
tv+wzKKdaFeHWmfFxUgYZXSd4igh0IKf3OZDohRKz+lNOhKrTYUE35dtFyyw04I=
=xYN8
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Just one small fix here, preventing a VM_WARN_ON when a !present
PMD/PUD is "freed" as part of a huge ioremap() operation.
The correct behaviour is to skip the free silently in this case, which
is a little weird (the function is a bit of a misnomer), but it
follows the x86 implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fix erroneous warnings in page freeing functions
Dan Carpenter reported that the untrusted data returns from kvm_register_read()
results in the following static checker warning:
arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
error: buffer underflow 'map->phys_map' 's32min-s32max'
KVM guest can easily trigger this by executing the following assembly sequence
in Ring0:
mov $10, %rax
mov $0xFFFFFFFF, %rbx
mov $0xFFFFFFFF, %rdx
mov $0, %rsi
vmcall
As this will cause KVM to execute the following code-path:
vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
which will reach out-of-bounds access.
This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
ignoring destinations that are not present and delivering the rest. We also check
whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).
Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.
Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbkngmAAoJEEtpOizt6ddyeaoH/15bbGHlwWf23tGjSoDzhyD4
zAXfy+SJdm4cR8K7jEkVrNffkEMAby7Zl28hTHKB9jsY1K8DD+EuCE3Nd4kkVAsc
iHJwV4aiHil/zC5SyE0MqMzELeS8UhsxESYebG6yNF0ElQDQ0SG+QAFr47/OBN9S
u4I7x0rhyJP6Kg8z9U4KtEX0hM6C7VVunGWu44/xZSAecTaMuJnItCIM4UMdEkSs
xpAoI59lwM6BWrXLvEunekAkxEXoR7AVpQER2PDINoLK2I0i0oavhPim9Xdt2ZXs
rqQqfmwmPOVvYbexDp97JtfWo3/psGLqvgoK1tq9bzF3u6Y3ylnUK5IspyVYwuQ=
=TK8A
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
Fixes for KVM/ARM for Linux v4.19 v2:
- Fix a VFP corruption in 32-bit guest
- Add missing cache invalidation for CoW pages
- Two small cleanups
The lock has never been used and the page tables are protected by
mmu_lock in struct kvm.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.
Fixes: fb1522e099 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan <jhogan@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
If trapping FPSIMD in the context of an AArch32 guest, it is critical
to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
not EL1.
Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
if we're not going to trap FPSIMD, as we then corrupt the existing
VFP state.
Moving the call to __activate_traps_fpsimd32 to the point where we
know for sure that we are going to trap ensures that we don't set that
bit spuriously.
Fixes: e6b673b741 ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
Cc: stable@vger.kernel.org # v4.18
Cc: Dave Martin <dave.martin@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Pull m68knommu fix from Greg Ungerer:
"A single change to fix booting on ColdFire platforms that have RAM
starting at a non-0 address"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: fix early memory reservation for ColdFire MMU systems
for systems with dcache aliasing. Those systems could previously observe
stale data, causing clock_gettime() & gettimeofday() to return incorrect
values.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW5GjkhUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN3LmgEA3bCpFJNpj6Ep4ffwHx6bLOgHzr3T
1CPx6UWPktIBZIgBAMmOWzCC5U/fqp3974lwohEG0orAzOfpuLn6uX9X0m4O
=TEC0
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fix from Paul Burton:
"A single fix for v4.19-rc3, resolving a problem with our VDSO data
page for systems with dcache aliasing. Those systems could previously
observe stale data, causing clock_gettime() & gettimeofday() to return
incorrect values"
* tag 'mips_fixes_4.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: VDSO: Match data page cache colouring when D$ aliases
In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they
hit a present non-table entry. In both cases we'll warn for non-present
entries, as the VM_WARN_ON() only checks the entry is not a table entry.
This has been observed to result in warnings when booting a v4.19-rc2
kernel under qemu.
Fix this by bailing out earlier for non-present entries.
Fixes: ec28bb9c9b ("arm64: Implement page table free interfaces")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When the kernel.print-fatal-signals sysctl has been enabled, a simple
userspace crash will cause the kernel to write a crash dump that contains,
among other things, the kernel gsbase into dmesg.
As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE
in this case.
This also moves the bitness-specific logic from show_regs() into
process_{32,64}.c.
Fixes: 45807a1df9 ("vdso: print fatal signals")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then
dividing it by HZ.
Both tsc_khz and the temporary variable holding the multiplication result
are of type unsigned long, so on 32bit the result is truncated to the lower
32bit.
Use u64 as type for the temporary variable and cast tsc_khz to it before
multiplying.
[ tglx: Massaged changelog and removed pointless braces ]
Fixes: cf7a63ef4e ("x86/tsc: Calibrate tsc only once")
Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yixin.zhu@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@microsoft.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
Here is the nds32 patch set based on 4.19-rc2.
Contained in here are the bug fixes, building error fixes and ftrace support
for nds32.
These are the LTP20170427 testing results.
Total Tests: 1902
Total Skipped Tests: 592
Total Failures: 420
Kernel Version: 4.19.0-rc2-00018-g2c9d30cc16f0-dirty
Machine Architecture: nds32
Signed-off-by: Greentime Hu <greentime@andestech.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iQIcBAABAgAGBQJbj0lJAAoJEHfB0l0b2JxElb4P/3YWwh0q3kqbUxMmfm0Dp2zv
SIFQwD2N1HGs0ZW4vB4hVEYhMz3qWJdkzRRzkdLxAdPVnJ0Prc6jBmQtwvBmYcWG
zWGcUkF1fGSQ+2XAqXZeqbUd4/GIApcdOVimquOZFgKnvprhymwqc2jlf3Zed42B
EM1sdxmP2ADsNQo+sz/BmhbLtBNzfFxwSo9KmstArVNNkkwOVTnzowzg6PxYSvMu
3cVFk4iYcoAVc8JEbUN/pObq9mdVs0xnDbilgWAdpFBcnk020V8GTB0PbY3PnD9G
PRfYV/4zwkVAviqkbBV8LXQD4joR9vSSjp+tk9sT4WUYXK4EjUyHvg0iWiV5pnAB
NsFzlH9WWQWnp3VSVBfP2mIE2j2A3iGzUZVfDQbI2lwNI4GI0AKUZtscpFDRz/Pw
J0s/FdXilKWDviefHcX+C31dkH0ZPCm4lymGWgv8le158Bo6BOTo6aRAsTnP33IN
VOdODET0zf14v+5FooL/5T75EiEoS1MtLC9cMA1U4XZ3p3GrEOiSuNIMetZQ+cHd
Z+FPflfAyDaSBFJzRyLohnOBOaWWDNe6CyGMzIKZ4qSRz0BltW12Ig6LfMUZXEMN
U0jmS4b8rWYHiOhSOCKsg95GBYGDUocJj1RRmoBLy0+Mq1yf+V/r/GnACouFOTEZ
VQgbEAzh2rcLJJi5GoFp
=DJCS
-----END PGP SIGNATURE-----
Merge tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
"Contained in here are the bug fixes, building error fixes and ftrace
support for nds32"
* tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: linker script: GCOV kernel may refers data in __exit
nds32: fix build error because of wrong semicolon
nds32: Fix a kernel panic issue because of wrong frame pointer access.
nds32: Only print one page of stack when die to prevent printing too much information.
nds32: Add macro definition for offset of lp register on stack
nds32: Remove the deprecated ABI implementation
nds32/stack: Get real return address by using ftrace_graph_ret_addr
nds32/ftrace: Support dynamic function graph tracer
nds32/ftrace: Support dynamic function tracer
nds32/ftrace: Add RECORD_MCOUNT support
nds32/ftrace: Support static function graph tracer
nds32/ftrace: Support static function tracer
nds32: Extract the checking and getting pointer to a macro
nds32: Clean up the coding style
nds32: Fix get_user/put_user macro expand pointer problem
nds32: Fix empty call trace
nds32: add NULL entry to the end of_device_id array
nds32: fix logic for module
This patch is used to fix nds32 allmodconfig/allyesconfig build error
because GCOV kernel embeds counters in the kernel for each line
and a part of that embed in __exit text. So we need to keep the
EXIT_TEXT and EXIT_DATA if CONFIG_GCOV_KERNEL=y.
Link: https://lkml.org/lkml/2018/9/1/125
Signed-off-by: Greentime Hu <greentime@andestech.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>