KVM's host TLB handling routines were using tlbw hazard barrier macros
around tlb_read(). Now that hazard barrier macros exist for tlbr, update
this case to use them.
Also fix various other unnecessary hazard barriers in this code.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The host kernel's exception vector base address is currently saved in
the VCPU structure at creation time, and restored on a guest exit.
However it doesn't change and can already be easily accessed from the
'ebase' variable (arch/mips/kernel/traps.c), so drop the host_ebase
member of kvm_vcpu_arch, export the 'ebase' variable to modules and load
from there instead.
This does result in a single extra instruction (lui) on the guest exit
path, but simplifies the code a bit and removes the redundant storage of
the host exception base address.
Credit for the idea goes to Cavium's VZ KVM implementation.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function kvm_mips_handle_mapped_seg_tlb_fault() has two completely
unused pointer arguments, hpa0 and hpa1, for which all users always pass
NULL.
Drop these two arguments and update the callers.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When handling TLB faults in the guest KSeg0 region, a pair of physical
addresses are read from the guest physical address map. However that
process is rather convoluted with an if/then/else statement. Simplify it
to just clear the lowest bit for the even entry and set the lowest bit
for the odd entry.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Several KVM module functions are indirected so that they can be accessed
from tlb.c which is statically built into the kernel. This is no longer
necessary as the relevant bits of code have moved into mmu.c which is
part of the KVM module, so drop the indirections.
Note: is_error_pfn() is defined inline in kvm_host.h, so didn't actually
require the KVM module to be loaded for it to work anyway.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Various functions in tlb.c perform higher level MMU handling, but don't
strictly need to be statically built into the kernel as they don't
directly manipulate TLB entries. Move these functions out into a
separate mmu.c which will be built into the KVM kernel module. This
allows them to directly reference KVM functions in the KVM kernel module
in future.
Module exports of these functions have been removed, since they aren't
needed outside of KVM.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The CP0 Cause register is passed around in KVM quite a bit, often as an
unsigned long, even though it is always 32-bits long.
Resize it to u32 throughout MIPS KVM.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the MIPS KVM C code to use standard kernel sized types (e.g.
u32) instead of inttypes.h style ones (e.g. uint32_t) or other types as
appropriate.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function kvm_mips_sync_icache() is unused, so lets remove it.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The host EntryHi in the KVM VCPU context is virtually unused. It gets
stored on exceptions, but only ever used in a kvm_debug() when a TLB
miss occurs.
Drop it entirely, removing that information from the kvm_debug output.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MIPS kvm_vcpu_arch::guest_inst isn't used, so drop it from the
struct and drop its asm-offsets definition.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When emulating TLB miss / invalid exceptions during CACHE instruction
emulation, be sure to set up the correct PC and host_cp0_badvaddr state
for the kvm_mips_emlulate_tlb*_ld() function to pick up for guest EPC
and BadVAddr.
PC needs to be rewound otherwise the guest EPC will end up pointing at
the next instruction after the faulting CACHE instruction.
host_cp0_badvaddr must be set because guest CACHE instructions trap with
a Coprocessor Unusable exception, which doesn't update the host BadVAddr
as a TLB exception would.
This doesn't tend to get hit when dynamic translation of emulated
instructions is enabled, since only the first execution of each CACHE
instruction actually goes through this code path, with subsequent
executions hitting the SYNCI instruction that it gets replaced with.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a CACHE instruction is emulated by kvm_mips_emulate_cache(), the PC
is first updated to point to the next instruction, and afterwards it
falls through the "dont_update_pc" label, which rewinds the PC back to
its original address.
This works when dynamic translation of emulated instructions is enabled,
since the CACHE instruction is replaced with a SYNCI which works without
trapping, however when dynamic translation is disabled the guest hangs
on CACHE instructions as they always trap and are never stepped over.
Roughly swap the meanings of the "done" and "dont_update_pc" to match
kvm_mips_emulate_CP0(), so that "done" will roll back the PC on failure,
and "dont_update_pc" won't change PC at all (for the sake of exceptions
that have already modified the PC).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When faulting guest addresses are matched against guest segments with
the KVM_GUEST_KSEGX() macro, change the mask to 0xe0000000 so as to
include bit 31.
This is mainly for safety's sake, as it prevents a rogue BadVAddr in the
host kseg2/kseg3 segments (e.g. 0xC*******) after a TLB exception from
matching the guest kseg0 segment (e.g. 0x4*******), triggering an
internal KVM error instead of allowing the corresponding guest kseg0
page to be mapped into the host vmalloc space.
Such a rogue BadVAddr was observed to happen with the host MIPS kernel
running under QEMU with KVM built as a module, due to a not entirely
transparent optimisation in the QEMU TLB handling. This has already been
worked around properly in a previous commit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
get a TLB refill exception in it when KVM is built as a module.
This was observed to happen with the host MIPS kernel running under
QEMU, due to a not entirely transparent optimisation in the QEMU TLB
handling where TLB entries replaced with TLBWR are copied to a separate
part of the TLB array. Code in those pages continue to be executable,
but those mappings persist only until the next ASID switch, even if they
are marked global.
An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
switching to the guest exception base. Subsequent TLB mapped kernel
instructions just prior to switching to the guest trigger a TLB refill
exception, which enters the guest exception handlers without updating
EPC. This appears as a guest triggered TLB refill on a host kernel
mapped (host KSeg2) address, which is not handled correctly as user
(guest) mode accesses to kernel (host) segments always generate address
error exceptions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arch/x86/kvm/iommu.c includes <linux/intel-iommu.h> and <linux/dmar.h>, which
both are unnecessary, in fact incorrect to be here as they are intel specific.
Building kvm on x86 passed after removing above inclusion.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Make the function names more similar between KVM_REQ_NMI and KVM_REQ_SMI.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
If the processor exits to KVM while delivering an interrupt,
the hypervisor then requeues the interrupt for the next vmentry.
Trying to enter SMM in this same window causes to enter non-root
mode in emulated SMM (i.e. with IF=0) and with a request to
inject an IRQ (i.e. with a valid VM-entry interrupt info field).
This is invalid guest state (SDM 26.3.1.4 "Check on Guest RIP
and RFLAGS") and the processor fails vmentry.
The fix is to defer the injection from KVM_REQ_SMI to KVM_REQ_EVENT,
like we already do for e.g. NMIs. This patch doesn't change the
name of the process_smi function so that it can be applied to
stable releases. The next patch will modify the names so that
process_nmi and process_smi handle respectively KVM_REQ_NMI and
KVM_REQ_SMI.
This is especially common with Windows, probably due to the
self-IPI trick that it uses to deliver deferred procedure
calls (DPCs).
Reported-by: Laszlo Ersek <lersek@redhat.com>
Reported-by: Michał Zegan <webczat_200@poczta.onet.pl>
Fixes: 64d6067057
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
- two fixes for 4.6 vgic [Christoffer]
(cc stable)
- six fixes for 4.7 vgic [Marc]
x86:
- six fixes from syzkaller reports [Paolo]
(two of them cc stable)
- allow OS X to boot [Dmitry]
- don't trust compilers [Nadav]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJXUI6+AAoJEED/6hsPKofof70H/iL+tri1Mu9q+8+EtANPp9i1
XWjjAQCGszq27H/A1Io+igszW/ghhp0OoVYSB8wSUBubTSjjyba90dWe41UpBywt
r2W41xbGUBRdC6aK5rZ0+Zw2MIv5+ubSsbDKqfY8IsWQzWln+NHo+q5UU8+NfdZ7
Nu7+lRhXJfJWy31k/FeDp3FeZXPZk9b9iZBafmnNcg/i1JMjrrVK8nSBXOSUvZ22
Dy/b3ln+8emv4lvOX0jfostQH0HkfulpBmJouZ5TadoegBtQEDEKpkglxSyUHAcT
TmMhRwRyx4K6i5Bp3zAJ0LtjEks9/ayHVmerMJ+LH1MBaJqNmvxUzSPXvKGA8JE=
=PCWN
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- two fixes for 4.6 vgic [Christoffer] (cc stable)
- six fixes for 4.7 vgic [Marc]
x86:
- six fixes from syzkaller reports [Paolo] (two of them cc stable)
- allow OS X to boot [Dmitry]
- don't trust compilers [Nadav]"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
KVM: Handle MSR_IA32_PERF_CTL
KVM: x86: avoid write-tearing of TDP
KVM: arm/arm64: vgic-new: Removel harmful BUG_ON
arm64: KVM: vgic-v3: Relax synchronization when SRE==1
arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
KVM: arm/arm64: vgic-v3: Always resample level interrupts
KVM: arm/arm64: vgic-v2: Always resample level interrupts
KVM: arm/arm64: vgic-v3: Clear all dirty LRs
KVM: arm/arm64: vgic-v2: Clear all dirty LRs
Intel CPUs having Turbo Boost feature implement an MSR to provide a
control interface via rdmsr/wrmsr instructions. One could detect the
presence of this feature by issuing one of these instructions and
handling the #GP exception which is generated in case the referenced MSR
is not implemented by the CPU.
KVM's vCPU model behaves exactly as a real CPU in this case by injecting
a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
However, some operating systems use this register during an early boot
stage in which their kernel is not capable of handling #GP correctly,
causing #DP and finally a triple fault effectively resetting the vCPU.
This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
crashes.
Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In theory, nothing prevents the compiler from write-tearing PTEs, or
split PTE writes. These partially-modified PTEs can be fetched by other
cores and cause mayhem. I have not really encountered such case in
real-life, but it does seem possible.
For example, the compiler may try to do something creative for
kvm_set_pte_rmapp() and perform multiple writes to the PTE.
Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull sparc fixes from David Miller:
"sparc64 mmu context allocation and trap return bug fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix return from trap window fill crashes.
sparc: Harden signal return frame checks.
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
Pull s390 fixes from Martin Schwidefsky:
"Three bugs fixes and an update for the default configuration"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix info leak in do_sigsegv
s390/config: update default configuration
s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
s390/bpf: reduce maximum program size to 64 KB
The GICv3 backend of the vgic is quite barrier heavy, in order
to ensure synchronization of the system registers and the
memory mapped view for a potential GICv2 guest.
But when the guest is using a GICv3 model, there is absolutely
no need to execute all these heavy barriers, and it is actually
beneficial to avoid them altogether.
This patch makes the synchonization conditional, and ensures
that we do not change the EL1 SRE settings if we do not need to.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Both our GIC emulations are "strict", in the sense that we either
emulate a GICv2 or a GICv3, and not a GICv3 with GICv2 legacy
support.
But when running on a GICv3 host, we still allow the guest to
tinker with the ICC_SRE_EL1 register during its time slice:
it can switch SRE off, observe that it is off, and yet on the
next world switch, find the SRE bit to be set again. Not very
nice.
An obvious solution is to always trap accesses to ICC_SRE_EL1
(by clearing ICC_SRE_EL2.Enable), and to let the handler return
the programmed value on a read, or ignore the write.
That way, the guest can always observe that our GICv3 is SRE==1
only.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When we trap ICC_SRE_EL1, we handle it as RAZ/WI. It would be
more correct to actual make it RO, and return the configured
value when read.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.
Otherwise we can get an OOPS that looks like this:
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
[0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code. First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.
The userland register window fill handler is:
add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;
And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took. In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for. It just always branches to the last instruction in
the parent trap's handler.
For example, for a regular fault, the code goes:
winfix_trampoline:
rdpr %tpc, %g3
or %g3, 0x7c, %g3
wrpr %g3, %tnpc
done
All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.
On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons. The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).
This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary. Now if you look at them, we'll see at the end:
ba,a,pt %xcc, user_rtt_fill_fixup;
ba,a,pt %xcc, user_rtt_fill_fixup;
ba,a,pt %xcc, user_rtt_fill_fixup;
And oops, all three cases are handled like a fault.
This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.
So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.
So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.
Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
All signal frames must be at least 16-byte aligned, because that is
the alignment we explicitly create when we build signal return stack
frames.
All stack pointers must be at least 8-byte aligned.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull string hash improvements from George Spelvin:
"This series does several related things:
- Makes the dcache hash (fs/namei.c) useful for general kernel use.
(Thanks to Bruce for noticing the zero-length corner case)
- Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
above.
- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.
- Rids the world of the bad hash multipliers in hash_32.
This finishes the job started in commit 689de1d6ca ("Minimal
fix-up of bad hashing behavior of hash_64()")
The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.
The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.
- Overhauls the dcache hash mixing.
The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)
- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.
- Sort out partial_name_hash().
The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:
- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes
- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"
Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)
On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"
* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add <asm/hash.h>
microblaze: Add <asm/hash.h>
m68k: Add <asm/hash.h>
<linux/hash.h>: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to <linux/stringhash.h>
This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.
Designing H8-specific hash algorithms to work around that is a separate
project. (But if the maintainers would like to get in touch...)
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
Microblaze is an FPGA soft core that can be configured various ways.
If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.
Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.
Yes, the amount of optimization effort put in is excessive. :-)
Shift-add chain found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
This is just the infrastructure; there are no users yet.
This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.
That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
MicroMIPS kernels may be expected to run on microMIPS only cores which
don't support the normal MIPS instruction set, so be sure to pass the
-mmicromips flag through to the VDSO cflags.
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/13349/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In microMIPS kernels, handle_signal() sets the isa16 mode bit in the
vdso address so that the sigreturn trampolines (which are offset from
the VDSO) get executed as microMIPS.
However commit ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
changed the offsets to come from the VDSO image, which already have the
isa16 mode bit set correctly since they're extracted from the VDSO
shared library symbol table.
Drop the isa16 mode bit handling from handle_signal() to fix sigreturn
for cores which support both microMIPS and normal MIPS. This doesn't fix
microMIPS only cores, since the VDSO is still built for normal MIPS, but
thats a separate problem.
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/13348/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Here is the quote from [1]:
The unit-address must match the first address specified
in the reg property of the node. If the node has no reg property,
the @ and unit-address must be omitted and the node-name alone
differentiates the node from other nodes at the same level
This patch adjusts MIPS dts-files and devicetree binding
documentation in accordance with [1].
[1] Power.org(tm) Standard for Embedded Power Architecture(tm)
Platform Requirements (ePAPR). Version 1.1 – 08 April 2011.
Chapter 2.2.1.1 Node Name Requirements
Signed-off-by: Antony Pavlov <antonynpavlov@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13345/
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Avoid an aliasing issue causing a build error in VDSO:
In file included from include/linux/srcu.h:34:0,
from include/linux/notifier.h:15,
from ./arch/mips/include/asm/uprobes.h:9,
from include/linux/uprobes.h:61,
from include/linux/mm_types.h:13,
from ./arch/mips/include/asm/vdso.h:14,
from arch/mips/vdso/vdso.h:27,
from arch/mips/vdso/gettimeofday.c:11:
include/linux/workqueue.h: In function 'work_static':
include/linux/workqueue.h:186:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
return *work_data_bits(work) & WORK_STRUCT_STATIC;
^
cc1: all warnings being treated as errors
make[2]: *** [arch/mips/vdso/gettimeofday.o] Error 1
with a CONFIG_DEBUG_OBJECTS_WORK configuration and GCC 5.2.0. Include
`-fno-strict-aliasing' along with compiler options used, as required for
kernel code, fixing a problem present since the introduction of VDSO
with commit ebb5e78cc6 ("MIPS: Initial implementation of a VDSO").
Thanks to Tejun for diagnosing this properly!
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.3+
Patchwork: https://patchwork.linux-mips.org/patch/13357/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Allow KASLR to be selected on Pistachio based systems. Tested on a
Creator Ci40.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13356/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On certain MIPS32 devices, the ftrace tracer "function_graph" uses
__lshrdi3() during the capturing of trace data. ftrace then attempts to
trace __lshrdi3() which leads to infinite recursion and a stack overflow.
Fix this by marking __lshrdi3() as notrace. Mark the other compiler
intrinsics as notrace in case the compiler decides to use them in the
ftrace path.
Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: <linux-mips@linux-mips.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 4.2.x-
Patchwork: https://patchwork.linux-mips.org/patch/13354/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The Hardware page Table Walker (HTW) is being misconfigured on 64-bit
kernels. The PWSize.PS (pointer size) bit determines whether pointers
within directories are loaded as 32-bit or 64-bit addresses, but was
never being set to 1 for 64-bit kernels where the unsigned long in pgd_t
is 64-bits wide.
This actually reduces rather than improves performance when the HTW is
enabled on P6600 since the HTW is initiated lots, but walks are all
aborted due I think to bad intermediate pointers.
Since we were already taking the width of the PTEs into account by
setting PWSize.PTEW, which is the left shift applied to the page table
index *in addition to* the native pointer size, we also need to reduce
PTEW by 1 when PS=1. This is done by calculating PTEW based on the
relative size of pte_t compared to pgd_t.
Finally in order for the HTW to be used when PS=1, the appropriate
XK/XS/XU bits corresponding to the different 64-bit segments need to be
set in PWCtl. We enable only XU for now to enable walking for XUSeg.
Supporting walking for XKSeg would be a bit more involved so is left for
a future patch. It would either require the use of a per-CPU top level
base directory if supported by the HTW (a bit like pgd_current but with
a second entry pointing at swapper_pg_dir), or the HTW would prepend bit
63 of the address to the global directory index which doesn't really
match how we split user and kernel page directories.
Fixes: cab25bc753 ("MIPS: Extend hardware table walking support to MIPS64")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13364/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>