Our local m68k architecture dump_fpu() is conditionally compiled in on
CONFIG_FPU. That is OK for all existing MMU enabled CPU types, but won't
handle the case for some ColdFire SoC CPU parts that we want to support
that have no FPU hardware.
dump_fpu() is expected to be present by the ELF loader, so we must always
have it available and exported.
Remove the conditional and reorganize the dump_fpu hard FPU code path
to let the compiler remove code when not needed.
This change based on changes and discussion from Yannick Gicquel
<yannick.gicquel@open.eurogiciel.org>.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The ACR registers of the ColdFire define at a macro level what regions
of the addresses space should have caching or other attribute types applied.
Currently for the MMU enabled setups we map the interal IO peripheral addres
space as uncachable based on the define for the MBAR address (CONFIG_MBAR).
Not all ColdFire SoC use a programmable MBAR register address. Some parts
have fixed addressing for their internal peripheral registers.
Generalize the way we get the internal peripheral base address so all types
can be accomodated in the ACR definitions. Each ColdFire SoC type now sets
its IO memory base and size definitions (which may be based on MBAR) which
are then used in the ACR definitions.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The early ColdFire bootmem_alloc() code is currently only included in
the board support for the Coldire 54xx platforms. It will be used on all
ColdFire MMU enabled platforms as others are supported. So move the
mcf54xx_bootmem_alloc() function to be generally available to all MMU
enabled ColdFire parts (and use a more generic name for it).
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Not all ColdFire SoC parts that have an MMU also have an FPU - so set
an FPU type (via m68k_fputype) appropriate for the configured platform.
With this set correctly /proc/cpuinfo will report FPU "none" on devices
that don't have one. And kernel code paths that initialize FPU hardware
will now only execute if an FPU is actually present.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Create a new machine type for platforms based around the ColdFire 5441x
SoC family. Set that machine type on startup when building for this
platform type.
Currently the ColdFire head.S hard codes a M54xx machine type at startup -
since that is the only platform type currently supported with MMU enabled.
The m5441x has an MMU and this change forms part of the support required
to run it with the MMU enabled.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Move the selection of CONFIG_FPU to each CPU type configuration.
Currently for m68k we have a global set of CONFIG_FPU based on if CONFIG_MMU
is enabled or not. There is at least one CPU family we support (m5441x)
that has an MMU but has no FPU hardware. So we need to be able to have
CONFIG_MMU set and CONFIG_FPU not set.
Whether we build for a CPU with MMU enabled or not doesn't change the
fact that it has FPU hardware support. Our current non-MMU builds have
never had CONIG_FPU enabled - and in fact the kernel will not compile
with that set and CONFIG_MMU not set at the moment. It is easy enough
to fix this - but it would involve a structure change to sigcontext.h,
and that is a user space exported header (so ABI change).
This change makes no configuration visible changes, and all configs
end up with the same configuration settings as before.
This change based on changes and discussion from Yannick Gicquel
<yannick.gicquel@open.eurogiciel.org>.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The pin write code that supports the UART signals is not using he correct
word write IO access method. It correctly reads the correct 16 bit
registrer, it should also write the new value back with a 16 bit write.
Fix it to use writew().
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Most ColdFire support code has switched to using IO memory access
methods (readb/writeb/etc) when reading and writing internal peripheral
device registers. The WildFire board specific halt code was missed.
As it is now the WildFire code is broken, since all register definitions
were changed to be register addresses only some time ago.
Fix the WildFire board code to use the appropriate IO access functions.
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The early setup code for the ColdFire 53xx platform accesses variables
before the RAM and other system initialization steps may have taken place.
Currently it has 2 global variables that will end up in the bss section
that are accessed during this early setup. There is a special static RAM
stack setup at this time, but not necessarily the RAM where kernel data
sections will end up.
Even on system setups where RAM is setup by a boot loader the access
to the early setup variables is before the BSS section has been initialized.
This can potentially corrupt a ram loaded root filesystem that sits in that
memory area before it has been moved.
These 2 variables are not used at all after being set, and can just be
removed.
Reported-by: Christian Gieseler <christiangieseler@yahoo.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
- powernv/pci: Fix m64 checks for SR-IOV and window alignment from Russell Currey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX50CvAAoJEFHr6jzI4aWAqJAP/0/0D8YGwOuIoYD2GmfoasKR
TFbbuhX3xnfdiRG6w/sFBI3oh7icCw7hC+Qj1lNu9D3L/UkxOTBny+W07KvWzX44
Yu74nEHgq3mVrRAU4McztbKIUBK2zagGwwCcGZXZl/uQI1ylvmmpcH3xClQzF+oA
xKk8eB1OW2Ay6+y+FkSuyBHHSfww6QCk7ERPqaStCW9Uy+dDBjIwStLQuOpAhN/o
Z9K+JwpPJ8qgw1Pe9pvrD5MjcM0hR+tUZm6LklZCCk89feqlwcrz9cpOrmTdGuF+
n1iacpDaFf6IOlhI+6ImrT15llTgSk/nu9GNIRFDwOjVCuGy5aDQBtWuRFiVNggp
vkZWFSl594Jn5H9/s6MpMXygSl36NMKgM/ZKvUsEAe6mF0Kb9pZRB7b/aV+ajkCQ
rkQCe0KKSF6+D3wu3SmMe0NTc3/GkgxZN0lTnqUaB5PSRqwvVwurXugnAKr7arhj
JSu9/QSeOxNI5ytDF1Nf9/RN0DT+L1w0vun083DupyJkG1hrjzm9kI0lACQTr/QX
TxAWXGjiTsUOeM4pfNzqaJE4fNUc0TIc41jgWMx9qXzbKjhijgKEPtmyDMz93GVY
hFXyRAMsWUOsQGP5tiLFYG0PkNsmCDIwca+yg47EicBQGTpEsGLYUBRvIILYNBKI
ULl0yMLZWekl1rzthDdB
=35xQ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull one more powerpc fix from Michael Ellerman:
"powernv/pci: Fix m64 checks for SR-IOV and window alignment from
Russell Currey"
* tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Commit 432c6bacbd ("MIPS: Use per-mm page to execute branch delay slot
instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS
macro from do_dsemulret, leading to the ds_emul file in debugfs always
returning zero even though we perform delay slot emulations.
Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 432c6bacbd ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14301/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes the possibility of a deadlock when bringing up
secondary CPUs.
The deadlock occurs because the set_cpu_online() is called before
synchronise_count_slave(). This can cause a deadlock if the boot CPU,
having scheduled another thread, attempts to send an IPI to the
secondary CPU, which it sees has been marked online. The secondary is
blocked in synchronise_count_slave() waiting for the boot CPU to enter
synchronise_count_master(), but the boot cpu is blocked in
smp_call_function_many() waiting for the secondary to respond to it's
IPI request.
Fix this by marking the CPU online in cpu_callin_map and synchronising
counters before declaring the CPU online and calculating the maps for
IPIs.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reported-by: Justin Chen <justinpopo6@gmail.com>
Tested-by: Justin Chen <justinpopo6@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14302/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Pull perf fixes from Thomas Gleixner:
"Three fixlets for perf:
- add a missing NULL pointer check in the intel BTS driver
- make BTS an exclusive PMU because BTS can only handle one event at
a time
- ensure that exclusive events are limited to one PMU so that several
exclusive events can be scheduled on different PMU instances"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Limit matching exclusive events to one PMU
perf/x86/intel/bts: Make it an exclusive PMU
perf/x86/intel/bts: Make sure debug store is valid
Pull locking fixes from Thomas Gleixner:
"Two smallish fixes:
- use the proper asm constraint in the Super-H atomic_fetch_ops
- a trivial typo fix in the Kconfig help text"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()
Pull EFI fixes from Thomas Gleixner:
"Two fixes for EFI/PAT:
- a 32bit overflow bug in the PAT code which was unearthed by the
large EFI mappings
- prevent a boot hang on large systems when EFI mixed mode is enabled
but not used"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Only map RAM into EFI page tables if in mixed-mode
x86/mm/pat: Prevent hang during boot when mapping pages
In case of error, the function platform_device_register_simple()
returns ERR_PTR() and never returns NULL. The NULL test in the
return value check must therefor be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Vadim Pasternak <vadimp@mellanox.com>
Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus reported the following objtool warning:
kernel/signal.o: warning: objtool: .altinstr_replacement+0x54: call without frame pointer save/setup
The warning is valid. It's caused by the fact that gcc placed the call
instruction in alternative_call_2()'s inline asm before the frame
pointer setup, which breaks frame pointer convention and can result in a
bad stack trace.
Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160923214939.j5o7c67nhepzmh3t@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix kgdb breakpoint insertion in read-only text sections (when
CONFIG_DEBUG_RODATA or CONFIG_DEBUG_SET_MODULE_RONX are enabled)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX5SUUAAoJEGvWsS0AyF7xJhsQAIBvpLUNntihhcKTLbBkpQ96
ni/cludUBgQ8fil3leiR9zmfDDI1meohgETct2Z7qMHaTCx5Ej72GG5cdQl71quy
a2Ks/k7BlwsMLMzHMQLreZhF/1z0XFzOwjbXIMAvxTLUNHCZRvtuZ3mES5+JTtYc
u/obI3JhuKJjkV74CFSfSd6NoRP+Xgmzqmj/afN2TKsZwczqJw33BeNABWxi41SR
o3OcqgJppmQbCyH7KeKsYrG3zghpzraDj9e/qwWwVDiw00bWar+rE2sxmQM5pHuT
RsCmRXIVy/nbVduKl9t4UamkvMsS+dL048bRD2WIOu8hUkZKgpT/c6KcBgspYGXm
ZFBSJZITamU3r9txNMn/WNVp/S2i3KaojrxzrTPG/6wencT5ZfCOfBqWB44I6Xf4
PG8mfmHzEJwpHUMwz5mSnkigNWe5y3CJhRNUO3GQp+4ULAVo3JbD1oyF57WC2PJh
30MOPfl+jxv+uMT83T1zDsIZ0JdufvuaHJ7d9AP1wVcK5PVGW6icuO5Krd3fk7CQ
VJ1yflwya9TVEqRUQDDHwnKlCHohILJomx9myV578/lmEOs8FgA5Dbx+/grgS6/P
+9YKB7hv8uK26uGO/bn41YCqhgCoJX2YqEi4qECRLwDjNtNyYD3XBLT25jAbiopi
Gfew0xtNcFagew1BHYzD
=NCBp
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"A couple of last-minute arm64 fixes for 4.8:
- Fix secondary CPU to NUMA node assignment
- Fix kgdb breakpoint insertion in read-only text sections (when
CONFIG_DEBUG_RODATA or CONFIG_DEBUG_SET_MODULE_RONX are enabled)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kgdb: handle read-only text / modules
arm64: Call numa_store_cpu_info() earlier.
In the mipsr2_decoder() function, used to emulate pre-MIPSr6
instructions that were removed in MIPSr6, the init_fpu() function is
called if a removed pre-MIPSr6 floating point instruction is the first
floating point instruction used by the task. However, init_fpu()
performs varous actions that rely upon not being migrated. For example
in the most basic case it sets the coprocessor 0 Status.CU1 bit to
enable the FPU & then loads FP register context into the FPU registers.
If the task were to migrate during this time, it may end up attempting
to load FP register context on a different CPU where it hasn't set the
CU1 bit, leading to errors such as:
do_cpu invoked from kernel context![#2]:
CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G D 4.7.0-00424-g49b0c82 #2
task: 838e4000 ti: 88d38000 task.ti: 88d38000
$ 0 : 00000000 00000001 ffffffff 88d3fef8
$ 4 : 838e4000 88d38004 00000000 00000001
$ 8 : 3400fc01 801f8020 808e9100 24000000
$12 : dbffffff 807b69d8 807b0000 00000000
$16 : 00000000 80786150 00400fc4 809c0398
$20 : 809c0338 0040273c 88d3ff28 808e9d30
$24 : 808e9d30 00400fb4
$28 : 88d38000 88d3fe88 00000000 8011a2ac
Hi : 0040273c
Lo : 88d3ff28
epc : 80114178 _restore_fp+0x10/0xa0
ra : 8011a2ac mipsr2_decoder+0xd5c/0x1660
Status: 1400fc03 KERNEL EXL IE
Cause : 1080002c (ExcCode 0b)
PrId : 0001a920 (MIPS I6400)
Modules linked in:
Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0)
Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338
808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000
004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28
808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000
00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20
...
Call Trace:
[<80114178>] _restore_fp+0x10/0xa0
[<8011a2ac>] mipsr2_decoder+0xd5c/0x1660
[<8010de18>] do_ri+0x90/0x6b8
[<80105c20>] ret_from_exception+0x0/0x10
Fix this by disabling preemption around the call to init_fpu(), ensuring
that it starts & completes on one CPU.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0a668fb20 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6")
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14305/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Instead of comparing the name to a magic string, use archdata to
explicitly communicate whether the arch timer is suitable for
direct vdso access.
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Erratum A-008585 says that the ARM generic timer counter "has the
potential to contain an erroneous value for a small number of core
clock cycles every time the timer value changes". Accesses to TVAL
(both read and write) are also affected due to the implicit counter
read. Accesses to CVAL are not affected.
The workaround is to reread TVAL and count registers until successive
reads return the same value. Writes to TVAL are replaced with an
equivalent write to CVAL.
The workaround is to reread TVAL and count registers until successive reads
return the same value, and when writing TVAL to retry until counter
reads before and after the write return the same value.
The workaround is enabled if the fsl,erratum-a008585 property is found in
the timer node in the device tree. This can be overridden with the
clocksource.arm_arch_timer.fsl-a008585 boot parameter, which allows KVM
users to enable the workaround until a mechanism is implemented to
automatically communicate this information.
This erratum can be found on LS1043A and LS2080A.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
[will: renamed read macro to reflect that it's not usually unstable]
Signed-off-by: Will Deacon <will.deacon@arm.com>
A new accessor for gic_write_bpr1 is added to arch_gicv3.h in 4.9,
whilst the CP15 accessors are redifined in a separate branch.
This leads to a horrible clash, where the new accessor ends up with
a crap "asm volatile" definition.
Work around this by carrying our own definition of gic_write_bpr1,
creating a small conflict which will be obvious to resolve.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
commit d47529b2e9
"gpio: don't include module.h in shared driver header"
removed <linux/module.h> from the <linux/gpio/driver.h> header.
It seems arch/arm/mach-omap2/board-rx51-peripherals.c
is using __initdata_or_module from <linux/module.h> through
<linux/gpio.h> to <linux/gpio/driver.h>, so break this dependency
so that we get a clean compile.
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Lindgren <tony@atomide.com>
Fixes: d47529b2e9 ("gpio: don't include module.h in shared driver header")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or
CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using
aarch64_insn_write() instead of probe_kernel_write() as introduced by
commit 2f896d5866 ("arm64: use fixmap for text patching") in 4.0.
Fixes: 11d91a770f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online. Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.
When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes. The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array. This results in:
[ 0.881970] Unable to handle kernel paging request at virtual address fffffb1008b926a4
[ 1.970095] pgd = fffffc00094b0000
[ 1.973530] [fffffb1008b926a4] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[ 1.982610] Internal error: Oops: 96000004 [#1] SMP
[ 1.987541] Modules linked in:
[ 1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G W 4.8.0-rc6-preempt-vol+ #9
[ 1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[ 2.005159] task: fffffe0fe89cc300 task.stack: fffffe0fe8b8c000
[ 2.011158] PC is at try_to_wake_up+0x194/0x34c
[ 2.015737] LR is at try_to_wake_up+0x150/0x34c
[ 2.020318] pc : [<fffffc00080e7468>] lr : [<fffffc00080e7424>] pstate: 600000c5
[ 2.027803] sp : fffffe0fe8b8fb10
[ 2.031149] x29: fffffe0fe8b8fb10 x28: 0000000000000000
[ 2.036522] x27: fffffc0008c63bc8 x26: 0000000000001000
[ 2.041896] x25: fffffc0008c63c80 x24: fffffc0008bfb200
[ 2.047270] x23: 00000000000000c0 x22: 0000000000000004
[ 2.052642] x21: fffffe0fe89d25bc x20: 0000000000001000
[ 2.058014] x19: fffffe0fe89d1d00 x18: 0000000000000000
[ 2.063386] x17: 0000000000000000 x16: 0000000000000000
[ 2.068760] x15: 0000000000000018 x14: 0000000000000000
[ 2.074133] x13: 0000000000000000 x12: 0000000000000000
[ 2.079505] x11: 0000000000000000 x10: 0000000000000000
[ 2.084879] x9 : 0000000000000000 x8 : 0000000000000000
[ 2.090251] x7 : 0000000000000040 x6 : 0000000000000000
[ 2.095621] x5 : ffffffffffffffff x4 : 0000000000000000
[ 2.100991] x3 : 0000000000000000 x2 : 0000000000000000
[ 2.106364] x1 : fffffc0008be4c24 x0 : ffffff0ffffada80
[ 2.111737]
[ 2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020)
[ 2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000)
[ 2.125914] fb00: fffffe0fe8b8fb80 fffffc00080e7648
.
.
.
[ 2.442859] Call trace:
[ 2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70)
[ 2.451843] f940: fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468
[ 2.459767] f960: fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64
[ 2.467690] f980: fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000
[ 2.475614] f9a0: fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000
[ 2.483540] f9c0: fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4
[ 2.491465] f9e0: ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000
[ 2.499387] fa00: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040
[ 2.507309] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.515233] fa40: 0000000000000000 0000000000000000 0000000000000000 0000000000000018
[ 2.523156] fa60: 0000000000000000 0000000000000000
[ 2.528089] [<fffffc00080e7468>] try_to_wake_up+0x194/0x34c
[ 2.533723] [<fffffc00080e7648>] wake_up_process+0x28/0x34
[ 2.539275] [<fffffc00080d3764>] create_worker+0x110/0x19c
[ 2.544824] [<fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0
[ 2.550724] [<fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4
[ 2.557066] [<fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c
[ 2.563234] [<fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168
[ 2.569398] [<fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4
[ 2.575210] [<fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148
[ 2.581027] [<fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8
[ 2.586929] [<fffffc00080dbd64>] kthread+0xdc/0xf0
[ 2.591776] [<fffffc0008083380>] ret_from_fork+0x10/0x50
[ 2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[ 2.603464] ---[ end trace 58c0cd36b88802bc ]---
[ 2.608138] Kernel panic - not syncing: Fatal exception
Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init(). Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.
Suggested-by: Robert Richter <rric@kernel.org>
Fixes: 1a2db30034 ("arm64, numa: Add NUMA support for arm64 platforms.")
Cc: <stable@vger.kernel.org> # 4.7.x-
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Run kvm-unit-tests/eventinj.flat in L1:
Sending NMI to self
After NMI to self
FAIL: NMI
This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
and L0 are empty so:
- The L2 accesses memory can generate EPT violation which can be intercepted by L0.
The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
recorded in vmcs02's IDT-vectoring info.
- L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
it is a nested vmexit.
- L1 receives the EPT violation, then fixes its EPT12.
- L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
- L0 emulates VMRESUME which is called from L1, then return to L2.
L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
vmcs02.
- The L2 re-executes the fault instruction and cause EPT violation again.
- Since the L1's EPT12 is valid, L0 can fix its EPT02
- L0 resume L2
The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
event injection since it is caused by EPT02's EPT violation.
However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
the L0's responsibility to inject NMI from IDT-vectoring info to L2.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Bandan Das <bsd@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f95
x86, apicv: add virtual x2apic support) disable virtual x2apic mode
completely if w/o APICv, and the author also told me that windows guest
can't enter into x2apic mode when he developed the APICv feature several
years ago. However, it is not truth currently, Interrupt Remapping and
vIOMMU is added to qemu and the developers from Intel test windows 8 can
work in x2apic mode w/ Interrupt Remapping enabled recently.
This patch enables TPR shadow for virtual x2apic mode to boost
windows guest in x2apic mode even if w/o APICv.
Can pass the kvm-unit-test.
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Suggested-by: Wincy Van <fanwenyi0529@gmail.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
__warn+0xd1/0xf0
warn_slowpath_fmt+0x4f/0x60
? prepare_to_swait+0x39/0xa0
? prepare_to_swait+0x39/0xa0
__might_sleep+0x7e/0x80
__gfn_to_pfn_memslot+0x156/0x480 [kvm]
gfn_to_pfn+0x2a/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/4230:
#0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
stack backtrace:
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
dump_stack+0x99/0xd0
lockdep_rcu_suspicious+0xe7/0x120
gfn_to_memslot+0x12a/0x140 [kvm]
gfn_to_pfn+0x12/0x30 [kvm]
gfn_to_page+0xe/0x20 [kvm]
kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
? _raw_spin_unlock_irqrestore+0x36/0x80
vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
kvm_vcpu_check_block+0x12/0x60 [kvm]
kvm_vcpu_block+0x94/0x4c0 [kvm]
kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0xfd/0x210
? __lock_is_held+0x54/0x70
do_vfs_ioctl+0x96/0x6a0
? __fget+0x11c/0x210
? __fget+0x5/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x81/0x220
entry_SYSCALL64_slow_path+0x25/0x25
These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
The nested preemption timer is based on hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the new check_nested_events
hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
both can be in nested preemption timer evaluation path which results in
the warning above.
This patch fix it by leveraging request bit to async reload APIC access
page before vmentry in order to avoid to reload directly during the nested
preemption timer evaluation, it is safe since the vmcs01 is loaded and
current is nested vmexit.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
kvm_guest.config is useful for KVM guests on other arches, and nothing
in it appears to be x86 specific, so just move the whole file. Kbuild
will find it in either location.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.
Fix this by making intel_bts an "exclusive" PMU.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We cannot use the "z" constraint twice, since its a single register
(r0). Change the one not used by movli.l/movco.l to "r".
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lazy unmap (defer tlb flush after unmap until dma address reuse) can
greatly reduce the number of RPCIT instructions in the best case. In
reality we are often far away from the best case scenario because our
implementation suffers from the following problem:
To create dma addresses we maintain an iommu bitmap and a pointer into
that bitmap to mark the start of the next search. That pointer moves from
the start to the end of that bitmap and we issue a global tlb flush
once that pointer wraps around. To prevent address reuse before we issue
the tlb flush we even have to move the next pointer during unmaps - when
clearing a bit > next. This could lead to a situation where we only use
the rear part of that bitmap and issue more tlb flushes than expected.
To fix this we no longer clear bits during unmap but maintain a 2nd
bitmap which we use to mark addresses that can't be reused until we issue
the global tlb flush after wrap around.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split dma_update_trans into __dma_update_trans which handles updating
the dma translation tables and __dma_purge_tlb which takes care of
purging associated entries in the dma translation lookaside buffer.
The map_sg API makes use of this split approach by calling
__dma_update_trans once per physically contiguous address range but
__dma_purge_tlb only once per dma contiguous address range.
This results in less invocations of the expensive RPCIT instruction
when using map_sg.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Our map_sg implementation mapped sg entries independently of each other.
For ease of use and possible performance improvements this patch changes
the implementation to try to map as many (likely physically non-contiguous)
sglist entries as possible into a contiguous DMA segment.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Simplify the code we use to calculate dma addresses by putting
everything related in a dma_alloc_address function. Also provide
a dma_free_address counterpart.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We calculate dma addresses using an iommu bitmap. Since commit
69eea95c ("s390/pci_dma: fix DMA table corruption with > 4 TB main memory")
we've made sure that addresses created using that bitmap are below
the maximum reported by firmware. Thus the additional check for
that address to be within range can be removed.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
By now both VHE and non-VHE initialisation sequences query supported
VMID size. Lets keep only single instance of this code under
init_common_resources().
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
This patch allows to build and use vgic-v3 in 32-bit mode.
Unfortunately, it can not be split in several steps without extra
stubs to keep patches independent and bisectable. For instance,
virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
to be already defined.
It is how support has been done:
* handle SGI requests from the guest
* report configured SRE on access to GICv3 cpu interface from the guest
* required vgic-v3 macros are provided via uapi.h
* static keys are used to select GIC backend
* to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
the static inlines
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
vgic-v3 save/restore routines are written in such way that they map
arm64 system register naming nicely, but it does not fit to arm
world. To keep virt/kvm/arm/hyp/vgic-v3-sr.c untouched we create a
mapping with a function for each register mapping the 32-bit to the
64-bit accessors.
Please, note that 64-bit wide ICH_LR is split in two 32-bit halves
(ICH_LR and ICH_LRC) accessed independently.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Headers linux/irqchip/arm-gic.v3.h and arch/arm/include/asm/kvm_hyp.h
are included in virt/kvm/arm/hyp/vgic-v3-sr.c and both define macros
called __ACCESS_CP15 and __ACCESS_CP15_64 which obviously creates a
conflict. These macros were introduced independently for GIC and KVM
and, in fact, do the same thing.
As an option we could add prefixes to KVM and GIC version of macros so
they won't clash, but it'd introduce code duplication. Alternatively,
we could keep macro in, say, GIC header and include it in KVM one (or
vice versa), but such dependency would not look nicer.
So we follow arm64 way (it handles this via sysreg.h) and move only
single set of macros to asm/cp15.h
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
vgic-v3 driver uses architecture specific MPIDR_LEVEL_SHIFT macro to
encode the affinity in a form compatible with ICC_SGI* registers.
Unfortunately, that macro is missing on ARM, so let's add it.
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was
introduced to hide everything specific to vgic-v3 from 32-bit world.
We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3
will gone, but we don't have support for ITS there yet and we need to
continue keeping ITS away.
Introduce the new config option to prevent ITS code being build in
32-bit mode when support for vgic-v3 is done.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
So we can reuse the code under arch/arm
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Since we are going to share vgic-v3 save/restore code with ARM keep
arch specific accessors separately.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently GIC backend is selected via alternative framework and this
is fine. We are going to introduce vgic-v3 to 32-bit world and there
we don't have patching framework in hand, so we can either check
support for GICv3 every time we need to choose which backend to use or
try to optimise it by using static keys. The later looks quite
promising because we can share logic involved in selecting GIC backend
between architectures if both uses static keys.
This patch moves arm64 from alternative to static keys framework for
selecting GIC backend. For that we embed static key into vgic_global
and enable the key during vgic initialisation based on what has
already been exposed by the host GIC driver.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
In commit:
ec776ef6bb ("x86/mm: Add support for the non-standard protected e820 type")
Christoph references the original patch I wrote implementing pmem support.
The intent of the 'max_pfn' changes in that commit were to enable persistent
memory ranges to be covered by the struct page memmap by default.
However, that approach was abandoned when Christoph ported the patches [1], and
that functionality has since been replaced by devm_memremap_pages().
In the meantime, this max_pfn manipulation is confusing kdump [2] that
assumes that everything covered by the max_pfn is "System RAM". This
results in kdump hanging or crashing.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098
So fix it.
Reported-by: Zhang Yi <yizhan@redhat.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org> # v4.1 and later kernels
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@lists.01.org
Fixes: ec776ef6bb ("x86/mm: Add support for the non-standard protected e820 type")
Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
virt_addr_valid is supposed to return true if and only if virt_to_page
returns a valid page structure. The current macro does math on whatever
address is given and passes that to pfn_valid to verify. vmalloc and
module addresses can happen to generate a pfn that 'happens' to be
valid. Fix this by only performing the pfn_valid check on addresses that
have the potential to be valid.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add the UV4-specific function definitions and define an operations struct
to implement them in the BAU driver.
Many BAU MMRs, although functionally the same, have new addresses on UV4
due to hardware changes. Each MMR requires new read/write functions, but
their implementation in the driver does not change. Thus, it is enough to
enumerate them in the operations struct for the changes to take effect.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-11-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The BAU on UV4 does not need to maintain the payload queue tail pointer. Do
not initialize the tail pointer MMR on UV4.
Note that write_payload_tail is not an abstracted BAU function since it is
an operation specific to pre-UV4 versions. Then we must switch on the UV
version to control its usage, for which we use uvhub_version rather than
is_uv*_hub because it is quicker/more concise.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-10-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Software timeouts are not currently supported on BAU for UV4. Instead, the
BAU will rely on hardware-level fairness protocols to determine broadcast
timeouts.
Do not call enable_timeouts or calculate_destination_timeout on UV4. These
functions write to pre-UV4 MMRs so they generate error messages on UV4.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-9-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert the use of UV version-specific functions to their abstracted
counterparts.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-7-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many BAU functions have different implementations depending on the UV
version. Rather than switching on the uvhub_version throughout the driver,
we can define a set of operations for each version. This is especially
beneficial for UV4, which will require many new MMR read/write functions.
Currently, the set of abstracted functions are the same for UV1, UV2, and
UV3. The functions were chosen because each one will have a different
implementation for UV4. Other functions will be added as needed to handle
new implementations or to cleanup the existing differences between UV1,
UV2, and UV3, i.e. read_status and wait_completion.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-6-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The BAU driver should use the functions provided by uv_hub.h rather than
its own implementations. uv_physnodeaddr converts vaddrs to paddrs for
BAU MMR fields, but this is done better by uv_gpa_to_offset.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-5-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The payload queue first MMR requires the physical memory address and hub
GNODE of where the payload queue resides in memory, but the associated
variables are named as if the PNODE were used. Rename gnode-related
variables and clarify the definitions of the payload queue head, last, and
tail pointers.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-4-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.
This patch finishes step 2.
In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.
And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.
After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.
And finally we will use this array to make cpuid <-> nodeid persistent.
cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.
In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.
So here is the problem:
Assume we have the following cpuid <-> nodeid in the beginning:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
and we hot-remove node2 and node3, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
and we hot-add node4 and node5, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}
Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
It happens here:
create_worker(struct worker_pool *pool)
|--> worker = alloc_worker(pool->node);
static struct worker *alloc_worker(int node)
{
struct worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
......
return worker;
}
[Solution]
There are four mappings in the kernel:
1. nodeid (logical node id) <-> pxm
2. apicid (physical cpu id) <-> nodeid
3. cpuid (logical cpu id) <-> apicid
4. cpuid (logical cpu id) <-> nodeid
1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
mapping is setup at boot time. This mapping is persistent, won't change.
2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
persistent.
3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
allocated, lower ids first, and released at CPU hotremove time, reused for other
hotadded CPUs. So this mapping is not persistent.
4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.
apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
This is done by introducing an extra parameter to generic_processor_info to let the
caller control if disabled cpus are ignored.
2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
registering local apic. Store the mapping in this array.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
This is also done by introducing an extra parameter to these apis to let the caller
control if disabled cpus are ignored.
4. Establish all possible cpuid <-> nodeid mapping.
This is done via an additional acpi namespace walk for processors.
This patch finished step 1.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.
But we don't have to do it in this way.
In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.
Allocate pgdats for all memory-less nodes and online them at boot
time. Then build zonelists for these nodes. As a result, when cpus on these
memory-less nodes try to allocate memory from local node, it will
automatically fall back to the proper zones in the zonelists.
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-2-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Provide a nicer to_sa1111_device macro to convert a struct device to a
sa1111_dev. We will need this for drivers when converting them to
dev_pm_ops, or removing shutdown methods.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The page structures associated with the vDSO pages in the kernel image
are calculated using virt_to_page(), which uses __pa() under the hood to
find the pfn associated with the virtual address. The vDSO data pointers
however point to kernel symbols, so __pa_symbol() should really be used
instead.
Since there is no equivalent to virt_to_page() which uses __pa_symbol(),
fix init_vdso_image() to work directly with pfns, calculated with
__phys_to_pfn(__pa_symbol(...)).
This issue broke the Malta Enhanced Virtual Addressing (EVA)
configuration which has a non-default implementation of __pa_symbol().
This is because it uses a physical alias so that the kernel executes
from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to
the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which
uses the same underlying RAM.
Since there are no page structures associated with the low physical
address region, some arbitrary kernel memory would be interpreted as a
page structure for the vDSO pages and badness ensues.
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/14229/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The maximum size of e820 map array for EFI systems is defined as
E820_X_MAX (E820MAX + 3 * MAX_NUMNODES).
In x86_64 defconfig, this ends up with E820_X_MAX = 320, e820 and e820_saved
are 6404 bytes each.
With larger configs, for example Fedora kernels, E820_X_MAX = 3200, e820
and e820_saved are 64004 bytes each. Most of this space is wasted.
Typical machines have some 20-30 e820 areas at most.
After previous patch, e820 and e820_saved are pointers to e280 maps.
Change them to initially point to maps which are __initdata.
At the very end of kernel init, just before __init[data] sections are freed
in free_initmem(), allocate smaller blocks, copy maps there,
and change pointers.
The late switch makes sure that all functions which can be used to change
e820 maps are no longer accessible (they are all __init functions).
Run-tested.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160918182125.21000-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch turns e820 and e820_saved into pointers to e820 tables,
of the same size as before.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160917213927.1787-2-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
They are all called only from other __init functions in e820.c
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160917213927.1787-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 5958d19a14 checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags. This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.
The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.
Revert these cases to the previous behaviour, adding a new helper function
to do so. This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.
Fixes: 5958d19a14 ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that memory initialization doesn't add default memory region
specify it explicitly in the memmap command line option in case somebody
wants to boot in non-DT-enabled configuration.
While at it update earlycon access mode to mmio32native to support both
LE and BE cores transparently.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Now that memory initialization doesn't add default memory region specify
it explicitly in the memmap command line option.
Save common_defconfig as defconfig so that it doesn't have all option
settings in it, only those that are different from the Kconfig defaults.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Now that memory initialization doesn't add default memory region specify
it explicitly in the memmap command line option.
Save iss_defconfig as defconfig so that it doesn't have all option
settings in it, only those that are different from the Kconfig defaults.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
ISS kernel by default has only low memory. But it may be configured to
support high memory and started in a simulator with more than 128M of
RAM. Simdisk driver in such configuration can get IO request with a
high memory page. There may be no TLB entry for that page, only page
table entry. However simulators don't do pagewalking, so such IO request
will fail. Touch IO buffer in the buffer read/write loop so that a TLB
entry is likely there when read or write simcall is invoked.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
A number of ISS platform functions use inline assembly to invoke
simulator exit, not all correctly. Define simc_exit(exit_code) and use
it instead of inline assembly.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Group platform_* functions together and turn two separate #ifdef/#ifndef
blocks into single #ifdef/#else. No functional changes.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
DT-enabled kernel should have a CPU node connected to a clock. This clock
is the CCOUNT clock. Use old platform_calibrate_ccount call as a fallback
when CPU node cannot be found or has no clock and in non-DT-enabled
configurations.
Drop no longer needed code that updates CPU clock-frequency property in
the DT; drop DT-related code from the platform_calibrate_ccount too.
Move of_clk_init to the top of time_init, so that clocks are initialized
before CCOUNT calibration is attempted.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Instead of querying hardcoded FPGA frequency register and then updating
clock-frequency property in specificly named DT nodes in machine setup
code register a clock provider that returns fixed-rate clock, configured
by register specified in DT. This way we have less magic/hardcoded names
and use more existing common clock framework code.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
With the following commit:
e18bcccd1a ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder")
The task pointer argument to show_stack_log_lvl() in show_stack() was
inadvertently changed to 'current'.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: byungchul.park@lge.com
Cc: fweisbec@gmail.com
Cc: keescook@chromium.org
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@amacapital.net
Cc: nilayvaish@gmail.com
Cc: rostedt@goodmis.org
Cc: tip-bot for Josh Poimboeuf <tipbot@zytor.com>
Fixes: e18bcccd1a ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder")
Link: http://lkml.kernel.org/r/20160920155340.yhewlx7vmgmov5fb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
new EFI memory map reservation code didn't align reservations to
EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
the global EFI memory map - Matt Fleming
-----BEGIN PGP SIGNATURE-----
iQI2BAABCAAgBQJX4UshGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
wI0jPVX2RA/9HiT48K98R1fkiig/V3Wh8H8y7lQbwO+qA1WZ7q/D+DYau62qI3zM
sLgTSoV2zzjBr+4JWuGIdbtpYBjhHci0HwEcOGp5EFSZmgJGOpc7VY2VdFaYyZ2X
xhowGLXv2jgfKoAPt9MfgZX2Qh5Jt4CRlC+UcilUJp0wy1+uD4W7XmrLM1Rg7Ug2
8XTL/BA5QxQa3wx8+/z6xV/ADlhzf3DQ7BwI3qkx5RHSXSuy1/E2XGTnyuDt/7Oo
KmMwQGGrrv92uSjyq5vJ53DuNxXDJ7nGtlXTT2u0WtcZbzJXsXIaCZItbCA5rqW/
wWGgsz5XSc406m1WNv8Zs8xx9NAEP/+KXQfftwuQHFqoylG4QqYuO060V5XOxQlh
cdRuyVUgMONZAvTzEAUC+5SaN3c+WDCL8oEnrs/tdMxSzBL9Rj7yaHMW+ozVe5Nf
GFxMnO0l5SG9+o8hSAkkn5RDfIDzSTC1W5imqWJQXhC2BXHEdmh4wVWIG4QgPM/K
gqaTpX72Nu04lBjCXXpfKSK5Xqv67hZOzxUDkkjxtq1PtDbL1dPpzAF3s3GFdreI
FfQgetg4jM4EPnWDQWqWVJxQUv+uCfpDV9g5y9kMinVb2OpyGjcZpNckGHKnqIXr
hyntPYqyy793W6dPGEjTqfcVP751qisMjp6iIIqk3h+qr7HyJSF5nqs=
=S9gJ
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core
Pull EFI fix from Matt Fleming:
* Fix a boot crash reported by Mike Galbraith and Mike Krinkin. The
new EFI memory map reservation code didn't align reservations to
EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
the global EFI memory map (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
by type conversion errors in the x86 pat code - Matt Fleming
-----BEGIN PGP SIGNATURE-----
iQI2BAABCAAgBQJX4UkgGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
wI0jPVWaoA/9EqgJqgXP5JSoV5EFzGA0uyE65J46hhmpqry0gnJ0Jv4lf9nWVShI
iYm3VRW2bIjF6sNKw/Z9ZFzhXY96bn3axlPwoMbwsuCJoWHwIG0dYkI+N4oQGcaR
7xX5hrnWeRPdU28wTZNyvqWBojXJcY2Cv3mg+/1/wByL7wABljcAxmr3a0Vk1j/R
UsAdWKz0AFYMnKKhfwcmIwTRDFXlzmCVNP1luycFNjVXALW7vHOIav7Wzn7ZMCfv
Nd6FYlabUm5Nq/o8o7QD/KniBj/0maurBq2jnbtV/Iim6OsMpN62AsXYY06QNtrp
IlrDb4RIYJa+Oi4VDpjoEZicBlVuEBBJaziS3bPMUlswkO2S/v7LvddIqqEm/K1V
rqLTpcx+O66LODs1GHz2hDd/T8I5lHsR8F9ggmboQj8kTze4kowpeN4kuTB91+oK
4gybkXiB7olrjdWCxPe2UE/2c325nHsSmuObDmfJjTxCn1zsbJY233oXLLfaiMgn
y4URazGH7m2kTAfjbwfJQKLMAz9AOItxaY4f2nIO6BM/ZaNM59CyGQUKU9muhecd
p/e8qJtyP4narwtDFWm5XMUybS3W46F5jAAux1AGi83vhpfbVjnvajm5wzeonTq3
pAS3NBWMuJ0xHQsq/zkI+S5bvzErsQKwV0p+tNhpKSb0n72tDaN+CYk=
=qMBL
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/urgent
Pull EFI fixes from Matt Fleming:
* Fix a boot hang on large memory machines (multiple terabyte) caused
by type conversion errors in the x86 PAT code (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike Galbraith reported that his machine started rebooting during boot
after,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
The ESRT table on his machine is 56 bytes and at no point in the
efi_arch_mem_reserve() call path is that size rounded up to
EFI_PAGE_SIZE, nor is the start address on an EFI_PAGE_SIZE boundary.
Since the EFI memory map only deals with whole pages, inserting an EFI
memory region with 56 bytes results in a new entry covering zero
pages, and completely screws up the calculations for the old regions
that were trimmed.
Round all sizes upwards, and start addresses downwards, to the nearest
EFI_PAGE_SIZE boundary.
Additionally, efi_memmap_insert() expects the mem::range::end value to
be one less than the end address for the region.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Tested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Since commit 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:
| .... node #0, CPUs: #1#2#3#4#5#6#7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
| <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
| [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40
This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.
Fixes: 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.
This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.
commit 742563777e ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().
Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.
Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Clean up the duplication in the IRQ chip implementation - we can compute
the register address from the interrupt number rather than duplicating
the code for each register.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Add a gpio_chip instance for SA1111 GPIOs. This allows us to use
gpiolib to lookup and manipulate SA1111 GPIOs.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Move the SA1111 interrupt cleanup to a separate function, so it can be
re-used in the probe error cleanup path.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Convert sa1111 to use devm_clk_get() to get its clock resource, and
strip out the clk_put() calls. This simplifies the error handling
a little.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Use devm_kzalloc() to allocate our driver data, so we can eliminate its
kfree() from the device removal and error cleanup paths.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When removing a SA1111 device, we try to remove all child devices.
However, we must only remove our own RAB bus typed devices from the
tree, there may be other devices present which should not be touched.
This is necessary before we introduce gpiochip to SA1111 to avoid
incorrectly trying to remove the gpiochip device, leading to an oops
in __release_resource().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
The additions of uaccess.h are to deal with implict includes like:
arch/s390/kernel/traps.c: In function 'do_report_trap':
arch/s390/kernel/traps.c:56:4: error: implicit declaration of function 'extable_fixup' [-Werror=implicit-function-declaration]
arch/s390/kernel/traps.c: In function 'illegal_op':
arch/s390/kernel/traps.c:173:3: error: implicit declaration of function 'get_user' [-Werror=implicit-function-declaration]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Export clp.h for usage by userspace.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This enables UBSAN for s390. We have to disable the null sanitizer
as s390 code does access memory via a null pointer (the prefix page).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The combo of list_empty() check and return list_first_entry()
can be replaced with list_first_entry_or_null().
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
most unaligned accesses are reasonable efficient (no kernel emulation)
on s390, let's announce it
This also
- removes the ubsan false positives for unaligned accesses on s390 with
default config
- uses simpler arithmetic in several functions in several other areas
of the kernel like ethernet frame classification
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
vm_data->avic_vm_id is a u32, so the check for a error
return (less than zero) such as -EAGAIN from
avic_get_next_vm_id currently has no effect whatsoever.
Fix this by using a temporary int for the comparison
and assign vm_data->avic_vm_id to this. I used an explicit
u32 cast in the assignment to show why vm_data->avic_vm_id
cannot be used in the assign/compare steps.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lately tsc page was implemented but filled with empty
values. This patch setup tsc page scale and offset based
on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value.
The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr
reads count to zero which potentially improves performance.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
[Computation of TSC page parameters rewritten to use the Linux timekeeper
parameters. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a function that reads the exact nanoseconds value that is
provided to the guest in kvmclock. This crystallizes the notion of
kvmclock as a thin veneer over a stable TSC, that the guest will
(hopefully) convert with NTP. In other words, kvmclock is *not* a
paravirtualized host-to-guest NTP.
Drop the get_kernel_ns() function, that was used both to get the base
value of the master clock and to get the current value of kvmclock.
The former use is replaced by ktime_get_boot_ns(), the latter is
the purpose of get_kernel_ns().
This also allows KVM to provide a Hyper-V time reference counter that
is synchronized with the time that is computed from the TSC page.
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the guest's kvmclock count up from zero, not from the host boot
time. The guest cannot rely on that anyway because it changes on
migration, the numbers are easier on the eye and finally it matches the
desired semantics of the Hyper-V time reference counter.
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will use it in the next patches for KVM_GET_CLOCK and as a basis for the
contents of the Hyper-V TSC page. Get the values from the Linux
timekeeper even if kvmclock is not enabled.
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All previous users of dump_trace() have been converted to use the new
unwind interfaces, so we can remove it and the related
print_context_stack() and print_context_stack_bp() callback functions.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5b97da3572b40b5a4d8e185cf2429308d0987a13.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert show_trace_log_lvl() to use the new unwinder. dump_trace() has
been deprecated.
show_trace_log_lvl() is special compared to other users of the unwinder.
It's the only place where both reliable *and* unreliable addresses are
needed. With frame pointers enabled, most callers of the unwinder don't
want to know about unreliable addresses. But in this case, when we're
dumping the stack to the console because something presumably went
wrong, the unreliable addresses are useful:
- They show stale data on the stack which can provide useful clues.
- If something goes wrong with the unwinder, or if frame pointers are
corrupt or missing, all the stack addresses still get shown.
So in order to show all addresses on the stack, and at the same time
figure out which addresses are reliable, we have to do the scanning and
the unwinding in parallel.
The scanning is done with the help of get_stack_info() to traverse the
stacks. The unwinding is done separately by the new unwinder.
In theory we could simplify show_trace_log_lvl() by instead pushing some
of this logic into the unwind code. But then we would need some kind of
"fake" frame logic in the unwinder which would add a lot of complexity
and wouldn't be worth it in order to support only one user.
Another benefit of this approach is that once we have a DWARF unwinder,
we should be able to just plug it in with minimal impact to this code.
Another change here is that callers of show_trace_log_lvl() don't need
to provide the 'bp' argument. The unwinder already finds the relevant
frame pointer by unwinding until it reaches the first frame after the
provided stack pointer.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/703b5998604c712a1f801874b43f35d6dac52ede.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The x86 stack dump code is a bit of a mess. dump_trace() uses
callbacks, and each user of it seems to have slightly different
requirements, so there are several slightly different callbacks floating
around.
Also there are some upcoming features which will need more changes to
the stack dump code, including the printing of stack pt_regs, reliable
stack detection for live patching, and a DWARF unwinder. Each of those
features would at least need more callbacks and/or callback interfaces,
resulting in a much bigger mess than what we have today.
Before doing all that, we should try to clean things up and replace
dump_trace() with something cleaner and more flexible.
The new unwinder is a simple state machine which was heavily inspired by
a suggestion from Andy Lutomirski:
https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com
It's also similar to the libunwind API:
http://www.nongnu.org/libunwind/man/libunwind(3).html
Some if its advantages:
- Simplicity: no more callback sprawl and less code duplication.
- Flexibility: it allows the caller to stop and inspect the stack state
at each step in the unwinding process.
- Modularity: the unwinder code, console stack dump code, and stack
metadata analysis code are all better separated so that changing one
of them shouldn't have much of an impact on any of the others.
Two implementations are added which conform to the new unwind interface:
- The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.
- The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n. This
isn't an "unwinder" per se. All it does is scan the stack for kernel
text addresses. But with no frame pointers, guesses are better than
nothing in most cases.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6dc2f909c47533d213d0505f0a113e64585bec82.1474045023.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the addition of uses of GCC's condition code outputs in commit:
35ccfb7114 ("x86, asm: Use CC_SET()/CC_OUT() in <asm/rwsem.h>")
... there's now an overlap of outputs and clobbers in __down_write_trylock().
Such overlaps are generally getting tagged with an error (occasionally
even with an ICE). I can't really tell why plain GCC 6.2 doesn't detect
this (judging by the code it is meant to), while the slightly modified
one I use does. Since condition code clobbers are never necessary on x86
(other than perhaps for documentation purposes, which doesn't really
get done consistently), remove it altogether rather than inventing
something like CC_CLOBBER (to accompany CC_SET/CC_OUT).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57E003CC0200007800110102@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel PT facility grew some new functionality:
* PTWRITE packet carries the payload of the new PTWRITE instruction
that can be used to instrument Intel PT traces with user-supplied
data. Packets of this type are only generated if 'ptwrite' capability
is set and PTWEn bit is set in the event attribute's config. Flow
update packets (FUP) can be generated on PTWRITE packets if FUPonPTW
config bit is set. Setting these bits is not allowed if 'ptwrite'
capability is not set.
* PWRE, PWRX, MWAIT, EXSTOP packets communicate core power management
events. These depend on 'power_event_tracing' capability and are
enabled by setting PwrEvtEn bit in the event attribute.
Extend the driver capabilities and provide the proper sanity checks in the
event validation function.
[ tglx: Massaged changelog ]
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/20160916134819.1978-1-alexander.shishkin@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These files were only including module.h for exception table related
functions. We've now separated that content out into its own file
"extable.h" so now move over to that and avoid all the extra header content
in module.h that we don't really need to compile these files.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20160919210418.30243-1-paul.gortmaker@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via
CPUID") added code to retrieve the crystal and TSC frequency from CPUID
leaves. If the crystal freqency is enumerated as 0,the resulting TSC
frequency is 0 as well. For CPUs with a known fixed crystal frequency a
quirk list is available to set the frequency,
Kabylake and SkylakeX CPUs are missing in the list of CPUs which need this
quirk. Add them so the TSC frequency can be calculated correctly.
[ tglx: Removed the silly default case as the switch() is only invoked when
cpu_khz is 0. Massaged changelog. ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/1474289501-31717-3-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
asm/intel-family.h contains defines for cpu ids which should be used
instead of hex constants. Convert the switch case in native_calibrate_tsc()
to use the defines before adding more cpu models.
[ tglx: Massaged changelog ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/1474289501-31717-2-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.
The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.
There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.
[ tglx: Massaged changelog ]
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc6+ #5 Not tainted
-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.
stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
Call Trace:
<IRQ> [<ffffffff94446949>] dump_stack+0x99/0xd0
[<ffffffff940e9247>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff9448e0d5>] do_trace_write_msr+0x135/0x140
[<ffffffff9406e750>] native_write_msr+0x20/0x30
[<ffffffff9406503d>] native_apic_msr_eoi_write+0x1d/0x30
[<ffffffff9405b17e>] smp_trace_call_function_interrupt+0x1e/0x270
[<ffffffff948cb1d6>] trace_call_function_interrupt+0x96/0xa0
<EOI> [<ffffffff947200f4>] ? cpuidle_enter_state+0xe4/0x360
[<ffffffff947200df>] ? cpuidle_enter_state+0xcf/0x360
[<ffffffff947203a7>] cpuidle_enter+0x17/0x20
[<ffffffff940df008>] cpu_startup_entry+0x338/0x4d0
[<ffffffff9405bfc4>] start_secondary+0x154/0x180
This can be reproduced readily by running ftrace test case of kselftest.
Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly. The same applies to
exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().
[ tglx: Massaged changelog ]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The prctl code which references vdso_image_x32 is built when CONFIG_X86_X32
is set. This results in the following build failure:
LD init/built-in.o
arch/x86/built-in.o: In function `do_arch_prctl':
(.text+0x27466): undefined reference to `vdso_image_x32'
vdso_image_x32 depends on CONFIG_X86_X32_ABI. So we need to make the prctl
depend on that as well.
[ tglx: Massaged changelog ]
Fixes: 2eefd87896 ("x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/1474073513-6656-1-git-send-email-vlee@freedesktop.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull crypto fixes from Herbert Xu:
"This fixes a potential weakness in IPsec CBC IV generation, as well as
a number of issues that arose out of an OOM crash on ARM with CTR-mode
AES"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/aes-ctr - fix NULL dereference in tail processing
crypto: arm/aes-ctr - fix NULL dereference in tail processing
crypto: skcipher - Fix blkcipher walk OOM crash
crypto: echainiv - Replace chaining with multiplication
Install the callbacks via the state machine.
[ tglx: Renamed the state to MIPS_SOC_PREPARE so it can be reused by other
SOCs ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-16-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Install the callbacks via the state machine.
CPU_UP_CANCELED_FROZEN() is not preserved: It is only there to free memory in an
error case because it is assumed if the CPU does show up on resume it won't be
seen ever again. As per Borislav:
|IOW, you don't need mc_cpu_dead().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160907164523.46a2xnffha4bv63g@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160906170457.32393-5-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add module parameter xilinx_uartps.rx_trigger_level=32 to command line
options for CSP to set Rx watermark for xuartps driver lower than the
default value, to avoid UART overruns at 115200 bps.
Signed-off-by: Scott Telford <stelford@cadence.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Add lost Kconfig symbol. This should have been part of 40e084a506
('MIPS: Add uprobes support.').
Fixes: 40e084a506 ('MIPS: Add uprobes support.')
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 44a7185c2a ("of/platform: Add common method to populate
default bus") added new arch_initcall of_platform_default_populate_init()
that will override device_initcall octeon_publish_devices(). This broke
many OCTEON boards as important devices are not getting probed anymore
(e.g. on EdgeRouter Lite the USB mass storage/rootfs is missing).
Fix by changing octeon_publish_devices() to arch_initcall.
Fixes: 44a7185c2a ("of/platform: Add common method to populate default bus")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Rob Herring <robh@kernel.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14041/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 1685ddbe35 ("MIPS: Octeon: Changes to support readq()/writeq()
usage.") added bitwise shift operations that assume that unsigned long
is always 64-bits. This broke the build of VDSO code, as it gets compiled
also in "faked" 32-bit mode. Althought the failing inline functions are
never executed in 32-bit mode, they still need to pass the compilation.
Fix by using 64-bit types explicitly.
The patch fixes the following build failure:
CC arch/mips/vdso/gettimeofday-o32.o
In file included from los/git/devel/linux/arch/mips/include/asm/io.h:32:0,
from los/git/devel/linux/arch/mips/include/asm/page.h:194,
from los/git/devel/linux/arch/mips/vdso/vdso.h:26,
from los/git/devel/linux/arch/mips/vdso/gettimeofday.c:11:
los/git/devel/linux/arch/mips/include/asm/mach-cavium-octeon/mangle-port.h: In function '__should_swizzle_bits':
los/git/devel/linux/arch/mips/include/asm/mach-cavium-octeon/mangle-port.h:19:40: error: right shift count >= width of type [-Werror=shift-count-overflow]
unsigned long did = ((unsigned long)a >> 40) & 0xff;
^~
Fixes: 1685ddbe35 ("MIPS: Octeon: Changes to support readq()/writeq() usage.")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Steven J. Hill <steven.hill@cavium.com>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14039/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
cpu_has_fpu macro uses smp_processor_id() and is currently executed
with preemption enabled, that triggers the warning at runtime.
It is assumed throughout the kernel that if any CPU has an FPU, then all
CPUs would have an FPU as well, so it is safe to perform the check with
preemption enabled - change the code to use raw_ variant of the check to
avoid the warning.
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # 4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14125/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Pull perf fixes from Thomas Gleixner:
"A couple of small fixes to x86 perf drivers:
- Measure L2 for HW_CACHE* events on AMD
- Fix the address filter handling in the intel/pt driver
- Handle the BTS disabling at the proper place"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
perf/x86/intel/pt: Do validate the size of a kernel address filter
perf/x86/intel/pt: Fix kernel address filter's offset validation
perf/x86/intel/pt: Fix an off-by-one in address filter configuration
perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
Since commit acb2505d01 ("openrisc: fix copy_from_user()"),
copy_from_user() returns the number of bytes requested, not the
number of bytes not copied.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: acb2505d01 ("openrisc: fix copy_from_user()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
avr32 builds fail with:
arch/avr32/kernel/built-in.o: In function `arch_ptrace':
(.text+0x650): undefined reference to `___copy_from_user'
arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined
reference to `___copy_from_user'
kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax':
(.text+0x5dd8): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin':
sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `ptrace_has_cap':
ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user'
kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to
`___copy_from_user' follow
Fixes: 8630c32275 ("avr32: fix copy_from_user()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Havard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Fixes for code merged this cycle:
- Fix restore of SPRs upon wake up from hypervisor state loss from Gautham R. Shenoy
- Fix the state of root PE from Gavin Shan
- Detach from PE on releasing PCI device from Gavin Shan
- Fix size of NUM_CPU_FTR_KEYS on 32-bit
- Fix missed TCE invalidations that should fallback to OPAL
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX3QI8AAoJEFHr6jzI4aWAmDEP/3k8Tuk30d+QhfVm++N18cZ7
EFdpO25m+kJH83PVc3Lri6sj2z6/Bpm6Ib2V3gynExB9SxUCcAJXHqwioLkL9/PW
aUiotxsPvlpfBFAjNbk2myB8JSbc/+8yJaojKYWqwX796bjUdRkI7rmXtfrjmX6U
uhQQ9nvKNxThwY5eedMH9PCJ89BzgLefrExHUD171iR43qfaouLkUn/Ba+UIhC5m
pepwePCTXHEPm8e328hYVSNEmqWRgL+UN2EUZKqXjITNtDSHCdwGTF8iifwTku54
g/rrta8CgFD4x5chTROnOhJMkTD9MRoneVR8nE4QD6yMHj9k1huL8J8wlfnG/zbB
Ym6MNKBYbGPMAoYfbxAcvWr/7XL+szNoR+p+VWl+rgf2Z08dQaI4zNiB3aimCs1g
7yWW649Gd4gXyNygfeMCDWGZbVhQdQIHcNrcAKFuIRvkn3iPZ0cPa5GYxZ7o/32B
oKAtZMsufGN0eC21hbLaRkyeYPdqEjyk+T734t05cfBCvScWkHeBapnX9gYOoCqZ
ok7b8wXVqVFXZ+FSZ8Ec7YquUHBhHECpqofMgB6d9DqbWPlubwiA3g4YnjrpFDC7
u4a4bVKVZy8fk3w7+2ibkIdud35zL0LqkB2ZNhOn3IYM/yBD0zgUs+bIqruTKZ2+
AYapeGjmf+SBD3ytGtab
=MG5t
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes for code merged this cycle:
- Fix restore of SPRs upon wake up from hypervisor state loss from
Gautham R Shenoy
- Fix the state of root PE from Gavin Shan
- Detach from PE on releasing PCI device from Gavin Shan
- Fix size of NUM_CPU_FTR_KEYS on 32-bit
- Fix missed TCE invalidations that should fallback to OPAL"
* tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
powerpc/powernv: Detach from PE on releasing PCI device
powerpc/powernv: Fix the state of root PE
powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit
powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss
Here are a couple of bugfixes for v4.8-rc. Most of them have
actually been around for a while this time but for some reason
didn't get applied early on. The shmobile regulator fix is the
only one that isn't completely obvious.
device tree changes:
- archtimer interrupts must be level triggered (multiple platforms)
- fix for USB and MMC clocks on STiH410
- fix split DT repository in case of raspberry-pi 3
- A new use of skeleton.dtsi on arm64 has crept in after that
was removed.
defconfig updates:
- xilinx vdma has a new Kconfig symbol name
- keystone requires CONFIG_NOP_USB_XCEIV since v4.8-rc1
code fixes:
- fix regulator quirk on shmobile
- suspend-to-ram regression on EXYNOS
maintainer updates:
- Javier Martinez Canillas is now a reviewer for Samsung EXYNOS
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAV9wHNGCrR//JCVInAQKPBhAA4HWz5YoE1FwatmrZ7LyLgl+SD7ezDuGC
w/oo01kGYSq9vN8E7rTWqTW/lylTgt7adX3E6wNPIIfVg9dx9TnJ0HofH3TjHku4
K7HeqapNqqqWh3VF8xFZO6YkKi09uhsX+j8NOAGlhd50A4OrOA1xh1NtpIakLX7z
TYBpbjW1TB3SwNiq7CbC1PJUKzTfP49hQmf6dUdKajLZ2Wova4H0bonyo45DhanZ
JiZyZlR9NnieVcftAP+kGFskM8q2hyZPqtoCar/mWrmerWMUG3n1MWj9LyDTVsVc
zd7wBcX9dLOe26qGW88MUnbUBC/R2nZ+VDzMwyoYoIHlHALDcn2ldDotLDVIRp6A
xGMejt06Q2qG8zX4SCjyq9hu2LeyBRWHkRTaoAD2tsT5SD4KNMi3GeYnq9Su+iYw
hXrCOrua1pMDhWsU5RMGrfPXKbZSkkcvvt1MAoUn5h7xTqLQ1+PKLIUVOPnAR6Ns
lHR86oo1kAoXDPbKZRnMbHSQ76kW+nWF+vDOJ7ozXNwZtcmXZiqfKxs/RDVecqFL
kJMPJBPUGW5FAakarLb68f8XVsiHQr3ujofTyA77cUACqLBidbhxbfq+5NMWyck/
zXPLk4HEGBlg9v8g17g1MxdttS+Na9pzNVfE23CuGKc147QIh1M3DeLjoIZ9gSfH
p8cxVpe5gBs=
=tYAb
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple of bugfixes for v4.8-rc.
Most of them have actually been around for a while this time but for
some reason didn't get applied early on. The shmobile regulator fix
is the only one that isn't completely obvious.
Device tree changes:
- archtimer interrupts must be level triggered (multiple platforms)
- fix for USB and MMC clocks on STiH410
- fix split DT repository in case of raspberry-pi 3
- a new use of skeleton.dtsi on arm64 has crept in after that was
removed.
defconfig updates:
- xilinx vdma has a new Kconfig symbol name
- keystone requires CONFIG_NOP_USB_XCEIV since v4.8-rc1
Code fixes:
- fix regulator quirk on shmobile
- suspend-to-ram regression on EXYNOS
Maintainer updates:
- Javier Martinez Canillas is now a reviewer for Samsung EXYNOS"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: keystone: defconfig: Fix USB configuration
arm64: dts: Fix broken architected timer interrupt trigger
ARM: multi_v7_defconfig: update XILINX_VDMA
ARM64: dts: bcm: Use a symlink to R-Pi dtsi files from arch=arm
ARM: dts: Remove use of skeleton.dtsi from bcm283x.dtsi
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
ARM: shmobile: fix regulator quirk for Gen2
ARM: EXYNOS: Clear OF_POPULATED flag from PMU node in IRQ init callback
MAINTAINERS: Add myself as reviewer for Samsung Exynos support
Pull ARM fixes from Russell King:
"Most of this update are fixes primarily discovered from testing on the
older StrongARM 1110 and PXA systems, as a result of recent interest
from several people in these platforms:
- Locomo interrupt handling incorrectly stores the handler data in
the chip's private data slot: when Locomo is combined with an
interrupt controller who's chip uses the chip private data, this
leads to an oops.
- SA1111 was missing a call to clk_disable() to clean up after a
failed probe.
- SA1111 and PCMCIA suspend/resume was broken:
The PCMCIA "ds" layer was using the legacy bus suspend/resume
methods, which the core PM code is no longer calling as a result of
device_pm_check_callbacks() introduced in commit aa8e54b559
("PM / sleep: Go direct_complete if driver has no callbacks").
SA1111 was broken due to changes to PCMCIA which makes PCMCIA
suspend itself later than the SA1111 code expects, and resume
before the SA1111 code has initialised access to the pcmcia
sub-device.
- the default SA1111 interrupt mask polarity got messed up when it
was converted to use a dynamic interrupt base number for its
interrupts.
- fix platform_get_irq() error code propagation, which was causing
problems on platforms where the interrupt may not be available at
probe time in DT setups.
- fix the lack of clock to PCMCIA code on PXA platforms, which was
omitted in conversions of PXA to CCF.
- fix an oops in the PXA PCMCIA code caused by a previous commit not
realising that Lubbock is different from the rest of the PXA PCMCIA
drivers.
- ensure that SA1111 low-level PCMCIA drivers propagate their error
codes to the main probe function, rather than the driver silently
accepting a failure.
- fix the sa11xx debugfs reporting of timing information, which
always indicated zero due to the clock being a factor of 1000 out.
- fix the polarity of the status change signal reported from the
sockets.
Lastly, one ARM specific commit from Stefan Agner fixing the LPAE
cache attributes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: pxa/lubbock: add pcmcia clock
ARM: locomo: fix locomo irq handling
ARM: 8612/1: LPAE: initialize cache policy correctly
ARM: sa1111: fix missing clk_disable()
ARM: sa1111: fix pcmcia suspend/resume
ARM: sa1111: fix pcmcia interrupt mask polarity
ARM: sa1111: fix error code propagation in sa1111_probe()
pcmcia: lubbock: fix sockets configuration
pcmcia: sa1111: fix propagation of lowlevel board init return code
pcmcia: soc_common: fix SS_STSCHG polarity
pcmcia: sa11xx_base: add units to the timing information
pcmcia: sa11xx_base: fix reporting of timing information
pcmcia: ds: fix suspend/resume
Move the PMU name into a common header file so it may
be referenced by other users.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARMv8 machines can identify the micro/arch defined counters
that are available on a machine. Add all these counters to the
default armv8 perf map. At run-time disable the counters which
are not available on the given PMU.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for ACPI support, add a pmu_probe_info table to
the arm_pmu_device_probe() call. This table gets used when
probing in the absence of a devicetree node for PMU.
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This commit exports the following information to
user-space via the newly created per-vcpu debugfs
directory:
- TSC offset (as a signed number)
- TSC scaling ratio
- TSC scaling ratio fractinal bits
The original intention of this commit was to
export only the TSC offset, but the TSC scaling
information is exported for completeness.
We need to retrieve the TSC offset from user-space
in order to support the merging of host and guest
traces in trace-cmd. Today, we use the kvm_write_tsc_offset
tracepoint, but it has a number of problems (mainly,
it requires a running VM to be rebooted, ftrace setup,
and also tracepoints are not supposed to be ABIs).
The merging of host and guest traces is explained
in more detail in this thread:
[Qemu-devel] [RFC] host and guest kernel trace merging
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg00887.html
This commit creates the following files in debugfs:
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-offset
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio
/sys/kernel/debug/kvm/66828-10/vcpu0/tsc-scaling-ratio-frac-bits
The last two are only created if TSC scaling is supported.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Two stubs are added:
o kvm_arch_has_vcpu_debugfs(): must return true if the arch
supports creating debugfs entries in the vcpu debugfs dir
(which will be implemented by the next commit)
o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
entries in the vcpu debugfs dir
For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The TSC offset can now be read directly from struct kvm_arch_vcpu.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A future commit will want to easily read a vCPU's TSC offset,
so we store it in struct kvm_arch_vcpu_arch for easy access.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
broke Suspend to RAM on Exynos boards due to lack of probing of
PMU (Power Management Unit) driver. Multiple drivers attach to
the PMU's DT node: irqchip, clock controller and PMU platform
driver for handling suspend. The new irqchip code marked the
PMU's DT node as OF_POPULATED but we need to attach to this
node also PMU platform driver.
2. Add Javier as additional reviewer for Exynos patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX25oaAAoJEME3ZuaGi4PXDa4P/0KG/BAzf++1/43zP9I3PAhM
od3QOLDrnR16ly2AL8QbJWq574ZvNa8ssZ5m3QrWgt4JEGVudc97OpByupTrgm51
xlfceTcsyG95OV1Psm20NJrIK2o1FEDIr4Bdh0/ixjOs/zajLyzUtx0qQ2rHa8mQ
hzJeJGbk4yu50jsaUDWBH2p9n2EHuCqWfvRroE/fEbGL6G120OkFpiP3AExEycnD
U3gUMm+QorEWavf9cMtAzJcnFNKDjaT4iStgTAeCrvcJ4ttwX4x1z5PJV9M/zbs4
XtGoE7mGfGdajhXXnFPPinUB+1MffJdttI6jHEuTWBc/Znfdgr796HnAUYp2bcPp
WQT/OcnFN0WUeYY4dONBI5/mIuHcj1p5esOttjEvIJNOVdoHevnl5KUhaAajy9DV
0alKU9DQ49Z5j+e2lxTJ749Lj1+FfnX63EDQik95hrDZc8QKbbIe3F4/OuR6xm4L
mPUaMSP2OazSpMVNiFoWL/qyB5ukSP0fyLR9cRggUbBUoq8A0WWT0QRMYaaATnQS
fzEeEoerHIxuqU2LKUBJJ7m3VUl0VbWt7VBU2jI73RQ13IzeH92eMh6ssMjO3k0d
R2OAU6R8H6oaCb9qsYajzRjMxm4zEalkUqNTXB4GMM7tsGckqibdgRqF/4fI/4D4
TtUkGN+FCEJOLzAgAntK
=Ljgr
-----END PGP SIGNATURE-----
Merge tag 'samsung-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into fixes
Pull "ARM: exynos: Fixes for v4.8, secound round" from Krzysztof Kozłowski:
1. A recent change in populating irqchip devices from Device Tree
broke Suspend to RAM on Exynos boards due to lack of probing of
PMU (Power Management Unit) driver. Multiple drivers attach to
the PMU's DT node: irqchip, clock controller and PMU platform
driver for handling suspend. The new irqchip code marked the
PMU's DT node as OF_POPULATED but we need to attach to this
node also PMU platform driver.
2. Add Javier as additional reviewer for Exynos patches.
* tag 'samsung-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: EXYNOS: Clear OF_POPULATED flag from PMU node in IRQ init callback
MAINTAINERS: Add myself as reviewer for Samsung Exynos support
show_stack_log_lvl() and friends allow a NULL pointer for the
task_struct to indicate the current task. This creates confusion and
can cause sneaky bugs.
Instead require the caller to pass 'current' directly.
This only changes the internal workings of the dumpstack code. The
dump_trace() and show_stack() interfaces still allow a NULL task
pointer. Those interfaces should also probably be fixed as well.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.
This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,
$ perf stat -e cache-references,cache-misses
measure completely different things.
Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.
Fix this by explicitly validating the result of this calculation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.
This patch adds address validation for the user supplied kernel filters.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.
Fix this and make sure that zero-sized filters don't pass the filter
validation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.7
Cc: stable@vger.kernel.org#v4.7
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will prevent a crash if get_wchan() runs after the task stack
is freed.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/337aeca8614024aa4d8d9c81053bbf8fcffbe4ad.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Specifically, pin the stack in save_stack_trace_tsk() and
show_trace_log_lvl().
This will prevent a crash if the target task dies before or while
dumping its stack once we start freeing task stacks early.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cf0082cde65d1941a996d026f2b2cdbfaca17bfa.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When I rebased my thread_info changes onto Brian's switch_to()
changes, I carefully checked that I fixed up all the code correctly,
but I missed a comment :(
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 15f4eae70d ("x86: Move thread_info into task_struct")
Link: http://lkml.kernel.org/r/089fe1e1cbe8b258b064fccbb1a5a5fd23861031.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Windows because it uses RTC periodic interrupts.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJX2xpHAAoJEL/70l94x66D9NsIAIw+9oRA86qjehVnguV3fRKA
ITZ4OGFDiXPWuxqDaw8mHHXr0RYx8KcMTzfFNbV+YL5U0cq9xYzdaNhchKPpyF+3
7H5wL8Ku9wkYZ930kdCf5Q+LNCfg8d/wKlibPEbX0MDx4jL99kkcxLzEkmIRqFlq
bpXaQe/KR1xCWR6gI/a6aRJWLfGuFMV82YSnk/dCSjwotbAwjJUSt+IPhLwhx28o
7ddcxW3CxQqelJorcu2lvRiGnCvEzDhIdOvHJqibCjo3uzqbcI4PA2gs3rozbs9s
VMEzqZpNgK0XsyKyccSw75npViIHYPkjMzxoyHMDhgiP3eIwp/tJquxAfLjK4WE=
=h4P4
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
"One fix for an x86 regression in VM migration, mostly visible with
Windows because it uses RTC periodic interrupts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
get_user_ex(x, ptr) should zero x on failure. It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When userspace sends KVM_SET_LAPIC, KVM schedules a check between
the vCPU's IRR and ISR and the IOAPIC redirection table, in order
to re-establish the IOAPIC's dest_map (the list of CPUs servicing
the real-time clock interrupt with the corresponding vectors).
However, __rtc_irq_eoi_tracking_restore_one was forgetting to
set dest_map->vectors. Because of this, the IOAPIC did not process
the real-time clock interrupt EOI, ioapic->rtc_status.pending_eoi
got stuck at a non-zero value, and further RTC interrupts were
reported to userspace as coalesced.
Fixes: 9e4aabe2bb
Fixes: 4d99ba898d
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Gilbert <dgilbert@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Simply enabling CONFIG_KEYSTONE_USB_PHY doesn't work anymore
as it depends on CONFIG_NOP_USB_XCEIV. We need to enable
that as well.
This fixes USB on Keystone boards from v4.8-rc1 onwards.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.
We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.
This patch moves intel_bts enabling/disabling directly to the PMI
handler.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... otherwise the compiler complains:
arch/x86/entry/vdso/vma.c:252:12: warning: ‘map_vdso_randomized’ defined but not used [-Wunused-function]
But the #ifdeffery here is getting pretty ugly, so move around
vdso_addr() as well to cluster the dependencies a bit more.
It's still not particulary pretty though ...
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gorcunov@openvz.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: oleg@redhat.com
Cc: xemul@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kprobes searches backwards a finite number of instructions to determine if
there is an attempt to probe a load/store exclusive sequence. It stops when
it hits the maximum number of instructions or a load or store exclusive.
However this means it can run up past the beginning of the function and
start looking at literal constants. This has been shown to cause a false
positive and blocks insertion of the probe. To fix this, further limit the
backwards search to stop if it hits a symbol address from kallsyms. The
presumption is that this is the entry point to this code (particularly for
the common case of placing probes at the beginning of functions).
This also improves efficiency by not searching code that is not part of the
function. There may be some possibility that the label might not denote the
entry path to the probed instruction but the likelihood seems low and this
is just another example of how the kprobes user really needs to be
careful about what they are doing.
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In commit f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.
Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.
Fixes: f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that most of the thread_info users have been cleaned up,
this is straightforward.
Most of this code was written by Linus.
Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a50eab40abeaec9cb9a9e3cbdeafd32190206654.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
thread_info may move in the future, so use the accessors.
[ Andy Lutomirski wrote this changelog message and changed
"task_thread_info(child)->cpu" to "task_cpu(child)". ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3439705d9838940cc82733a7335fa8c654c37db8.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was a nice optimization while it lasted, but thread_info is moving
and this optimization will no longer work.
Quoting Linus:
Oh Gods, Andy. That pt_regs_to_thread_info() thing made me want
to do unspeakable acts on a poor innocent wax figure that looked
_exactly_ like you.
[ Changelog written by Andy. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6376aa81c68798cc81631673f52bd91a3e078944.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because sched.h and thread_info.h are a tangled mess, I turned
in_compat_syscall() into a macro. If we had current_thread_struct()
or similar and we could use it from thread_info.h, then this would
be a bit cleaner.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ccc8a1b2f41f9c264a41f771bb4a6539a642ad72.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
in_exception_stack() has some recursion checking which makes sure the
stack trace code never traverses a given exception stack more than once.
This prevents an infinite loop if corruption somehow causes a stack's
"next stack" pointer to point to itself (directly or indirectly).
The recursion checking can be useful for other stacks in addition to the
exception stack, so extend it to work for all stacks.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/95de5db4cfe111754845a5cef04e20630d01423f.1473905218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When an interrupt happens in entry code while running on a software IRQ
stack, and the IRQ stack was empty, regs->sp will contain the stack end
address (e.g., irq_stack_ptr). If the regs are passed to dump_trace(),
get_stack_info() will report STACK_TYPE_UNKNOWN, causing dump_trace() to
return prematurely without trying to go to the next stack.
Update the bounds checking for software interrupt stacks so that the
ending address is now considered part of the stack.
This means that it's now possible for the 'walk_stack' callbacks --
print_context_stack() and print_context_stack_bp() -- to be called with
an empty stack. But that's fine; they're already prepared to deal with
that due to their on_stack() checks.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5a5e5de92dcf11e8dc6b6e8e50ad7639d067830b.1473905218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
valid_stack_ptr() is buggy: it assumes that all stacks are of size
THREAD_SIZE, which is not true for exception stacks. So the
walk_stack() callbacks will need to know the location of the beginning
of the stack as well as the end.
Another issue is that in general the various features of a stack (type,
size, next stack pointer, description string) are scattered around in
various places throughout the stack dump code.
Encapsulate all that information in a single place with a new stack_info
struct and a get_stack_info() interface.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8164dd0db96b7e6a279fa17ae5e6dc375eecb4a9.1473905218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
in_exception_stack() does some bad, bad things just so the unwinder can
print different values for different areas of the debug exception stack.
There's no need to clarify where exactly on the stack it is. Just print
"#DB" and be done with it.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e91cb410169dd576678dd427c35efb716fd0cee1.1473905218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make the clocks visible options that can be selected by anyone. This
avoids the problems of:
1) Select is a reverse dependency and is hard for people to understand
and can sometimes be a pain to track down
2) Build coverage goes down because configs are hidden
3) Code bloat
Patch suggested by Stephen Boyd
Signed-off-by: Jon Mason <jonmason@broadcom.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Enumeration
Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
Power management
Fix bridge_d3 update on device removal (Lukas Wunner)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX2az0AAoJEFmIoMA60/r8/EoP/28mGRiKi8mlqNAR3MYN3F0n
VSIm7WyxWNawH1gRJXKQBzNqgMJnj4qRGXSIvP3AIYyBDcJs/X7j91/eKOARNfQr
55A+gfSz4jUKlw+0WgPY8/U2/xQ4yoom1zhbsAYcIVeljZo/3JUg+wHpPjhIMkH0
2slTerHRDExrS43jxQi225toiEaO6lcY8EVmHCDo+jYlQz3sCEwIXg9hn1rwTbvG
sJI0zyUwHF+oWowgJqlwYxsbPPnelPAN5YAx7KrHuVmBdL0Bgo3oIRtbb3JZZ9Up
L9bQ6NpRjSARvijaZ2TAhueqIIDv2HGgvwNB01l4Yggw7Sm1dFCuUS6vj/e5tpZA
xntE3F6s2Z+I4I1D7pAX3jMYCdYx/QltiTCeGRp8pJv+f4ewW3jcel3FAksY3BEg
0NCjDrGFqGYai4hGRROpt/aXlW/Pn53eQLlu4Xg2qgkj0NMh0ODMrTjMnABB39ae
eGqIXab7WeVBxt10eU19J1u1RTqpUO2LJW+cMnvYdCfKAYby/gj8SD8vIsn3oDjZ
hQS/4fSHurc7LZwDmwOfaiHlGnvcQWV9EKwgScS0v8AxPQnC8pNUgYpzZcXY8Q6I
YXtyK7suFriSZPS0Qs4FrdfJrmTBaBQ55aZu9aftb5v4YqacN5qZo+HaXiONBy3v
RzsFK6xIbnIgb35g8vKy
=PQml
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are two changes for v4.8. The first fixes a "[Firmware Bug]: reg
0x10: invalid BAR (can't size)" warning on Haswell, and the second
fixes a problem in some new runtime suspend functionality we merged
for v4.8. Summary:
Enumeration:
Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
Power management:
Fix bridge_d3 update on device removal (Lukas Wunner)"
* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix bridge_d3 update on device removal
PCI: Mark Haswell Power Control Unit as having non-compliant BARs
The ARM architected timer specification mandates that the interrupt
associated with each timer is level triggered (which corresponds to
the "counter >= comparator" condition).
A number of DTs are being remarkably creative, declaring the interrupt
to be edge triggered. A quick look at the TRM for the corresponding ARM
CPUs clearly shows that this is wrong, and I've corrected those.
For non-ARM designs (and in the absence of a publicly available TRM),
I've made them active low as well, which can't be completely wrong
as the GIC cannot disinguish between level low and level high.
The respective maintainers are of course welcome to prove me wrong.
While I was at it, I took the liberty to fix a couple of related issue,
such as some spurious affinity bits on ThunderX, and their complete
absence on ls1043a (both of which seem to be related to copy-pasting
from other DTs).
Acked-by: Duc Dang <dhdang@apm.com>
Acked-by: Carlo Caione <carlo@endlessm.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Introduce new flags that defines which ABI to use on creating sigframe.
Those flags kernel will set according to sigaction syscall ABI,
which set handler for the signal being delivered.
So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver.
Those flags will be used only under CONFIG_COMPAT.
Similar way ARM uses sa_flags to differ in which mode deliver signal
for 26-bit applications (look at SA_THIRYTWO).
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-7-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As the task isn't executing at the moment of {GET,SET}REGS,
return regset that corresponds to code selector, rather than
value of TIF_IA32 flag.
I.e. if we ptrace i386 elf binary that has just changed it's
code selector to __USER_CS, than GET_REGS will return
full x86_64 register set.
Note, that this will work only if application has changed it's CS.
If the application does 32-bit syscall with __USER_CS, ptrace
will still return 64-bit register set. Which might be still confusing
for tools that expect TS_COMPACT to be exposed [1, 2].
So this this change should make PTRACE_GETREGSET more reliable and
this will be another step to drop TIF_{IA32,X32} flags.
[1]: https://sourceforge.net/p/strace/mailman/message/30471411/
[2]: https://lkml.org/lkml/2012/1/18/320
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: Pedro Alves <palves@redhat.com>
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-6-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
show_stack_log_lvl() and dump_trace() are already preemption safe:
- If they're running in irq or exception context, preemption is already
disabled and the percpu stack pointers can be trusted.
- If they're running with preemption enabled, they must be running on
the task stack anyway, so it doesn't matter if they're comparing the
stack pointer against a percpu stack pointer from this CPU or another
one: either way it won't match.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0ca0b1044eca97d4f0ec7c1619cf80b3b65560d.1473371307.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>