When CPUs are stopped during an abnormal operation like panic
for each CPU a line is printed and the stack trace is dumped.
This information is only interesting for the aborting CPU
and on systems with many CPUs it only makes it harder to
debug if after the aborting CPU the log is flooded with data
about all other CPUs too.
Therefore remove the stack dump and printk of other CPUs
and only print a single line that the other CPUs are going to be
stopped and, in case any CPUs remain online list them.
Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We already re-enable interrupts where necessary in the entry code, so
there is no need to do it again in do_page fault. This patch removes
the redundant code.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.
There are two main scenarios for this situation:
1. The CPU getting an access fault does not support hardware updates of
the access/dirty flags. However, a different agent in the system
(e.g. SMMU) can do this, therefore overriding a writable entry with a
clean one could potentially lose the automatically updated dirty
status
2. A more complex situation is possible when all CPUs support hardware
AF/DBM:
a) Initial state: shareable + writable vma and pte_none(pte)
b) Read fault taken by two threads of the same process on different
CPUs
c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
eventually reaches do_set_pte() which sets a writable + clean pte.
CPU0 releases the mmap_sem
d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
pte entry it reads is present, writable and clean and it continues
to pte_mkyoung()
e) CPU1 calls ptep_set_access_flags()
If between (d) and (e) the hardware (another CPU) updates the dirty
state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
marking the entry clean again.
This patch implements an arm64-specific ptep_set_access_flags() function
to perform an atomic update of the PTE flags.
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.3+
[will: reworded comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Enable NUMA balancing for arm64 platforms.
Add pte, pmd protnone helpers for use by automatic NUMA balancing.
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Attempt to get the memory and CPU NUMA node via of_numa. If that
fails, default the dummy NUMA node and map all memory and CPUs to node
0.
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In order to extract NUMA information from the device tree, we need to
have the tree in its unflattened form.
Move the call to bootmem_init() in the tail of paging_init() into
setup_arch, and adjust header files so that its declaration is
visible.
Move the unflatten_device_tree() call between the calls to
paging_init() and bootmem_init(). Follow on patches add NUMA handling
to bootmem_init().
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With a VHE capable CPU, kernel can run at EL2 and is a decided at early
boot. If some of the CPUs didn't start it EL2 or doesn't have VHE, we
could have CPUs running at different exception levels, all in the same
kernel! This patch adds an early check for the secondary CPUs to detect
such situations.
For each non-boot CPU add a sanity check to make sure we don't have
different run levels w.r.t the boot CPU. We save the information on
whether the boot CPU is running in hyp mode or not and ensure the
remaining CPUs match it.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[will: made boot_cpu_hyp_mode static]
Signed-off-by: Will Deacon <will.deacon@arm.com>
During the activation of a secondary CPU, we could report serious
configuration issues and hence request to crash the kernel. We do
this for CPU ASID bit check now. We will need it also for handling
mismatched exception levels for the CPUs with VHE. Hence, add a
helper to do the same for reusability.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS
enabled, lockdep will compare current->hardirqs_enabled with the flags from
local_irq_save().
When a debug exception occurs, interrupts are disabled in entry.S, but
lockdep isn't told, resulting in:
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
------------[ cut here ]------------
WARNING: at ../kernel/locking/lockdep.c:3523
Modules linked in:
CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204
Hardware name: ARM Juno development board (r1) (DT)
task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000
PC is at check_flags.part.35+0x17c/0x184
LR is at check_flags.part.35+0x17c/0x184
pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5
[...]
---[ end trace 74631f9305ef5020 ]---
Call trace:
[<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184
[<ffffff80080ffe30>] lock_acquire+0xa8/0xc4
[<ffffff8008093038>] breakpoint_handler+0x118/0x288
[<ffffff8008082434>] do_debug_exception+0x3c/0xa8
[<ffffff80080854b4>] el1_dbg+0x18/0x6c
[<ffffff80081e82f4>] do_filp_open+0x64/0xdc
[<ffffff80081d6e60>] do_sys_open+0x140/0x204
[<ffffff80081d6f58>] SyS_openat+0x10/0x18
[<ffffff8008085d30>] el0_svc_naked+0x24/0x28
possible reason: unannotated irqs-off.
irq event stamp: 65857
hardirqs last enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4
hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4
softirqs last enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290
softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0
This patch adds the annotations to do_debug_exception(), while trying not
to call trace_hardirqs_off() if el1_dbg() interrupted a task that already
had irqs disabled.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit 1cf4f629d9 ("cpu/hotplug: Move online calls to
hotplugged cpu") it is ensured that callbacks of CPU_ONLINE and
CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
function calls are no longer required.
Replace smp_call_function_single() with a direct call of
hw_breakpoint_reset(). To keep the calling convention, interrupts are
explicitly disabled around the call.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit 1cf4f629d9 ("cpu/hotplug: Move online calls to
hotplugged cpu") it is ensured that callbacks of CPU_ONLINE and
CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
function calls are no longer required.
Replace smp_call_function_single() with a direct call to
clear_os_lock(). The function writes the OSLAR register to clear OS
locking. This does not require to be called with interrupts disabled,
therefore the smp_call_function_single() calling convention is not
preserved.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The mapping of the kernel consist of four segments, each of which is mapped
with different permission attributes and/or lifetimes. To optimize the TLB
and translation table footprint, we define various opaque constants in the
linker script that resolve to different aligment values depending on the
page size and whether CONFIG_DEBUG_ALIGN_RODATA is set.
Considering that
- a 4 KB granule kernel benefits from a 64 KB segment alignment (due to
the fact that it allows the use of the contiguous bit),
- the minimum alignment of the .data segment is THREAD_SIZE already, not
PAGE_SIZE (i.e., we already have padding between _data and the start of
the .data payload in many cases),
- 2 MB is a suitable alignment value on all granule sizes, either for
mapping directly (level 2 on 4 KB), or via the contiguous bit (level 3 on
16 KB and 64 KB),
- anything beyond 2 MB exceeds the minimum alignment mandated by the boot
protocol, and can only be mapped efficiently if the physical alignment
happens to be the same,
we can simplify this by standardizing on 64 KB (or 2 MB) explicitly, i.e.,
regardless of granule size, all segments are aligned either to 64 KB, or to
2 MB if CONFIG_DEBUG_ALIGN_RODATA=y. This also means we can drop the Kconfig
dependency of CONFIG_DEBUG_ALIGN_RODATA on CONFIG_ARM64_4K_PAGES.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Keeping .head.text out of the .text mapping buys us very little: its actual
payload is only 4 KB, most of which is padding, but the page alignment may
add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional
padding to the uncompressed kernel Image.
Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to
map the adjacent 56 KB of code without the PTE_CONT attribute, and since
this region contains things like the vector table and the GIC interrupt
handling entry point, this region is likely to benefit from the reduced TLB
pressure that results from PTE_CONT mappings.
So remove the alignment between the .head.text and .text sections, and use
the [_text, _etext) rather than the [_stext, _etext) interval for mapping
the .text segment.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Apart from the arm64/linux and EFI header data structures, there is nothing
in the .head.text section that must reside at the beginning of the Image.
So let's move it to the .init section where it belongs.
Note that this involves some minor tweaking of the EFI header, primarily
because the address of 'stext' no longer coincides with the start of the
.text section. It also requires a couple of relocated symbol references
to be slightly rewritten or their definition moved to the linker script.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Replace the poorly defined term chunk with segment, which is a term that is
already used by the ELF spec to describe contiguous mappings with the same
permission attributes of statically allocated ranges of an executable.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the vmemmap region has been redefined to cover the linear region
rather than the entire physical address space, we no longer need to
perform a virtual-to-physical translation in the implementaion of
virt_to_page(). This restricts virt_to_page() translations to the linear
region, so redefine virt_addr_valid() as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This moves the vmemmap region right below PAGE_OFFSET, aka the start
of the linear region, and redefines its size to be a power of two.
Due to the placement of PAGE_OFFSET in the middle of the address space,
whose size is a power of two as well, this guarantees that virt to
page conversions and vice versa can be implemented efficiently, by
masking and shifting rather than ordinary arithmetic.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Before restricting virt_to_page() to the linear mapping, ensure that
the text patching code does not use it to resolve references into the
core kernel text, which is mapped in the vmalloc area.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The zero page is statically allocated, so grab its struct page pointer
without using virt_to_page(), which will be restricted to the linear
mapping later.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The implementation of free_initmem_default() expects __init_begin
and __init_end to be covered by the linear mapping, which is no
longer the case. So open code it instead, using addresses that are
explicitly translated from kernel virtual to linear virtual.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The translation performed by virt_to_page() is only valid for linear
addresses, and kernel symbols are no longer in the linear mapping.
So perform the __pa() translation explicitly, which does the right
thing in either case, and only then translate to a struct page offset.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This removes the relocate_initrd() implementation and invocation, which are
no longer needed now that the placement of the initrd is guaranteed to be
covered by the linear mapping.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Instead of going out of our way to relocate the initrd if it turns out
to occupy memory that is not covered by the linear mapping, just add the
initrd to the linear mapping. This puts the burden on the bootloader to
pass initrd= and mem= options that are mutually consistent.
Note that, since the placement of the linear region in the PA space is
also dependent on the placement of the kernel Image, which may reside
anywhere in memory, we may still end up with a situation where the initrd
and the kernel Image are simply too far apart to be covered by the linear
region.
Since we now leave it up to the bootloader to pass the initrd in memory
that is guaranteed to be accessible by the kernel, add a mention of this to
the arm64 boot protocol specification as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit 36e5cd6b89, since the
section alignment is now guaranteed by construction when choosing the
value of memstart_addr.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This redefines ARM64_MEMSTART_ALIGN in terms of the minimal alignment
required by sparsemem vmemmap. This comes down to using 1 GB for all
translation granules if CONFIG_SPARSEMEM_VMEMMAP is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
After choosing memstart_addr to be the highest multiple of
ARM64_MEMSTART_ALIGN less than or equal to the first usable physical memory
address, we clip the memblocks to the maximum size of the linear region.
Since the kernel may be high up in memory, we take care not to clip the
kernel itself, which means we have to clip some memory from the bottom if
this occurs, to ensure that the distance between the first and the last
usable physical memory address can be covered by the linear region.
However, we fail to update memstart_addr if this clipping from the bottom
occurs, which means that we may still end up with virtual addresses that
wrap into the userland range. So increment memstart_addr as appropriate to
prevent this from happening.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, we check two pointers: cpu_ops and cpu_suspend on every idle
state entry. These pointers check can be avoided:
If cpu_ops has not been registered, arm_cpuidle_init() will return
-EOPNOTSUPP, so arm_cpuidle_suspend() will never have chance to
run. In other word, the cpu_ops check can be avoid.
Similarly, the cpu_suspend check could be avoided in this hot path by
moving it into arm_cpuidle_init().
I measured the 4096 * time from arm_cpuidle_suspend entry point to the
cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB
board.
1. only one shell, no other process, hot-unplug secondary cpus, execute
the following cmd
while true
do
sleep 0.2
done
before the patch: 1581220ns
after the patch: 1579630ns
reduced by 0.1%
2. only one shell, no other process, hot-unplug secondary cpus, execute
the following cmd
while true
do
md5sum /tmp/testfile
sleep 0.2
done
NOTE: the testfile size should be larger than L1+L2 cache size
before the patch: 1961960ns
after the patch: 1912500ns
reduced by 2.5%
So the more complex the system load, the bigger the improvement.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are some new cpu features which can be identified by id_aa64mmfr2,
this patch appends all fields of it.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull ARM fixes from Russell King:
"A couple of small fixes, and wiring up the new syscalls which appeared
during the merge window"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8550/1: protect idiv patching against undefined gcc behavior
ARM: wire up preadv2 and pwritev2 syscalls
ARM: SMP enable of cache maintanence broadcast
Pull parisc fixes from Helge Deller:
"Since commit 0de798584b ("parisc: Use generic extable search and
sort routines") module loading is boken on parisc, because the parisc
module loader wasn't prepared for the new R_PARISC_PCREL32 relocations.
In addition, due to that breakage, Mikulas Patocka noticed that
handling exceptions from modules probably never worked on parisc. It
was just masked by the fact that exceptions from modules don't happen
during normal use.
This patch series fixes those issues and survives the tests of the
lib/test_user_copy kernel module test. Some patches are tagged for
stable"
* 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Update comment regarding relative extable support
parisc: Unbreak handling exceptions from kernel modules
parisc: Fix kernel crash with reversed copy_from_user()
parisc: Avoid function pointers for kernel exception routines
parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
- intel_pstate fixes for two issues exposed by the recent switch
over from using timers and for one issue introduced during the
4.4 cycle plus new comments describing data structures used by
the driver (Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that
may cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model
in the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs
to be done during device removal more precisely (Krzysztof
Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X
and Kabylake processors (Len Brown).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXCBaNAAoJEILEb/54YlRxDrkP/jdCfB3wOcRGp6LN6GHksC3u
k2yUQl+XJZhggLda+yUK2ovV5tCJwGmP1N1B/MacRr/5MeicGXBvt6ZrHGqOfTKo
9xAGKOt2xsoAbHkleB733GIRD9TYZpqCTx9n0dka7kRArQA/uhY+TlXYY2Kn+E0B
1c4lEN3d9j4G2qW55YxnfRUOhwUc03syokOsS2z9ChEAVPSrn7Zv8V+8rPwxIz5w
uUWphlnEQgaRwzwxACHwE0bht8sl1cJ1UUSVbzf30mzRlGHiteYyI/vuE2hF9SiV
DEv9mhSB+4U6WuBHr4Cwu+nf9YTlaY1ZaS3r5EDzJaNJb8tMqPZcGitfn+fTGtUz
9GlCQZIBcP1vWBDybuuPOzAH++QUFrikozuNfUu1d+WFEypRyadMIvUhtmZPG9mh
9+Vem+ta32eXos07dx4Dvth+yNvmG5bxPnteDp3GPsnCCDlXutcKaaJaaj+1NzHi
oqHyEynMhJuN/D2h+oIVIUppKBVO55M+lJMJrX891zICs/98K2LwOtFDuNRWQml0
3yR7Ieoj5gxVfbT6zNeH5z7CxP/MymNAMjbxfiwqtGHhkNoAv92ymMxnptPiLCHn
Zyycelsve3sneV+iyk1fAOZFvDXbzTUyigxGP7Kc87iia/Yd2TjLlvwJkMNK6x/K
2OSkjJFqqrqAqcXNIz3G
=f4MM
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).
Specifics:
- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"
* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...
Update the comment to reflect the changes of commit 0de7985 (parisc: Use
generic extable search and sort routines).
Signed-off-by: Helge Deller <deller@gmx.de>
Handling exceptions from modules never worked on parisc.
It was just masked by the fact that exceptions from modules
don't happen during normal use.
When a module triggers an exception in get_user() we need to load the
main kernel dp value before accessing the exception_data structure, and
afterwards restore the original dp value of the module on exit.
Noticed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
The kernel module testcase (lib/test_user_copy.c) exhibited a kernel
crash on parisc if the parameters for copy_from_user were reversed
("illegal reversed copy_to_user" testcase).
Fix this potential crash by checking the fault handler if the faulting
address is in the exception table.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
We want to avoid the kernel module loader to create function pointers
for the kernel fixup routines of get_user() and put_user(). Changing
the external reference from function type to int type fixes this.
This unbreaks exception handling for get_user() and put_user() when
called from a kernel module.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Commit 0de7985 (parisc: Use generic extable search and sort routines)
changed the exception tables to use 32bit relative offsets.
This patch now adds support to the kernel module loader to handle such
R_PARISC_PCREL32 relocations for 32- and 64-bit modules.
Signed-off-by: Helge Deller <deller@gmx.de>
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
It was reported that a kernel with CONFIG_ARM_PATCH_IDIV=y stopped
booting when compiled with the upcoming gcc 6. Turns out that turning
a function address into a writable array is undefined and gcc 6 decided
it was OK to omit the store to the first word of the function while
still preserving the store to the second word.
Even though gcc 6 is now fixed to behave more coherently, it is a
mystery that gcc 4 and gcc 5 actually produce wanted code in the kernel.
And in fact the reduced test case to illustrate the issue does indeed
break with gcc < 6 as well.
In any case, let's guard the kernel against undefined compiler behavior
by hiding the nature of the array location as suggested by gcc
developers.
Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70128
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Marcin Juszkiewicz <mjuszkiewicz@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Let's see if anybody even notices. I doubt anybody uses this, and it
does expose addresses that should be randomized, so let's just remove
the code. It's old and traditional, and it used to be cute, but we
should have removed this long ago.
If it turns out anybody notices and this breaks something, we'll have to
revert this, and maybe we'll end up using other approaches instead
(using %pK or similar). But removing unnecessary code is always the
preferred option.
Noted-by: Emrah Demir <ed@abdsec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
others are usual stable material.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXA8x6AAoJEL/70l94x66D0x8H/RcBnc75994RQ++WmHSvD9GF
yruGB8soLDdjX+Oceol0aEPHokrBu3JtcdoTBe0GwbCKV/F5NkQZ4EfLxDtR3tte
7ILkPULLy5GElFpJNQuT4pmXzTEspFvXpqHhFik7WVBga3W9wMFQcjbrgmGBUzLE
p2aJVhZyErpKxGFkUYWhDnlqWsguTTIzv/pqNhLY4VVc0UrXN9AA0fq9RkvgU3KS
Hxk4/A6SV/b7dyzvttzITww0f1iu8FmlLj2TXapIEoOz7AnInD6KIN0RYpxbDjxN
bEzEfpahUtuDeM87/t2kHEj0Gn09iHK7/BbCC1Hrwo1CQhbAQ/D0GIvqYAQixf4=
=NugZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Miscellaneous bugfixes.
The ARM and s390 fixes are for new regressions from the merge window,
others are usual stable material"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
compiler-gcc: disable -ftracer for __noclone functions
kvm: x86: make lapic hrtimer pinned
s390/mm/kvm: fix mis-merge in gmap handling
kvm: set page dirty only if page has been writable
KVM: x86: reduce default value of halt_poll_ns parameter
KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
KVM: x86: Inject pending interrupt even if pending nmi exist
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
When a vCPU runs on a nohz_full core, the hrtimer used by
the lapic emulation code can be migrated to another core.
When this happens, it's possible to observe milisecond
latency when delivering timer IRQs to KVM guests.
The huge latency is mainly due to the fact that
apic_timer_fn() expects to run during a kvm exit. It
sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
entry. However, if the timer fires on a different core,
we have to wait until the next kvm exit for the guest
to see KVM_REQ_PENDING_TIMER set.
This problem became visible after commit 9642d18ee. This
commit changed the timer migration code to always attempt
to migrate timers away from nohz_full cores. While it's
discussable if this is correct/desirable (I don't think
it is), it's clear that the lapic emulation code has
a requirement on firing the hrtimer in the same core
where it was started. This is achieved by making the
hrtimer pinned.
Lastly, note that KVM has code to migrate timers when a
vCPU is scheduled to run in different core. However, this
forced migration may fail. When this happens, we can have
the same problem. If we want 100% correctness, we'll have
to modify apic_timer_fn() to cause a kvm exit when it runs
on a different core than the vCPU. Not sure if this is
possible.
Here's a reproducer for the issue being fixed:
1. Set all cores but core0 to be nohz_full cores
2. Start a guest with a single vCPU
3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()
You'll see that apic_timer_fn() will run in core0 while
kvm_inject_apic_timer_irqs() runs in a different core. If
you get both on core0, try running a program that takes 100%
of the CPU and pin it to core0 to force the vCPU out.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c") dropped
some changes from commit a3a92c31bf ("KVM: s390: fix mismatch
between user and in-kernel guest limit") - this breaks KVM for some
memory sizes (kvm-s390: failed to commit memory region) like
exactly 2GB.
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull MIPS fixes from Ralf Baechle:
"This is the first round of MIPS fixes for 4.6:
- Fix spelling mistakes all over arch/mips
- Provide __bswapsi2 so XZ kernel compression will build with older GCC
- ATH79 clock fixes.
- Fix clock-rated copy-paste erros in ATH79 DTS.
- Fix gisb-arb compatible string for 7435 BMIPS
- Enable NAND and UBIFS support in CI20.
- Fix BUG() assertion caused by inapropriate smp_processor_id() use.
- Fix exception handling issues for the sake of debuggers
- Fix the last remaining instance of irq_to_gpio in the db1xxx_ss PCMCIA code
- Fix MSA unaligned load failures
- Panic if kernel is configured for a not TLB-supported page size
- Bail out on unsupported relocs in modules.
- Partial fix for Qemu breakage after recent IPI rewrite
- Wire up the preadv2 and pwrite2 syscalls
- Fix the ar724x clock calculation"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: traps.c: Verify the ISA for microMIPS RDHWR emulation
MIPS: BMIPS: Fix gisb-arb compatible string for 7435
MIPS: Bail on unsupported module relocs
MIPS: dts: qca: ar9132_tl_wr1043nd_v1.dts: use "ref" for reference clock name
MIPS: ath79: Fix the ar913x reference clock rate
MIPS: ath79: Fix the ar724x clock calculation
dt-bindings: clock: qca,ath79-pll: fix copy-paste typos
MIPS: traps: Correct the SIGTRAP debug ABI in `do_watch' and `do_trap_or_bp'
FIRMWARE: Broadcom: Fix grammar of warning messages in bcm47xx_sprom.c.
MIPS: ci20: Enable NAND and UBIFS support in defconfig.
MIPS: Fix misspellings in comments.
MIPS: tlb-r4k: panic if the MMU doesn't support PAGE_SIZE
MIPS: zboot: Remove copied source files on clean
MIPS: zboot: Fix the build with XZ compression on older GCC versions
MIPS: Wire up preadv2 and pwrite2 syscalls.
MIPS: cpu_name_string: Use raw_smp_processor_id().
pcmcia: db1xxx_ss: fix last irq_to_gpio user
MIPS: Fix MSA ld unaligned failure cases
MIPS: Fix broken malta qemu
- Safely migrate event channels between CPUs.
- Fix CPU hotplug.
- Maintainer changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXAkQtAAoJEFxbo/MsZsTRpAoH/R0mVySTK3RloxRo4SPDWs//
3EIgDFBCde/JfDmhJw7OTLC6oHExq/ObBunV4I6HSStDYjZfHnMXTe7uiRo6oOUt
ql8/k1P4NM202L2qkjZU89ObPwOMbx50NiHtVG3JAIydZc/jgn4/brow9ZymDAUd
lp85Oj0d66uM5iIY9YVa5nY/calt5W0rr9EoV93HSf6GFefNJKXJ5u3KW8IgMyIl
I4/y8GraQLAcXBcmrOny51nlIxsiv1wTssJfExH49/8In3JH3SlbZDGuEiIovPUC
jJ96Tr/oOhFyPZIM3J7pFYpvn4en84V07zbaWcEUmVf8capv3pjwJNg2Xx64FdI=
=bSO5
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from David Vrabel:
"Regression and bug fixes for 4.6-rc2:
- safely migrate event channels between CPUs
- fix CPU hotplug
- maintainer changes"
* tag 'for-linus-4.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
MAINTAINERS: xen: Konrad to step down and Juergen to pick up
xen/events: Mask a moving irq
Xen on ARM and ARM64: update MAINTAINERS info
xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()
xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
Let's stop pretending that pages in page cache are special. They are
not.
The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.
The third patch removes macros definition"
[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.
As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]
* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
Mostly direct substitution with occasional adjustment or removing
outdated comments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure it's the microMIPS rather than MIPS16 ISA before emulating
microMIPS RDHWR. Mostly needed as an optimisation for configurations
where `cpu_has_mmips' is hardcoded to 0 and also a good measure in case
we add further microMIPS instructions to emulate in the future, as the
corresponding MIPS16 encoding is ADDIUSP, not supposed to trap.
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12282/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>