Print the mask, max_cycles, and max_idle_ns values for
clocksources being registered.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-12-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A long running project has been to clean up remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz() functions.
However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register() name, and
consolidate much of the internal logic so we don't have
duplication.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stultz@linaro.org
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.
Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-9-git-send-email-john.stultz@linaro.org
[ Minor cleanups of kernel messages. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
than just silently fixing the issue.
So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequence.
The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calculation to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.
I also added some basic rate-limiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract 'now' from 'cycle_last'.
The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta to see if it is a small (relative to the mask)
negative value (again negative relative to the mask).
If so, we assume we jumped backwards somehow and
instead use zero for our delta.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplication
overflow from occuring.
This patch introduces infrastructure to allow a cap that
limits the clocksource read delta value to the 'max_cycles' value,
which is where an overflow would occur.
Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING=y.
There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were mis-scheduled
too far in the future, which would cause time to stop.
However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recently there's been requests for better sanity
checking in the time code, so that it's more clear
when something is going wrong, since timekeeping issues
could manifest in a large number of strange ways in
various subsystems.
Thus, this patch adds some extra infrastructure to
add a check to update_wall_time() to print two new
warnings:
1) if we see the call delayed beyond the 'max_cycles'
overflow point,
2) or if we see the call delayed beyond the clocksource's
'max_idle_ns' value, which is currently 50% of the
overflow point.
This extra infrastructure is conditional on
a new CONFIG_DEBUG_TIMEKEEPING option, also
added in this patch - default off.
Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
[ Improved the changelog and the messages a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to facilitate clocksource validation, add a
'max_cycles' field to the clocksource structure which
will hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.
Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% safety margin,
which these 12.5% margins where then added to.
So to simplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().
Additionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for the sched_clock.c bit)
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The previous clocks_calc_max_nsecs() code had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.
The previous implementation also had a subtle issue
that it avoided overflows with signed 64-bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other safety margins
were then added to, reducing the intended interval length.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b
RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q
al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE
ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr
NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw
QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68=
=qpY+
-----END PGP SIGNATURE-----
Merge tag 'v4.0-rc2' into timers/core, to refresh the tree before pulling more changes
Pull locking fix from Ingo Molnar:
"An rtmutex deadlock path fixlet"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Set state back to running on error
The "usual" path is:
- rt_mutex_slowlock()
- set_current_state()
- task_blocks_on_rt_mutex() (ret 0)
- __rt_mutex_slowlock()
- sleep or not but do return with __set_current_state(TASK_RUNNING)
- back to caller.
In the early error case where task_blocks_on_rt_mutex() return
-EDEADLK we never change the task's state back to RUNNING. I
assume this is intended. Without this change after ww_mutex
using rt_mutex the selftest passes but later I get plenty of:
| bad: scheduling from the idle thread!
backtraces.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: afffc6c180 ("locking/rtmutex: Optimize setting task running after being blocked")
Link: http://lkml.kernel.org/r/1425056229-22326-4-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's a uname workaround for broken userspace which can't handle kernel
versions of 3.x. Update it for 4.x.
Signed-off-by: Jon DeVree <nuxi@vault24.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull livepatching fixes from Jiri Kosina:
"Two tiny fixes for livepatching infrastructure:
- extending RCU critical section to cover all accessess to
RCU-protected variable, by Petr Mladek
- proper format string passing to kobject_init_and_add(), by Jiri
Kosina"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: RCU protect struct klp_func all the time when used in klp_ftrace_handler()
livepatch: fix format string in kobject_init_and_add()
func->new_func has been accessed after rcu_read_unlock() in klp_ftrace_handler()
and therefore the access was not protected.
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS:
- a number of fixes that didn't make the 3.19 release.
- a number of cleanups.
- preliminary support for Cavium's Octeon 3 SOCs which feature up to
48 MIPS64 R3 cores with FPU and hardware virtualization.
- support for MIPS R6 processors.
Revision 6 of the MIPS architecture is a major revision of the MIPS
architecture which does away with many of original sins of the
architecture such as branch delay slots. This and other changes in
R6 require major changes throughout the entire MIPS core
architecture code and make up for the lion share of this pull
request.
- finally some preparatory work for eXtendend Physical Address
support, which allows support of up to 40 bit of physical address
space on 32 bit processors"
[ Ahh, MIPS can't leave the PAE brain damage alone. It's like
every CPU architect has to make that mistake, but pee in the snow
by changing the TLA. But whether it's called PAE, LPAE or XPA,
it's horrid crud - Linus ]
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (114 commits)
MIPS: sead3: Corrected get_c0_perfcount_int
MIPS: mm: Remove dead macro definitions
MIPS: OCTEON: irq: add CIB and other fixes
MIPS: OCTEON: Don't do acknowledge operations for level triggered irqs.
MIPS: OCTEON: More OCTEONIII support
MIPS: OCTEON: Remove setting of processor specific CVMCTL icache bits.
MIPS: OCTEON: Core-15169 Workaround and general CVMSEG cleanup.
MIPS: OCTEON: Update octeon-model.h code for new SoCs.
MIPS: OCTEON: Implement DCache errata workaround for all CN6XXX
MIPS: OCTEON: Add little-endian support to asm/octeon/octeon.h
MIPS: OCTEON: Implement the core-16057 workaround
MIPS: OCTEON: Delete unused COP2 saving code
MIPS: OCTEON: Use correct instruction to read 64-bit COP0 register
MIPS: OCTEON: Save and restore CP2 SHA3 state
MIPS: OCTEON: Fix FP context save.
MIPS: OCTEON: Save/Restore wider multiply registers in OCTEON III CPUs
MIPS: boot: Provide more uImage options
MIPS: Remove unneeded #ifdef __KERNEL__ from asm/processor.h
MIPS: ip22-gio: Remove legacy suspend/resume support
mips: pci: Add ifdef around pci_proc_domain
...
Pull ntp fix from Ingo Molnar:
"An adjtimex interface regression fix for 32-bit systems"
[ A check that was added in a previous commit is really only a concern
for 64bit systems, but was applied to both 32 and 64bit systems, which
results in breaking 32bit systems.
Thus the fix here is to make the check only apply to 64bit systems ]
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Fixup adjtimex freq validation on 32-bit systems
Pull locking fixes from Ingo Molnar:
"Two fixes: the paravirt spin_unlock() corruption/crash fix, and an
rtmutex NULL dereference crash fix"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/spinlocks/paravirt: Fix memory corruption on unlock
locking/rtmutex: Avoid a NULL pointer dereference on deadlock
Pull scheduler fixes from Ingo Molnar:
"Thiscontains misc fixes: preempt_schedule_common() and io_schedule()
recursion fixes, sched/dl fixes, a completion_done() revert, two
sched/rt fixes and a comment update patch"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Avoid obvious configuration fail
sched/autogroup: Fix failure to set cpu.rt_runtime_us
sched/dl: Do update_rq_clock() in yield_task_dl()
sched: Prevent recursion in io_schedule()
sched/completion: Serialize completion_done() with complete()
sched: Fix preempt_schedule_common() triggering tracing recursion
sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()
sched: Make dl_task_time() use task_rq_lock()
sched: Clarify ordering between task_rq_lock() and move_queued_task()
Pull rcu fix and x86 irq fix from Ingo Molnar:
- Fix a bug that caused an RCU warning splat.
- Two x86 irq related fixes: a hotplug crash fix and an ACPI IRQ
registry fix.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Clear need_qs flag to prevent splat
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()
x86/irq: Fix regression caused by commit b568b8601f
* KDB: improved searching
* No longer enter debug core on panic if panic timeout is set
KGDB/KDB regressions / cleanups
* fix pdf doc build errors
* prevent junk characters on kdb console from printk levels
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU5qxHAAoJEIciOldedpOjliYP/izuoNZ/EtjjeihOL44ic0o0
cmvdSc/ovR/mO4fbDpftMB0nhzclgRyAvr+VTPd3Bp5Poh+wJ0ZKu1R7f+ioSN73
Y4ek9PJqPSBQr+JdfPK80N56Choeni48bsC6up12i3BTfXobj81zlu4Sj0SMOoHq
IkFkB7soRuiFoc5IkKMvf3N3T9j1PnEULmHteNDRr0hTmGipEzkD3zocc/bRFV/l
JTZRzIMGBNGnF01DPLDcuvbu0wGBh6ADMBLtx5v1UrhV32ypfJq2bgxFvgM/AXn2
3VG4HcRbVsGmlBOahFR6X0DE/WAplw01yu1EabR2EWUePyz41cnSkxl4nR/NNhwz
qMbr3uzu1iWUTTz5ySRcWxSuRRCihVQqNk6p+y21N/jY/5cr2jI03qJm0zZ/ObqL
VUcPE7CfdcriCDXoepgXZE4XfX65Jf5tUiyiCj+1ds05ab5qHELIwKOZdjU2ON1b
pb2ElPngGSEEoU/eSDgP2RVJ9Mk/k5s2TxaPXVJNkeWGNxPU5HtCytZpVI5hckbP
/NZWTtyUDZ85is8cWUkHEdjnQ+CdzaA/FwJEqnB0or2is91mo8uBxP5BvdqPnPL0
QdPPnVgD72dumXfJpH2HY3DdUs24LaP0vgSO8ELKgfA67nprS+5xztNSd8ekNnhF
4wMhZbuAhB68E6bA0X7G
=TH0R
-----END PGP SIGNATURE-----
Merge tag 'for_linux-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull kgdb/kdb updates from Jason Wessel:
"KGDB/KDB New:
- KDB: improved searching
- No longer enter debug core on panic if panic timeout is set
KGDB/KDB regressions / cleanups
- fix pdf doc build errors
- prevent junk characters on kdb console from printk levels"
* tag 'for_linux-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kgdb, docs: Fix <para> pdfdocs build errors
debug: prevent entering debug mode on panic/exception.
kdb: Const qualifier for kdb_getstr's prompt argument
kdb: Provide forward search at more prompt
kdb: Fix a prompt management bug when using | grep
kdb: Remove stack dump when entering kgdb due to NMI
kdb: Avoid printing KERN_ levels to consoles
kdb: Fix off by one error in kdb_cpu()
kdb: fix incorrect counts in KDB summary command output
On non-developer devices, kgdb prevents the device from rebooting
after a panic.
Incase of panics and exceptions, to allow the device to reboot, prevent
entering debug mode to avoid getting stuck waiting for the user to
interact with debugger.
To avoid entering the debugger on panic/exception without any extra
configuration, panic_timeout is being used which can be set via
/proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the
default value.
Setting panic_timeout indicates that the user requested machine to
perform unattended reboot after panic. We dont want to get stuck waiting
for the user input incase of panic.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: Android Kernel Team <kernel-team@android.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Colin Cross <ccross@android.com>
[Kiran: Added context to commit message.
panic_timeout is used instead of break_on_panic and
break_on_exception to honor CONFIG_PANIC_TIMEOUT
Modified the commit as per community feedback]
Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
All current callers of kdb_getstr() can pass constant pointers via the
prompt argument. This patch adds a const qualification to make explicit
the fact that this is safe.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Currently kdb allows the output of comamnds to be filtered using the
| grep feature. This is useful but does not permit the output emitted
shortly after a string match to be examined without wading through the
entire unfiltered output of the command. Such a feature is particularly
useful to navigate function traces because these traces often have a
useful trigger string *before* the point of interest.
This patch reuses the existing filtering logic to introduce a simple
forward search to kdb that can be triggered from the more prompt.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Currently when the "| grep" feature is used to filter the output of a
command then the prompt is not displayed for the subsequent command.
Likewise any characters typed by the user are also not echoed to the
display. This rather disconcerting problem eventually corrects itself
when the user presses Enter and the kdb_grepping_flag is cleared as
kdb_parse() tries to make sense of whatever they typed.
This patch resolves the problem by moving the clearing of this flag
from the middle of command processing to the beginning.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Issuing a stack dump feels ergonomically wrong when entering due to NMI.
Entering due to NMI is normally a reaction to a user request, either the
NMI button on a server or a "magic knock" on a UART. Therefore the
backtrace behaviour on entry due to NMI should be like SysRq-g (no stack
dump) rather than like oops.
Note also that the stack dump does not offer any information that
cannot be trivial retrieved using the 'bt' command.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Currently when kdb traps printk messages then the raw log level prefix
(consisting of '\001' followed by a numeral) does not get stripped off
before the message is issued to the various I/O handlers supported by
kdb. This causes annoying visual noise as well as causing problems
grepping for ^. It is also a change of behaviour compared to normal usage
of printk() usage. For example <SysRq>-h ends up with different output to
that of kdb's "sr h".
This patch addresses the problem by stripping log levels from messages
before they are issued to the I/O handlers. printk() which can also
act as an i/o handler in some cases is special cased; if the caller
provided a log level then the prefix will be preserved when sent to
printk().
The addition of non-printable characters to the output of kdb commands is a
regression, albeit and extremely elderly one, introduced by commit
04d2c8c83d ("printk: convert the format for KERN_<LEVEL> to a 2 byte
pattern"). Note also that this patch does *not* restore the original
behaviour from v3.5. Instead it makes printk() from within a kdb command
display the message without any prefix (i.e. like printk() normally does).
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Joe Perches <joe@perches.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
There was a follow on replacement patch against the prior
"kgdb: Timeout if secondary CPUs ignore the roundup".
See: https://lkml.org/lkml/2015/1/7/442
This patch is the delta vs the patch that was committed upstream:
* Fix an off-by-one error in kdb_cpu().
* Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
really want a static limit.
* Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
(kgdb-next contains a patch which introduced pr_fmt() to this file
to the tag will now be applied automatically).
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.
A define of K(x) as
is defined in the code, but not used.
This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.
Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Pull kbuild updates from Michal Marek:
- several cleanups in kbuild
- serialize multiple *config targets so that 'make defconfig kvmconfig'
works
- The cc-ifversion macro got support for an else-branch
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild,gcov: simplify kernel/gcov/Makefile more
kbuild: allow cc-ifversion to have the argument for false condition
kbuild,gcov: simplify kernel/gcov/Makefile
kbuild,gcov: remove unnecessary workaround
kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
kbuild: fix cc-ifversion macro
kbuild: drop $(version_h) from MRPROPER_FILES
kbuild: use mixed-targets when two or more config targets are given
kbuild: remove redundant line from bounds.h/asm-offsets.h
kbuild: merge bounds.h and asm-offsets.h rules
kbuild: Drop support for clean-rule
Setting the root group's cpu.rt_runtime_us to 0 is a bad thing; it
would disallow the kernel creating RT tasks.
One can of course still set it to 1, which will (likely) still wreck
your kernel, but at least make it clear that setting it to 0 is not
good.
Collect both sanity checks into the one place while we're there.
Suggested-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150209112715.GO24151@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.
In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.
Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.
Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.
Reported-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Fixes: 8323f26ce3 ("sched: Fix race in task_group()")
Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is not possible for the clockevents core to know which modes (other than
those with a corresponding feature flag) are supported by a particular
implementation. And drivers are expected to handle transition to all modes
elegantly, as ->set_mode() would be issued for them unconditionally.
Now, adding support for a new mode complicates things a bit if we want to use
the legacy ->set_mode() callback. We need to closely review all clockevents
drivers to see if they would break on addition of a new mode. And after such
reviews, it is found that we have to do non-trivial changes to most of the
drivers [1].
Introduce mode-specific set_mode_*() callbacks, some of which the drivers may or
may not implement. A missing callback would clearly convey the message that the
corresponding mode isn't supported.
A driver may still choose to keep supporting the legacy ->set_mode() callback,
but ->set_mode() wouldn't be supporting any new modes beyond RESUME. If a driver
wants to benefit from using a new mode, it would be required to migrate to
the mode specific callbacks.
The legacy ->set_mode() callback and the newly introduced mode-specific
callbacks are mutually exclusive. Only one of them should be supported by the
driver.
Sanity check is done at the time of registration to distinguish between optional
and required callbacks and to make error recovery and handling simpler. If the
legacy ->set_mode() callback is provided, all mode specific ones would be
ignored by the core but a warning is thrown if they are present.
Call sites calling ->set_mode() directly are also updated to use
__clockevents_set_mode() instead, as ->set_mode() may not be available anymore
for few drivers.
[1] https://lkml.org/lkml/2014/12/9/605
[2] https://lkml.org/lkml/2015/1/23/255
Suggested-by: Thomas Gleixner <tglx@linutronix.de> [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Link: http://lkml.kernel.org/r/792d59a40423f0acffc9bb0bec9de1341a06fa02.1423788565.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367 (time: adjtimex: Validate the ADJ_FREQUENCY values)
Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.
ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.
See bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=92481https://bugzilla.redhat.com/show_bug.cgi?id=1188074
This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: George Joseph <george.joseph@fairview5.com>
Tested-by: George Joseph <george.joseph@fairview5.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.19+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
io_schedule() calls blk_flush_plug() which, depending on the
contents of current->plug, can initiate arbitrary blk-io requests.
Note that this contrasts with blk_schedule_flush_plug() which requires
all non-trivial work to be handed off to a separate thread.
This makes it possible for io_schedule() to recurse, and initiating
block requests could possibly call mempool_alloc() which, in times of
memory pressure, uses io_schedule().
Apart from any stack usage issues, io_schedule() will not behave
correctly when called recursively as delayacct_blkio_start() does
not allow for repeated calls.
So:
- use ->in_iowait to detect recursion. Set it earlier, and restore
it to the old value.
- move the call to "raw_rq" after the call to blk_flush_plug().
As this is some sort of per-cpu thing, we want some chance that
we are on the right CPU
- When io_schedule() is called recurively, use blk_schedule_flush_plug()
which cannot further recurse.
- as this makes io_schedule() a lot more complex and as io_schedule()
must match io_schedule_timeout(), but all the changes in io_schedule_timeout()
and make io_schedule a simple wrapper for that.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Moved the now rudimentary io_schedule() into sched.h. ]
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Battersby <tonyb@cybernetics.com>
Link: http://lkml.kernel.org/r/20150213162600.059fffb2@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit de30ec4730 "Remove unnecessary ->wait.lock serialization when
reading completion state" was not correct, without lock/unlock the code
like stop_machine_from_inactive_cpu()
while (!completion_done())
cpu_relax();
can return before complete() finishes its spin_unlock() which writes to
this memory. And spin_unlock_wait().
While at it, change try_wait_for_completion() to use READ_ONCE().
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Added a comment with the barrier. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Cc: raghavendra.kt@linux.vnet.ibm.com
Cc: waiman.long@hp.com
Fixes: de30ec4730 ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state")
Link: http://lkml.kernel.org/r/20150212195913.GA30430@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the function graph tracer needs to disable preemption, it might
call preempt_schedule() after reenabling it if something triggered the
need for rescheduling in between.
Therefore we can't trace preempt_schedule() itself because we would
face a function tracing recursion otherwise as the tracer is always
called before PREEMPT_ACTIVE gets set to prevent that recursion. This is
why preempt_schedule() is tagged as "notrace".
But the same issue applies to every function called by preempt_schedule()
before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is
one such example. Unfortunately we forgot to tag it as notrace as well
and as a result we are encountering tracing recursion since it got
introduced by:
a18b5d0181 ("sched: Fix missing preemption opportunity")
Let's fix that by applying the appropriate function tag to
preempt_schedule_common().
Reported-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1424110807-15057-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A deadline task may be throttled and dequeued at the same time.
This happens, when it becomes throttled in schedule(), which
is called to go to sleep:
current->state = TASK_INTERRUPTIBLE;
schedule()
deactivate_task()
dequeue_task_dl()
update_curr_dl()
start_dl_timer()
__dequeue_task_dl()
prev->on_rq = 0;
Later the timer fires, but the task is still dequeued:
dl_task_timer()
enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */
Someone wakes it up:
try_to_wake_up()
enqueue_dl_entity()
BUG_ON(on_dl_rq())
Patch fixes this problem, it prevents queueing !on_rq tasks
on dl_rq.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Wrote comment. ]
Cc: Juri Lelli <juri.lelli@arm.com>
Fixes: 1019a359d3 ("sched/deadline: Fix stale yield state")
Link: http://lkml.kernel.org/r/1374601424090314@web4j.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill reported that a dl task can be throttled and dequeued at the
same time. This happens, when it becomes throttled in schedule(),
which is called to go to sleep:
current->state = TASK_INTERRUPTIBLE;
schedule()
deactivate_task()
dequeue_task_dl()
update_curr_dl()
start_dl_timer()
__dequeue_task_dl()
prev->on_rq = 0;
This invalidates the assumption from commit 0f397f2c90 ("sched/dl:
Fix race in dl_task_timer()"):
"The only reason we don't strictly need ->pi_lock now is because
we're guaranteed to have p->state == TASK_RUNNING here and are
thus free of ttwu races".
And therefore we have to use the full task_rq_lock() here.
This further amends the fact that we forgot to update the rq lock loop
for TASK_ON_RQ_MIGRATE, from commit cca26e8009 ("sched: Teach
scheduler to understand TASK_ON_RQ_MIGRATING state").
Reported-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Link: http://lkml.kernel.org/r/20150217123139.GN5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There was a wee bit of confusion around the exact ordering here;
clarify things.
Reported-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20150217121258.GM5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With task_blocks_on_rt_mutex() returning early -EDEADLK we never
add the waiter to the waitqueue. Later, we try to remove it via
remove_waiter() and go boom in rt_mutex_top_waiter() because
rb_entry() gives a NULL pointer.
( Tested on v3.18-RT where rtmutex is used for regular mutex and I
tried to get one twice in a row. )
Not sure when this started but I guess 397335f004 ("rtmutex: Fix
deadlock detector for real") or commit 3d5c9340d1 ("rtmutex:
Handle deadlock detection smarter").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # for v3.16 and later kernels
Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull getname/putname updates from Al Viro:
"Rework of getname/getname_kernel/etc., mostly from Paul Moore. Gets
rid of quite a pile of kludges between namei and audit..."
* 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
audit: replace getname()/putname() hacks with reference counters
audit: fix filename matching in __audit_inode() and __audit_inode_child()
audit: enable filename recording via getname_kernel()
simpler calling conventions for filename_mountpoint()
fs: create proper filename objects using getname_kernel()
fs: rework getname_kernel to handle up to PATH_MAX sized filenames
cut down the number of do_path_lookup() callers
Pull misc VFS updates from Al Viro:
"This cycle a lot of stuff sits on topical branches, so I'll be sending
more or less one pull request per branch.
This is the first pile; more to follow in a few. In this one are
several misc commits from early in the cycle (before I went for
separate branches), plus the rework of mntput/dput ordering on umount,
switching to use of fs_pin instead of convoluted games in
namespace_unlock()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch the IO-triggering parts of umount to fs_pin
new fs_pin killing logics
allow attaching fs_pin to a group not associated with some superblock
get rid of the second argument of acct_kill()
take count and rcu_head out of fs_pin
dcache: let the dentry count go down to zero without taking d_lock
pull bumping refcount into ->kill()
kill pin_put()
mode_t whack-a-mole: chelsio
file->f_path.dentry is pinned down for as long as the file is open...
get rid of lustre_dump_dentry()
gut proc_register() a bit
kill d_validate()
ncpfs: get rid of d_validate() nonsense
selinuxfs: don't open-code d_genocide()
Merge yet more updates from Andrew Morton:
- a pile of minor fs fixes and cleanups
- kexec updates
- random misc fixes in various places: vmcore, rbtree, eventfd, ipc, seccomp.
- a series of python-based kgdb helper scripts
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO
samples/seccomp: improve label helper
ipc,sem: use current->state helpers
scripts/gdb: disable pagination while printing from breakpoint handler
scripts/gdb: define maintainer
scripts/gdb: convert CpuList to generator function
scripts/gdb: convert ModuleList to generator function
scripts/gdb: use a generator instead of iterator for task list
scripts/gdb: ignore byte-compiled python files
scripts/gdb: port to python3 / gdb7.7
scripts/gdb: add basic documentation
scripts/gdb: add lx-lsmod command
scripts/gdb: add class to iterate over CPU masks
scripts/gdb: add lx_current convenience function
scripts/gdb: add internal helper and convenience function for per-cpu lookup
scripts/gdb: add get_gdbserver_type helper
scripts/gdb: add internal helper and convenience function to retrieve thread_info
scripts/gdb: add is_target_arch helper
scripts/gdb: add helper and convenience function to look up tasks
scripts/gdb: add task iteration class
...
The value resulting from the SECCOMP_RET_DATA mask could exceed MAX_ERRNO
when setting errno during a SECCOMP_RET_ERRNO filter action. This makes
sure we have a reliable value being set, so that an invalid errno will not
be ignored by userspace.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This provides a reliable breakpoint target, required for automatic symbol
loading via the gdb helper command 'lx-symbols'.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the code around one of the conditionals in the kexec_load syscall
routine.
The original code was confusing with a redundant check on KEXEC_ON_CRASH
and comments outside of the conditional block. This change switches the
order of the conditional check, and cleans up the comments for the
conditional. There is no functional change to the code.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>