Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Never terribly functional or popular, plagued by hard to fix bugs the time
to say goodbye has more than arrived.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Match the R4000 semantics for the initial state of interrupt/kernel
status register flags for the R3000 in kernel_thread().
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Convert old/obsolete NORET_TYPE and ATTRIB_NORET macros to use the
newer standard of "__noreturn" as defined in compiler-gcc.h.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This new routine doesn't lookup for symbol names. So we needn't
to pass any char buffers or pointer since we don't care about
names.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Make sure cpu_has_fpu (which uses smp_processor_id()) is used only in
atomic context.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If the PC was ret_from_irq or ret_from_exception, there will be no
more normal stackframe. Instead of stopping the unwinding, use PC and
RA saved by an exception handler to continue unwinding into the
interrupted context. This also simplifies the CONFIG_STACKTRACE code.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Implement stacktrace interface by using unwind_stack() and enable lockdep
support in Kconfig.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This array was used to 'cache' some frame info about scheduler
functions to speed up get_wchan(). This array was 1Ko size and
was only used when CONFIG_KALLSYMS was set but declared for all
configs.
Rather than make the array statement conditional, this patches
removes this array and its uses. Indeed the common case doesn't
seem to use this array and get_wchan() is not a critical path
anyways.
It results in a smaller bss and a smaller/cleaner code:
text data bss dec hex filename
2543808 254148 139296 2937252 2cd1a4 vmlinux-new-get-wchan
2544080 254148 143392 2941620 2ce2b4 vmlinux~old
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch adds 2 sanity checks.
The first one test that the start address of the function to analyze has been
set by the caller. If not return an error since nothing usefull can be done
without.
The second one checks that the function's size has been set. A null size can
happen if CONFIG_KALLSYMS is not set and it means that we don't know the size
of the function to analyze. In this case, we make it equal to 128 instructions
by default.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch allows unwind_stack() to return ra for leaf function.
But it tries to detects cases where get_frame_info() wrongly
consider nested function as a leaf one.
It also pass 'unsinged long *sp' instead of 'unsigned long **sp'
as second parameter. The code looks cleaner.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Now get_frame_info() wants to detect move sp instruction first. It
assumes that the save ra in the stack instruction can't happen
before allocating frame size space into the stack.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Instead of dump all possible address in the stack, unwind the stack frame
based on prologue code analysis, as like as get_wchan() does. While the
code analysis might fail for some reason, there is a new kernel option
"raw_show_trace" to disable this feature.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The only user of get_wchan is the proc fs - and proc can't be built modular.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move function prototypes to asm/signal.h to detect trivial errors and
add some __user tags to get rid of sparse warnings. Generated code
should not be changed.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Implement get_wchan() and frame_info_init() using kallsyms_lookup().
This fixes problem with static sched/lock functions and mfinfo[]
maintenance issue. If CONFIG_KALLSYMS was disabled, get_wchan() just
returns thread_saved_pc() value.
Also unwind stackframe based on "addiu sp,-imm" analysis instead of
frame pointer. This fixes problem with functions compiled without
-fomit-frame-pointer.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
dump_regs() is used by a bunch of drivers for their internal stuff;
renamed mips instance (one that is seen in system-wide headers)
to elf_dump_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!