This fixes the clock source updates in update_wall_time() to correctly
track the time coming in via current_tick_length(). Optimize the fast
paths to be as short as possible to keep the overhead low.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As suggested by Roman Zippel, change clocksource functions to use
clocksource_xyz rather then xyz_clocksource to avoid polluting the
namespace.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the time sources for i386 (acpi_pm, cyclone, hpet, pit, and tsc).
With this patch, the conversion of the i386 arch to the generic timekeeping
code should be complete.
The patch should be fairly straight forward, only adding the new clocksources.
[hirofumi@mail.parknet.co.jp: acpi_pm cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduces clocksource switching code and the arch generic time accessor
functions that use the clocksource infrastructure.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of incrementing xtime by tick_nsec + ntp adjustments, use the
clocksource abstraction to increment and scale time. Using the clocksource
abstraction allows other clocksources to be used consistently in the face of
late or lost ticks, while preserving the existing behavior via the jiffies
clocksource.
This removes the need to keep time_phase adjustments as we just use the
current_tick_length() function as the NTP interface and accumulate time using
shifted nanoseconds.
The basics of this design was by Roman Zippel, however it is my own
interpretation and implementation, so the credit should go to him and the
blame to me.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the current_tick_length() function so it takes an argument which
specifies how much precision to return in shifted nanoseconds. This provides
a simple way to convert between NTPs internal nanoseconds shifted by
(SHIFT_SCALE - 10) to other shifted nanosecond units that are used by the
clocksource abstraction.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies. Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change. Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.
[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces the clocksource management infrastructure. A clocksource is a
driver-like architecture generic abstraction of a free-running counter. This
code defines the clocksource structure, and provides management code for
registering, selecting, accessing and scaling clocksources.
Additionally, this includes the trivial jiffies clocksource, a lowest common
denominator clocksource, provided mainly for use as an example.
[hirofumi@mail.parknet.co.jp: Don't enable IRQ too early]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert kernel/cpu.c from semaphore to mutex.
I've reviewed all lock_cpu_hotplug() critical sections, and they all seem to
fit mutex semantics.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems ppc64 wants to lock mutexes in early bootup code, with interrupts
disabled, and they expect interrupts to stay disabled, else they crash.
Work around this bug by making mutex debugging variants save/restore irq
flags.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commits
3e3318dee0 [PATCH] swsusp: x86_64 mark special saveable/unsaveable pages
b6370d96e0 [PATCH] swsusp: i386 mark special saveable/unsaveable pages
ce4ab0012b [PATCH] swsusp: add architecture special saveable pages support
because not only do they apparently cause page faults on x86, the
infrastructure doesn't compile on powerpc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not that x86-64 and other architecture support should be difficult to
add (trivial fixups to the data format and add the proper linker script
entry).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In current 2.6.17 implementation, signal_struct refered from task_struct is
used for per-process data structure. The pacct facility also uses it as a
per-process data structure to store stime, utime, minflt, majflt. But those
members are saved in __exit_signal(). It's too late.
For example, if some threads exits at same time, pacct facility has a
possibility to drop accountings for a part of those threads. (see, the
following 'The results of original 2.6.17 kernel') I think accounting
information should be completely collected into the per-process data structure
before writing out an accounting record.
This patch fixes this matter. Accumulation of stime, utime, minflt and majflt
are done before generating accounting record.
[mingo@elte.hu: fix acct_collect() siglock bug found by lockdep]
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When pacct facility generate an 'ac_flag' field in accounting record, it
refers a task_struct of the thread which died last in the process. But any
other task_structs are ignored.
Therefore, pacct facility drops ASU flag even if root-privilege operations are
used by any other threads except the last one. In addition, AFORK flag is
always set when the thread of group-leader didn't die last, although this
process has called execve() after fork().
We have a same matter in ac_exitcode. The recorded ac_exitcode is an exit
code of the last thread in the process. There is a possibility this exitcode
is not the group leader's one.
The pacct facility need an i/o operation when an accounting record is
generated. There is a possibility to wake OOM killer up. If OOM killer is
activated, it kills some processes to make them release process memory
regions.
But acct_process() is called in the killed processes context before calling
exit_mm(), so those processes cannot release own memory. In the results, any
processes stop in this point and it finally cause a system stall.
Move kthread API kernel-doc from kthread.h to kthread.c & fix it.
Add kthread API to kernel-api DocBook.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix kernel-doc formatting in ktime.h and hrtimer.[ch] files.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a cpu hotplug callback fails on CPU_UP_PREPARE, all callbacks will be
called with CPU_UP_CANCELED. A few of these callbacks assume that on
CPU_UP_PREPARE a pointer to task has been stored in a percpu array. This
assumption is not true if CPU_UP_PREPARE fails and the following calls to
kthread_bind() in CPU_UP_CANCELED will cause an addressing exception
because of passing a NULL pointer.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Update stop_machine.c to spawn stop_machine as kthreads rather than the
deprecated kernel_threads.
- Update stop_machine to use the more efficient kthread_bind() before
running task in place of set_cpus_allowed() after.
[akpm@osdl.org: remove now-wrong set_cpus_allowed()]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If futexes are disabled we fail to link on ppc64.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm testing glibc on MIPS64, little-endian, N32, O32 and N64 multilibs.
Among the NPTL test failures seen are some arising from sigsuspend problems
for N32: it blocks the wrong signals, so SIGCANCEL (SIGRTMIN) is blocked
despite glibc's carefully excluding it from sets of signals to block.
Specifically, testing suggests it blocks signal N^32 instead of signal N,
so (in the example tested) blocking SIGUSR1 (17) blocks signal 49 instead.
glibc's sigset_t uses an array of unsigned long, as does the kernel.
In both cases, signal N+1 is represented as
(1UL << (N % (8 * sizeof (unsigned long)))) in word number
(N / (8 * sizeof (unsigned long))).
Thus the N32 glibc uses an array of 32-bit words and the N64 kernel uses an
array of 64-bit words. For little-endian, the layout is the same, with
signals 1-32 in the first 4 bytes, signals 33-64 in the second, etc.; for
big-endian, userspace has that layout while in the kernel each 8 bytes have
the two halves swapped from the userspace layout.
The N32 sigsuspend syscall uses sigset_from_compat to convert the userspace
sigset to kernel format. If __COMPAT_ENDIAN_SWAP__ is *not* set, this uses
logic of the form
set->sig[0] = compat->sig[0] | (((long)compat->sig[1]) << 32 )
to convert the userspace sigset to a kernel one. This looks correct to me
for both big and little endian, given that in userspace compat->sig[1] will
represent signals 33-64, and so will the high 32 bits of set->sig[0] in the
kernel. If however __COMPAT_ENDIAN_SWAP__ *is* set, as it is for
__MIPSEL__, it uses
set->sig[0] = compat->sig[1] | (((long)compat->sig[0]) << 32 );
which seems incorrect for both big and little endian, and would
explain the observed symptoms.
This code is the only use of __COMPAT_ENDIAN_SWAP__, so if incorrect
then that macro serves no purpose, in which case something like the
following patch would seem appropriate to remove it.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The table is empty, why does it still exist?
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, enabling/disabling printk timestamps is only possible through
reboot (bootparam) or recompile. I normally do not run with timestamps
(since syslog handles that in a good manner), but for measuring small
kernel delays (e.g. irq probing - see parport thread) I needed subsecond
precision, but then again, just for some minutes rather than all kernel
messages to come. The following patch adds a module_param() with which the
timestamps can be en-/disabled in a live system through
/sys/modules/printk/parameters/printk_time.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
copy_process() appears to be the only caller of acct_clear_integrals() and
does not pass in NULL task pointers. Remove the unecessary check.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
schedule_on_each_cpu() presently does a large kmalloc - 96 kbytes on 1024 CPU
64-bit.
Rework it so that we do one 8192-byte allocation and then a pile of tiny ones,
via alloc_percpu(). This has a much higher chance of success (100% in the
current VM).
This also has the effect of reducing the memory requirements from NR_CPUS*n to
num_possible_cpus()*n.
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- proper prototypes for the following functions:
- ctrl_alt_del() (in include/linux/reboot.h)
- getrusage() (in include/linux/resource.h)
- make the following needlessly global functions static:
- kernel_restart_prepare()
- kernel_kexec()
[akpm@osdl.org: compile fix]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently printk is no use for early debugging because it refuses to
actually print anything to the console unless
cpu_online(smp_processor_id()) is true.
The stated explanation is that console drivers may require per-cpu
resources, or otherwise barf, because the system is not yet setup
correctly. Fair enough.
However some console drivers might be quite happy running early during
boot, in fact we have one, and so it'd be nice if printk understood that.
So I added a flag (which I would have called CON_BOOT, but that's taken)
called CON_ANYTIME, which indicates that a console is happy to be called
anytime, even if the cpu is not yet online.
Tested on a Power 5 machine, with both a CON_ANYTIME driver and a bogus
console driver that BUG()s if called while offline. No problems AFAICT.
Built for i386 UP & SMP.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since raw_notifier chains don't benefit from any centralized locking
protections, they shouldn't suffer from the associated limitations. Under
some circumstances it might make sense for a raw_notifier callout routine
to unregister itself from the notifier chain. This patch (as678) changes
the notifier core to allow for such things.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.
This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If CONFIG_KALLSYMS is defined and if it should happen that is_exported() is
given a NULL 'mod' and lookup_symbol(name, __start___ksymtab,
__stop___ksymtab) returns 0, then we'll end up dereferencing a NULL
pointer.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Considering that there isn't a lot of hw we can depend on during resume,
this is about as good as it gets.
This is x86-only for now, although the basic concept (and most of the
code) will certainly work on almost any platform.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correct the return type of handle_IRQ_event() (inconsistency noticed during
Xen development), and remove redundant declarations. The return type
adjustment required breaking out the definition of irqreturn_t into a
separate header, in order to satisfy current include order dependencies.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove synchronize_kernel() (deprecated 2-APR-2005 in
http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to
non-GPL Linux kernel modules (as was announced more than one year ago in
http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kernel/sys.c doesn't have anything in it relying on linux/init.h -
remove the include.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you get to that point in the code it means that desc->move_irq is set,
pending_irq_cpumask[irq] and cpu_online_map should have a value. Still
pretty good chance anding those two you'll still have a value. So these
two branch predictors should be inverted.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add the __might_sleep() check back to cond_resched().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set errorp in dup_fd, it will be used in sys_unshare also.
Signed-off-by: Prasanna Meda <mlp@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
exit_aio() and exit_mmap() can sleep. But it's easy to accidentally call
mmput() from inside locks.
Cc: Dave Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create two files in /sys/kernel, kexec_loaded and kexec_crash_loaded. Each
file contains a simple boolean value indicating whether the relevant kernel
has been loaded into memory. The motivation for this is geared around
support.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp allocate only as much memory as needed to store the image data
and metadata during resume.
Without this patch swsusp additionally allocates many page frames that will
conflict with the "original" locations of the image data and are considered
as "unsafe", treating them as "eaten" pages (ie. allocated but unusable).
The patch makes swsusp allocate as many pages as it'll need to store the
data read from the image in one shot, creating a list of allocated "safe"
pages, and use the observation that all pages allocated by it are marked
with the PG_nosave and PG_nosave_free flags set. Namely, when it's about
to load an image page, swsusp can check whether the page frame
corresponding to the "original" location of this page has been allocated
(ie. if the page frame has the PG_nosave and PG_nosave_free flags set) and
if so, it can load the page directly into this page frame. Otherwise it
uses an allocated "safe" page from the list to store the data that will be
copied to their "original" location later on.
This allows us to save many page copyings and page allocations during
resume and in the future it may allow us to load images greater than 50% of
the normal zone.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: "Pavel Machek" <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- make needlessly global functions static
- make dummy functions static inline
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
swsusp allocates memory from the normal zone, so it cannot use lowmem
reserve pages from the lower zones. Therefore it should not count these
pages as available to it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. Add architecture specific pages save/restore support. Next two patches
will use this to save/restore 'ACPI NVS' pages.
2. Allow reserved pages 'nosave'. This could avoid save/restore BIOS
reserved pages.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>