Commit Graph

1459 Commits

Author SHA1 Message Date
Linus Torvalds
179475a3b4 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, sparseirq: clean up Kconfig entry
  x86: turn CONFIG_SPARSE_IRQ off by default
  sparseirq: fix numa_migrate_irq_desc dependency and comments
  sparseirq: add kernel-doc notation for new member in irq_desc, -v2
  locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
  sparseirq, xen: make sure irq_desc is allocated for interrupts
  sparseirq: fix !SMP building, #2
  x86, sparseirq: move irq_desc according to smp_affinity, v7
  proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
  sparse irqs: add irqnr.h to the user headers list
  sparse irqs: handle !GENIRQ platforms
  sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
  sparseirq: fix Alpha build failure
  sparseirq: fix typo in !CONFIG_IO_APIC case
  x86, MSI: pass irq_cfg and irq_desc
  x86: MSI start irq numbering from nr_irqs_gsi
  x86: use NR_IRQS_LEGACY
  sparse irq_desc[] array: core kernel and x86 changes
  genirq: record IRQ_LEVEL in irq_desc[]
  irq.h: remove padding from irq_desc on 64bits
2008-12-30 16:20:19 -08:00
Linus Torvalds
5f34fe1cfc Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.
2008-12-30 16:10:19 -08:00
Frederic Weisbecker
36994e58a4 tracing/kmemtrace: normalize the raw tracer event to the unified tracing API
Impact: new tracer plugin

This patch adapts kmemtrace raw events tracing to the unified tracing API.

To enable and use this tracer, just do the following:

 echo kmemtrace > /debugfs/tracing/current_tracer
 cat /debugfs/tracing/trace

You will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584

That was to stay backward compatible with the format output produced in
inux/tracepoint.h.

This is the default ouput, but note that I tried something else.

If you change an option:

echo kmem_minimalistic > /debugfs/trace_options

and then cat /debugfs/trace, you will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

   -      C                            0xffff88007c088780          file_free_rcu
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
   +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
   +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
   +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
   +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
   +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp

Yeah I shall confess kmem_minimalistic should be: kmem_alternative.

Whatever, I find it more readable but this a personal opinion of course.
We can drop it if you want.

On the ALLOC/FREE column, + means an allocation and - a free.

On the type column, you have K = kmalloc, C = cache, P = page

I would like the flags to be GFP_* strings but that would not be easy to not
break the column with strings....

About the node...it seems to always be -1. I don't know why but that shouldn't
be difficult to find.

I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
be more easy to find the tracer headers if they are all in their common
directory.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-30 09:36:13 +01:00
Rusty Russell
33edcf133b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-30 08:02:35 +10:30
Ingo Molnar
2ff9f9d962 Merge branch 'topic/kmemtrace' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 into tracing/kmemtrace 2008-12-29 15:16:24 +01:00
Eduard - Gabriel Munteanu
b9ce08c010 kmemtrace: Core implementation.
kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-12-29 15:34:01 +02:00
Yinghai Lu
43a256322a sparseirq: move __weak symbols into separate compilation unit
GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.

So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:15:49 +01:00
Ingo Molnar
e1df957670 Merge branch 'linus' into perfcounters/core
Conflicts:
	fs/exec.c
	include/linux/init_task.h

Simple context conflicts.
2008-12-29 09:45:15 +01:00
Linus Torvalds
96faec945f Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
  allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
  kbuild: strip generated symbols from *.ko
  kbuild: simplify use of genksyms
  kernel-doc: check for extra kernel-doc notations
  kbuild: add headerdep used to detect inclusion cycles in header files
  kbuild: fix string equality testing in tags.sh
  kbuild: fix make tags/cscope
  kbuild: fix make incompatibility
  kbuild: remove TAR_IGNORE
  setlocalversion: add git-svn support
  setlocalversion: print correct subversion revision
  scripts: improve the decodecode script
  scripts/package: allow custom options to rpm
  genksyms: allow to ignore symbol checksum changes
  genksyms: track symbol checksum changes
  tags and cscope support really belongs in a shell script
  kconfig: fix options to check-lxdialog.sh
  kbuild: gen_init_cpio expands shell variables in file names
  remove bashisms from scripts/extract-ikconfig
  kbuild: teach mkmakfile to be silent
  ...
2008-12-28 15:13:48 -08:00
Linus Torvalds
b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
Yinghai Lu
13a0c3c269 sparseirq: work around compiler optimizing away __weak functions
Impact: fix panic on null pointer with sparseirq

Some GCC versions seem to inline the weak global function,
when that function is empty.

Work it around, by making the functions return a (dummy) integer.

Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27 13:24:00 +01:00
Ingo Molnar
12d79bafb7 rcu: provide RCU options on non-preempt architectures too
Impact: build fix

Some old architectures still do not use kernel/Kconfig.preempt, so the
moving of the RCU options there broke their build:

 In file included from /home/mingo/tip/include/linux/sem.h:81,
                 from /home/mingo/tip/include/linux/sched.h:69,
                 from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9:
 /home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration"

Move these options back to init/Kconfig, which every architecture
includes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-25 09:31:28 +01:00
Jan Beulich
9bb482476c allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
Building upon parts of the module stripping patch, this patch
introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
kernels I tested with.

The patch also does away with the need to special case the kallsyms-
internal symbols by making them available even in the first linking
stage.

While it is a generated file, the patch includes the changes to
scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
here is.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-12-19 22:47:10 +01:00
Paul E. McKenney
64db4cfff9 "Tree RCU": scalable classic RCU implementation
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs.  Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion.  This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o	Fixes from remainder of line-by-line code walkthrough,
	including comment spelling, initialization, undesirable
	narrowing due to type conversion, removing redundant memory
	barriers, removing redundant local-variable initialization,
	and removing redundant local variables.

	I do not believe that any of these fixes address the CPU-hotplug
	issues that Andi Kleen was seeing, but please do give it a whirl
	in case the machine is smarter than I am.

	A writeup from the walkthrough may be found at the following
	URL, in case you are suffering from terminal insomnia or
	masochism:

	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o	Made rcutree tracing use seq_file, as suggested some time
	ago by Lai Jiangshan.

o	Added a .csv variant of the rcudata debugfs trace file, to allow
	people having thousands of CPUs to drop the data into
	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
	documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o	Fix a theoretical race between grace-period initialization and
	force_quiescent_state() that could occur if more than three
	jiffies were required to carry out the grace-period
	initialization.  Which it might, if you had enough CPUs.

o	Apply Ingo's printk-standardization patch.

o	Substitute local variables for repeated accesses to global
	variables.

o	Fix comment misspellings and redundant (but harmless) increments
	of ->n_rcu_pending (this latter after having explicitly added it).

o	Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o	Fixed a number of problems noted by Gautham Shenoy, including
	the cpu-stall-detection bug that he was having difficulty
	convincing me was real.  ;-)

o	Changed cpu-stall detection to wait for ten seconds rather than
	three in order to reduce false positive, as suggested by Ingo
	Molnar.

o	Produced a design document (http://lwn.net/Articles/305782/).
	The act of writing this document uncovered a number of both
	theoretical and "here and now" bugs as noted below.

o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
	condition, fix kerneldoc comments, and add memory barriers
	in dynticks interface functions.

o	Add more data to tracing.

o	Remove unused "rcu_barrier" field from rcu_data structure.

o	Count calls to rcu_pending() from scheduling-clock interrupt
	to use as a surrogate timebase should jiffies stop counting.

o	Fix a theoretical race between force_quiescent_state() and
	grace-period initialization.  Yes, initialization does have to
	go on for some jiffies for this race to occur, but given enough
	CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o	Fix a number of checkpatch.pl complaints.

o	Apply review comments from Ingo Molnar and Lai Jiangshan
	on the stall-detection code.

o	Fix several bugs in !CONFIG_SMP builds.

o	Fix a misspelled config-parameter name so that RCU now announces
	at boot time if stall detection is configured.

o	Run tests on numerous combinations of configurations parameters,
	which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
	changeset some time ago, and finally got around to retesting
	this option).

o	Fix some tracing bugs in rcupreempt that caused incorrect
	totals to be printed.

o	I now test with a more brutal random-selection online/offline
	script (attached).  Probably more brutal than it needs to be
	on the people reading it as well, but so it goes.

o	A number of optimizations and usability improvements:

	o	Make rcu_pending() ignore the grace-period timeout when
		there is no grace period in progress.

	o	Make force_quiescent_state() avoid going for a global
		lock in the case where there is no grace period in
		progress.

	o	Rearrange struct fields to improve struct layout.

	o	Make call_rcu() initiate a grace period if RCU was
		idle, rather than waiting for the next scheduling
		clock interrupt.

	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
		idle, as suggested by Andi Kleen.  I still don't
		completely trust this change, and might back it out.

	o	Make CONFIG_RCU_TRACE be the single config variable
		manipulated for all forms of RCU, instead of the prior
		confusion.

	o	Document tracing files and formats for both rcupreempt
		and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o	Separated dynticks interface so that NMIs and irqs call separate
	functions, greatly simplifying it.  In particular, this code
	no longer requires a proof of correctness.  ;-)

o	Separated dynticks state out into its own per-CPU structure,
	avoiding the duplicated accounting.

o	The case where a dynticks-idle CPU runs an irq handler that
	invokes call_rcu() is now correctly handled, forcing that CPU
	out of dynticks-idle mode.

o	Review comments have been applied (thank you all!!!).
	For but one example, fixed the dynticks-ordering issue that
	Manfred pointed out, saving me much debugging.  ;-)

o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel.  It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space.  That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy.  By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware.  Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future.  (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors.  This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o	More bugs will probably surface as a result of an ongoing
	line-by-line code inspection.

	Patches will be provided as required.

o	There are probably hangs, rcutorture failures, &c.  Seems
	quite stable on a 128-CPU machine, but that is kind of small
	compared to 4096 CPUs.  However, seems to do better than
	mainline.

	Patches will be provided as required.

o	The memory footprint of this version is several KB larger
	than rcuclassic.

	A separate UP-only rcutiny patch will be provided, which will
	reduce the memory footprint significantly, even compared
	to the old rcuclassic.  One such patch passes light testing,
	and has a memory footprint smaller even than rcuclassic.
	Initial reaction from various embedded guys was "it is not
	worth it", so am putting it aside.

Credits:

o	Manfred Spraul for ideas, review comments, and bugs spotted,
	as well as some good friendly competition.  ;-)

o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
	for reviews and comments.

o	Thomas Gleixner for much-needed help with some timer issues
	(see patches below).

o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
	alive despite my heavy abuse^Wtesting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 21:56:04 +01:00
Rusty Russell
968ea6d80e Merge ../linux-2.6-x86
Conflicts:

	arch/x86/kernel/io_apic.c
	kernel/sched.c
	kernel/sched_stats.h
2008-12-13 21:55:51 +10:30
Rusty Russell
98a79d6a50 cpumask: centralize cpu_online_map and cpu_possible_map
Impact: cleanup

Each SMP arch defines these themselves.  Move them to a central
location.

Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
   CONFIG_INIT_ALL_POSSIBLE for this rather than break them.

2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
   Those archs simply have phys_cpu_present_map replaced everywhere.

3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
   so I just manipulate them both in sync.

4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
   declarations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
2008-12-13 21:19:41 +10:30
Ingo Molnar
8299608f14 Merge branches 'irq/sparseirq', 'x86/quirks' and 'x86/reboot' into cpus4096
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
2008-12-12 13:49:24 +01:00
Ingo Molnar
4c59e4676d perfcounters: select ANON_INODES
The perfcounters subsystem depends on CONFIG_ANON_INODES facilities,
so make sure it's selected.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 19:38:33 +01:00
Thomas Gleixner
0793a61d4d performance counters: core code
Implement the core kernel bits of Performance Counters subsystem.

The Linux Performance Counter subsystem provides an abstraction of
performance counter hardware capabilities. It provides per task and per
CPU counters, and it provides event capabilities on top of those.

Performance counters are accessed via special file descriptors.
There's one file descriptor per virtual counter used.

The special file descriptor is opened via the perf_counter_open()
system call:

 int
 perf_counter_open(u32 hw_event_type,
                   u32 hw_event_period,
                   u32 record_type,
                   pid_t pid,
                   int cpu);

The syscall returns the new fd. The fd can be used via the normal
VFS system calls: read() can be used to read the counter, fcntl()
can be used to set the blocking mode, etc.

Multiple counters can be kept open at a time, and the counters
can be poll()ed.

See more details in Documentation/perf-counters.txt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:47:03 +01:00
Yinghai Lu
0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Will Newton
1d926f2756 init/main.c: use ktime accessor function in initcall_debug code
Impact: fix initcall debug output on non-scalar ktime platforms (32-bit embedded)

The initcall_debug code access the tv64 member of ktime.  This won't work
correctly for large deltas on platforms that don't use the scalar ktime
implementation.

Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:10:15 +01:00
Ingo Molnar
9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Linus Torvalds
8c60bfb066 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpuset: fix regression when failed to generate sched domains
  sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
  sched: fix kernel warning on /proc/sched_debug access
  sched: correct sched-rt-group.txt pathname in init/Kconfig
2008-11-18 08:06:21 -08:00
Mathieu Desnoyers
c1df1bd2c4 markers: auto enable tracepoints (new API : trace_mark_tp())
Impact: new API

Add a new API trace_mark_tp(), which declares a marker within a
tracepoint probe. When the marker is activated, the tracepoint is
automatically enabled.

No branch test is used at the marker site, because it would be a
duplicate of the branch already present in the tracepoint.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 09:01:29 +01:00
James Morris
2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells
d84f4f992c CRED: Inaugurate COW credentials
Inaugurate copy-on-write credentials management.  This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

	struct cred *new = prepare_creds();
	int ret = blah(new);
	if (ret < 0) {
		abort_creds(new);
		return ret;
	}
	return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const.  The purpose of this is compile-time
discouragement of altering credentials through those pointers.  Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

  (1) Its reference count may incremented and decremented.

  (2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     This now prepares and commits credentials in various places in the
     security code rather than altering the current creds directly.

 (2) Temporary credential overrides.

     do_coredump() and sys_faccessat() now prepare their own credentials and
     temporarily override the ones currently on the acting thread, whilst
     preventing interference from other threads by holding cred_replace_mutex
     on the thread being dumped.

     This will be replaced in a future patch by something that hands down the
     credentials directly to the functions being called, rather than altering
     the task's objective credentials.

 (3) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_capset_check(), ->capset_check()
     (*) security_capset_set(), ->capset_set()

     	 Removed in favour of security_capset().

     (*) security_capset(), ->capset()

     	 New.  This is passed a pointer to the new creds, a pointer to the old
     	 creds and the proposed capability sets.  It should fill in the new
     	 creds or return an error.  All pointers, barring the pointer to the
     	 new creds, are now const.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()

     	 Changed; now returns a value, which will cause the process to be
     	 killed if it's an error.

     (*) security_task_alloc(), ->task_alloc_security()

     	 Removed in favour of security_prepare_creds().

     (*) security_cred_free(), ->cred_free()

     	 New.  Free security data attached to cred->security.

     (*) security_prepare_creds(), ->cred_prepare()

     	 New. Duplicate any security data attached to cred->security.

     (*) security_commit_creds(), ->cred_commit()

     	 New. Apply any security effects for the upcoming installation of new
     	 security by commit_creds().

     (*) security_task_post_setuid(), ->task_post_setuid()

     	 Removed in favour of security_task_fix_setuid().

     (*) security_task_fix_setuid(), ->task_fix_setuid()

     	 Fix up the proposed new credentials for setuid().  This is used by
     	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
     	 setuid() changes.  Changes are made to the new credentials, rather
     	 than the task itself as in security_task_post_setuid().

     (*) security_task_reparent_to_init(), ->task_reparent_to_init()

     	 Removed.  Instead the task being reparented to init is referred
     	 directly to init's credentials.

	 NOTE!  This results in the loss of some state: SELinux's osid no
	 longer records the sid of the thread that forked it.

     (*) security_key_alloc(), ->key_alloc()
     (*) security_key_permission(), ->key_permission()

     	 Changed.  These now take cred pointers rather than task pointers to
     	 refer to the security context.

 (4) sys_capset().

     This has been simplified and uses less locking.  The LSM functions it
     calls have been merged.

 (5) reparent_to_kthreadd().

     This gives the current thread the same credentials as init by simply using
     commit_thread() to point that way.

 (6) __sigqueue_alloc() and switch_uid()

     __sigqueue_alloc() can't stop the target task from changing its creds
     beneath it, so this function gets a reference to the currently applicable
     user_struct which it then passes into the sigqueue struct it returns if
     successful.

     switch_uid() is now called from commit_creds(), and possibly should be
     folded into that.  commit_creds() should take care of protecting
     __sigqueue_alloc().

 (7) [sg]et[ug]id() and co and [sg]et_current_groups.

     The set functions now all use prepare_creds(), commit_creds() and
     abort_creds() to build and check a new set of credentials before applying
     it.

     security_task_set[ug]id() is called inside the prepared section.  This
     guarantees that nothing else will affect the creds until we've finished.

     The calling of set_dumpable() has been moved into commit_creds().

     Much of the functionality of set_user() has been moved into
     commit_creds().

     The get functions all simply access the data directly.

 (8) security_task_prctl() and cap_task_prctl().

     security_task_prctl() has been modified to return -ENOSYS if it doesn't
     want to handle a function, or otherwise return the return value directly
     rather than through an argument.

     Additionally, cap_task_prctl() now prepares a new set of credentials, even
     if it doesn't end up using it.

 (9) Keyrings.

     A number of changes have been made to the keyrings code:

     (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
     	 all been dropped and built in to the credentials functions directly.
     	 They may want separating out again later.

     (b) key_alloc() and search_process_keyrings() now take a cred pointer
     	 rather than a task pointer to specify the security context.

     (c) copy_creds() gives a new thread within the same thread group a new
     	 thread keyring if its parent had one, otherwise it discards the thread
     	 keyring.

     (d) The authorisation key now points directly to the credentials to extend
     	 the search into rather pointing to the task that carries them.

     (e) Installing thread, process or session keyrings causes a new set of
     	 credentials to be created, even though it's not strictly necessary for
     	 process or session keyrings (they're shared).

(10) Usermode helper.

     The usermode helper code now carries a cred struct pointer in its
     subprocess_info struct instead of a new session keyring pointer.  This set
     of credentials is derived from init_cred and installed on the new process
     after it has been cloned.

     call_usermodehelper_setup() allocates the new credentials and
     call_usermodehelper_freeinfo() discards them if they haven't been used.  A
     special cred function (prepare_usermodeinfo_creds()) is provided
     specifically for call_usermodehelper_setup() to call.

     call_usermodehelper_setkeys() adjusts the credentials to sport the
     supplied keyring as the new session keyring.

(11) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) selinux_setprocattr() no longer does its check for whether the
     	 current ptracer can access processes with the new SID inside the lock
     	 that covers getting the ptracer's SID.  Whilst this lock ensures that
     	 the check is done with the ptracer pinned, the result is only valid
     	 until the lock is released, so there's no point doing it inside the
     	 lock.

(12) is_single_threaded().

     This function has been extracted from selinux_setprocattr() and put into
     a file of its own in the lib/ directory as join_session_keyring() now
     wants to use it too.

     The code in SELinux just checked to see whether a task shared mm_structs
     with other tasks (CLONE_VM), but that isn't good enough.  We really want
     to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

     The NFS server daemon now has to use the COW credentials to set the
     credentials it is going to use.  It really needs to pass the credentials
     down to the functions it calls, but it can't do that until other patches
     in this series have been applied.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:23 +11:00
Simon Arlott
02f5621042 Kconfig: SLUB is the default slab allocator
In 2007, a0acd82080 changed the default
slab allocator to SLUB, but the SLAB help text still says SLAB is the
default. This change fixes that.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-11-13 20:49:01 +02:00
Adrian Knoth
2fe401e386 sched: correct sched-rt-group.txt pathname in init/Kconfig
init/Kconfig directs the user to Documentation/sched-rt-group.txt, but
the file is actually in Documentation/scheduler/sched-rt-group.txt.

This patch corrects the pathname mentioned in init/Kconfig.

Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:32:29 +01:00
Frederic Weisbecker
7423907283 tracing/fastboot: Use the ring-buffer timestamp for initcall entries
Impact: Split the boot tracer entries in two parts: call and return

Now that we are using the sched tracer from the boot tracer, we want
to use the same timestamp than the ring-buffer to have consistent time
captures between sched events and initcall events.

So we get rid of the old time capture by the boot tracer and split the
initcall events in two parts: call and return. This way we have the
ring buffer timestamp of both.

An example trace:

[   27.904149584] calling  net_ns_init+0x0/0x1c0 @ 1
[   27.904429624] initcall net_ns_init+0x0/0x1c0 returned 0 after 0 msecs
[   27.904575926] calling  reboot_init+0x0/0x20 @ 1
[   27.904655399] initcall reboot_init+0x0/0x20 returned 0 after 0 msecs
[   27.904800228] calling  sysctl_init+0x0/0x30 @ 1
[   27.905142914] initcall sysctl_init+0x0/0x30 returned 0 after 0 msecs
[   27.905287211] calling  ksysfs_init+0x0/0xb0 @ 1
 ##### CPU 0 buffer started ####
            init-1     [000]    27.905395:      1:120:R   + [001]    11:115:S
 ##### CPU 1 buffer started ####
          <idle>-0     [001]    27.905425:      0:140:R ==> [001]    11:115:R
            init-1     [000]    27.905426:      1:120:D ==> [000]     0:140:R
          <idle>-0     [000]    27.905431:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905451:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905456:      4:115:S ==> [000]     0:140:R
           udevd-11    [001]    27.905458:     11:115:R   + [001]    14:115:R
          <idle>-0     [000]    27.905459:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905462:      0:140:R ==> [000]     4:115:R
           udevd-11    [001]    27.905462:     11:115:R ==> [001]    14:115:R
     ksoftirqd/0-4     [000]    27.905467:      4:115:S ==> [000]     0:140:R
          <idle>-0     [000]    27.905470:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905473:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905476:      4:115:S ==> [000]     0:140:R
          <idle>-0     [000]    27.905479:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.905482:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.905486:      4:115:S ==> [000]     0:140:R
           udevd-14    [001]    27.905499:     14:120:X ==> [001]    11:115:R
           udevd-11    [001]    27.905506:     11:115:R   + [000]     1:120:D
          <idle>-0     [000]    27.905515:      0:140:R ==> [000]     1:120:R
           udevd-11    [001]    27.905517:     11:115:S ==> [001]     0:140:R
[   27.905557107] initcall ksysfs_init+0x0/0xb0 returned 0 after 3906 msecs
[   27.905705736] calling  init_jiffies_clocksource+0x0/0x10 @ 1
[   27.905779239] initcall init_jiffies_clocksource+0x0/0x10 returned 0 after 0 msecs
[   27.906769814] calling  pm_init+0x0/0x30 @ 1
[   27.906853627] initcall pm_init+0x0/0x30 returned 0 after 0 msecs
[   27.906997803] calling  pm_disk_init+0x0/0x20 @ 1
[   27.907076946] initcall pm_disk_init+0x0/0x20 returned 0 after 0 msecs
[   27.907222556] calling  swsusp_header_init+0x0/0x30 @ 1
[   27.907294325] initcall swsusp_header_init+0x0/0x30 returned 0 after 0 msecs
[   27.907439620] calling  stop_machine_init+0x0/0x50 @ 1
            init-1     [000]    27.907485:      1:120:R   + [000]     2:115:S
            init-1     [000]    27.907490:      1:120:D ==> [000]     2:115:R
        kthreadd-2     [000]    27.907507:      2:115:R   + [001]    15:115:R
          <idle>-0     [001]    27.907517:      0:140:R ==> [001]    15:115:R
        kthreadd-2     [000]    27.907517:      2:115:D ==> [000]     0:140:R
          <idle>-0     [000]    27.907521:      0:140:R   + [000]     4:115:S
          <idle>-0     [000]    27.907524:      0:140:R ==> [000]     4:115:R
           udevd-15    [001]    27.907527:     15:115:D   + [000]     2:115:D
     ksoftirqd/0-4     [000]    27.907537:      4:115:S ==> [000]     2:115:R
           udevd-15    [001]    27.907537:     15:115:D ==> [001]     0:140:R
        kthreadd-2     [000]    27.907546:      2:115:R   + [000]     1:120:D
        kthreadd-2     [000]    27.907550:      2:115:S ==> [000]     1:120:R
            init-1     [000]    27.907584:      1:120:R   + [000]    15:  0:D
            init-1     [000]    27.907589:      1:120:R   + [000]     2:115:S
            init-1     [000]    27.907593:      1:120:D ==> [000]    15:  0:R
           udevd-15    [000]    27.907601:     15:  0:S ==> [000]     2:115:R
 ##### CPU 0 buffer started ####
        kthreadd-2     [000]    27.907616:      2:115:R   + [001]    16:115:R
 ##### CPU 1 buffer started ####
          <idle>-0     [001]    27.907620:      0:140:R ==> [001]    16:115:R
        kthreadd-2     [000]    27.907621:      2:115:D ==> [000]     0:140:R
           udevd-16    [001]    27.907625:     16:115:D   + [000]     2:115:D
          <idle>-0     [000]    27.907628:      0:140:R   + [000]     4:115:S
           udevd-16    [001]    27.907629:     16:115:D ==> [001]     0:140:R
          <idle>-0     [000]    27.907631:      0:140:R ==> [000]     4:115:R
     ksoftirqd/0-4     [000]    27.907636:      4:115:S ==> [000]     2:115:R
        kthreadd-2     [000]    27.907644:      2:115:R   + [000]     1:120:D
        kthreadd-2     [000]    27.907647:      2:115:S ==> [000]     1:120:R
            init-1     [000]    27.907657:      1:120:R   + [001]    16:  0:D
          <idle>-0     [001]    27.907666:      0:140:R ==> [001]    16:  0:R
[   27.907703862] initcall stop_machine_init+0x0/0x50 returned 0 after 0 msecs
[   27.907850704] calling  filelock_init+0x0/0x30 @ 1
[   27.907926573] initcall filelock_init+0x0/0x30 returned 0 after 0 msecs
[   27.908071327] calling  init_script_binfmt+0x0/0x10 @ 1
[   27.908165195] initcall init_script_binfmt+0x0/0x10 returned 0 after 0 msecs
[   27.908309461] calling  init_elf_binfmt+0x0/0x10 @ 1

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 10:17:19 +01:00
Frederic Weisbecker
3f5ec13696 tracing/fastboot: move boot tracer structs and funcs into their own header.
Impact: Cleanups on the boot tracer and ftrace

This patch bring some cleanups about the boot tracer headers. The
functions and structures of this tracer have nothing related to ftrace
and should have so their own header file.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 10:17:18 +01:00
Frederic Weisbecker
71566a0d16 tracing/fastboot: Enable boot tracing only during initcalls
Impact: modify boot tracer

We used to disable the initcall tracing at a specified time (IE: end
of builtin initcalls). But we don't need it anymore. It will be
stopped when initcalls are finished.

However we want two things:

_Start this tracing only after pre-smp initcalls are finished.

_Since we are planning to trace sched_switches at the same time, we
want to enable them only during the initcall execution.

For this purpose, this patch introduce two functions to enable/disable
the sched_switch tracing during boot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 17:14:02 +01:00
Huang Weiyi
d3f15800d5 init/do_mounts_md.c: remove duplicated #include
Removed duplicated #include <linux/delay.h> in init/do_mounts_md.c.

The same compile error ("error: implicit declaration of function
'msleep'") got fixed twice:

 - f8b77d3939 ("init/do_mounts_md.c:
   msleep compile fix")

 - 73b4a24f5f ("init/do_mounts_md.c must
   #include <linux/delay.h>")

by people adding the <linux/delay.h> include in two slightly different
places.  Andrew's quilt scripts happily ignore the fuzz, and will
re-apply the patch even though they had conflicts.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-01 10:35:51 -07:00
KAMEZAWA Hiroyuki
84ad6d7000 memcg: update menuconfig help text
page_cgroup is now allocated at boot and memmap doesn't includes pointer
for page_cgroup.  Fix the menu help text.

Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Alexey Dobriyan
f8b77d3939 init/do_mounts_md.c: msleep compile fix
init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Linus Torvalds
4403b406d4 Revert "Call init_workqueues before pre smp initcalls."
This reverts commit a802dd0eb5 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.

It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall().  That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-25 19:53:38 -07:00
Linus Torvalds
5ed487bc2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
  [PATCH] fs: add a sanity check in d_free
  [PATCH] i_version: remount support
  [patch] vfs: make security_inode_setattr() calling consistent
  [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
  [PATCH] move executable checking into ->permission()
  [PATCH] fs/dcache.c: update comment of d_validate()
  [RFC PATCH] touch_mnt_namespace when the mount flags change
  [PATCH] reiserfs: add missing llseek method
  [PATCH] fix ->llseek for more directories
  [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
  [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
  [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
  [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
  [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
  [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
  [PATCH] get rid of on-stack dentry in udf
  [PATCH 2/2] anondev: switch to IDA
  [PATCH 1/2] anondev: init IDR statically
  [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
  [PATCH] Optimise NFS readdir hack slightly.
  ...
2008-10-23 10:22:40 -07:00
Linus Torvalds
a3415dc34f Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (32 commits)
  PCI hotplug: fix logic in Compaq hotplug controller bus speed setup
  PCI: don't export linux/io.h from pci.h
  PCI: PCI_QUIRKS depends on PCI
  PCI hotplug: pciehp: poll data link layer link active
  PCI hotplug: pciehp: fix possible memory leak in pcie_init
  PCI: Workaround invalid P2P bridge bus numbers
  PCI Hotplug: fakephp: add duplicate slot name debugging
  PCI: Hotplug core: remove 'name'
  PCI: shcphp: remove 'name' parameter
  PCI: SGI Hotplug: stop managing bss_hotplug_slot->name
  PCI: rpaphp: kmalloc/kfree slot->name directly
  PCI: pciehp: remove 'name' parameter
  PCI: ibmphp: stop managing hotplug_slot->name
  PCI: fakephp: remove 'name' parameter
  PCI, PCI Hotplug: introduce slot_name helpers
  PCI: cpqphp: stop managing hotplug_slot->name
  PCI: cpci_hotplug: stop managing hotplug_slot->name
  PCI: acpiphp: remove 'name' parameter
  PCI: prevent duplicate slot names
  PCI Hotplug: serialize pci_hp_register and pci_hp_deregister
  ...
2008-10-23 10:16:53 -07:00
Linus Torvalds
a534487606 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  stop_machine: fix error code handling on multiple cpus
  stop_machine: use workqueues instead of kernel threads
  workqueue: introduce create_rt_workqueue
  Call init_workqueues before pre smp initcalls.
  Make panic= and panic_on_oops into core_params
  Make initcall_debug a core_param
  core_param() for genuinely core kernel parameters
  param: Fix duplicate module prefixes
  module: check kernel param length at compile time, not runtime
  Remove stop_machine during module load v2
  module: simplify load_module.
2008-10-23 10:00:14 -07:00
KAMEZAWA Hiroyuki
94b6da5ab8 memcg: fix page_cgroup allocation
page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
  parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
  ....
  cgroup_init_early()  # "early" init of cgroup.
  ....
  setup_arch()         # memmap is allocated.
  ...
  page_cgroup_init();
  mem_init();   # we cannot call alloc_bootmem after this.
  ....
  cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
    - cgroup_init_early() is too early
    - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
    use of vmalloc area in x86-32 is important and we should avoid very large
    vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
    directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:02 -07:00
Alexey Dobriyan
6de24f0ed0 [PATCH 1/2] anondev: init IDR statically
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 05:13:13 -04:00
Geert Uytterhoeven
61cfc7e442 PCI: PCI_QUIRKS depends on PCI
commit 3d13731024 ("PCI: allow quirks to be
compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each
and every .config.  Fix this by making it depend on PCI.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-22 16:42:46 -07:00
Heiko Carstens
a802dd0eb5 Call init_workqueues before pre smp initcalls.
This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-10-22 10:00:25 +11:00
Rusty Russell
d0ea3d7d28 Make initcall_debug a core_param
This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-22 10:00:24 +11:00
Linus Torvalds
a0bfb673dc Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits)
  PCI: fix pci_ioremap_bar() on s390
  PCI: fix AER capability check
  PCI: use pci_find_ext_capability everywhere
  PCI: remove #ifdef DEBUG around dev_dbg call
  PCI hotplug: fix get_##name return value problem
  PCI: document the pcie_aspm kernel parameter
  PCI: introduce an pci_ioremap(pdev, barnr) function
  powerpc/PCI: Add legacy PCI access via sysfs
  PCI: Add ability to mmap legacy_io on some platforms
  PCI: probing debug message uniformization
  PCI: support PCIe ARI capability
  PCI: centralize the capabilities code in probe.c
  PCI: centralize the capabilities code in pci-sysfs.c
  PCI: fix 64-vbit prefetchable memory resource BARs
  PCI: replace cfg space size (256/4096) by macros.
  PCI: use resource_size() everywhere.
  PCI: use same arg names in PCI_VDEVICE comment
  PCI hotplug: rpaphp: make debug var unique
  PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c
  PCI: fix hotplug get_##name return value problem
  ...
2008-10-20 13:40:47 -07:00
Linus Torvalds
92b29b86fe Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
  tracing/fastboot: improve help text
  tracing/stacktrace: improve help text
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix bootgraph.pl initcall name regexp
  tracing/fastboot: fix issues and improve output of bootgraph.pl
  tracepoints: synchronize unregister static inline
  tracepoints: tracepoint_synchronize_unregister()
  ftrace: make ftrace_test_p6nop disassembler-friendly
  markers: fix synchronize marker unregister static inline
  tracing/fastboot: add better resolution to initcall debug/tracing
  trace: add build-time check to avoid overrunning hex buffer
  ftrace: fix hex output mode of ftrace
  tracing/fastboot: fix initcalls disposition in bootgraph.pl
  tracing/fastboot: fix printk format typo in boot tracer
  ftrace: return an error when setting a nonexistent tracer
  ftrace: make some tracers reentrant
  ring-buffer: make reentrant
  ring-buffer: move page indexes into page headers
  tracing/fastboot: only trace non-module initcalls
  ftrace: move pc counter in irqtrace
  ...

Manually fix conflicts:
 - init/main.c: initcall tracing
 - kernel/module.c: verbose level vs tracepoints
 - scripts/bootgraph.pl: fallout from cherry-picking commits.
2008-10-20 13:35:07 -07:00
Thomas Petazzoni
3d13731024 PCI: allow quirks to be compiled out
This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
the PCI quirks, which are not necessarily used on embedded systems when
PCI is working properly. As this is a size-reduction option, it depends
on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
code:

   text	   data	    bss	    dec	    hex	filename
1287806	 123596	 212992	1624394	 18c94a	vmlinux.old
1275854	 123596	 212992	1612442	 189a9a	vmlinux
 -11952       0       0  -11952   -2EB0 +/-

This patch has originally been written by Zwane Mwaikambo
<zwane@arm.linux.org.uk> and is part of the Linux Tiny project.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-20 10:53:40 -07:00
Matt Helsley
dc52ddc0e6 container freezer: implement freezer cgroup subsystem
This patch implements a new freezer subsystem in the control groups
framework.  It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup.  Reading will return the current state.

* Examples of usage :

   # mkdir /containers/freezer
   # mount -t cgroup -ofreezer freezer  /containers
   # mkdir /containers/0
   # echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

   # cat /containers/0/freezer.state
   RUNNING

to freeze all tasks in the container :

   # echo FROZEN > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   FREEZING
   # cat /containers/0/freezer.state
   FROZEN

to unfreeze all tasks in the container :

   # echo RUNNING > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   RUNNING

This is the basic mechanism which should do the right thing for user space
task in a simple scenario.

It's important to note that freezing can be incomplete.  In that case we
return EBUSY.  This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time.  After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read.  The state will remain
"FREEZING" until one of these things happens:

	1) Userspace cancels the freezing operation by writing "RUNNING" to
		the freezer.state file
	2) Userspace retries the freezing operation by writing "FROZEN" to
		the freezer.state file (writing "FREEZING" is not legal
		and returns EIO)
	3) The tasks that blocked the cgroup from entering the "FROZEN"
		state disappear from the cgroup's set of tasks.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:34 -07:00
Nick Piggin
db64fe0225 mm: rewrite vmap layer
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).

The biggest problem with vmap is actually vunmap.  Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache.  This is all done under a global lock.  As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
 This gives terrible quadratic scalability characteristics.

Another problem is that the entire vmap subsystem works under a single
lock.  It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.

This is a rewrite of vmap subsystem to solve those problems.  The existing
vmalloc API is implemented on top of the rewritten subsystem.

The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated.  So the addresses aren't allocated again until
a subsequent TLB flush.  A single TLB flush then can flush multiple
vunmaps from each CPU.

XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.

The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.

There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.

To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap.  Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).

As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages.  Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
in nanoseconds per map+touch+unmap.

threads           vanilla         vmap rewrite
1                 14700           2900
2                 33600           3000
4                 49500           2800
8                 70631           2900

So with a 8 cores, the rewritten version is already 25x faster.

In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram...  along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system.  I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now.  vmap is pretty well blown off the
profiles.

Before:
1352059 total                                      0.1401
798784 _write_lock                              8320.6667 <- vmlist_lock
529313 default_idle                             1181.5022
 15242 smp_call_function                         15.8771  <- vmap tlb flushing
  2472 __get_vm_area_node                         1.9312  <- vmap
  1762 remove_vm_area                             4.5885  <- vunmap
   316 map_vm_area                                0.2297  <- vmap
   312 kfree                                      0.1950
   300 _spin_lock                                 3.1250
   252 sn_send_IPI_phys                           0.4375  <- tlb flushing
   238 vmap                                       0.8264  <- vmap
   216 find_lock_page                             0.5192
   196 find_next_bit                              0.3603
   136 sn2_send_IPI                               0.2024
   130 pio_phys_write_mmr                         2.0312
   118 unmap_kernel_range                         0.1229

After:
 78406 total                                      0.0081
 40053 default_idle                              89.4040
 33576 ia64_spinlock_contention                 349.7500
  1650 _spin_lock                                17.1875
   319 __reg_op                                   0.5538
   281 _atomic_dec_and_lock                       1.0977
   153 mutex_unlock                               1.5938
   123 iget_locked                                0.1671
   117 xfs_dir_lookup                             0.1662
   117 dput                                       0.1406
   114 xfs_iget_core                              0.0268
    92 xfs_da_hashname                            0.1917
    75 d_alloc                                    0.0670
    68 vmap_page_range                            0.0462 <- vmap
    58 kmem_cache_alloc                           0.0604
    57 memset                                     0.0540
    52 rb_next                                    0.1625
    50 __copy_user                                0.0208
    49 bitmap_find_free_region                    0.2188 <- vmap
    46 ia64_sn_udelay                             0.1106
    45 find_inode_fast                            0.1406
    42 memcmp                                     0.2188
    42 finish_task_switch                         0.1094
    42 __d_lookup                                 0.0410
    40 radix_tree_lookup_slot                     0.1250
    37 _spin_unlock_irqrestore                    0.3854
    36 xfs_bmapi                                  0.0050
    36 kmem_cache_free                            0.0256
    35 xfs_vn_getattr                             0.0322
    34 radix_tree_lookup                          0.1062
    33 __link_path_walk                           0.0035
    31 xfs_da_do_buf                              0.0091
    30 _xfs_buf_find                              0.0204
    28 find_get_page                              0.0875
    27 xfs_iread                                  0.0241
    27 __strncpy_from_user                        0.2812
    26 _xfs_buf_initialize                        0.0406
    24 _xfs_buf_lookup_pages                      0.0179
    24 vunmap_page_range                          0.0250 <- vunmap
    23 find_lock_page                             0.0799
    22 vm_map_ram                                 0.0087 <- vmap
    20 kfree                                      0.0125
    19 put_page                                   0.0330
    18 __kmalloc                                  0.0176
    17 xfs_da_node_lookup_int                     0.0086
    17 _read_lock                                 0.0885
    17 page_waitqueue                             0.0664

vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.

[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Adrian Bunk
73b4a24f5f init/do_mounts_md.c must #include <linux/delay.h>
This patch fixes the following compile error caused by commit
589f800bb1 ("fastboot: make the raid
autodetect code wait for all devices to init"):

    CC      init/do_mounts_md.o
  init/do_mounts_md.c: In function 'autodetect_raid':
  init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
  make[2]: *** [init/do_mounts_md.o] Error 1

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 13:35:34 -07:00
Thomas Petazzoni
ebf3f09c63 Configure out AIO support
This patchs adds the CONFIG_AIO option which allows to remove support
for asynchronous I/O operations, that are not necessarly used by
applications, particularly on embedded devices. As this is a
size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
save ~7 kilobytes of kernel code/data:

   text	   data	    bss	    dec	    hex	filename
1115067	 119180	 217088	1451335	 162547	vmlinux
1108025	 119048	 217088	1444161	 160941	vmlinux.new
  -7042    -132       0   -7174   -1C06 +/-

This patch has been originally written by Matt Mackall
<mpm@selenic.com>, and is part of the Linux Tiny project.

[randy.dunlap@oracle.com: build fix]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:51 -07:00
Nye Liu
889d51a107 initramfs: add option to preserve mtime from initramfs cpio images
When unpacking the cpio into the initramfs, mtimes are not preserved by
default.  This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
mtimes stored in the cpio image to be used when constructing the
initramfs.

For embedded applications that run exclusively out of the initramfs, this
is invaluable:

When building embedded application initramfs images, its nice to know when
the files were actually created during the build process - that makes it
easier to see what files were modified when so we can compare the files
that are being used on the image with the files used during the build
process.  This might help (for example) to determine if the target system
has all the updated files you expect to see w/o having to check MD5s etc.

In our environment, the whole system runs off the initramfs partition, and
seeing the modified times of the shared libraries (for example) helps us
find bugs that may have been introduced by the build system incorrectly
propogating outdated shared libraries into the image.

Similarly, many of the initializion/configuration files in /etc might be
dynamically built by the build system, and knowing when they were modified
helps us sanity check whether the target system has the "latest" files
etc.

Finally, we might use last modified times to determine whether a hot fix
should be applied or not to the running ramfs.

Signed-off-by: Nye Liu <nyet@nyet.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:31 -07:00
Geert Uytterhoeven
93fd85d005 identify_ramdisk_image(): correct typo about return value in comment
identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:

	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
		printk(KERN_NOTICE
		       "RAMDISK: Compressed image found at block %d\n",
		       start_block);
		nblocks = 0;
		^^^^^^^^^^^
		goto done;
	}

	...

done:
	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
	kfree(buf);
	return nblocks;
	^^^^^^^^^^^^^^

Hence correct the typo in the comment, which has existed since the
addition of compressed ramdisk support in 1.3.48.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:30 -07:00
Ingo Molnar
b2aaf8f74c Merge branch 'linus' into stackprotector
Conflicts:
	arch/x86/kernel/Makefile
	include/asm-x86/pda.h
2008-10-15 13:46:29 +02:00
Linus Torvalds
2d51b75370 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
  raid, fastboot: hide RAID autodetect option if MD is compiled as a module
  raid: make RAID autodetect default a KConfig option
  warning: fix init do_mounts_md c
  fastboot: make the RAID autostart code print a message just before waiting
  fastboot: make the raid autodetect code wait for all devices to init
  fastboot: Fix bootgraph.pl initcall name regexp
  fastboot: fix issues and improve output of bootgraph.pl
  Add a script to visualize the kernel boot process / time
2008-10-14 12:28:02 -07:00
Tim Bird
ca538f6bbe tracing/fastboot: add better resolution to initcall debug/tracing
Change the time resolution for initcall_debug to microseconds, from
milliseconds.  This is handy to determine which initcalls you want to work
on for faster booting.

One one of my test machines, over 90% of the initcalls are less than a
millisecond and (without this patch) these are all reported as 0 msecs.
Working on the 900 us ones is more important than the 4 us ones.

With 'quiet' on the kernel command line, this adds no significant overhead
to kernel boot time.

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:27 +02:00
Frederic Weisbecker
097d036a2f tracing/fastboot: only trace non-module initcalls
At this time, only built-in initcalls interest us.
We can't really produce a relevant graph if we include
the modules initcall too.

I had good results after this patch (see svg in attachment).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:17 +02:00
Frederic Weisbecker
5601020feb tracing/fastboot: get the initcall name before it disappears
After some initcall traces, some initcall names may be inconsistent.
That's because these functions will disappear from the .init section
and also their name from the symbols table.

So we have to copy the name of the function in a buffer large enough
during the trace appending. It is not costly for the ring_buffer because
the number of initcall entries is commonly not really large.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:12 +02:00
Frederic Weisbecker
cb5ab74204 tracing/fastboot: change the printing of boot tracer according to bootgraph.pl
Change the boot tracer printing to make it parsable for
the scripts/bootgraph.pl script.

We have now to output two lines for each initcall, according to the
printk in do_one_initcall() in init/main.c
We need now the call's time and the return's time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:39:11 +02:00
Frédéric Weisbecker
3bf77af6e1 tracing/ftrace: launch boot tracing after pre-smp initcalls
Launch the boot tracing inside the initcall_debug area. Old printk
have not been removed to keep the old way of initcall tracing for
backward compatibility.

[ mingo@elte.hu: resolved conflicts ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:50 +02:00
Arjan van de Ven
aa5d9151f7 tracing/fastboot: add a script to visualize the kernel boot process / time
When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with the fastboot asynchronous
initcall level, it's very valuable to see which initcall gets run where
and when.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-14 10:38:46 +02:00
Steven Rostedt
68bf21aa15 ftrace: mcount call site on boot nops core
This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.

This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.

This patch does not handle modules. That comes with a later patch.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:34:44 +02:00
Ingo Molnar
5f87f11218 tracing: clean up tracepoints kconfig structure
do not expose users to CONFIG_TRACEPOINTS - tracers can select it
just fine.

update ftrace to select CONFIG_TRACEPOINTS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:33:32 +02:00
Ingo Molnar
fa340d9c05 tracing: disable tracepoints by default
while it's arguably low overhead, we dont enable new features by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:32:56 +02:00
Mathieu Desnoyers
97e1c18e8d tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section).  When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
  tracepoint table. This will make sure not type mismatch happens due to
  connexion of a probe with the wrong type to a tracepoint declared with
  the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.

Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.

While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.

* 2 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
      100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
      150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
      200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
      250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
      300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
      350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
      400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
      450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
      500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
      550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
      600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
      650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
      700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
      750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
      800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
--------------------------------------------------------------

* 4 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
      100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
      150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
      200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
      250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
      300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
      350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
      400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
      450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
      500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
      550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
      600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
      650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
      700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
      750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
      800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
--------------------------------------------------------------

* 8 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
      100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
      150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
      200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
      250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
      300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
      350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
      400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
      450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
      500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
      550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
      600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
      650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
      700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
      750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
      800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
--------------------------------------------------------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:28:28 +02:00
Linus Torvalds
20272c8994 Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
  proc: remove kernel.maps_protect
  proc: remove now unneeded ADDBUF macro
  [PATCH] proc: show personality via /proc/pid/personality
  [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
  proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
  proc: make grab_header() static
  proc: remove unused get_dma_list()
  proc: remove dummy vmcore_open()
  proc: proc_sys_root tweak
  proc: fix return value of proc_reg_open() in "too late" case

Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
2008-10-13 10:04:04 -07:00
Arjan van de Ven
a364092a41 raid: make RAID autodetect default a KConfig option
RAID autodetect has the side effect of requiring synchronisation
of all device drivers, which can make the boot several seconds longer
(I've measured 7 on one of my laptops).... even for systems that don't
have RAID setup for the root filesystem (the only FS where this matters).

This patch makes the default for autodetect a config option; either way
the user can always override via the kernel command line.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NeilBrown <neilb@suse.de>
2008-10-12 08:25:02 -07:00
Ingo Molnar
82cbc11a41 warning: fix init do_mounts_md c
fix warning:

  init/do_mounts_md.c: In function ‘md_run_setup’:
  init/do_mounts_md.c:282: warning: ISO C90 forbids mixed declarations and code

also, use the opportunity to put the RAID autodetection code
into a separate function - this also solves a checkpatch style warning.

No code changed:

md5:
   aa36a35faef371b05f1974ad583bdbbd  do_mounts_md.o.before.asm
   aa36a35faef371b05f1974ad583bdbbd  do_mounts_md.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-12 08:24:34 -07:00
Arjan van de Ven
02c15def84 fastboot: make the RAID autostart code print a message just before waiting
As requested/suggested by Neil Brown: make the raid code print that it's
about to wait for probing to be done as well as give a suggestion on how
to disable the probing if the user doesn't use raid.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com
2008-10-12 08:23:53 -07:00
Arjan van de Ven
589f800bb1 fastboot: make the raid autodetect code wait for all devices to init
The raid autodetect code really needs to have all devices probed before
it can detect raid arrays; not doing so would give rather messy situations
where arrays would get detected as degraded while they shouldn't be etc.

This is in preparation of removing the "wait for everything to init"
code that makes everyone pay, not just raid users.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-12 08:23:04 -07:00
Arjan van de Ven
f9b9796ade Add a script to visualize the kernel boot process / time
When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with some of the initializing
going asynchronous soon, it's valuable to track/print which worker thread
is executing the initialization.

This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-12 08:07:20 -07:00
Alexey Dobriyan
53167a3ef2 proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-10 04:18:57 +04:00
Tejun Heo
55dc7db70a init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param
DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless.  Root devices should
be explicitly specified using textual names.  Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled.  Also, add warning
to the help text.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:11 +02:00
Linus Torvalds
96d746c68f Fix init/main.c to use regular printk with '%pF' for initcall fn
.. small detail, but the silly e1000e initcall warning debugging caused
me to look at this code.  Rather than gouge my eyes out with a spoon, I
just fixed it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-03 13:38:07 -07:00
Andi Kleen
9e94cd325b Move sysctl check into debugging section and don't make it default y
I noticed that sysctl_check.o was the largest object file in
a allnoconfig build in kernel/*.

  36243       0       0   36243    8d93 kernel/sysctl_check.o

This is because it was default y and && EMBEDDED. But I don't
really see a need for a non kernel developer to have their
sysctls checked all the time.

So move the Kconfig into the kernel debugging section and
also drop the default y and the EMBEDDED check.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-16 17:13:43 -07:00
Arjan van de Ven
59f9415ffb modules: extend initcall_debug functionality to the module loader
The kernel has this really nice facility where if you put "initcall_debug"
on the kernel commandline, it'll print which function it's going to
execute just before calling an initcall, and then after the call completes
it will

1) print if it had an error code

2) checks for a few simple bugs (like leaving irqs off)
and

3) print how long the init call took in milliseconds.

While trying to optimize the boot speed of my laptop, I have been loving
number 3 to figure out what to optimize...  ...  and then I wished that
the same thing was done for module loading.

This patch makes the module loader use this exact same functionality; it's
a logical extension in my view (since modules are just sort of late
binding initcalls anyway) and so far I've found it quite useful in finding
where things are too slow in my boot.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-08-12 17:52:54 +10:00
Robert P. J. Day
0b0de14433 Kconfig: Extend "menuconfig" for modules to simplify Kconfig file
Given that the init/Kconfig file uses a "menuconfig" directive for
modules already, might as well wrap all the submenu entries in an "if"
to toss all those dependencies.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-08-06 22:14:04 +02:00
Bartlomiej Zolnierkiewicz
b5b9309d34 remove unnecessary <linux/hdreg.h> includes
Following files don't need <linux/hdreg.h> at all:

- arch/mips/jazz/setup.c
- arch/sh/boards/mach-systemh/irq.c
- drivers/macintosh/mediabay.c
- drivers/scsi/hptiop.c
- drivers/usb/storage/freecom.c
- arch/powerpc/include/asm/ide.h
- init/main.c

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-08-05 18:16:58 +02:00
Linus Torvalds
e811603feb Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: scripts/ver_linux: don't set PATH
  Kconfig/init: change help text to match default value
  kbuild: genksyms: Include extern information in dumps
  kbuild: genksyms parser: fix the __attribute__ rule
  kbuild: scripts/genksyms/lex.l: add %option noinput
  kconfig: scripts/kconfig/zconf.l: add %option noinput
  kbuild: fix O=... build of um
2008-08-01 11:50:21 -07:00
Linus Torvalds
57b1494d2b Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic, x86: fix add iommu_num_pages helper function
  x86: remove stray <6> in BogoMIPS printk
  x86: move dma32_reserve_bootmem() after reserve_crashkernel()
2008-08-01 10:28:17 -07:00
jkacur
775a7229ac Kconfig/init: change help text to match default value
Change the "If unsure" message to match the default value.

Signed-off-by: John Kacur <jkacur at gmail dot com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-31 23:33:10 +02:00
Geert Uytterhoeven
bd673c7c3b initrd: cast initrd_start' to void *'
commit fb6624ebd9 (initrd: Fix virtual/physical
mix-up in overwrite test) introduced the compiler warning below on mips,
as its virt_to_page() doesn't cast the passed address to unsigned long
internally, unlike on most other architectures:

init/main.c: In function `start_kernel':
init/main.c:633: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast
init/main.c:636: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast

For now, kill the warning by explicitly casting initrd_start to `void *', as
that's the type it should really be.

Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30 09:41:45 -07:00
Ingo Molnar
35780c8ea7 Merge commit 'v2.6.27-rc1' into x86/urgent 2008-07-29 12:10:50 +02:00
Ingo Molnar
cb28a1bbdb Merge branch 'linus' into core/generic-dma-coherent
Conflicts:

	arch/x86/Kconfig

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 00:07:55 +02:00
Joe Perches
d7ba11d01c x86: remove stray <6> in BogoMIPS printk
Rabin Vincent noticed that there's a stray <6> in BogoMIPS printk:

> Remove the extra KERN_INFO which causes this:
> Calibrating delay loop... <6>179.40 BogoMIPS (lpj=897024)
> -	printk(KERN_INFO "%lu.%02lu BogoMIPS (lpj=%lu)\n",
> -			loops_per_jiffy/(500000/HZ),
> -			(loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
> +	printk("%lu.%02lu BogoMIPS (lpj=%lu)\n",
> +		loops_per_jiffy/(500000/HZ),
> +		(loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
>  }

How about just using KERN_CONT and leaving the whitespace
for a patch that does the entire file?

Reported-by: Rabin Vincent <rabin@rab.in>
2008-07-28 14:22:26 +02:00
Linus Torvalds
6948385cbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
  setlocalversion: do not describe if there is nothing to describe
  kconfig: fix typos: "Suport" -> "Support"
  kconfig: make defconfig is no longer chatty
  kconfig: make oldconfig is now less chatty
  kconfig: speed up all*config + randconfig
  kconfig: set all new symbols automatically
  kconfig: add diffconfig utility
  kbuild: remove Module.markers during mrproper
  kbuild: sparse needs CF not CHECKFLAGS
  kernel-doc: handle/strip __init
  vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section
  init: fix URL of "The GNU Accounting Utilities"
  kbuild: add arch/$ARCH/include to search path
  kbuild: asm symlink support for arch/$ARCH/include
  kbuild: support arch/$ARCH/include for tags, cscope
  kbuild: prepare headers_* for arch/$ARCH/include
  kbuild: install all headers when arch is changed
  kbuild: make clean removes *.o.* as well
  kbuild: optimize headers_* targets
  kbuild: only one call for include/ in make headers_*
  ...
2008-07-27 09:59:59 -07:00
Adrian Bunk
f56f6d30c7 make init/do_mounts.c:root_device_name static
This patch makes the needlessly global root_device_name static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:12 -07:00
Eduard - Gabriel Munteanu
7babe8db99 Full conversion to early_initcall() interface, remove old interface
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls.  Now we remove the older interface, converting all
existing users to the new one.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu
c2147a5092 Better interface for hooking early initcalls
Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls.  Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.

This is required by CPU hotplug, because early users have to register
notifiers before going SMP.  One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code.  This in turn is used by kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Heikki Orsila
12d2b8f951 kconfig: fix typos: "Suport" -> "Support"
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25 22:12:52 +02:00
S.Çağlar Onur
37a4c94074 init: fix URL of "The GNU Accounting Utilities"
Following patch corrects URL of "The GNU Accounting Utilities" in init/Kconfig.

Noticed by: Bart Van Assche" <bart.vanassche@gmail.com>

Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25 22:12:36 +02:00
Adrian Bunk
3ae4eed34b proper pid{hash,map}_init() prototypes
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:45 -07:00
Daniel Guilak
197dcffc8b init/version.c: define version_string only if CONFIG_KALLSYMS is not defined
int Version_* is only used with ksymoops, which is only needed (according
to README and Documentation/Changes) if CONFIG_KALLSYMS is NOT defined.
Therefore this patch defines version_string only if CONFIG_KALLSYMS is not
defined.

Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:29 -07:00
Daniel Guilak
277e2c6959 init/version.c: silence sparse warning by declaring the version string
Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
Thomas Petazzoni
2d6ffcca62 inflate: refactor inflate malloc code
Inflate requires some dynamic memory allocation very early in the boot
process and this is provided with a set of four functions:
malloc/free/gzip_mark/gzip_release.

The old inflate code used a mark/release strategy rather than implement
free.  This new version instead keeps a count on the number of outstanding
allocations and when it hits zero, it resets the malloc arena.

This allows removing all the mark and release implementations and unifying
all the malloc/free implementations.

The architecture-dependent code must define two addresses:
 - free_mem_ptr, the address of the beginning of the area in which
   allocations should be made
 - free_mem_end_ptr, the address of the end of the area in which
   allocations should be made. If set to 0, then no check is made on
   the number of allocations, it just grows as much as needed

The architecture-dependent code can also provide an arch_decomp_wdog()
function call.  This function will be called several times during the
decompression process, and allow to notify the watchdog that the system is
still running.  If an architecture provides such a call, then it must
define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls
arch_decomp_wdog().

Work initially done by Matt Mackall, updated to a recent version of the
kernel and improved by me.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:28 -07:00
Robert P. J. Day
cb345d7352 init/: delete hard-coded setting and testing of BUILD_CRAMDISK
There seems to be little point in explicitly setting, then testing the macro
BUILD_CRAMDISK within the context of a single source file.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Adrian Bunk
82c8253ac2 init/do_mounts.c should #include <linux/initrd.h>
Every file should include the headers containing the externs for its
global code (in this case for rd_doload).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:27 -07:00
Linus Torvalds
7f9dce3837 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: hrtick_enabled() should use cpu_active()
  sched, x86: clean up hrtick implementation
  sched: fix build error, provide partition_sched_domains() unconditionally
  sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
  cpu hotplug: Make cpu_active_map synchronization dependency clear
  cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
  sched: rework of "prioritize non-migratable tasks over migratable ones"
  sched: reduce stack size in isolated_cpu_setup()
  Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
2008-07-23 19:36:53 -07:00
Johannes Berg
baabaae981 make CONFIG_KMOD invisible
... as preparation for removing it completely, make it an
invisible bool defaulting to yes.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:28 +10:00
Denys Vlasenko
f7f5b67557 Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module.c and module.h conatains code for finding
exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
(because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).

This patch adds required #ifdefs.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:27 +10:00
Geert Uytterhoeven
fb6624ebd9 initrd: Fix virtual/physical mix-up in overwrite test
On recent kernels, I get the following error when using an initrd:

| initrd overwritten (0x00b78000 < 0x07668000) - disabling it.

My Amiga 4000 has 12 MiB of RAM at physical address 0x07400000 (virtual
0x00000000).
The initrd is located at the end of RAM: 0x00b78000 - 0x00c00000 (virtual).
The overwrite test compares the (virtual) initrd location to the (physical)
first available memory location, which fails.

This patch converts initrd_start to a page frame number, so it can safely be
compared with min_low_pfn.

Before the introduction of discontiguous memory support on m68k
(12d810c1b8), min_low_pfn was just left
untouched by the m68k-specific code (zero, I guess), and everything worked
fine.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-20 17:24:40 -07:00
Ingo Molnar
f6dc8ccaab Merge branch 'linus' into core/generic-dma-coherent
Conflicts:

	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 21:13:20 +02:00
Max Krasnyansky
e761b77252 cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
This is based on Linus' idea of creating cpu_active_map that prevents
scheduler load balancer from migrating tasks to the cpu that is going
down.

It allows us to simplify domain management code and avoid unecessary
domain rebuilds during cpu hotplug event handling.

Please ignore the cpusets part for now. It needs some more work in order
to avoid crazy lock nesting. Although I did simplfy and unify domain
reinitialization logic. We now simply call partition_sched_domains() in
all the cases. This means that we're using exact same code paths as in
cpusets case and hence the test below cover cpusets too.
Cpuset changes to make rebuild_sched_domains() callable from various
contexts are in the separate patch (right next after this one).

This not only boots but also easily handles
	while true; do make clean; make -j 8; done
and
	while true; do on-off-cpu 1; done
at the same time.
(on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
this on right now in gnome-terminal and things are moving just fine.

Also this is running with most of the debug features enabled (lockdep,
mutex, etc) no BUG_ONs or lockdep complaints so far.

I believe I addressed all of the Dmitry's comments for original Linus'
version. I changed both fair and rt balancer to mask out non-active cpus.
And replaced cpu_is_offline() with !cpu_active() in the main scheduler
code where it made sense (to me).

Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Cc: dmitry.adamushko@gmail.com
Cc: pj@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 13:22:25 +02:00
Linus Torvalds
9c1be0c471 Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes
2008-07-16 15:02:57 -07:00
Linus Torvalds
59190f4213 Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  generic-ipi: more merge fallout
  generic-ipi: merge fix
  x86, visws: use mach-default/entry_arch.h
  x86, visws: fix generic-ipi build
  generic-ipi: fixlet
  generic-ipi: fix s390 build bug
  generic-ipi: fix linux-next tree build failure
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix "smp_call_function: get rid of the unused nonatomic/retry argument"
  on_each_cpu(): kill unused 'retry' parameter
  smp_call_function: get rid of the unused nonatomic/retry argument
  sh: convert to generic helpers for IPI function calls
  parisc: convert to generic helpers for IPI function calls
  mips: convert to generic helpers for IPI function calls
  m32r: convert to generic helpers for IPI function calls
  arm: convert to generic helpers for IPI function calls
  alpha: convert to generic helpers for IPI function calls
  ia64: convert to generic helpers for IPI function calls
  powerpc: convert to generic helpers for IPI function calls
  ...

Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15 14:12:03 -07:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Ingo Molnar
6c9fcaf2ee Merge branch 'core/rcu' into core/rcu-for-linus 2008-07-15 21:10:12 +02:00
Adrian Hunter
2d62f48858 do_mounts: allow UBI root device name
Similarly to MTD devices, allow UBI devices.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-07-14 19:10:52 +03:00
Dmitry Baryshkov
ee7e5516be generic: per-device coherent dma allocator
Currently x86_32, sh and cris-v32 provide per-device coherent dma
memory allocator.

However their implementation is nearly identical. Refactor out
common code to be reused by them.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-30 12:51:05 +02:00
Jens Axboe
3d44223327 Add generic helpers for arch IPI function calls
This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.

The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.

Acked-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:21:34 +02:00
Alok Kataria
f3f3149f35 x86: use cpu_khz for loops_per_jiffy calculation, cleanup
As suggested by Ingo, remove all references to tsc from init/calibrate.c

TSC is x86 specific, and using tsc in variable names in a generic file should
be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
of lpj value. Also tsc_rate_*  is called timer_rate_*

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24 13:53:46 +02:00
Alok Kataria
3da757daf8 x86: use cpu_khz for loops_per_jiffy calculation
On the x86 platform we can use the value of tsc_khz computed during tsc
calibration to calculate the loops_per_jiffy value. Its very important
to keep the error in lpj values to minimum as any error in that may
result in kernel panic in check_timer. In virtualization environment, On
a highly overloaded host the guest delay calibration may sometimes
result in errors beyond the ~50% that timer_irq_works can handle,
resulting in the guest panicking.

Does some formating changes to lpj_setup code to now have a single
printk to print the bogomips value.

We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Daniel Hecht <dhecht@vmware.com>
Cc: Tim Mann <mann@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Cc: Sahil Rihan <srihan@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23 22:51:33 +02:00
Ingo Molnar
766d02786e Merge branch 'linus' into core/rcu 2008-06-16 11:23:36 +02:00
Ingo Molnar
4205942968 x86: fix the stackprotector canary of the boot CPU
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-26 16:15:32 +02:00
Ingo Molnar
9b5609fd77 stackprotector: include files
create <linux/stackprotector.h> for core kernel files to include.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-26 16:15:32 +02:00
Sam Ravnborg
73531905ed Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST
init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.

With this change we now try the defconfig targets last.

This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2008-05-25 23:03:18 +02:00
Adrian Bunk
03de250a26 md: proper extern for mdp_major
This patch adds a proper extern for mdp_major in include/linux/raid/md.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:09 -07:00
Paul E. McKenney
4446a36ff8 rcu: add call_rcu_sched()
Fourth cut of patch to provide the call_rcu_sched().  This is again to
synchronize_sched() as call_rcu() is to synchronize_rcu().

Should be fine for experimental and -rt use, but not ready for inclusion.
With some luck, I will be able to tell Andrew to come out of hiding on
the next round.

Passes multi-day rcutorture sessions with concurrent CPU hotplugging.

Fixes since the first version include a bug that could result in
indefinite blocking (spotted by Gautham Shenoy), better resiliency
against CPU-hotplug operations, and other minor fixes.

Fixes since the second version include reworking grace-period detection
to avoid deadlocks that could happen when running concurrently with
CPU hotplug, adding Mathieu's fix to avoid the softlockup messages,
as well as Mathieu's fix to allow use earlier in boot.

Fixes since the third version include a wrong-CPU bug spotted by
Andrew, getting rid of the obsolete synchronize_kernel API that somehow
snuck back in, merging spin_unlock() and local_irq_restore() in a
few places, commenting the code that checks for quiescent states based
on interrupting from user-mode execution or the idle loop, removing
some inline attributes, and some code-style changes.

Known/suspected shortcomings:

o	I still do not entirely trust the sleep/wakeup logic.  Next step
	will be to use a private snapshot of the CPU online mask in
	rcu_sched_grace_period() -- if the CPU wasn't there at the start
	of the grace period, we don't need to hear from it.  And the
	bit about accounting for changes in online CPUs inside of
	rcu_sched_grace_period() is ugly anyway.

o	It might be good for rcu_sched_grace_period() to invoke
	resched_cpu() when a given CPU wasn't responding quickly,
	but resched_cpu() is declared static...

This patch also fixes a long-standing bug in the earlier preemptable-RCU
implementation of synchronize_rcu() that could result in loss of
concurrent external changes to a task's CPU affinity mask.  I still cannot
remember who reported this...

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-19 10:01:36 +02:00
Cyrill Gorcunov
a76bfd0da2 initcalls: Fix m68k build and possible buffer overflow
This patch fixes a build bug on m68k - gcc decides to emit a call to the
strlen library function, which we don't implement.

More importantly - my previous patch "init: don't lose initcall return
values" (commit e662e1cfd4) had introduced
potential buffer overflow by wrong calculation of string accumulator
size.

Use strlcat() instead, fixing both bugs.

Many thanks Andreas Schwab and Geert Uytterhoeven for helping
to catch and fix the bug.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-15 18:20:06 -07:00
Linus Torvalds
e0df154f45 Split up 'do_initcalls()' into two simpler functions
One function to just loop over the entries, one function to actually do
the call and the associated debugging code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-15 18:14:01 -07:00
Linus Torvalds
a442ac512f Clean up 'print_fn_descriptor_symbol()' types
Everybody wants to pass it a function pointer, and in fact, that is what
you _must_ pass it for it to make sense (since it knows that ia64 and
ppc64 use descriptors for function pointers and fetches the actual
address from there).

So don't make the argument be a 'unsigned long' and force everybody to
add a cast.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-15 17:50:37 -07:00
Kay Sievers
30f2f0eb4b block: do_mounts - accept root=<non-existant partition>
Some devices, like md, may create partitions only at first access,
so allow root= to be set to a valid non-existant partition of an
existing disk. This applies only to non-initramfs root mounting.

This fixes a regression from 2.6.24 which did allow this to happen and
broke some users machines :(

Acked-by: Neil Brown <neilb@suse.de>
Tested-by: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br>
Cc: stable <stable@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-14 10:37:57 -07:00
Cyrill Gorcunov
e662e1cfd4 init: don't lose initcall return values
There is an ability to lose an initcall return value if it happened with irq
disabled or imbalanced preemption (and if we debug initcall).

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-13 08:02:25 -07:00
Rusty Russell
91e37a793b module: don't ignore vermagic string if module doesn't have modversions
Linus found a logic bug: we ignore the version number in a module's
vermagic string if we have CONFIG_MODVERSIONS set, but modversions
also lets through a module with no __versions section for modprobe
--force (with tainting, but still).

We should only ignore the start of the vermagic string if the module
actually *has* crcs to check.  Rather than (say) having an
entertaining hissy fit and creating a config option to work around the
buggy code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09 07:45:18 -07:00
Stas Sergeev
e5e1d3cb20 pcspkr: fix dependancies
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-05-07 12:42:03 +02:00
Parag Warudkar
aac6abca85 sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED
GROUP_SCHED is confirmed to cause unacceptable latencies, see:

   http://lkml.org/lkml/2008/5/2/370.

Mark it EXPERIMENTAL and default to no for now.

Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Peter Zijlstra
3e51f33fcc sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
this replaces the rq->clock stuff (and possibly cpu_clock()).

 - architectures that have an 'imperfect' hardware clock can set
   CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

 - the 'jiffie' window might be superfulous when we update tick_gtod
   before the __update_sched_clock() call in sched_clock_tick()

 - cpu_clock() might be implemented as:

     sched_clock_cpu(smp_processor_id())

   if the accuracy proves good enough - how far can TSC drift in a
   single jiffie when considering the filtering and idle hooks?

[ mingo@elte.hu: various fixes and cleanups ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Ingo Molnar
a5574cf65b sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select.

the next change utilizes it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Linus Torvalds
826e4506a0 Make forced module loading optional
The kernel module loader used to be much too happy to allow loading of
modules for the wrong kernel version by default.  For example, if you
had MODVERSIONS enabled, but tried to load a module with no version
info, it would happily load it and taint the kernel - whether it was
likely to actually work or not!

Generally, such forced module loading should be considered a really
really bad idea, so make it conditional on a new config option
(MODULE_FORCE_LOAD), and make it default to off.

If somebody really wants to force module loads, that's their problem,
but we should not encourage it.  Especially as it happened to me by
mistake (ie regular unversioned Fedora modules getting loaded) causing
lots of strange behavior.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-04 17:04:16 -07:00
Christoph Lameter
f6acb63508 slub: #ifdef simplification
If we make SLUB_DEBUG depend on SYSFS then we can simplify some
#ifdefs and avoid others.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-05-02 00:27:13 +03:00
Thomas Gleixner
3ac7fe5a4a infrastructure to debug (dynamic) objects
We can see an ever repeating problem pattern with objects of any kind in the
kernel:

1) freeing of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore.  One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.

While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause.  This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.

The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.

Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.

The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash.  Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.

Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.

The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation.  When the sanity check triggers a warning message
and a stack trace is printed.

The list of operations can be extended if the need arises.  For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets.  The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
global lock.

The debug code can be compiled in without being active.  The runtime overhead
is minimal and could be optimized by asm alternatives.  A kernel command line
option enables the debugging code.

Thanks to Ingo Molnar for review, suggestions and cleanup patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Pavel Emelyanov
5cd204550b Deprecate find_task_by_pid()
There are some places that are known to operate on tasks'
global pids only:

* the rest_init() call (called on boot)
* the kgdb's getthread
* the create_kthread() (since the kthread is run in init ns)

So use the find_task_by_pid_ns(..., &init_pid_ns) there
and schedule the find_task_by_pid for removal.

[sukadev@us.ibm.com: Fix warning in kernel/pid.c]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:48 -07:00
Oleg Nesterov
fae5fa44f1 signals: fix /sbin/init protection from unwanted signals
The global init has a lot of long standing problems with the unhandled fatal
signals.

	- The "is_global_init(current)" check in get_signal_to_deliver()
	  protects only the main thread. Sub-thread can dequee the fatal
	  signal and shutdown the whole thread group except the main thread.
	  If it dequeues SIGSTOP /sbin/init will be stopped, this is not
	  right too. Note that we can't use is_global_init(->group_leader),
	  this breaks exec and this can't solve other problems we have.

	- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
	  on delivery. This breaks exec, has other bad implications, and this
	  is just wrong.

Introduce the new SIGNAL_UNKILLABLE flag to fix these problems.  It also helps
to solve some other problems addressed by the subsequent patches.

Currently we use this flag for the global init only, but it could also be used
by kthreads and (perhaps) by the sub-namespace inits.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Akinobu Mita
199f0ca514 idr: create idr_layer_cache at boot time
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.

This change also enables us to eliminate the check every time idr_init() is
called.

[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:25 -07:00
Holger Schurig
88f458e4b9 sysctl: allow embedded targets to disable sysctl_check.c
Disable sysctl_check.c for embedded targets. This saves about about 11 kB
in .text and another 11 kB in .data on a PXA255 embedded platform.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:22 -07:00
Balbir Singh
cf475ad28a cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage.  The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined.  It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes.  The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Serge E. Hallyn
08ce5f16ee cgroups: implement device whitelist
Implement a cgroup to track and enforce open and mknod restrictions on device
files.  A device cgroup associates a device access whitelist with each cgroup.
 A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers.  Major
and minor are either an integer or * for all.  Access is a composition of r
(read), w (write), and m (mknod).

The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
the parent.  Admins can then remove devices from the whitelist or add new
entries.  A child cgroup can never receive a device access which is denied its
parent.  However when a device access is removed from a parent it will not
also be removed from the child(ren).

An entry is added using devices.allow, and removed using
devices.deny.  For instance

	echo 'c 1:3 mr' > /cgroups/1/devices.allow

allows cgroup 1 to read and mknod the device usually known as
/dev/null.  Doing

	echo a > /cgroups/1/devices.deny

will remove the default 'a *:* mrw' entry.

CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
has.  Any task can move itself between cgroups.  This won't be sufficient, but
we can decide the best way to adequately restrict movement later.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Paul Menage
418d7d875c CGroup API files: make CGROUP_DEBUG default to off
The cgroup debug subsystem isn't generally useful for users.  It should
default to "n".

Signed-off-by: Paul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Thomas Petazzoni
3265e66b18 directly use kmalloc() and kfree() in init/initramfs.c
Instead of using the malloc() and free() wrappers needed by the
lib/inflate.c code for allocations, simply use kmalloc() and kfree() in the
initramfs code.  This is needed for a further lib/inflate.c-related cleanup
patch that will remove the malloc() and free() functions.

Take that opportunity to remove the useless kmalloc() return value
cast.

Based on work done by Matt Mackall.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Bjorn Helgaas
626adeb667 Simplify initcall_debug output
print_fn_descriptor_symbol() prints the address if we don't have a symbol, so
no need to print both.

Also, combine printing return value with elapsed time.  Changes this:

  Calling initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50()
  initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() returned 1.
  initcall 0xc05b7a70 ran for 0 msecs: pci_mmcfg_late_insert_resources+0x0/0x50()
  initcall at 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50(): returned with error code 1

to this:

  calling  pci_mmcfg_late_insert_resources+0x0/0x50()
  initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned 1 after 0 msecs
  initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned with error code 1

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Adrian Bunk
f17a32e97e let LOG_BUF_SHIFT default to 17
16 kB is often no longer enough for a normal boot of an UP system.

And even less when people e.g. use suspend.

17 seems to be a more reasonable default for current kernels on current
hardware (it's just the default, anyone who is memory limited can still lower
it).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Harvey Harrison
d613c3e2d8 init: fix integer as NULL pointer warnings
init/do_mounts_rd.c:215:13: warning: Using plain integer as NULL pointer
init/do_mounts_md.c:136:45: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 17:29:18 -07:00
Linus Torvalds
e945e849e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: video drivers: add facility level
  sparc: tcx.c make tcx_init and tcx_exit static
  sparc: ffb.c make ffb_init and ffb_exit static
  sparc: cg14.c make cg14_init and cg15_exit static
  sparc: bw2.c fix bw2_exit
  sparc64: Fix accidental syscall restart on child return from clone/fork/vfork.
  sparc64: Clean up handling of pt_regs trap type encoding.
  sparc: Remove old style signal frame support.
  sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
  sparc: sunzilog.c remove unused argument
  sparc: fix drivers/video/tcx.c warning
  sparc64: Kill unused local ISA bus layer.
  input: Rewrite sparcspkr device probing.
  sparc64: Do not ignore 'pmu' device ranges.
  sparc64: Kill ISA_FLOPPY_WORKS code.
  sparc64: Kill CONFIG_SPARC32_COMPAT
  sparc64: Cleanups and corrections for arch/sparc64/Kconfig
  sparc64: Fix wedged irq regression.
2008-04-28 09:45:57 -07:00
Ingo Molnar
96fffeb4b4 make CC_OPTIMIZE_FOR_SIZE non-experimental
this option has been the default on a wide range of distributions
for a long time - time to make it non-experimental.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 09:42:46 -07:00
David S. Miller
09337f501e sparc64: Kill CONFIG_SPARC32_COMPAT
It's completely superfluous, CONFIG_COMPAT is sufficient.

What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types.  But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.

Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
Benjamin Herrenschmidt
839ad62e75 [POWERPC] Use __weak macro for smp_setup_processor_id
Use the __weak macro instead of the longer __attribute__ ((weak)) form
in one place in init/main.c.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
--

 init/main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:57:33 +10:00
Benjamin Herrenschmidt
8c9843e57a [POWERPC] Add thread_info_cache_init() weak hook
Some architectures need to maintain a kmem cache for thread info
structures.  The next commit adds that to powerpc to fix an alignment
problem.

There is no good arch callback to use to initialize that cache
that I can find, so this adds a new one in the form of a weak
function whose default is empty.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 20:57:33 +10:00
Viktor Radnai
b9b158fe1c sched: better rt-group documentation
Viktor was nice enough to enhance the document based on my replies to
his questions on the subject.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:45:01 +02:00
Mike Travis
e0982e90cd init: move setup of nr_cpu_ids to as early as possible
Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.

Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:59 +02:00
Mike Travis
321a8e9dcb cpumask: add CPU_MASK_ALL_PTR macro
* Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as
    a pointer reference to CPU_MASK_ALL.  This reduces where possible
    the instances where CPU_MASK_ALL allocates and fills a large
    array on the stack.  Used only if NR_CPUS > BITS_PER_LONG.

  * Change init/main.c to use new set_cpus_allowed_ptr().

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:59 +02:00
Christoph Lameter
0f389ec630 slub: No need for per node slab counters if !SLUB_DEBUG
The per node counters are used mainly for showing data through the sysfs API.
If that API is not compiled in then there is no point in keeping track of this
data. Disable counters for the number of slabs and the number of total slabs
if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
potentially contended cacheline so this could actually be a performance
benefit to embedded systems.

SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
is on by default).

Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
if the system is not compiled with NUMA support.

[penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-14 18:53:02 +03:00
Linus Torvalds
9a9e0d6855 ACPI: Remove ACPI_CUSTOM_DSDT_INITRD option
This essentially reverts commit 71fc47a9ad
("ACPI: basic initramfs DSDT override support"), because the code simply
isn't ready.

It did ugly things to the init sequence to populate the rootfs image
early, but that just ended up showing other problems with the whole
approach.  The fact is, the VFS layer simply isn't initialized this
early, and the relevant ACPI code should either run much later, or this
shouldn't be done at all.

For 2.6.25, we'll just pick the latter option.  We can revisit this
concept later if necessary.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Markus Gaugusch <dsdt@gaugusch.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-15 11:58:04 -07:00
Paul E. McKenney
21bbb39c37 rcu: move PREEMPT_RCU config option back under PREEMPT
The original preemptible-RCU patch put the choice between classic and
preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
on machines not supporting CONFIG_PREEMPT.  This choice was therefore moved to
init/Kconfig, which worked, but placed the choice between classic and
preemptible RCU at the top level, a very obtuse choice indeed.

This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
kernel/Kconfig.preempt, where one would expect it to be.  The other
(CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
architectures, hopefully avoiding build breakage.  Thanks to Roman Zippel for
suggesting this approach.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-10 18:01:20 -07:00
Linus Torvalds
2c6f2db13a Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  debugfs: fix sparse warnings
  Driver core: Fix cleanup when failing device_add().
  driver core: Remove dpm_sysfs_remove() from error path of device_add()
  PM: fix new mutex-locking bug in the PM core
  PM: Do not acquire device semaphores upfront during suspend
  kobject: properly initialize ksets
  sysfs: CONFIG_SYSFS_DEPRECATED fix
  driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04 16:37:35 -08:00
Balbir Singh
00f0b8259e Memory controller: rename to Memory Resource Controller
Rename Memory Controller to Memory Resource Controller.  Reflect the same
changes in the CONFIG definition for the Memory Resource Controller.  Group
together the config options for Resource Counters and Memory Resource
Controller.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:12 -08:00
Alex Riesen
d9d4fcfe51 Fix "Malformed early option 'loglevel'"
Keith Mannthey said:

  The parameter hotadd_percent is setup right but there is a "Malformed
  early option 'numa'" message.

Rusty Russell said:

  This happens when the function registered with early_param() returns
  non-zero.  __setup() functions return 1 if OK, module_param() and
  early_param() return 0 or a -ve error code.

For instance:

Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Malformed early option 'loglevel'
127MB HIGHMEM available.
896MB LOWMEM available.

Command line:

BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5

Acked-by: Yinghai Lu <yhlu.kernel@gmai.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Keith Mannthey <kmannth@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04 16:35:10 -08:00
Ingo Molnar
d47846c586 sysfs: CONFIG_SYSFS_DEPRECATED fix
CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes
regressions in working setups that had SYSFS_DEPRECATED disabled.

so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new
default via 'make oldconfig', even if their old .config's disabled
CONFIG_SYSFS_DEPRECATED ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04 14:47:04 -08:00
Greg Kroah-Hartman
024440d2ec driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
As things get moved into this config option, the hard date of 2006 does
not work anymore, so update the text to be more descriptive.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04 14:47:04 -08:00
Andi Kleen
0835ab53ea cgroup memory controller: document huge memory/cache overhead in Kconfig
Document huge memory/cache overhead of memory controller in Kconfig

I was a little surprised that 2.6.25-rc* increased struct page for the
memory controller.  At least on many x86-64 machines it will not fit into a
single cache line now anymore and also costs considerable amounts of RAM.
At earlier review I remembered asking for a external data structure for
this.

It's also quite unobvious that a innocent looking Kconfig option with a
single line Kconfig description has such a negative effect.

This patch attempts to document these disadvantages at least so that users
configuring their kernel can make a informed decision.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:16 -08:00
Jan Blunck
6ac08c39a1 Use struct path in fs_struct
* Use struct path in fs_struct.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Peter Zijlstra
052f1dc7eb sched: rt-group: make rt groups scheduling configurable
Make the rt group scheduler compile time configurable.
Keep it experimental for now.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-13 15:45:40 +01:00
Sam Ravnborg
fab1e310d3 kbuild: fix make V=1
When make -s support were added to filechk to
combination created with make V=1 were not
covered.
Fix it by explicitly cover this case too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Mike Frysinger <vapier@gentoo.org>
2008-02-11 17:43:54 +01:00
Thomas Gleixner
a03c2a48e0 x86: DEBUG_PAGEALLOC: enable after mem_init()
DEBUG_PAGEALLOC must not be enabled before mem_init(). Before this
point there is nothing to allocate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Ingo Molnar
166124fde9 brk: help text typo fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
Mike Frysinger
d75f4c683f kbuild: silence CHK/UPD messages according to $(quiet)
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-09 10:43:58 +01:00
Yinghai Lu
f6f21c8146 Convert loglevel-related kernel boot parameters to early_param
So we can use them for the early console like console=uart8250 or
earlycon=uart8250 or early_printk

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:42 -08:00
Oleg Nesterov
430c623121 start the global /sbin/init with 0,0 special pids
As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.

Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().

This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
8520d7c7f8 teach set_special_pids() to use struct pid
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Pavel Emelyanov
74bd59bb39 namespaces: cleanup the code managed with PID_NS option
Just like with the user namespaces, move the namespace management code into
the separate .c file and mark the (already existing) PID_NS option as "depend
on NAMESPACES"

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Pavel Emelyanov
aee16ce73c namespaces: cleanup the code managed with the USER_NS option
Make the user_namespace.o compilation depend on this option and move the
init_user_ns into user.c file to make the kernel compile and work without the
namespaces support.  This make the user namespace code be organized similar to
other namespaces'.

Also mask the USER_NS option as "depend on NAMESPACES".

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Pavel Emelyanov
ae5e1b22f1 namespaces: move the IPC namespace under IPC_NS option
Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.

The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good.  On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.

Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files.  It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file.  Moving them would make this patch complicated.  On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Pavel Emelyanov
58bfdd6dee namespaces: move the UTS namespace under UTS_NS option
Currently all the namespace management code is in the kernel/utsname.c file,
so just compile it out and make stubs in the appropriate header.

The init namespace itself is in init/version.c and is in the kernel all the
time.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Pavel Emelyanov
c5289a6949 namespaces: add the NAMESPACES config option
The option is selectable if EMBEDDED is chosen only.  When the EMBEDDED is off
namespaces will be on.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:23 -08:00
Linus Torvalds
f0f1b3364a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (112 commits)
  ACPI: fix build warning
  Revert "cpuidle: build fix for non-x86"
  ACPI: update intrd DSDT override console messages
  ACPI: update DSDT override documentation
  ACPI: Add "acpi_no_initrd_override" kernel parameter
  ACPI: its a directory not a folder....
  ACPI: misc cleanups
  ACPI: add missing prink prefix strings
  ACPI: cleanup acpi.h
  ACPICA: fix CONFIG_ACPI_DEBUG_FUNC_TRACE build
  ACPI: video: Ignore ACPI video devices that aren't present in hardware
  ACPI: video: reset brightness on resume
  ACPI: video: call ACPI notifier chain for ACPI video notifications
  ACPI: create notifier chain to get hotkey events to graphics driver
  ACPI: video: delete unused display switch on hotkey event code
  ACPI: video: create "brightness_switch_enabled" modparam
  cpuidle: Add a poll_idle method
  ACPI: cpuidle: Support C1 idle time accounting
  ACPI: enable MWAIT for C1 idle
  ACPI: idle: Fix acpi_safe_halt usages and interrupt enabling/disabling
  ...
2008-02-07 09:45:58 -08:00
Balbir Singh
8cdea7c054 Memory controller: cgroups setup
Setup the memory cgroup and add basic hooks and controls to integrate
and work with the cgroup.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: David Rientjes <rientjes@google.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Pavel Emelianov
e552b66170 Memory controller: resource counters
With fixes from David Rientjes <rientjes@google.com>

Introduce generic structures and routines for resource accounting.

Each resource accounting cgroup is supposed to aggregate it,
cgroup_subsystem_state and its resource-specific members within.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:18 -08:00
Markus Gaugusch
71fc47a9ad ACPI: basic initramfs DSDT override support
The basics of DSDT from initramfs. In case this option is selected,
populate_rootfs() is called a bit earlier to have the initramfs content
available during ACPI initialization.

This is a very similar path to the one available at
http://gaugusch.at/kernel.shtml but with some update in the
documentation, default set to No and the change of populate_rootfs() the
"Jeff Mahony way" (which avoids reading the initramfs twice).

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-06 22:07:41 -05:00
Linus Torvalds
3e6bdf473f Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: fix deadlock, make pgd_lock irq-safe
  virtio: fix trivial build bug
  x86: fix mttr trimming
  x86: delay CPA self-test and repeat it
  x86: fix 64-bit sections
  generic: add __FINITDATA
  x86: remove suprious ifdefs from pageattr.c
  x86: mark the .rodata section also NX
  x86: fix iret exception recovery on 64-bit
  cpuidle: dubious one-bit signed bitfield in cpuidle.h
  x86: fix sparse warnings in powernow-k8.c
  x86: fix sparse error in traps_32.c
  x86: trivial sparse/checkpatch in quirks.c
  x86 ptrace: disallow null cs/ss
  MAINTAINERS: RDC R-321x SoC maintainer
  brk randomization: introduce CONFIG_COMPAT_BRK
  brk: check the lower bound properly
  x86: remove X2 workaround
  x86: make spurious fault handler aware of large mappings
  x86: make traps on entry code be debuggable in user space, 64-bit
2008-02-06 13:54:09 -08:00
Ingo Molnar
32a932332c brk randomization: introduce CONFIG_COMPAT_BRK
based on similar patch from: Pavel Machek <pavel@ucw.cz>

Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free
(but not obliged to) randomize the brk area.

Heap randomization breaks ancient binaries, so we keep COMPAT_BRK
enabled by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:44 +01:00
Adrian Bunk
6c81c32f96 calibrate_delay() must be __cpuinit
calibrate_delay() must be __cpuinit, not __{dev,}init.

I've verified that this is correct for all users.

While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>

This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:

WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:08 -08:00
Robert P. J. Day
b25b7819e5 Remove superfluous checks for CONFIG_BLK_DEV_INITRD from initramfs.c
Given that init/Makefile includes initramfs.c in the build only if
CONFIG_BLK_DEV_INITRD is defined, there seems to be no point checking for
it yet again.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:06 -08:00
Adrian Bunk
011e3fcd1e proper prototype for get_filesystem_list()
Ad a proper prototype for migration_init() in include/linux/fs.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Adrian Bunk
a1c9eea9e5 proper prototype for signals_init()
Add a proper prototype for signals_init() in include/linux/signal.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Andrew Morton
941e492bdb read_current_timer() cleanups
- All implementations can be __devinit

- The function prototypes were in asm/timex.h but they all must be the same,
  so create a single declaration in linux/timex.h.

- uninline the sparc64 version to match the other architectures

- Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.

[ezk@cs.sunysb.edu: fix build]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:02 -08:00
Matt Mackall
3729145821 slob: correct Kconfig description
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:19 -08:00
Matt Mackall
1e88328111 maps4: make page monitoring /proc file optional
Make /proc/ page monitoring configurable

This puts the following files under an embedded config option:

/proc/pid/clear_refs
/proc/pid/smaps
/proc/pid/pagemap
/proc/kpagecount
/proc/kpageflags

[akpm@linux-foundation.org: Kconfig fix]
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:17 -08:00
Davide Libenzi
f79c343e2e timerfd: un-break CONFIG_TIMERFD
Remove the broken status to CONFIG_TIMERFD.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Mathieu Desnoyers
125e564582 Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig
Move the instrumentation Kconfig to

arch/Kconfig for architecture dependent options
  - oprofile
  - kprobes

and

init/Kconfig for architecture independent options
  - profiling
  - markers

Remove the "Instrumentation Support" menu. Everything moves to "General setup".
Delete the kernel/Kconfig.instrumentation file.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:08 +01:00
Mathieu Desnoyers
fb32e03fdc Create arch/Kconfig
Puts the content of arch/Kconfig in the "General setup" menu.

Linus:

> Should it come with a re-duplication of it's content into each
> architecture, which was the case previously ? The oprofile and kprobes
> menu entries were litteraly cut and pasted from one architecture to
> another. Should we put its content in init/Kconfig then ?

I don't think it's a good idea to go back to making it per-architecture,
although that extensive "depends on <list-of-archiectures-here>" might
indicate that there certainly is room for cleanup there.

And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I
just think it's wrong to (a) lump the code together when it really doesn't
necessarily need to and (b) show it to users as some kind of choice that
is tied together (whether it then has common code or not).

On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like

        depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32

really shouldn't exist in a file like kernel/Kconfig.instrumentation.

It would be much better to do

        depends on ARCH_SUPPORTS_KPROBES

in that generic file, and then architectures that do support it would just
have a

        bool ARCH_SUPPORTS_KPROBES
                default y

in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...

Sam Ravnborg:

Stuff it into a new file: arch/Kconfig
We can then extend this file to include all the 'trailing'
Kconfig things that are anyway equal for all ARCHs.

But it should be kept clean - so if we introduce such a file
then we should use ARCH_HAS_whatever in the arch specific Kconfig
files to enable stuff that is not shared.

[...]

The above suggestion is actually not exactly the best way to do it...
First the naming..
A quick grep shows following usage today (in Kconfig files)
ARCH_HAS        51
ARCH_SUPPORTS   4
HAVE_ARCH       7

ARCH_HAS is the clear winner.

In the common Kconfig file do:

config FOO
        depends on ARCH_HAS_FOO
        bool "bla bla"

config ARCH_HAS_FOO
        def_bool n

In the arch specific Kconfig file in a suitable place do:

config SUITABLE_OPTION
        select ARCH_HAS_FOO

The naming of ARCH_HAS_ is fixed and shall be:
ARCH_HAS_<config option it will enable>

Only a single line added pr. architecture.
And we will end up with a (maybe even commented) list of trivial selects.

- Yet another update :

Moving to HAVE_* now.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03 08:58:07 +01:00
Paul E. McKenney
095031052b RCU: add help text for "RCU implementation type"
This patch supplies help text for the "RCU implementation type"
kernel configuration choice.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31 22:45:22 +01:00
Ingo Molnar
12d6f21eac x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=y
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.

this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.

In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:58 +01:00
Mike Travis
dd5af90a7f x86/non-x86: percpu, node ids, apic ids x86.git fixup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:32 +01:00
Andi Kleen
ca74a6f84e x86: optimize lock prefix switching to run less frequently
On VMs implemented using JITs that cache translated code changing the lock
prefixes is a quite costly operation that forces the JIT to throw away and
retranslate a lot of code.

Previously a SMP kernel would rewrite the locks once for each CPU which
is quite unnecessary. This patch changes the code to never switch at boot in
 the normal case (SMP kernel booting with >1 CPU) or only once for SMP kernel
on UP.

This makes a significant difference in boot up performance on AMD SimNow!
Also I expect it to be a little faster on native systems too because a smp
switch does a lot of text_poke()s which each synchronize the pipeline.

v1->v2: Rename max_cpus
v1->v2: Fix off by one in UP check (Thomas Gleixner)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:17 +01:00
travis@sgi.com
b32ef636a5 percpu: use a kconfig variable to signal arch specific percpu setup
The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:32:51 +01:00
Roman Zippel
80daa56008 kconfig: use environment option
Use the environment option to provide the ARCH symbol
and the KERNELVERSION symbol.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28 23:14:39 +01:00
Yuichi Nakamura
1322b9def9 sh: syscall audit support.
Support syscall auditing..

Signed-off-by: Yuichi Nakamura <ynakam@hitachisoft.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-28 13:18:57 +09:00
Paul E. McKenney
e260be673a Preempt-RCU: implementation
This patch implements a new version of RCU which allows its read-side
critical sections to be preempted. It uses a set of counter pairs
to keep track of the read-side critical sections and flips them
when all tasks exit read-side critical section. The details
of this implementation can be found in this paper -

	http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

and the article-

	http://lwn.net/Articles/253651/

This patch was developed as a part of the -rt kernel development and
meant to provide better latencies when read-side critical sections of
RCU don't disable preemption.  As a consequence of keeping track of RCU
readers, the readers have a slight overhead (optimizations in the paper).
This implementation co-exists with the "classic" RCU implementations
and can be switched to at compiler.

Also includes RCU tracing summarized in debugfs.

[ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Reviewed-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:24 +01:00
Gautham R Shenoy
d221938c04 cpu-hotplug: refcount based cpu hotplug
This patch implements a Refcount + Waitqueue based model for
cpu-hotplug.

Now, a thread which wants to prevent cpu-hotplug, will bump up a global
refcount and the thread which wants to perform a cpu-hotplug operation
will block till the global refcount goes to zero.

The readers, if any, during an ongoing cpu-hotplug operation are blocked
until the cpu-hotplug operation is over.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:01 +01:00
Linus Torvalds
b47711bfbc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  selinux: make mls_compute_sid always polyinstantiate
  security/selinux: constify function pointer tables and fields
  security: add a secctx_to_secid() hook
  security: call security_file_permission from rw_verify_area
  security: remove security_sb_post_mountroot hook
  Security: remove security.h include from mm.h
  Security: remove security_file_mmap hook sparse-warnings (NULL as 0).
  Security: add get, set, and cloning of superblock security information
  security/selinux: Add missing "space"
2008-01-25 08:44:29 -08:00
Randy Dunlap
9148fe8767 sysfs: make SYSFS_DEPRECATED depend on SYSFS
Make SYSFS_DEPRECATED depend on SYSFS since files that check
CONFIG_SYSFS_DEPRECATED don't check for CONFIG_SYSFS first.
Also don't prompt user about SYSFS_DEPRECATED if SYSFS=n.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:41 -08:00
Kay Sievers
edfaa7c365 Driver core: convert block from raw kobjects to core devices
This moves the block devices to /sys/class/block. It will create a
flat list of all block devices, with the disks and partitions in one
directory. For compatibility /sys/block is created and contains symlinks
to the disks.

  /sys/class/block
  |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
  |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
  |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
  |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
  |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
  |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
  |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
  `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

  /sys/block/
  |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
  `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:36 -08:00
H. Peter Anvin
bced95283e security: remove security_sb_post_mountroot hook
The security_sb_post_mountroot() hook is long-since obsolete, and is
fundamentally broken: it is never invoked if someone uses initramfs.
This is particularly damaging, because the existence of this hook has
been used as motivation for not using initramfs.

Stephen Smalley confirmed on 2007-07-19 that this hook was originally
used by SELinux but can now be safely removed:

     http://marc.info/?l=linux-kernel&m=118485683612916&w=2

Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-01-25 11:29:50 +11:00
Linus Torvalds
158a962422 Unify /proc/slabinfo configuration
Both SLUB and SLAB really did almost exactly the same thing for
/proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.

This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
and SLAB, and shares all the setup code.  Maybe SLOB will want this some
day too.

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02 13:04:48 -08:00
Len Brown
982286d1b8 Pull bugzilla-9345 into release branch 2007-12-06 16:52:00 -05:00
Srivatsa Vaddagiri
d842de871c sched: cpu accounting controller (V2)
Commit cfb5285660 removed a useful feature for
us, which provided a cpu accounting resource controller.  This feature would be
useful if someone wants to group tasks only for accounting purpose and doesnt
really want to exercise any control over their cpu consumption.

The patch below reintroduces the feature. It is based on Paul Menage's
original patch (Commit 62d0df6406), with
these differences:

        - Removed load average information. I felt it needs more thought (esp
	  to deal with SMP and virtualized platforms) and can be added for
	  2.6.25 after more discussions.
        - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
	  stats are) and invoke cpuacct_charge() from the respective scheduler
	  classes
	- Make accounting scalable on SMP systems by splitting the usage
	  counter to be per-cpu
	- Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
	  code is not big enough to warrant a new file and also this rightly
	  needs to live inside the scheduler. Also things like accessing
	  rq->lock while reading cpu usage becomes easier if the code lived in
	  kernel/sched.c)

The patch also modifies the cpu controller not to provide the same accounting
information.

Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>

 Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
 some simple tests like cpuspin (spin on the cpu), ran several tasks in
 the same group and timed them. Compared their time stamps with
 cpuacct.usage.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-02 20:04:49 +01:00
Rafael J. Wysocki
8baabde66c Freezer: Fix s2disk resume from initrd
Add appropriate freezer annotations to handle_initrd(), so that it's possible
to resume from disk from an initrd.

http://bugzilla.kernel.org/show_bug.cgi?id=9345

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Chris Friedhoff <chris@friedhoff.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-11-20 22:22:42 -05:00
Mike Frysinger
529a73fbae Blackfin arch: punt CONFIG_BFIN -- we already have CONFIG_BLACKFIN
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-11-23 14:28:44 +08:00
Eric W. Biederman
57d5f66b86 pidns: Place under CONFIG_EXPERIMENTAL
This is my trivial patch to swat innumerable little bugs with a single
blow.

After some intensive review (my apologies for not having gotten to this
sooner) what we have looks like a good base to build on with the current
pid namespace code but it is not complete, and it is still much to simple
to find issues where the kernel does the wrong thing outside of the initial
pid namespace.

Until the dust settles and we are certain we have the ABI and the
implementation is as correct as humanly possible let's keep process ID
namespaces behind CONFIG_EXPERIMENTAL.

Allowing us the option of fixing any ABI or other bugs we find as long as
they are minor.

Allowing users of the kernel to avoid those bugs simply by ensuring their
kernel does not have support for multiple pid namespaces.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kir Kolyshkin <kir@swsoft.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:43 -08:00
Andrew Morton
cfb5285660 revert "Task Control Groups: example CPU accounting subsystem"
Revert 62d0df6406.

This was originally intended as a simple initial example of how to create a
control groups subsystem; it wasn't intended for mainline, but I didn't make
this clear enough to Andrew.

The CFS cgroup subsystem now has better functionality for the per-cgroup usage
accounting (based directly on CFS stats) than the "usage" status file in this
patch, and the "load" status file is rather simplistic - although having a
per-cgroup load average report would be a useful feature, I don't believe this
patch actually provides it.  If it gets into the final 2.6.24 we'd probably
have to support this interface for ever.

Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:40 -08:00
Adrian Bunk
e6fe6649b4 sched: proper prototype for kernel/sched.c:migration_init()
This patch adds a proper prototype for migration_init() in
include/linux/sched.h

Since there's no point in always returning 0 to a caller that doesn't check
the return value it also changes the function to return void.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-09 22:39:39 +01:00
Ingo Molnar
8ef93cf114 sched: mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL
mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL. All bugs have been
fixed and it's perfect ;-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:51 +02:00
Al Viro
74c3cbe33b [PATCH] audit: watching subtrees
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is.  Limitations:
	* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
	* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
	* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there.  New command
tells audit to recalculate the trees, trimming such sources of false
positives.

Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-21 02:37:45 -04:00
Simon Arlott
211fee8a82 spelling fixes: init/
Spelling fix in init/.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:28:29 +02:00
Robert P. J. Day
d8af7c6ab0 Drop the superfluous test for an old version of gcc.
The header file <linux/compiler.h> already enforces a suitably recent
version of gcc, so there's no point checking for that again.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:13:32 +02:00
Srivatsa Vaddagiri
68318b8e0b Hook up group scheduler with control groups
Enable "cgroup" (formerly containers) based fair group scheduling.  This
will let administrator create arbitrary groups of tasks (using "cgroup"
pseudo filesystem) and control their cpu bandwidth usage.

[akpm@linux-foundation.org: fix cpp condition]
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:51 -07:00
Serge E. Hallyn
858d72ead4 cgroups: implement namespace tracking subsystem
When a task enters a new namespace via a clone() or unshare(), a new cgroup
is created and the task moves into it.

This version names cgroups which are automatically created using
cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or
cloned process.  (Thanks Pavel for the idea) This is safe because if the
process unshares again, it will create

	/cgroups/(...)/node_<pid>/node_<pid>

The only possibilities (AFAICT) for a -EEXIST on unshare are

	1. pid wraparound
	2. a process fails an unshare, then tries again.

Case 1 is unlikely enough that I ignore it (at least for now).  In case 2, the
node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare()
succeed.

Changelog:
	Name cloned cgroups as "node_<pid>".

[clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:37 -07:00
Paul Menage
006cb99200 Task Control Groups: simple task cgroup debug info subsystem
This example subsystem exports debugging information as an aid to diagnosing
refcount leaks, etc, in the cgroup framework.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Paul Menage
62d0df6406 Task Control Groups: example CPU accounting subsystem
This example demonstrates how to use the generic cgroup subsystem for a
simple resource tracker that counts, for the processes in a cgroup, the
total CPU time used and the %CPU used in the last complete 10 second interval.

Portions contributed by Balbir Singh <balbir@in.ibm.com>

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Paul Menage
8793d854ed Task Control Groups: make cpusets a client of cgroups
Remove the filesystem support logic from the cpusets system and makes cpusets
a cgroup subsystem

The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
passed through to the cgroup filesystem with the appropriate options to
emulate the old cpuset filesystem behaviour.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Paul Menage
ddbcc7e8e5 Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------

There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.

This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.

The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.

- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel

This patch:

Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Stephen Hemminger
c80544dc0b sparse pointer use of zero as null
Get rid of sparse related warnings from places that use integer as NULL
pointer.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Avi Kivity
e98c320291 Move PREEMPT_NOTIFIERS into an always-included Kconfig
Kconfig.preempt is not included on some archs (for example, m68k).  On those
archs, the Kconfig machinery complains that KVM selects an undefined symbol
PREEMPT_NOTIFIERS (which lives in Kconfig.preempt).

So move the offending symbol into a Kconfig file which is included by
everyone.

Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:55 -07:00
Linus Torvalds
821f3eff7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
  kbuild: introduce ccflags-y, asflags-y and ldflags-y
  kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
  kbuild: enable use of AFLAGS and CFLAGS on commandline
  kbuild: enable 'make AFLAGS=...' to add additional options to AS
  kbuild: fix AFLAGS use in h8300 and m68knommu
  kbuild: check for wrong use of CFLAGS
  kbuild: enable 'make CFLAGS=...' to add additional options to CC
  kbuild: fix up CFLAGS usage
  kbuild: make modpost detect unterminated device id lists
  kbuild: call export_report from the Makefile
  kbuild: move Kai Germaschewski to CREDITS
  kconfig/menuconfig: distinguish between selected-by-another options and comments
  kconfig: tristate choices with mixed tristate and boolean values
  include/linux/Kbuild: remove duplicate entries
  kbuild: kill backward compatibility checks
  kbuild: kill EXTRA_ARFLAGS
  kbuild: fix documentation in makefiles.txt
  kbuild: call make once for all targets when O=.. is used
  kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
  kbuild: update _shipped files for kconfig syntax cleanup
  ...

Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
2007-10-16 11:23:06 -07:00
Mel Gorman
ac0e5b7a6b remove PAGE_GROUP_BY_MOBILITY
Grouping pages by mobility can be disabled at compile-time. This was
considered undesirable by a number of people. However, in the current stack of
patches, it is not a simple case of just dropping the configurable patch as it
would cause merge conflicts.  This patch backs out the configuration option.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:43:00 -07:00
Mel Gorman
b92a6edd4b Add a configure option to group pages by mobility
The grouping mechanism has some memory overhead and a more complex allocation
path.  This patch allows the strategy to be disabled for small memory systems
or if it is known the workload is suffering because of the strategy.  It also
acts to show where the page groupings strategy interacts with the standard
buddy allocator.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Randy Dunlap
bfe8df3d31 slow down printk during boot
Optionally add a boot delay after each kernel printk() call, crudely
measured in milliseconds, with a maximum delay of 10 seconds per printk.

Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

It has been useful in cases like "during boot, my machine just reboots or the
screen goes black" by slowing down printk, (and adding initcall_debug), we can
usually see the last thing that happened before the lights went out which is
usually a valuable clue.

[akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
[akpm@linux-foundation.org: fix lots of stuff]
[bunk@stusta.de: kernel/printk.c: make 2 variables static]
[heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:49 -07:00
Srivatsa Vaddagiri
fb615581c7 sched: group scheduler, fix coding style issues
Fix coding style issues reported by Randy Dunlap and others

Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:12 +02:00
Ingo Molnar
de8d585a12 sched: enable CONFIG_FAIR_GROUP_SCHED=y by default
enable CONFIG_FAIR_GROUP_SCHED=y by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Ingo Molnar
7ed2be459b sched: fair-group sched, cleanups
fair-group sched, cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri
24e377a832 sched: add fair-user scheduler
Enable user-id based fair group scheduling. This is useful for anyone
who wants to test the group scheduler w/o having to enable
CONFIG_CGROUPS.

A separate scheduling group (i.e struct task_grp) is automatically created for 
every new user added to the system. Upon uid change for a task, it is made to 
move to the corresponding scheduling group.

A /proc tunable (/proc/root_user_share) is also provided to tune root
user's quota of cpu bandwidth.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri
9b5b77512d sched: clean up code under CONFIG_FAIR_GROUP_SCHED
With the view of supporting user-id based fair scheduling (and not just
container-based fair scheduling), this patch renames several functions
and makes them independent of whether they are being used for container
or user-id based fair scheduling.

Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating
less-sized array for tg->cfs_rq[] and tf->se[]).

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:09 +02:00
Srivatsa Vaddagiri
29f59db3a7 sched: group-scheduler core
Add interface to control cpu bandwidth allocation to task-groups.

(not yet configurable, due to missing CONFIG_CONTAINERS)

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 17:00:07 +02:00
Sam Ravnborg
a0f97e06a4 kbuild: enable 'make CFLAGS=...' to add additional options to CC
The variable CFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.

This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
tree and enabling one to use:
make CFLAGS=...
to specify additional gcc commandline options.

One usecase is when trying to find gcc bugs but other
use cases has been requested too.

Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k

Test was simple to do a defconfig build, apply the patch and check
that nothing got rebuild.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-10-14 22:21:35 +02:00
Andrew Morton
e42601973b disable sys_timerfd() for 2.6.23
There is still some confusion and disagreement over what this interface should
actually do.  So it is best that we disable it in 2.6.23 until we get that
fully sorted out.

(sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we
assume that nobody is using it yet).

Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Davide Libenzi <davidel@xmailserver.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Nigel Cunningham
019ad4a0a6 Fix failure to resume from initrds
Commit 8314418629 (Freezer: make kernel
threads nonfreezable by default) breaks freezing when attempting to resume
from an initrd, because the init (which is freezeable) spins while waiting
for another thread to run /linuxrc, but doesn't check whether it has been
told to enter the refrigerator.  The original patch replaced a call to
try_to_freeze() with a call to yield().  I believe a simple reversion is
wrong because if !CONFIG_PM_SLEEP, try_to_freeze() is a noop.  It should
still yield.

Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:17 -07:00
Hugh Dickins
62e6f1e8bb fix maxcpus=1 oops in show_stat()
Alexey Dobriyan reports that maxcpus=1 is still broken in 2.6.23-rc4:
if CONFIG_HOTPLUG_CPU is not set, x86_64 bootup oopses in show_stat() -
for_each_possible_cpu accesses a per-cpu area which was never set up.

Alexey identified commit 61ec7567db
(ACPI: boot correctly with "nosmp" or "maxcpus=0") as the origin;
but it's not really to blame, just exposes a bug in 2.6.23-rc1's commit
8b3b295502 (Especially when !CONFIG_HOTPLUG_CPU,
avoid needlessy allocating resources for CPUs that can never become available).

rc1's test for max_cpus < 2 in start_kernel() wasn't working because
max_cpus was still NR_CPUS at that point: until rc4 moved the maxcpus
parsing earlier.  Now it sets cpu_possible_map to 1 before allocating
all possible per-cpu areas; then smp_init() expands cpu_possible_map
to cpu_present_map (0xf in my case) later on.

rc1's commit has good intentions, but expects cpu_present_map to be
limited by maxcpus, which is only the case on i386.  cpus_and(possible,
possible,present) might be good, but needs an audit of cpu_present_map
uses - there may well be assumptions that any cpu present is possible.

So stay safe for now and just revert those #ifndef CONFIG_HOTPLUG_CPU
optimizations in rc1's commit.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-30 21:54:31 -07:00
Hugh Dickins
8134097717 fix maxcpus=N parsing
Commit 61ec7567db ('ACPI: boot correctly
with "nosmp" or "maxcpus=0"') broke 'maxcpus=' handling on x86[-64].

maxcpus=N is now having no effect on x86_64, and freezing bootup on i386
(because of inconsistency with the separate maxcpus parsing down in
arch/i386, I guess).  That's because early_param parsing is a little
different from __setup parsing, and needs the "=" omitted: then it seems
to work as the original commit intended (no mention of IO-APIC in
/proc/interrupts when maxcpus=0).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-27 10:27:48 -07:00
Len Brown
61ec7567db ACPI: boot correctly with "nosmp" or "maxcpus=0"
In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
However, in ACPI mode, these parameters didn't completely disable
the IO APIC initialization code and boot failed.

init/main.c:
	Disable the IO_APIC if "nosmp" or "maxcpus=0"
	undefine disable_ioapic_setup() when it doesn't apply.

i386:
	delete ioapic_setup(), it was a duplicate of parse_noapic()
	delete undefinition of disable_ioapic_setup()

x86_64:
	rename disable_ioapic_setup() to parse_noapic() to match i386
	define disable_ioapic_setup() in header to match i386

http://bugzilla.kernel.org/show_bug.cgi?id=1641

Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-08-21 00:33:35 -04:00
Al Boldi
ff0cfc66cd Kconfig: remove top level menu "Code maturity level options"
Remove the top level menu "Code maturity level options", and moves its
options into menu "General setup".

This makes Kconfig less cluttered and easier to setup.

Signed-off-by: Al Boldi <a1426z@gawab.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:43 -07:00
Adrian Bunk
448e3cee83 ANON_INODES shouldn't be user visible
There doesn't seem to be a good reason for ANON_INODES being
an user visible option.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:42 -07:00
Linus Torvalds
6a302358d8 Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6.23
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6.23:
  sh: Fix fs.h removal from mm.h regressions.
  sh: fix get_wchan() for SH kernels without framepointers
  sh: arch/sh/boot - fix shell usage
  rtc: rtc-sh: Correct sh_rtc_set_time() for some SH-3 parts.
  sh: remove support for sh7300 and solution engine 7300
  sh: Add sh to the CC_OPTIMIZE_FOR_SIZE dependencies.
  sh: Kill off virt_to_bus()/bus_to_virt().
  sh: sh-sci - fix SH7708 support
  sh: Restrict DSP support to specific CPUs.
  sh: Silence sq compile warning on sh4 nommu.
  sh: Kill the rest of the SE73180 cruft.
  sh: remove support for sh73180 and solution engine 73180
  sh: remove old broken pint code
  sh: Reclaim beginning of P3 space for vmalloc area.
  sh: Fix Dreamcast DMA issues.
  sh: Add kmap_coherent()/kunmap_coherent() interface for SH-4.
2007-07-30 21:54:37 -07:00
Al Viro
b0a5ab9315 initramfs: missing __init
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:11:56 -07:00
Paul Mundt
32582fa460 sh: Add sh to the CC_OPTIMIZE_FOR_SIZE dependencies.
Presently we only use this with CONFIG_EXPERIMENTAL, but it is
something that can be supported commonly.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-26 15:37:44 +09:00
Rafael J. Wysocki
8314418629 Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Christoph Lameter
a0acd82080 Make SLUB the default allocator
There are some reports that 2.6.22 has SLUB as the default. Not
true!

This will make SLUB the default for 2.6.23.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Jan Beulich
8b3b295502 adjust nosmp handling
Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for
CPUs that can never become available.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:47 -07:00
Dave Jones
97842216b8 Allow softlockup to be runtime disabled
It's useful sometimes to disable the softlockup checker at boottime.
Especially if it triggers during a distro install.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:47 -07:00
Cedric Le Goater
acce292c82 user namespace: add the framework
Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.

A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)

The unshare is not included in this patch.

Changes since [try #4]:
	- Updated get_user_ns and put_user_ns to accept NULL, and
	  get_user_ns to return the namespace.

Changes since [try #3]:
	- moved struct user_namespace to files user_namespace.{c,h}

Changes since [try #2]:
	- removed struct user_namespace* argument from find_user()

Changes since [try #1]:
	- removed struct user_namespace* argument from find_user()
	- added a root_user per user namespace

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:47 -07:00
Cedric Le Goater
7d69a1f4a7 remove CONFIG_UTS_NS and CONFIG_IPC_NS
CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:47 -07:00
Pierre Ossman
cc1ed7542c init: wait for asynchronously scanned block devices
Some buses (e.g.  USB and MMC) do their scanning of devices in the
background, causing a race between them and prepare_namespace().  In order
to be able to use these buses without an initrd, we now wait for the device
specified in root= to actually show up.

If the device never shows up than we will hang in an infinite loop.  In
order to not mess with setups that reboot on panic, the feature must be
turned on via the command line option "rootwait".

[bunk@stusta.de: root_wait can become static]
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:45 -07:00
Jan Engelhardt
66da573320 Use menuconfig objects II - module menu
Change menuconfig objects from "menu, config" into "menuconfig" so that the
user can disable the whole feature without entering its menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:40 -07:00
Paul Mundt
84a01c2f8e slob: sparsemem support
Currently slob is disabled if we're using sparsemem, due to an earlier
patch from Goto-san.  Slob and static sparsemem work without any trouble as
it is, and the only hiccup is a missing slab_is_available() in the case of
sparsemem extreme.  With this, we're rid of the last set of restrictions
for slob usage.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:36 -07:00
Yinghai Lu
18a8bd949d serial: convert early_uart to earlycon for 8250
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
include/asm-x86_64/serial.h.  the serial8250_ports need to be probed late in
serial initializing stage.  the console_init=>serial8250_console_init=>
register_console=>serial8250_console_setup will return -ENDEV, and console
ttyS0 can not be enabled at that time.  need to wait till uart_add_one_port in
drivers/serial/serial_core.c to call register_console to get console ttyS0.
that is too late.

Make early_uart to use early_param, so uart console can be used earlier.  Make
it to be bootconsole with CON_BOOT flag, so can use console handover feature.
and it will switch to corresponding normal serial console automatically.

new command line will be:
	console=uart8250,io,0x3f8,9600n8
	console=uart8250,mmio,0xff5e0000,115200n8
or
	earlycon=uart8250,io,0x3f8,9600n8
	earlycon=uart8250,mmio,0xff5e0000,115200n8

it will print in very early stage:
	Early serial console at I/O port 0x3f8 (options '9600n8')
	console [uart0] enabled
later for console it will print:
	console handover: boot [uart0] -> real [ttyS0]

Signed-off-by: <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:35 -07:00
Jan Engelhardt
5f5c926e3c block/Kconfig already has its own "menuconfig" so remove these
"menu, endmenu" that did not get cleaned up in the block patch
[ http://lkml.org/lkml/2007/4/10/251 ]

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 13:43:28 +02:00
Ingo Molnar
1df21055e3 sched: add init_idle_bootup_task()
add the init_idle_bootup_task() callback to the bootup thread,
unused at the moment. (CFS will use it to switch the scheduling
class of the boot thread to the idle class)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:51:58 +02:00
Sam Ravnborg
92080309df init/main: use __init_refok to fix section mismatch
Kill a special case in modpost by introducing the
__init_refok marker.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-19 09:11:58 +02:00
Christoph Lameter
9fbf09a09e SLUB: Remove depends on EXPERIMENTAL and !ARCH_USES_SLAB_PAGE_STRUCT
No arch sets ARCH_USES_SLAB_PAGE_STRUCT anymore.

Remove the experimental dependency as well since we want to have it as
a real alternative to SLAB.

It all comes down to killing a single line from init/Kconfig.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:03 -07:00
Nick Piggin
afc0cedbe9 slob: implement RCU freeing
The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because
even on UP, RCU freeing semantics are not equivalent to simply freeing
immediately.  This also allows SLOB to be used on SMP.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:02 -07:00
Davide Libenzi
e1ad7468c7 signal/timer/event: eventfd core
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only).  It can be used instead of pipe(2) in all cases where those
would simply be used to signal events.  Their kernel overhead is much lower
than pipes, and they do not consume two fds.  When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it.  The API is:

int eventfd(unsigned int count);

The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd.  It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

The POLLIN flag is raised when the internal counter is greater than zero.

The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.

The POLLERR flag is raised when an overflow in the counter value is detected.

The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).

But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.

The read(2) function reads the __u64 counter value, and reset the internal
value to zero.  If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).

The write(2) call writes an __u64 count value, and adds it to the current
counter.  The eventfd fd supports O_NONBLOCK also.

On the kernel side, we have:

struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);

The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).

The kernel can then call eventfd_signal() every time it wants to post an event
to userspace.  The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:

http://www.xmailserver.org/eventfd-bench.c

This is the eventfd-based version of pipetest-4 (pipe(2) based):

http://www.xmailserver.org/pipetest-4.c

Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.

[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Davide Libenzi
b215e28399 signal/timer/event: timerfd core
This patch introduces a new system call for timers events delivered though
file descriptors.  This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2).  As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.

The system call is defined as:

int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd).  If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.

The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.  The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.

If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.

If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.

The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.

The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.

As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2).  When a timer event happened on the timerfd, a POLLIN mask will be
returned.

The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2).  The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.

A quick test program, shows timerfd working correctly on my amd64 box:

http://www.xmailserver.org/timerfd-test.c

[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Davide Libenzi
fba2afaaec signal/timer/event: signalfd core
This patch series implements the new signalfd() system call.

I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal().  If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK).  This
seems to be working fine on my Dual Opteron machine.  I made a quick test
program for it:

http://www.xmailserver.org/signafd-test.c

The signalfd() system call implements signal delivery into a file descriptor
receiver.  The signalfd file descriptor if created with the following API:

int signalfd(int ufd, const sigset_t *mask, size_t masksize);

The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea).  Use "ufd" == -1 if you want a brand new
signalfd file.

The "mask" allows to specify the signal mask of signals that we are interested
in.  The "masksize" parameter is the size of "mask".

The signalfd fd supports the poll(2) and read(2) system calls.  The poll(2)
will return POLLIN when signals are available to be dequeued.  As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.

The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer.  The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error.  The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned.  The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.

If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned.  A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process.  The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:

struct signalfd_siginfo {
	__u32 signo;	/* si_signo */
	__s32 err;	/* si_errno */
	__s32 code;	/* si_code */
	__u32 pid;	/* si_pid */
	__u32 uid;	/* si_uid */
	__s32 fd;	/* si_fd */
	__u32 tid;	/* si_fd */
	__u32 band;	/* si_band */
	__u32 overrun;	/* si_overrun */
	__u32 trapno;	/* si_trapno */
	__s32 status;	/* si_status */
	__s32 svint;	/* si_int */
	__u64 svptr;	/* si_ptr */
	__u64 utime;	/* si_utime */
	__u64 stime;	/* si_stime */
	__u64 addr;	/* si_addr */
};

[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Davide Libenzi
5dc8bf8132 signal/timer/event fds: anonymous inode source
This patch add an anonymous inode source, to be used for files that need
and inode only in order to create a file*. We do not care of having an
inode for each file, and we do not even care of having different names in
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd
(and whatever else there'll be).

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:36 -07:00
Sukadev Bhattiprolu
0e29b24aa6 Explicitly set pgid and sid of init process
Explicitly set pgid and sid of init process to 1.

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: <containers@lists.osdl.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:35 -07:00
Christoph Lameter
d4751a2797 SLUB: SLUB_DEBUG must depend on SLUB
Otherwise people get asked about SLUB_DEBUG even if they have another
slab allocator enabled.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:53 -07:00
Linus Torvalds
9a9136e270 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  sound: convert "sound" subdirectory to UTF-8
  MAINTAINERS: Add cxacru website/mailing list
  include files: convert "include" subdirectory to UTF-8
  general: convert "kernel" subdirectory to UTF-8
  documentation: convert the Documentation directory to UTF-8
  Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
  remove broken URLs from net drivers' output
  Magic number prefix consistency change to Documentation/magic-number.txt
  trivial: s/i_sem /i_mutex/
  fix file specification in comments
  drivers/base/platform.c: fix small typo in doc
  misc doc and kconfig typos
  Remove obsolete fat_cvf help text
  Fix occurrences of "the the "
  Fix minor typoes in kernel/module.c
  Kconfig: Remove reference to external mqueue library
  Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
  Correct comments in genrtc.c to refer to correct /proc file.
  Fix more "deprecated" spellos.
  Fix "deprecated" typoes.
  ...

Fix trivial comment conflict in kernel/relay.c.
2007-05-09 12:54:17 -07:00
Eric W. Biederman
73c279927f kthread: don't depend on work queues
Currently there is a circular reference between work queue initialization
and kthread initialization.  This prevents the kthread infrastructure from
initializing until after work queues have been initialized.

We want the properties of tasks created with kthread_create to be as close
as possible to the init_task and to not be contaminated by user processes.
The later we start our kthreadd that creates these tasks the harder it is
to avoid contamination from user processes and the more of a mess we have
to clean up because the defaults have changed on us.

So this patch modifies the kthread support to not use work queues but to
instead use a simple list of structures, and to have kthreadd start from
init_task immediately after our kernel thread that execs /sbin/init.

By being a true child of init_task we only have to change those process
settings that we want to have different from init_task, such as our process
name, the cpus that are allowed, blocking all signals and setting SIGCHLD
to SIG_IGN so that all of our children are reaped automatically.

By being a true child of init_task we also naturally get our ppid set to 0
and do not wind up as a child of PID == 1.  Ensuring that tasks generated
by kthread_create will not slow down the functioning of the wait family of
functions.

[akpm@linux-foundation.org: use interruptible sleeps]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Dave Gilbert
dd2a345f8f Display all possible partitions when the root filesystem failed to mount
Display all possible partitions when the root filesystem is not mounted.
This helps to track spell'o's and missing drivers.

Updated to work with newer kernels.

Example output:

VFS: Cannot open root device "foobar" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
0800    8388608 sda driver: sd
  0801     192748 sda1
  0802    8193150 sda2
0810    4194304 sdb driver: sd
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[akpm@linux-foundation.org: cleanups, fix printk warnings]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Dave Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Christoph Lameter
34013886ef Fix spellings of slab allocator section in init/Kconfig
Fix some of the spelling issues. Fix sentences. Discourage SLOB use
since SLUB can pack objects denser.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:46 -07:00
Christoph Lameter
41ecc55b8a SLUB: add CONFIG_SLUB_DEBUG
CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components
of SLUB.  Thus SLUB will be able to replace SLOB.  SLUB can arrange objects in
a denser way than SLOB and the code size should be minimal without debugging
and sysfs support.

Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG.
CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB.  SLUB enables
debugging via a boot parameter.  SLUB debug code should always be present.

CONFIG_SLUB_DEBUG can be modified in the embedded config section.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Robert P. J. Day
b0e3765040 Kconfig: Remove reference to external mqueue library
Remove the reference to an external mqueue library since that was
merged into glibc in 2004.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:25:13 +02:00
David Sterba
3dde6ad8fc Fix trivial typos in Kconfig* files
Fix several typos in help text in Kconfig* files.

Signed-off-by: David Sterba <dave@jikos.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:12:20 +02:00
Alistair John Strachan
794543a236 Move LOG_BUF_SHIFT to a more sensible place
Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
undesirable trade off for something as trivial as increasing the kernel log
buffer size.

Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
likely to be able to change it such a circumstance that the default buffer
size is insufficient.

Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Ingo Molnar
8f0c45cdf8 enhance initcall_debug, measure latency
enhance the initcall_debug boot option:

 - measure the time the initcall took to execute and report
   it in units of milliseconds.

 - show the return code of initcalls (useful to see failures and
   to make sure that an initcall hung)

[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Adrian Bunk
46595390e9 init/do_mounts.c: proper prepare_namespace() prototype
Add a proper protype for prepare_namespace() in include/linux/init.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:00 -07:00
Rafael J. Wysocki
726162b5da freezer: remove PF_NOFREEZE from handle_initrd
Make handle_initrd() call try_to_freeze() in a suitable place instead of setting
PF_NOFREEZE for the current task.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:59 -07:00
Bryan Wu
1394f03221 blackfin architecture
This adds support for the Analog Devices Blackfin processor architecture, and
currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
(Dual Core) devices, with a variety of development platforms including those
avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
BF561-EZKIT), and Bluetechnix!  Tinyboards.

The Blackfin architecture was jointly developed by Intel and Analog Devices
Inc.  (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
December of 2000.  Since then ADI has put this core into its Blackfin
processor family of devices.  The Blackfin core has the advantages of a clean,
orthogonal,RISC-like microprocessor instruction set.  It combines a dual-MAC
(Multiply/Accumulate), state-of-the-art signal processing engine and
single-instruction, multiple-data (SIMD) multimedia capabilities into a single
instruction-set architecture.

The Blackfin architecture, including the instruction set, is described by the
ADSP-BF53x/BF56x Blackfin Processor Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

The Blackfin processor is already supported by major releases of gcc, and
there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
documentation, including "getting started" guides available at:
http://docs.blackfin.uclinux.org/ which provides links to the sources and
patches you will need in order to set up a cross-compiling environment for
bfin-linux-uclibc

This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/

We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

[m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Aubrey Li <aubrey.li@analog.com>
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:58 -07:00
Christoph Lameter
81819f0fc8 SLUB core
This is a new slab allocator which was motivated by the complexity of the
existing code in mm/slab.c. It attempts to address a variety of concerns
with the existing implementation.

A. Management of object queues

   A particular concern was the complex management of the numerous object
   queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
   each allocating CPU and use objects from a slab directly instead of
   queueing them up.

B. Storage overhead of object queues

   SLAB Object queues exist per node, per CPU. The alien cache queue even
   has a queue array that contain a queue for each processor on each
   node. For very large systems the number of queues and the number of
   objects that may be caught in those queues grows exponentially. On our
   systems with 1k nodes / processors we have several gigabytes just tied up
   for storing references to objects for those queues  This does not include
   the objects that could be on those queues. One fears that the whole
   memory of the machine could one day be consumed by those queues.

C. SLAB meta data overhead

   SLAB has overhead at the beginning of each slab. This means that data
   cannot be naturally aligned at the beginning of a slab block. SLUB keeps
   all meta data in the corresponding page_struct. Objects can be naturally
   aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
   boundaries and can fit tightly into a 4k page with no bytes left over.
   SLAB cannot do this.

D. SLAB has a complex cache reaper

   SLUB does not need a cache reaper for UP systems. On SMP systems
   the per CPU slab may be pushed back into partial list but that
   operation is simple and does not require an iteration over a list
   of objects. SLAB expires per CPU, shared and alien object queues
   during cache reaping which may cause strange hold offs.

E. SLAB has complex NUMA policy layer support

   SLUB pushes NUMA policy handling into the page allocator. This means that
   allocation is coarser (SLUB does interleave on a page level) but that
   situation was also present before 2.6.13. SLABs application of
   policies to individual slab objects allocated in SLAB is
   certainly a performance concern due to the frequent references to
   memory policies which may lead a sequence of objects to come from
   one node after another. SLUB will get a slab full of objects
   from one node and then will switch to the next.

F. Reduction of the size of partial slab lists

   SLAB has per node partial lists. This means that over time a large
   number of partial slabs may accumulate on those lists. These can
   only be reused if allocator occur on specific nodes. SLUB has a global
   pool of partial slabs and will consume slabs from that pool to
   decrease fragmentation.

G. Tunables

   SLAB has sophisticated tuning abilities for each slab cache. One can
   manipulate the queue sizes in detail. However, filling the queues still
   requires the uses of the spin lock to check out slabs. SLUB has a global
   parameter (min_slab_order) for tuning. Increasing the minimum slab
   order can decrease the locking overhead. The bigger the slab order the
   less motions of pages between per CPU and partial lists occur and the
   better SLUB will be scaling.

G. Slab merging

   We often have slab caches with similar parameters. SLUB detects those
   on boot up and merges them into the corresponding general caches. This
   leads to more effective memory use. About 50% of all caches can
   be eliminated through slab merging. This will also decrease
   slab fragmentation because partial allocated slabs can be filled
   up again. Slab merging can be switched off by specifying
   slub_nomerge on boot up.

   Note that merging can expose heretofore unknown bugs in the kernel
   because corrupted objects may now be placed differently and corrupt
   differing neighboring objects. Enable sanity checks to find those.

H. Diagnostics

   The current slab diagnostics are difficult to use and require a
   recompilation of the kernel. SLUB contains debugging code that
   is always available (but is kept out of the hot code paths).
   SLUB diagnostics can be enabled via the "slab_debug" option.
   Parameters can be specified to select a single or a group of
   slab caches for diagnostics. This means that the system is running
   with the usual performance and it is much more likely that
   race conditions can be reproduced.

I. Resiliency

   If basic sanity checks are on then SLUB is capable of detecting
   common error conditions and recover as best as possible to allow the
   system to continue.

J. Tracing

   Tracing can be enabled via the slab_debug=T,<slabcache> option
   during boot. SLUB will then protocol all actions on that slabcache
   and dump the object contents on free.

K. On demand DMA cache creation.

   Generally DMA caches are not needed. If a kmalloc is used with
   __GFP_DMA then just create this single slabcache that is needed.
   For systems that have no ZONE_DMA requirement the support is
   completely eliminated.

L. Performance increase

   Some benchmarks have shown speed improvements on kernbench in the
   range of 5-10%. The locking overhead of slub is based on the
   underlying base allocation size. If we can reliably allocate
   larger order pages then it is possible to increase slub
   performance much further. The anti-fragmentation patches may
   enable further performance increases.

Tested on:
i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

SLUB Boot options

slub_nomerge		Disable merging of slabs
slub_min_order=x	Require a minimum order for slab caches. This
			increases the managed chunk size and therefore
			reduces meta data and locking overhead.
slub_min_objects=x	Mininum objects per slab. Default is 8.
slub_max_order=x	Avoid generating slabs larger than order specified.
slub_debug		Enable all diagnostics for all caches
slub_debug=<options>	Enable selective options for all caches
slub_debug=<o>,<cache>	Enable selective options for a certain set of
			caches

Available Debug options
F		Double Free checking, sanity and resiliency
R		Red zoning
P		Object / padding poisoning
U		Track last free / alloc
T		Trace all allocs / frees (only use for individual slabs).

To use SLUB: Apply this patch and then select SLUB as the default slab
allocator.

[hugh@veritas.com: fix an oops-causing locking error]
[akpm@linux-foundation.org: various stupid cleanups and small fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
Christoph Lameter
476f35348e Safer nr_node_ids and nr_node_ids determination and initial values
The nr_cpu_ids value is currently only calculated in smp_init.  However, it
may be needed before (SLUB needs it on kmem_cache_init!) and other kernel
components may also want to allocate dynamically sized per cpu array before
smp_init.  So move the determination of possible cpus into sched_init()
where we already loop over all possible cpus early in boot.

Also initialize both nr_node_ids and nr_cpu_ids with the highest value they
could take.  If we have accidental users before these values are determined
then the current valud of 0 may cause too small per cpu and per node arrays
to be allocated.  If it is set to the maximum possible then we only waste
some memory for early boot users.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
Linus Torvalds
15700770ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits)
  kconfig: fix mconf segmentation fault
  kbuild: enable use of code from a different dir
  kconfig: error out if recursive dependencies are found
  kbuild: scripts/basic/fixdep segfault on pathological string-o-death
  kconfig: correct minor typo in Kconfig warning message.
  kconfig: fix path to modules.txt in Kconfig help
  usr/Kconfig: fix typo
  kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs'
  kbuild: be more explicit on missing .config file
  kbuild: clarify the creation of the LOCALVERSION_AUTO string.
  kbuild: propagate errors from find in scripts/gen_initramfs_list.sh
  kconfig: refer to qt3 if we cannot find qt libraries
  kbuild: handle compressed cpio initramfs-es
  kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text
  kbuild: remove stale comment in modpost.c
  kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE
  kbuild: fix make mrproper for Documentation/DocBook/man
  kbuild: remove kconfig binaries during make mrproper
  kconfig/menuconfig: do not hardcode '.config'
  kbuild: override build timestamp & version
  ...
2007-05-06 13:21:57 -07:00
Robert P. J. Day
6e5a5420b7 kbuild: clarify the creation of the LOCALVERSION_AUTO string.
Clarify the creation of the LOCALVERSION_AUTO string during kernel
configuration, and fix a couple typoes while we're there.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-02 20:58:11 +02:00
Sam Ravnborg
aae5f662a3 kbuild: whitelist section mismatch in init/main.c
In init/main.c we have a reference from rest_init() to .init.text
which is intentional.
Rename the function 'init' to 'kernel_init' to make it a
kernel wide unique symbol and whitelist the reference.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-02 20:58:07 +02:00
Jeremy Fitzhardinge
b6e3590f81 [PATCH] x86: Allow percpu variables to be page-aligned
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).

Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:12 +02:00
Dimitri Gorokhovik
f991633de6 [PATCH] initramfs should not depend on CONFIG_BLOCK
initramfs ended up depending on BLOCK:

  INITRAMFS_SOURCE <-- BLK_DEV_INITRD <-- BLOCK

This inhibits use of customized-initramfs-over-ramfs without block layer
(ramfs would still be enabled), useful in embedded applications.

Move BLK_DEV_INITRD out of 'drivers/block/Kconfig' and into 'init/Kconfig',
make it unconditional.

Signed-off-by: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06 09:30:25 -08:00
Christoph Lameter
53b8a315b7 [PATCH] Convert highest_possible_processor_id to nr_cpu_ids
We frequently need the maximum number of possible processors in order to
allocate arrays for all processors.  So far this was done using
highest_possible_processor_id().  However, we do need the number of
processors not the highest id.  Moreover the number was so far dynamically
calculated on each invokation.  The number of possible processors does not
change when the system is running.  We can therefore calculate that number
once.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-20 17:10:13 -08:00
Andrew Morton
6168a702ab [PATCH] Declare init_irq_proc before we use it.
powerpc gets:

init/main.c: In function `do_basic_setup':
init/main.c:714: warning: implicit declaration of function `init_irq_proc'

but we cannot include linux/irq.h in generic code.

Fix it by moving the declaration into linux/interrupt.h instead.

And make sure all code that defines init_irq_proc() is including
linux/interrupt.h.

And nuke an ifdef-in-C

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-19 14:21:50 -08:00
Thomas Gleixner
906568c9c6 [PATCH] tick-management: core functionality
With Ingo Molnar <mingo@elte.hu>

The tick-management code is the first user of the clockevents layer.  It takes
clock event devices from the clock events core and uses them to provide the
periodic tick.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:59 -08:00
Linus Torvalds
414f827c46 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
  [PATCH] x86-64: Remove mk_pte_phys()
  [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
  [PATCH] i386: fix 32-bit ioctls on x64_32
  [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
  [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
  [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
  [PATCH] i386: Move mce_disabled to asm/mce.h
  [PATCH] i386: paravirt unhandled fallthrough
  [PATCH] x86_64: Wire up compat epoll_pwait
  [PATCH] x86: Don't require the vDSO for handling a.out signals
  [PATCH] i386: Fix Cyrix MediaGX detection
  [PATCH] i386: Fix warning in cpu initialization
  [PATCH] i386: Fix warning in microcode.c
  [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
  [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
  [PATCH] i386: Remove fastcall in paravirt.[ch]
  [PATCH] x86-64: Fix wrong gcc check in bitops.h
  [PATCH] x86-64: survive having no irq mapping for a vector
  [PATCH] i386: geode configuration fixes
  [PATCH] i386: add option to show more code in oops reports
  ...
2007-02-14 09:46:06 -08:00
Eric W. Biederman
77b14db502 [PATCH] sysctl: reimplement the sysctl proc support
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files.  Making it possible to have /proc/sys vary
depending on the namespace you are in.  The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible.  By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:10:00 -08:00
Eric W. Biederman
a5494dcd8b [PATCH] sysctl: move SYSV IPC sysctls to their own file
This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the ipc logic to together it makes
the code a little more readable.

[gcoady.lk@gmail.com: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Grant Coady <gcoady.lk@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman
b04c3afb2b [PATCH] sysctl: move init_irq_proc into init/main where it belongs
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:58 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Vivek Goyal
ee5bfa642a [PATCH] generic: Break init() in two parts to avoid MODPOST warnings
o init() is a non __init function in .text section but it calls many
  functions which are in .init.text section. Hence MODPOST generates lots
  of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus from .text between 'init' (at offset 0xc0101049) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init from .text between 'init' (at offset 0xc010104e) and 'rest_init'
WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd from .text between 'init' (at offset 0xc0101053) and 'rest_init'

o This patch breaks down init() in two parts. One part which can go
  in .init.text section and can be freed and other part which has to
  be non __init(init_post()). Now init() calls init_post() and init_post()
  does not call any functions present in .init sections. Hence getting
  rid of warnings.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:22 +01:00
Alon Bar-Lev
30d7e0d466 [PATCH] Dynamic kernel command-line: common
Current implementation stores a static command-line buffer allocated to
COMMAND_LINE_SIZE size.  Most architectures stores two copies of this buffer,
one for future reference and one for parameter parsing.

Current kernel command-line size for most architecture is much too small for
module parameters, video settings, initramfs paramters and much more.  The
problem is that setting COMMAND_LINE_SIZE to a grater value, allocates static
buffers.

In order to allow a greater command-line size, these buffers should be
dynamically allocated or marked as init disposable buffers, so unused memory
can be released.

This patch renames the static saved_command_line variable into
boot_command_line adding __initdata attribute, so that it can be disposed
after initialization.  This rename is required so applications that use
saved_command_line will not be affected by this change.

It reintroduces saved_command_line as dynamically allocated buffer to match
the data in boot_command_line.

It also mark secondary command-line buffer as __initdata, and copies it to
dynamically allocated static_command_line buffer components may hold reference
to it after initialization.

This patch is for linux-2.6.20-rc4-mm1 and is divided to target each
architecture.  I could not check this in any architecture so please forgive me
if I got it wrong.

The per-architecture modification is very simple, use boot_command_line in
place of saved_command_line.  The common code is the change into dynamic
command-line.

This patch:

1. Rename saved_command_line into boot_command_line, mark as init
   disposable.

2. Add dynamic allocated saved_command_line.

3. Add dynamic allocated static_command_line.

4. During startup copy: boot_command_line into saved_command_line.  arch
   command_line into static_command_line.

5. Parse static_command_line and not arch command_line, so arch
   command_line may be freed.

Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:37 -08:00
Alexey Dobriyan
18f705f49a [PATCH] Move TASK_XACCT, TASK_IO_ACCOUNTING up in menus
Since they depends on TASKSTATS, it would be nice to move them closer to
another options depending on TASKSTATS.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 11:18:07 -08:00
Robert P. J. Day
842f968f3f [PATCH] Remove final reference to superfluous smp_commence()
Remove the last (and commented out) invocation of the obsolete
smp_commence() call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 11:18:05 -08:00
Jean-Paul Saman
c33df4eaaf [PATCH] disable init/initramfs.c
The file init/initramfs.c is always compiled and linked in the kernel
vmlinux even when BLK_DEV_RAM and BLK_DEV_INITRD are disabled and the
system isn't using any form of an initramfs or initrd.  In this situation
the code is only used to unpack a (static) default initial rootfilesystem.
The current init/initramfs.c code.  usr/initramfs_data.o compiles to a size
of ~15 kbytes.  Disabling BLK_DEV_RAM and BLK_DEV_INTRD shrinks the kernel
code size with ~60 Kbytes.

This patch avoids compiling in the code and data for initramfs support if
CONFIG_BLK_DEV_INITRD is not defined.  Instead of the initramfs code and
data it uses a small routine in init/noinitramfs.c to setup an initial
static default environment for mounting a rootfilesystem later on in the
kernel initialisation process.  The new code is: 164 bytes of size.

The patch is separated in two parts:
1) doesn't compile initramfs code when CONFIG_BLK_DEV_INITRD is not set
2) changing all plaforms vmlinux.lds.S files to not reserve an area of
PAGE_SIZE when CONFIG_BLK_DEV_INITRD is not set.

[deweerdt@free.fr: warning fix]
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:25 -08:00
Michael Neuling
0a7b35cb18 [PATCH] Add retain_initrd boot option
Add retain_initrd option to control freeing of initrd memory after
extraction.  By default, free memory as previously.

The first boot will need to hold a copy of the in memory fs for the second
boot.  This image can be large (much larger than the kernel), hence we can
save time when the memory loader is slow.  Also, it reduces the memory
footprint while extracting the first boot since you don't need another copy
of the fs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:24 -08:00
Linus Torvalds
c71551ad30 Don't put "linux_banner" in the .init section
It might save a few bytes after bootup, but it causes the string to be
linked in at the end of the final vmlinux image, which defeats the whole
point of doing all this, namely allowing some broken user-space binaries
to search for the kernel version string in the kernel binary.

So just remove the __init specifier.

Cc: Olaf Hering <olaf@aepfle.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Andrew Morton <akpm@osdl.org>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:04 -08:00
Vivek Goyal
88d20328cd [PATCH] i386: Convert some functions to __init to avoid MODPOST warnings
o Some functions which should have been in init sections as they are called
  only once. Put them in init sections. Otherwise MODPOST generates warning
  as these functions are placed in .text and they end up accessing something
  in init sections.

WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init
from .text between 'do_pre_smp_initcalls' (at offset 0xc01000d1) and
'run_init_process'

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-01-11 01:52:44 +01:00
Roman Zippel
3eb3c740f5 [PATCH] fix linux banner format string
Revert previous attempts at messing with the linux banner string and
simply use a separate format string for proc.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-10 09:33:59 -08:00
Ard van Breemen
c4a68306b9 [PATCH] start_kernel: test if irq's got enabled early, barf, and disable them again
The calls made by parse_parms to other initialization code might enable
interrupts again way too early.

Having interrupts on this early can make systems PANIC when they initialize
the IRQ controllers (which happens later in the code).  This patch detects
that irq's are enabled again, barfs about it and disables them again as a
safety net.

[akpm@osdl.org: cleanups]
Signed-off-by: Ard van Breemen <ard@telegraafnet.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:21 -08:00
Andrew Morton
ef129412b4 [PATCH] build compile.h earlier
compile.h is created super-late in the build.  But proc_misc.c want to include
it, and it's generally not sane to have a header file in include/linux be
created at the end of the build: it's either not present or, worse, wrong for
most of the build.

So the patch arranges for compile.h to be built at the start of the build
process.  It also consolidates the compile.h rules with those for version.h
and utsname.h, so they all get built together.

I hope.  My chances of having got this right are about 2%.

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00
Yasunori Goto
561ccd3a97 [PATCH] handle SLOB with sparsemen
This is to disallow to make SLOB with SMP or SPARSEMEM.  This avoids latent
troubles of SLOB with SLAB_DESTROY_BY_RCU.  And fix compile error.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:49 -08:00
Paul Jackson
2aea4fb616 [PATCH] CONFIG_VM_EVENT_COUNTER comment decrustify
The VM event counters, enabled by CONFIG_VM_EVENT_COUNTERS, which provides
VM event counters in /proc/vmstat, has become more essential to
non-EMBEDDED kernel configurations than they were in the past.  Comments in
the code and the Kconfig configuration explanation were stale, downplaying
their role excessively.

Refresh those comments to correctly reflect the current role of VM event
counters.

Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:46 -08:00
Adrian Bunk
1f21782e63 Driver core: proper prototype for drivers/base/init.c:driver_init()
Add a prototype for driver_init() in include/linux/device.h.

Also remove a static function of the same name in drivers/acpi/ibm_acpi.c to
ibm_acpi_driver_init() to fix the namespace collision.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-20 10:56:45 -08:00
Jesper Juhl
979c6a1e49 Kconfig: fix spelling error in config KALLSYMS help text
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12 19:25:11 +01:00
Linus Torvalds
8d610dd52d Make sure we populate the initroot filesystem late enough
We should not initialize rootfs before all the core initializers have
run.  So do it as a separate stage just before starting the regular
driver initializers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 12:12:04 -08:00
Linus Torvalds
8993780a6e Make SLES9 "get_kernel_version" work on the kernel binary again
As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary.  It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)

That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba:
"[PATCH] Fix linux banner utsname information").

This just restores "linux_banner" as a static string, which should fix
the version finding.  And /proc/version simply uses a different string.

To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.

Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 11:34:11 -08:00
Andrew Morton
7c3ab7381e [PATCH] io-accounting: core statistics
The present per-task IO accounting isn't very useful.  It simply counts the
number of bytes passed into read() and write().  So if a process reads 1MB
from an already-cached file, it is accused of having performed 1MB of I/O,
which is wrong.

(David Wright had some comments on the applicability of the present logical IO accounting:

  For billing purposes it is useless but for workload analysis it is very
  useful

  read_bytes/read_calls  average read request size
  write_bytes/write_calls average write request size

  read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
  write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                be missed

  I often look for logical larger than physical to see filesystem cache
  problems.  And the bytes/cpusec can help find applications that are
  dominating the cache and causing slow interactive response from page cache
  contention.

  I want to find the IO intensive applications and make sure they are doing
  efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
  IO commands).

This patchset adds new accounting which tries to be more accurate.  We account
for three things:

reads:

  attempt to count the number of bytes which this process really did cause
  to be fetched from the storage layer.  Done at the submit_bio() level, so it
  is accurate for block-backed filesystems.  I also attempt to wire up NFS and
  CIFS.

writes:

  attempt to count the number of bytes which this process caused to be sent
  to the storage layer.  This is done at page-dirtying time.

  The big inaccuracy here is truncate.  If a process writes 1MB to a file
  and then deletes the file, it will in fact perform no writeout.  But it will
  have been accounted as having caused 1MB of write.

  So...

cancelled_writes:

  account the number of bytes which this process caused to not happen, by
  truncating pagecache.

  We _could_ just subtract this from the process's `write' accounting.  But
  that means that some processes would be reported to have done negative
  amounts of write IO, which is silly.

  So we just report the raw number and punt this decision up to userspace.

Now, we _could_ account for writes at the physical I/O level.  But

- This would require that we track memory-dirtying tasks at the per-page
  level (would require a new pointer in struct page).

- It would mean that IO statistics for a process are usually only available
  long after that process has exitted.  Which means that we probably cannot
  communicate this info via taskstats.

This patch:

Wire up the kernel-private data structures and the accessor functions to
manipulate them.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Sukadev Bhattiprolu
84d737866e [PATCH] add child reaper to pid_namespace
Add a per pid_namespace child-reaper.  This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space.  Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.

This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:52 -08:00
Herbert Poetzl
a2ee8649ba [PATCH] Fix linux banner utsname information
utsname information is shown in the linux banner, which also is used for
/proc/version (which can have different utsname values inside a uts
namespaces).  this patch makes the varying data arguments and changes the
string to a format string, using those arguments.

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:37 -08:00
Linus Torvalds
4522d58275 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:59:11 -08:00
Andrew Morton
f1a60dbf68 [PATCH] gcc-4.1.0 is bust
Keith says

Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), wait_hpet_tick is
optimized away to a never ending loop and the kernel hangs on boot in timer
setup.

0000001a <wait_hpet_tick>:
  1a:   55                      push   %ebp
  1b:   89 e5                   mov    %esp,%ebp
  1d:   eb fe                   jmp    1d <wait_hpet_tick+0x3>

This is not a problem with gcc 3.3.5.  Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.

And the consensus is that gcc-4.1.0 is busted.  Suse went and shipped
gcc-4.1.0 so we cannot ban it.  Add a warning.

Cc: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Arjan van de Ven
2e591bbc0d [PATCH] Make initramfs printk a warning on incorrect cpio type
It turns out that the "-c" option of cpio is highly unportable even between
distros let alone unix variants, and may actually make the wrong type of
cpio archive.  I just wasted quite some time on this, and the kernel can
detect this and warn about it (it's __init memory so it gets thrown away
and thus there is no runtime overhead)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Nigel Cunningham
7dfb71030f [PATCH] Add include/linux/freezer.h and move definitions from sched.h
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Rusty Russell
d7cd56111f [PATCH] i386: cpu_detect extraction
Both lhype and Xen want to call the core of the x86 cpu detect code before
calling start_kernel.

(extracted from larger patch)

AK: folded in start_kernel header patch

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:08 +01:00
Kay Sievers
88a22c985e CONFIG_SYSFS_DEPRECATED
Provide a way to support older versions of udev that are shipped in
older distros.  If this option is disabled, it will also turn off the
compatible symlinks in sysfs that older programs might rely on.

When in doubt, or if running a distro older than 2006, say Yes here.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-01 14:51:58 -08:00
Eric W. Biederman
13bb7e37e5 [PATCH] sysctl: Undeprecate sys_sysctl
The basic issue is that despite have been deprecated and warned about as a
very bad thing in the man pages since its inception there are a few real
users of sys_sysctl.  It was my assumption that because sysctl had been
deprecated for all of 2.6 there would be no user space users by this point,
so I initially gave sys_sysctl a very short deprecation period.

Now that I know there are a few real users the only sane way to proceed
with deprecation is to push the time limit out to a year or two work and
work with distributions that have big testing pools like fedora core to
find these last remaining users.

Which means that the sys_sysctl interface needs to be maintained in the
meantime.

Since I have provided a technical measure that allows us to add new sysctl
entries without reserving more binary numbers I believe that is enough to
fix the sys_sysctl binary interface maintenance problems, because there is
no longer a need to change the binary interface at all.

Since the sys_sysctl implementation needs to stay around for a while and
the worst of the maintenance issues that caused us to occasionally break
the ABI have been addressed I don't see any advantage in continuing with
the removal of sys_sysctl.

So instead of merely increasing the deprecation period this patch removes
the deprecation of sys_sysctl and modifies the kernel to compile the code
in by default.

With committing to maintain sys_sysctl we get all of the advantages of a
fast interface for anything that needs it.  Currently sys_sysctl is about
5x faster than /proc/sys, for the same string data.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:24 -08:00
Jan Beulich
690a973f48 [PATCH] x86-64: Speed up dwarf2 unwinder
This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Paolo 'Blaisorblade' Giarrusso
b2670eacfb [PATCH] uml: use DEFCONFIG_LIST to avoid reading host's config
This should make sure that, for UML, host's configuration files are not
considered, which avoids various pains to the user.  Our dependency are such
that the obtained Kconfig will be valid and will lead to successful
compilation - however they cannot prevent an user from disabling any boot
device, and if an option is not set in the read .config (say
/boot/config-XXX), with make menuconfig ARCH=um, it is not set.  This always
disables UBD and all console I/O channels, which leads to non-working UML
kernels, so this bothers users - especially now, since it will happen on
almost every machine (/boot/config-`uname -r` exists almost on every machine).
 It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
can be avoided, so please _do_ merge this patch.

Given the existence of options, it could be interesting to implement
(additionally) "option required" - with it, Kconfig will refuse reading a
.config file (from wherever it comes) if the given option is not set.  With
this, one could mark with it the option characteristic of the given
architecture (it was an old proposal of Roman Zippel, when I pointed out our
problem):

config UML
	option required
	default y

However this should be further discussed:
*) for x86, it must support constructs like:

==arch/i386/Kconfig==
config 64BIT
	option required
	default n
where Kconfig must require that CONFIG_64BIT is disabled or not present in the
read .config.

*) do we want to do such checks only for the starting defconfig or also for
   .config? Which leads to:
*) I may want to port a x86_64 .config to x86 and viceversa, or even among more
   different archs. Should that be allowed, and in which measure (the user may
   force skipping the check for a .config or it is only given a warning by
   default)?

Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <kbuild-devel@lists.sourceforge.net>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:36 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
NeilBrown
e8703fe1f5 [PATCH] md: remove MAX_MD_DEVS which is an arbitrary limit
Once upon a time we needed to fixed limit to the number of md devices,
probably because we preallocated some array.  This need no longer exists, but
we still have an arbitrary limit.

So remove MAX_MD_DEVS and allow as many devices as we can fit into the 'minor'
part of a device number.

Also remove some useless noise at init time (which reports MAX_MD_DEVS) and
remove MD_THREAD_NAME_MAX which hasn't been used for a while.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:18 -07:00
Cedric Le Goater
9ec52099e4 [PATCH] replace cad_pid by a struct pid
There are a few places in the kernel where the init task is signaled.  The
ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
cached pid (cad_pid).

This patch replaces the pid_t by a struct pid to avoid pid wrap around
problem.  The struct pid is initialized at boot time in init() and can be
modified through systctl with

	/proc/sys/kernel/cad_pid

[ I haven't found any distro using it ? ]

It also introduces a small helper routine kill_cad_pid() which is used
where it seemed ok to use cad_pid instead of pid 1.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Arnd Bergmann
6760856791 [PATCH] introduce kernel_execve
The use of execve() in the kernel is dubious, since it relies on the
__KERNEL_SYSCALLS__ mechanism that stores the result in a global errno
variable.  As a first step of getting rid of this, change all users to a
global kernel_execve function that returns a proper error code.

This function is a terrible hack, and a later patch removes it again after the
kernel syscalls are gone.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Kirill Korotaev
25b21cb2f6 [PATCH] IPC namespace core
This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace.  Basically, it is another building block of
containers functionality.

This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support

[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:22 -07:00
Serge E. Hallyn
4865ecf131 [PATCH] namespaces: utsname: implement utsname namespaces
This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.

It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.

[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Jay Lan
9acc185351 [PATCH] csa: Extended system accounting over taskstats
Add extended system accounting handling over taskstats interface.  A
CONFIG_TASK_XACCT flag is created to enable the extended accounting code.

Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Randy Dunlap
0847062ad5 [PATCH] fix EMBEDDED + SYSCTL menu
SYSCTL should still depend on EMBEDDED.  This unbreaks the EMBEDDED menu
(from the recent SYSCTL_SYCALL menu option patch).

Fix typos in new SYSCTL_SYSCALL menu.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:25 -07:00
Ross Biro
f2443ab6c4 [PATCH] allow /proc/config.gz to be built as a module
The driver for /proc/config.gz consumes rather a lot of memory and it is in
fact possible to build it as a module.

In some ways this is a bit risky, because the .config which is used for
compiling kernel/configs.c isn't necessarily the same as the .config which was
used to build vmlinux.

But OTOH the potential memory savings are decent, and it'd be fairly dumb to
build your configs.o with a different .config.

Signed-off-by: Andrew Morton <akpm@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:20 -07:00
David Howells
9361401eb7 [PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer.  Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

 (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
     support.

 (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
     an item that uses the block layer.  This includes:

     (*) Block I/O tracing.

     (*) Disk partition code.

     (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

     (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
     	 block layer to do scheduling.  Some drivers that use SCSI facilities -
     	 such as USB storage - end up disabled indirectly from this.

     (*) Various block-based device drivers, such as IDE and the old CDROM
     	 drivers.

     (*) MTD blockdev handling and FTL.

     (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
     	 taking a leaf out of JFFS2's book.

 (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
     linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
     however, still used in places, and so is still available.

 (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
     parts of linux/fs.h.

 (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

 (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

 (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
     is not enabled.

 (*) fs/no-block.c is created to hold out-of-line stubs and things that are
     required when CONFIG_BLOCK is not set:

     (*) Default blockdev file operations (to give error ENODEV on opening).

 (*) Makes some /proc changes:

     (*) /proc/devices does not list any blockdevs.

     (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

 (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

 (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
     given command other than Q_SYNC or if a special device is specified.

 (*) In init/do_mounts.c, no reference is made to the blockdev routines if
     CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.

 (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
     error ENOSYS by way of cond_syscall if so).

 (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
     CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:31 +02:00
Eric W. Biederman
b89a81712f [PATCH] sysctl: Allow /proc/sys without sys_sysctl
Since sys_sysctl is deprecated start allow it to be compiled out.  This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.

[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:19 -07:00
Vivek Goyal
7e96287ddc [PATCH] kdump: introduce "reset_devices" command line option
Resetting the devices during driver initialization can be a costly
operation in terms of time (especially scsi devices).  This option can be
used by drivers to know that user forcibly wants the devices to be reset
during initialization.

This option can be useful while kernel is booting in unreliable
environment.  For ex.  during kdump boot where devices are in unknown
random state and BIOS execution has been skipped.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:17 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Andi Kleen
c9538ed492 [PATCH] Move unwind_init earlier
Needed for use of the unwinder in lockdep, because lockdep runs really
early too.

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Rusty Russell
33df0d19ea [PATCH] Allow early_param and identical __setup to exist
We currently assume that boot parameters which are handled by
early_param() will not overlap boot parameters handled by __setup: if
they do, behaviour is dependent on link order, usually meaning __setup
will not get called.

ACPI wants to use early_param("pci"), and pci uses __setup("pci="), so
we modify the core to let them coexist: "pci=noacpi" will now get
passed to both.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Greg Kroah-Hartman
d779249ed4 Driver Core: add ability for drivers to do a threaded probe
This adds the infrastructure for drivers to do a threaded probe, and
waits at init time for all currently outstanding probes to complete.

A new kernel thread will be created when the probe() function for the
driver is called, if the multithread_probe bit is set in the driver
saying it can support this kind of operation.

I have tested this with USB and PCI, and it works, and shaves off a lot
of time in the boot process, but there are issues with finding root boot
disks, and some USB drivers assume that this can never happen, so it is
currently not enabled for any bus type.  Individual drivers can enable
this right now if they wish, and bus authors can selectivly turn it on
as well, once they determine that their subsystem will work properly
with it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25 21:08:40 -07:00
Chuck Ebbert
ae81f9e379 [PATCH] Kconfig: move CONFIG_EMBEDDED options to submenu
Fix two problems with the CONFIG_EMBEDDED submenu:

(1) The menu was split in two by the rt_mutex patch, which moved
    half the items into the "General setup" menu.

(2) CONFIG_SYSCTL and CONFIG_UID16 were added to the main menu
    instead of the submenu.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-16 12:54:32 -07:00
Shailabh Nagar
6f44993fe1 [PATCH] per-task-delay-accounting: delay accounting usage of taskstats interface
Usage of taskstats interface by delay accounting.

Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:56 -07:00
Shailabh Nagar
c757249af1 [PATCH] per-task-delay-accounting: taskstats interface
Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
family), for getting statistics of tasks and thread groups during their
lifetime and when they exit.  The interface is intended for use by multiple
accounting packages though it is being created in the context of delay
accounting.

This patch creates the interface without populating the fields of the data
that is sent to the user in response to a command or upon the exit of a task.
Each accounting package interested in using taskstats has to provide an
additional patch to add its stats to the common structure.

[akpm@osdl.org: cleanups, Kconfig fix]
Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:56 -07:00
Shailabh Nagar
ca74e92b46 [PATCH] per-task-delay-accounting: setup
Initialization code related to collection of per-task "delay" statistics which
measure how long it had to wait for cpu, sync block io, swapping etc.  The
collection of statistics and the interface are in other patches.  This patch
sets up the data structures and allows the statistics collection to be
disabled through a kernel boot parameter.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:56 -07:00
Linus Torvalds
51bece910d Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  kbuild: introduce utsrelease.h
  kbuild: explicit turn off gcc stack-protector
2006-07-03 21:26:12 -07:00
Ingo Molnar
243c7621aa [PATCH] lockdep: annotate genirq
Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:06 -07:00
Ingo Molnar
fbb9ce9530 [PATCH] lockdep: core
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.

Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.

What does the lock validator do?  It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules.  If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal.  If the
new rule could create a deadlock scenario then this condition is printed out.

When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios.  In a typical system this means millions of separate
scenarios.  This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem).  [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]

Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically.  In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!).  So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself!  In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.

To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class".  For example, all struct inode objects
in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class.  The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules.  The
set of rules persist during the lifetime of the kernel.

To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:

 lock-classes:                            694 [max: 2048]
 direct dependencies:                  1598 [max: 8192]
 indirect dependencies:               17896
 all direct dependencies:             16206
 dependency chains:                    1910 [max: 8192]
 in-hardirq chains:                      17
 in-softirq chains:                     105
 in-process chains:                    1065
 stack-trace entries:                 38761 [max: 131072]
 combined max dependencies:         2033928
 hardirq-safe locks:                     24
 hardirq-unsafe locks:                  176
 softirq-safe locks:                     53
 softirq-unsafe locks:                  137
 irq-safe locks:                         59
 irq-unsafe locks:                      176

The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.

More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:

   http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt

[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:03 -07:00
Ingo Molnar
9a11b49a80 [PATCH] lockdep: better lock debugging
Generic lock debugging:

 - generalized lock debugging framework. For example, a bug in one lock
   subsystem turns off debugging in all lock subsystems.

 - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
   the mutex/rtmutex debugging code: it caused way too much prototype
   hackery, and lockdep will give the same information anyway.

 - ability to do silent tests

 - check lock freeing in vfree too.

 - more finegrained debugging options, to allow distributions to
   turn off more expensive debugging features.

There's no separate 'held mutexes' list anymore - but there's a 'held locks'
stack within lockdep, which unifies deadlock detection across all lock
classes.  (this is independent of the lockdep validation stuff - lockdep first
checks whether we are holding a lock already)

Here are the current debugging options:

CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y

which do:

 config DEBUG_MUTEXES
          bool "Mutex debugging, basic checks"

 config DEBUG_LOCK_ALLOC
         bool "Detect incorrect freeing of live mutexes"

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:01 -07:00
Heiko Carstens
93e028148f [PATCH] lockdep: console_init after local_irq_enable()
s390's console_init must enable interrupts, but early_boot_irqs_on() gets
called later.  To avoid problems move console_init() after local_irq_enable().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:00 -07:00
john stultz
88fecaa27f [PATCH] time initialisation fix
We're not reay to take a timer interrupt until timekeeping_init() has run.
But time_init() will start the time interrupt and if it is called with
local interrupts enabled we'll immediately take an interrupt and die.

Fix that by running timekeeping_init() prior to time_init().

We don't know _why_ local interrupts got enabled on Jesse Brandeburg's
machine.  That's a separate as-yet-unsolved problem.  THe patch adds a little
bit of debugging to detect that.

This whole requirement that local interrupts be held off during early boot
keeps on biting us.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:26:58 -07:00
Sam Ravnborg
63104eec23 kbuild: introduce utsrelease.h
include/linux/version.h contained both actual KERNEL version
and UTS_RELEASE that contains a subset from git SHA1 for when
kernel was compiled as part of a git repository.
This had the unfortunate side-effect that all files including version.h
would be recompiled when some git changes was made due to changes SHA1.
Split it out so we keep independent parts in separate files.

Also update checkversion.pl script to no longer check for UTS_RELEASE.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-07-03 23:30:54 +02:00
Linus Torvalds
22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
Adrian Bunk
dd673bca47 [PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies
Fix the INIT_ENV_ARG_LIMIT dependencies to what seems to have been
intended.

Spotted by Jean-Luc Leger.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:37 -07:00
Andrew Morton
033ab7f8e5 [PATCH] add smp_setup_processor_id()
Presently, smp_processor_id() isn't necessarily set up until setup_arch().
But it's used in boot_cpu_init() and printk() and perhaps in other places,
prior to setup_arch() being called.

So provide a new smp_setup_processor_id() which is called before anything
else, wire it up for Voyager (which boots on a CPU other than #0, and broke).

Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:37 -07:00
Christoph Lameter
f8891e5e1f [PATCH] Light weight event counters
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Linus Torvalds
602cada851 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
  [PATCH] devfs: Remove it from the feature_removal.txt file
  [PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
  [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
  [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
  [PATCH] devfs: Remove devfs_remove() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
  [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
  [PATCH] devfs: Remove devfs support from the sound subsystem
  [PATCH] devfs: Remove devfs support from the ide subsystem.
  [PATCH] devfs: Remove devfs support from the serial subsystem
  [PATCH] devfs: Remove devfs from the init code
  [PATCH] devfs: Remove devfs from the partition code
  ...
2006-06-29 14:19:21 -07:00
Ingo Molnar
23f78d4a03 [PATCH] pi-futex: rt mutex core
Core functions for the rt-mutex subsystem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:47 -07:00
Adrian Bunk
b6cd0b772d [PATCH] fs/buffer.c: cleanups
- add a proper prototype for the following global function:
  - buffer_init()

- make the following needlessly global function static:
  - end_buffer_async_write()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Greg Kroah-Hartman
ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman
bdaf852938 [PATCH] devfs: Remove devfs from the init code
This patch removes the devfs code from the init/ directory.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:05 -07:00
Linus Torvalds
2a2ed2db35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
  kbuild: trivial fixes in Makefile
  kbuild: adding symbols in Kconfig and defconfig to TAGS
  kbuild: replace abort() with exit(1)
  kbuild: support for %.symtypes files
  kbuild: fix silentoldconfig recursion
  kbuild: add option for stripping modules while installing them
  kbuild: kill some false positives from modpost
  kbuild: export-symbol usage report generator
  kbuild: fix make -rR breakage
  kbuild: append -dirty for updated but uncommited changes
  kbuild: append git revision for all untagged commits
  kbuild: fix module.symvers parsing in modpost
  kbuild: ignore make's built-in rules & variables
  kbuild: bugfix with initramfs
  kbuild: modpost build fix
  kbuild: check license compatibility when building modules
  kbuild: export-type enhancement to modpost.c
  kbuild: add dependency on kernel.release to the package targets
  kbuild: `make kernelrelease' speedup
  kconfig: KCONFIG_OVERWRITECONFIG
  ...
2006-06-26 11:05:15 -07:00
Linus Torvalds
81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Andi Kleen
c38bfdc85a [PATCH] x86_64: Move VM86 config into arch/i386/Kconfig
Architecture specific configs like this have no business at all
in init/Kconfig. This prevents it from being set on x86-64

Pointed out by H.Peter Anvin

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jan Beulich
4552d5dc08 [PATCH] x86_64: reliable stack trace support
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.

Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:17 -07:00
H. Peter Anvin
2139a7fbf3 [PATCH] initramfs overwrite fix
This patch ensures that initramfs overwrites work correctly, even when dealing
with device nodes of different types.  Furthermore, when replacing a file
which already exists, we must make very certain that we truncate the existing
file.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
john stultz
ad596171ed [PATCH] Time: Use clocksource infrastructure for update_wall_time
Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies.  Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change.  Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.

[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:20 -07:00
H. Peter Anvin
eab98702af [PATCH] Make sysctl obligatory except under CONFIG_EMBEDDED
Make makes sysctl non-optional unless EMBEDDED is set.  There are a number
of interfaces exposed via sysctl, enough that it has to be considered core
kernel functionality at this point.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:12 -07:00
Linus Torvalds
d9eaec9e29 Merge branch 'audit.b21' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b21' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (25 commits)
  [PATCH] make set_loginuid obey audit_enabled
  [PATCH] log more info for directory entry change events
  [PATCH] fix AUDIT_FILTER_PREPEND handling
  [PATCH] validate rule fields' types
  [PATCH] audit: path-based rules
  [PATCH] Audit of POSIX Message Queue Syscalls v.2
  [PATCH] fix se_sen audit filter
  [PATCH] deprecate AUDIT_POSSBILE
  [PATCH] inline more audit helpers
  [PATCH] proc_loginuid_write() uses simple_strtoul() on non-terminated array
  [PATCH] update of IPC audit record cleanup
  [PATCH] minor audit updates
  [PATCH] fix audit_krule_to_{rule,data} return values
  [PATCH] add filtering by ppid
  [PATCH] log ppid
  [PATCH] collect sid of those who send signals to auditd
  [PATCH] execve argument logging
  [PATCH] fix deadlocks in AUDIT_LIST/AUDIT_LIST_RULES
  [PATCH] audit_panic() is audit-internal
  [PATCH] inotify (5/5): update kernel documentation
  ...

Manual fixup of conflict in unclude/linux/inotify.h
2006-06-20 15:37:56 -07:00
Amy Griffis
f368c07d72 [PATCH] audit: path-based rules
In this implementation, audit registers inotify watches on the parent
directories of paths specified in audit rules.  When audit's inotify
event handler is called, it updates any affected rules based on the
filesystem event.  If the parent directory is renamed, removed, or its
filesystem is unmounted, audit removes all rules referencing that
inotify watch.

To keep things simple, this implementation limits location-based
auditing to the directory entries in an existing directory.  Given
a path-based rule for /foo/bar/passwd, the following table applies:

    passwd modified -- audit event logged
    passwd replaced -- audit event logged, rules list updated
    bar renamed     -- rule removed
    foo renamed     -- untracked, meaning that the rule now applies to
		       the new location

Audit users typically want to have many rules referencing filesystem
objects, which can significantly impact filtering performance.  This
patch also adds an inode-number-based rule hash to mitigate this
situation.

The patch is relative to the audit git tree:
http://kernel.org/git/?p=linux/kernel/git/viro/audit-current.git;a=summary
and uses the inotify kernel API:
http://lkml.org/lkml/2006/6/1/145

Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20 05:25:27 -04:00
Roman Zippel
face4374e2 kconfig: add defconfig_list/module option
This makes it possible to change two options which were hardcoded sofar.
1. Any symbol can now take the role of CONFIG_MODULES
2. The more useful option is to change the list of default file names,
   which kconfig uses to load the base configuration if .config isn't
   available.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-06-09 07:31:30 +02:00
Joern Engel
e9482b4374 [MTD] Allow alternate JFFS2 mount variant for root filesystem.
With this patch, "root=mtd3" and "root=mtd:foo" work for a JFFS2 rootfs.

Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
2006-05-30 14:25:46 +02:00
David Woodhouse
18594822fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-16 01:19:52 +01:00
Andy Whitcroft
be6e028b64 [PATCH] root mount failure: emit filesystems attempted
When we fail to mount from a valid root device list out the filesystems we
have tried to mount it with.  This gives the user vital diagnostics as to
what is missing from their kernel.

For example in the fragment below the kernel does not have CRAMFS compiled
into the kernel and yet appears to recognise it at the RAMDISK detect
stage.  Later the mount fails as we don't have the filesystem.

  RAMDISK: cramfs filesystem found at block 0
  RAMDISK: Loading 1604KiB [1 disk] into ram disk... done.
  XFS: bad magic number
  XFS: SB validate failed
  No filesystem could mount root, tried: reiserfs ext3 ext2 msdos vfat
    iso9660 jfs xfs
  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-15 11:20:57 -07:00
Mark Huang
6a050da45b [PATCH] initramfs: fix CPIO hardlink check
Copy the filenames of hardlinks when inserting them into the hash, since
the "name" pointer may point to scratch space (name_buf).  Not doing so
results in corruption if the scratch space is later overwritten: the wrong
file may be hardlinked, or, if the scratch space contains garbage, the link
will fail and a 0-byte file will be created instead.

Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-15 11:20:55 -07:00
David Woodhouse
6f18a022fb Finally remove the obnoxious inter_module_xxx()
This was already a bad plan when I argued against adding it in the first
place. Good riddance.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-08 22:40:05 +01:00
Andrew Morton
f3a19cb45f [PATCH] silence initcall warnings
Suppress the initcall-return-value warnings unless initcall_debug was
specified.

They do find bugs, but they're extremely small ones and as Andi points out,
people get distressed.

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01 18:17:43 -07:00
Andi Kleen
102e41fd9d [PATCH] i386: Move CONFIG_DOUBLEFAULT into arch/i386 where it belongs.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 10:39:20 -07:00
Randy Dunlap
e39632faa0 [PATCH] menu: relocate DOUBLEFAULT option
Move the DOUBLEFAULT option from the top-level menu to the EMBEDDED menu.
Only applicable to X86_32.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:33 -07:00
KAMEZAWA Hiroyuki
0a94502277 [PATCH] for_each_possible_cpu: fixes for generic part
replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Linus Torvalds
9ae21d1bb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
  Kconfig help: MTD_JEDECPROBE already supports Intel
  Remove ugly debugging stuff
  do_mounts.c: Minor ROOT_DEV comment cleanup
  BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
  BUG_ON() Conversion in mm/mempool.c
  BUG_ON() Conversion in mm/memory.c
  BUG_ON() Conversion in kernel/fork.c
  BUG_ON() Conversion in ipc/sem.c
  BUG_ON() Conversion in fs/ext2/
  BUG_ON() Conversion in fs/hfs/
  BUG_ON() Conversion in fs/dcache.c
  BUG_ON() Conversion in fs/buffer.c
  BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
  BUG_ON() Conversion in md/dm-table.c
  BUG_ON() Conversion in md/dm-path-selector.c
  BUG_ON() Conversion in drivers/isdn
  BUG_ON() Conversion in drivers/char
  BUG_ON() Conversion in drivers/mtd/
2006-03-26 09:41:18 -08:00
Jason Gunthorpe
33644c5e15 [PATCH] Fix typo causing bad mode of /initrd.image
I noticed that after boot with an initrd in 2.6.16 the rootfs had:

--w-r-xr-T    1 root     root      6241141 Jan  1  1970 initrd.image

Which is caused by a small typo:

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:58 -08:00
Andrew Morton
9a98e2f732 [PATCH] remove fixup_cpu_present_map()
Since the addition of boot_cpu_init(), fixup_cpu_present_map() has been a
no-op.  That's because fixup_cpu_present_map() won't touch cpu_present_map if
it has any bits set, and boot_cpu_init() sets a bit.

So remove fixup_cpu_present_map().

A consequence of this (actually of the boot_cpu_init() change) is that the
architecture _must_ populate cpu_present_map itself (probably in
smp_prepare_cpus()).  fixup_cpu_present_map() won't do it any more.

If the architecture doesn't do this, it'll only bring up a single CPU.

The other side effect (though less serious) is that smp_prepare_boot_cpu() no
longer needs to mark the boot cpu in the online and present maps -
boot_cpu_init() does that for everyone (to make early printks work).

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:57 -08:00
Florin Malita
5ac35783f4 do_mounts.c: Minor ROOT_DEV comment cleanup
The ROOT_DEV comment is no longer accurate, it now seems to be
initialized in init/do_mounts.c.

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:53:34 +02:00
Linus Torvalds
2e1ca21d46 Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild
* master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild: (46 commits)
  kbuild: remove obsoleted scripts/reference_* files
  kbuild: fix make help & make *pkg
  kconfig: fix time ordering of writes to .kconfig.d and include/linux/autoconf.h
  Kconfig: remove the CONFIG_CC_ALIGN_* options
  kbuild: add -fverbose-asm to i386 Makefile
  kbuild: clean-up genksyms
  kbuild: Lindent genksyms.c
  kbuild: fix genksyms build error
  kbuild: in makefile.txt note that Makefile is preferred name for kbuild files
  kbuild: replace PHONY with FORCE
  kbuild: Fix bug in crc symbol generating of kernel and modules
  kbuild: change kbuild to not rely on incorrect GNU make behavior
  kbuild: when warning symbols exported twice now tell user this is the problem
  kbuild: fix make dir/file.xx when asm symlink is missing
  kbuild: in the section mismatch check try harder to find symbols
  kbuild: fix section mismatch check for unwind on IA64
  kbuild: kill false positives from section mismatch warnings for powerpc
  kbuild: kill trailing whitespace in modpost & friends
  kbuild: small update of allnoconfig description
  kbuild: make namespace.pl CROSS_COMPILE happy
  ...

Trivial conflict in arch/ppc/boot/Makefile manually fixed up
2006-03-25 08:48:48 -08:00
Zdenek Pavlas
340e48e662 [PATCH] BLK_DEV_INITRD: do not require BLK_DEV_RAM=y
Initramfs initrd images do not need a ramdisk device, so remove this
restriction in Kconfig.  BLK_DEV_RAM=n saves about 13k on i386.  Also
without ramdisk device there's no need for "dry run", so initramfs unpacks
much faster.

People using cramfs, squashfs, or gzipped ext2/minix initrd images are
probably smart enough not to turn off ramdisk support by accident.

Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:57 -08:00
Adrian Bunk
77d47582c2 [PATCH] add a proper prototype for setup_arch()
This patch adds a proper prototype for setup_arch() in init.h.

This patch is based on a patch by Ben Dooks <ben-linux@fluff.org>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Andrew Morton
c1cda48af8 [PATCH] initcall failure reporting
We presently ignore the return values from initcalls.  But that can carry
useful debugging information.  So print it out if it's non-zero.

It turns out the -ENODEV happens quite a lot, due to built-in drivers which
have no hardware to drive.  So suppress that unless initcall_debug was
specified.

Also make the warning message more friendly by printing the name of the
initcall function.

Also drop the KERN_DEBUG from the initcall_debug message.  If we specified
inticall_debug then we obviously want to see the messages.

Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:53 -08:00
Rusty Russell
8d3b33f67f [PATCH] Remove MODULE_PARM
MODULE_PARM was actually breaking: recent gcc version optimize them out as
unused.  It's time to replace the last users, which are generally in the
most unloved drivers anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:52 -08:00
Theodore Ts'o
9b04c997b1 [PATCH] vfs: MS_VERBOSE should be MS_SILENT
The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
"don't be verbose".  This is confusing and counter-intuitive.

In addition, there is also no way to set the MS_VERBOSE flag in the
mount(8) program in util-linux, but interesting, it does define options
which would do the right thing if MS_SILENT were defined, which
unfortunately we do not:

#ifdef MS_SILENT
  { "quiet",    0, 0, MS_SILENT    },   /* be quiet  */
  { "loud",     0, 1, MS_SILENT    },   /* print out messages. */
#endif

So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
with MS_SILENT.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:15 -08:00
Jens Axboe
b86ff981a8 [PATCH] relay: migrate from relayfs to a generic relay API
Original patch from Paul Mundt, sysfs parts removed by me since they
were broken.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 19:56:55 +01:00
Eric Dumazet
b73b459f72 [PATCH] __GENERIC_PER_CPU changes
Now CONFIG_DEBUG_INITDATA is in, initial percpu data
[__per_cpu_start,__per_cpu_end] can be declared as a redzone, and invalid
accesses after boot can be detected, at least for i386.

We can let non possible cpus percpu data point to this 'redzone' instead of
NULL .

NULL was not a good choice because part of [0..32768] memory may be
readable and invalid accesses may happen unnoticed.

If CONFIG_DEBUG_INITDATA is not defined, each non possible cpu points to
the initial percpu data (__per_cpu_offset[cpu] == 0), thus invalid accesses
wont be detected/crash.

This patch also moves __per_cpu_offset[] to read_mostly area to avoid false
sharing.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Eric Dumazet
63872f87a1 [PATCH] Only allocate percpu data for possible CPUs
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

This patch saves ram, allocating num_possible_cpus() (instead of NR_CPUS)
instances.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <axboe@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Rafael J. Wysocki
6e1819d615 [PATCH] swsusp: userland interface
This patch introduces a user space interface for swsusp.

The interface is based on a special character device, called the snapshot
device, that allows user space processes to perform suspend and resume-related
operations with the help of some ioctls and the read()/write() functions.
 Additionally it allows these processes to allocate free swap pages from a
selected swap partition, called the resume partition, so that they know which
sectors of the resume partition are available to them.

The interface uses the same low-level system memory snapshot-handling
functions that are used by the built-it swap-writing/reading code of swsusp.

The interface documentation is included in the patch.

The patch assumes that the major and minor numbers of the snapshot device will
be 10 (ie.  misc device) and 231, the registration of which has already been
requested.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:07 -08:00
Stas Sergeev
44fd22992c [PATCH] Register the boot-cpu in the cpu maps earlier
Register the boot-cpu in the cpu maps earlier to allow the early printk to
work, and to fix an obscure deadlock at boot.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Adrian Bunk
8cab77a2f8 Kconfig: remove the CONFIG_CC_ALIGN_* options
I don't see any use case for the CONFIG_CC_ALIGN_* options:
- they are only available if EMBEDDED
- people using EMBEDDED will most likely also enable
  CC_OPTIMIZE_FOR_SIZE
- the default for -Os is to disable alignment

In case someone is doing performance comparisons and discovers that the
default settings gcc chooses aren't good, the only sane thing is to discuss
whether it makes sense to change this, not through offering options to change
this locally.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-03-12 23:35:17 +01:00
Heiko Carstens
02df360bf3 [PATCH] remove bogus comment from init/main.c
Remove bogus comment from init function which could lead to the assumption
that cpu_possible_map is setup in smp_prepare_cpus().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10 08:13:12 -08:00
Haren Myneni
9c15e852a5 [PATCH] kexec: fix in free initrd when overlapped with crashkernel region
It is possible that the reserved crashkernel region can be overlapped with
initrd since the bootloader sets the initrd location.  When the initrd
region is freed, the second kernel memory will not be contiguous.  The
Kexec_load can cause an oops since there is no contiguous memory to write
the second kernel or this memory could be used in the first kernel itself
and may not be part of the dump.  For example, on powerpc, the initrd is
located at 36MB and the crashkernel starts at 32MB.  The kexec_load caused
panic since writing into non-allocated memory (after 36MB).  We could see
the similar issue even on other archs.

One possibility is to move the initrd outside of crashkernel region.  But,
the initrd region will be freed anyway before the system is up.  This patch
fixes this issue and frees only regions that are not part of crashkernel
memory in case overlaps.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10 08:13:12 -08:00
Stephen Smalley
99f6d61bda [PATCH] selinux: require AUDIT
Make SELinux depend on AUDIT as it requires the basic audit support to log
permission denials at all.  Note that AUDITSYSCALL remains optional for
SELinux, although it can be useful in providing further information upon
denials.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07 16:12:33 -08:00
Adrian Bunk
3636641bb2 [PATCH] don't allow users to set CONFIG_BROKEN=y
Do not allow people to create configurations with CONFIG_BROKEN=y.

The sole reason for CONFIG_BROKEN=y would be if you are working on fixing a
broken driver, but in this case editing the Kconfig file is trivial.

Never ever should a user enable CONFIG_BROKEN.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03 08:32:03 -08:00
Adrian Bunk
fd279197b1 [PATCH] build kernel/intermodule.c only when required
Build kernel/intermodule.c only when required.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16 23:15:26 -08:00
Linus Torvalds
3f02d072d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-01-15 16:43:29 -08:00
Coywolf Qi Hunt
69c99ac17e [PATCH] abandon gcc 295x main.c tidy
After abandon-gcc-295x.patch, this relocates the error-out-early comment.

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:12 -08:00
Jesper Juhl
92c3504e6e Spelling fix in init/Kconfig for the help of CONFIG_SWAP
Trivial spelling fix s/socalled/so called/

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-15 02:40:08 +01:00
Andi Kleen
4092bdebab [PATCH] i386: Move DOUBLEFAULT config to arch/i386/Kconfig
It has no business being elsewhere and x86-64 doesn't need/want it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 19:05:04 -08:00
Thomas Gleixner
c0a3132963 [PATCH] hrtimer: hrtimer core code
hrtimer subsystem core.  It is initialized at bootup and expired by the timer
interrupt, but is otherwise not utilized by any other subsystem yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:37 -08:00
Adrian Bunk
2308acca65 [PATCH] "tiny-make-id16-support-optional" fixes
It seems the "make UID16 support optional" patch was checked when it
edited the -tiny tree some time ago, but it wasn't checked whether it
still matches the current situation when it was submitted for inclusion
in -mm. This patch fixes the following bugs:
- ARCH_S390X does no longer exist, nowadays this has to be expressed
  through (S390 && 64BIT)
- in five architecture specific Kconfig files the UID16 options
  weren't removed

Additionally, it changes the fragile negative dependencies of UID16 to
positive dependencies (new architectures are more likely to not require
UID16 support).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:23 -08:00
Matt Mackall
64ca9004b8 [PATCH] Make vm86 support optional
This adds an option to remove vm86 support under CONFIG_EMBEDDED.  Saves
about 5k.

This version eliminates most of the #ifdefs of the previous version and
instead uses function stubs in vm86.h.  Also, release_vm86_irqs is moved
from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs
can live together.

$ size vmlinux-baseline vmlinux-novm86
   text    data     bss     dec     hex filename
2920821  523232  190652 3634705  377611 vmlinux-baseline
2916268  523100  190492 3629860  376324 vmlinux-novm86

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
Matt Mackall
708e9a794c [PATCH] tiny: Configure ELF core dump support
configurable support for ELF core dumps

   text    data     bss     dec     hex filename
3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
3325552  528912  190556 4045020  3db8dc vmlinux-no-elf

add/remove: 0/8 grow/shrink: 0/0 up/down: 0/-4424 (-4424)
function                                     old     new   delta
fill_note                                     32       -     -32
maydump                                       58       -     -58
dump_seek                                     67       -     -67
writenote                                    180       -    -180
elf_dump_thread_status                       274       -    -274
fill_psinfo                                  308       -    -308
fill_prstatus                                466       -    -466
elf_core_dump                               3039       -   -3039

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
Matt Mackall
e585e47031 [PATCH] tiny: Make *[ug]id16 support optional
Configurable 16-bit UID and friends support

This allows turning off the legacy 16 bit UID interfaces on embedded platforms.

   text    data     bss     dec     hex filename
3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
3328268  529040  190556 4047864  3dc3f8 vmlinux

From: Adrian Bunk <bunk@stusta.de>

    UID16 was accidentially disabled for !EMBEDDED.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
Matt Mackall
22c4e3084e [PATCH] tiny: Make x86 doublefault handling optional
This adds configurable support for doublefault reporting on x86

add/remove: 0/3 grow/shrink: 0/1 up/down: 0/-13048 (-13048)
function                                     old     new   delta
cpu_init                                     846     786     -60
doublefault_fn                               188       -    -188
doublefault_stack                           4096       -   -4096
doublefault_tss                             8704       -   -8704

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
Andrew Morton
fd285bb54d [PATCH] Abandon gcc-2.95.x
There's one scsi driver which doesn't compile due to weird __VA_ARGS__ tricks
and the rather useful scsi/sd.c is currently getting an ICE.  None of the new
SAS code compiles, due to extensive use of anonymous unions.  The V4L guys are
very good at exploiting the gcc-2.95.x macro expansion bug (_why_ does each
driver need to implement its own debug macros?) and various people keep on
sneaking in anonymous unions, which are rather nice.

Plus anonymous unions are rather useful.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Paul Jackson
c417f0242e [PATCH] cpuset: remove test for null cpuset from alloc code path
Remove a couple of more lines of code from the cpuset hooks in the page
allocation code path.

There was a check for a NULL cpuset pointer in the routine
cpuset_update_task_memory_state() that was only needed during system boot,
after the memory subsystem was initialized, before the cpuset subsystem was
initialized, to catch a NULL task->cpuset pointer.

Add a cpuset_init_early() routine, just before the mem_init() call in
init/main.c, that sets up just enough of the init tasks cpuset structure to
render cpuset_update_task_memory_state() calls harmless.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:44 -08:00
Matt Mackall
10cef60295 [PATCH] slob: introduce the SLOB allocator
configurable replacement for slab allocator

This adds a CONFIG_SLAB option under CONFIG_EMBEDDED.  When CONFIG_SLAB is
disabled, the kernel falls back to using the 'SLOB' allocator.

SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
similar to the original Linux kmalloc allocator that SLAB replaced.  It's
signicantly smaller code and is more memory efficient.  But like all
similar allocators, it scales poorly and suffers from fragmentation more
than SLAB, so it's only appropriate for small systems.

It's been tested extensively in the Linux-tiny tree.  I've also
stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
recommended).

Here's a comparison for otherwise identical builds, showing SLOB saving
nearly half a megabyte of RAM:

$ size vmlinux*
   text    data     bss     dec     hex filename
3336372  529360  190812 4056544  3de5e0 vmlinux-slab
3323208  527948  190684 4041840  3dac70 vmlinux-slob

$ size mm/{slab,slob}.o
   text    data     bss     dec     hex filename
  13221     752      48   14021    36c5 mm/slab.o
   1896      52       8    1956     7a4 mm/slob.o

/proc/meminfo:
                  SLAB          SLOB      delta
MemTotal:        27964 kB      27980 kB     +16 kB
MemFree:         24596 kB      25092 kB    +496 kB
Buffers:            36 kB         36 kB       0 kB
Cached:           1188 kB       1188 kB       0 kB
SwapCached:          0 kB          0 kB       0 kB
Active:            608 kB        600 kB      -8 kB
Inactive:          808 kB        812 kB      +4 kB
HighTotal:           0 kB          0 kB       0 kB
HighFree:            0 kB          0 kB       0 kB
LowTotal:        27964 kB      27980 kB     +16 kB
LowFree:         24596 kB      25092 kB    +496 kB
SwapTotal:           0 kB          0 kB       0 kB
SwapFree:            0 kB          0 kB       0 kB
Dirty:               4 kB         12 kB      +8 kB
Writeback:           0 kB          0 kB       0 kB
Mapped:            560 kB        556 kB      -4 kB
Slab:             1756 kB          0 kB   -1756 kB
CommitLimit:     13980 kB      13988 kB      +8 kB
Committed_AS:     4208 kB       4208 kB       0 kB
PageTables:         28 kB         28 kB       0 kB
VmallocTotal:  1007312 kB    1007312 kB       0 kB
VmallocUsed:        48 kB         48 kB       0 kB
VmallocChunk:  1007264 kB    1007264 kB       0 kB

(this work has been sponsored in part by CELF)

From: Ingo Molnar <mingo@elte.hu>

   Fix 32-bitness bugs in mm/slob.c.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:41 -08:00
NeilBrown
2604b703b6 [PATCH] md: remove personality numbering from md
md supports multiple different RAID level, each being implemented by a
'personality' (which is often in a separate module).

These personalities have fairly artificial 'numbers'.  The numbers
are use to:
 1- provide an index into an array where the various personalities
    are recorded
 2- identify the module (via an alias) which implements are particular
    personality.

Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely).  Module identification can be done using an alias
based on level rather than 'personality' number.

The current 'raid5' modules support two level (4 and 5) but only one
personality.  This slight awkwardness (which was handled in the mapping from
level to personality) can be better handled by allowing raid5 to register 2
personalities.

With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities can be
added independently.

This patch also moves the check for chunksize being non-zero into the ->run
routines for the personalities that need it, rather than having it in core-md.
 This has a side effect of allowing 'faulty' and 'linear' not to have a
chunk-size set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:06 -08:00
Martin Schwidefsky
347a8dc3b8 [PATCH] s390: cleanup Kconfig
Sanitize some s390 Kconfig options.  We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT.  Replace these 6 options by
S390, 64BIT and COMPAT.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:53 -08:00
Arjan van de Ven
37b73c8281 [PATCH] x86/x86_64: mark rodata section read only: generic infrastructure
Generic prep-work for marking the .rodata section readonly:
* Align the rodata section at 4Kb boundary
* call the mark_rodata_ro() function when available

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:36 -08:00
David Howells
b0e15190ea [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU
The attached patch makes the SYSV IPC shared memory facilities use the new
ramfs facilities on a no-MMU kernel.

The following changes are made:

 (1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
     allow the IPC SHM facilities to commune with the tiny-shmem and shmem
     code.

 (2) ramfs files now need resizing using do_truncate() rather than by modifying
     the inode size directly (see shmem_file_setup()). This causes ramfs to
     attempt to bind a block of pages of sufficient size to the inode.

 (3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:32 -08:00
Linus Torvalds
db9edfd7e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04 18:44:12 -08:00
Linus Torvalds
25c862cc9e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild 2006-01-04 16:36:52 -08:00
Greg Kroah-Hartman
712f47cea7 [PATCH] HOTPLUG: always enable the .config option, unless EMBEDDED
With modules, dynamic /dev, and uevents, people really want
CONFIG_HOTPLUG to be enabled in their kernels.  If not, they can still
disable it, but it is discouraged.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Kay Sievers
0296b22813 [PATCH] remove CONFIG_KOBJECT_UEVENT option
It makes zero sense to have hotplug, but not the netlink
events enabled today. Remove this option and merge the
kobject_uevent.h header into the kobject.h header file.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:07 -08:00
Andi Kleen
77d76ea310 [NET]: Small cleanup to socket initialization
sock_init can be done as a core_initcall instead of calling
it directly in init/main.c

Also I removed an out of date #ifdef.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:14 -08:00
Sam Ravnborg
0d54164331 kbuild: remove EXPERIMENTAL tag from Module versioning
Module versioning support has been stable for a loong time
so let's get rid of the EXPERIMENTAL tag.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-12-26 23:04:02 +01:00
David S. Miller
a9c9dff1bc [SPARC64]: Stop putting -finline-limit=XXX into CFLAGS
It was a stupid workaround for the "static inline" vs.
"extern inline" issues of long ago, and it is what causes
schedule() to be inlined like crazy into kernel/sched.c
when -Os is specified.

MIPS and S390 should probably do the same.

Now CC_OPTIMIZE_FOR_SIZE can be safely used on sparc64
once more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-20 14:53:05 -08:00
Linus Torvalds
c45b4f1f1e Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL
Also, disable on sparc64 - a number of people report breakage.  Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.

It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64.  So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 18:52:21 -08:00
Linus Torvalds
0910b444bc Expose "Optimize for size" option for everybody
Let's put my money where my mouth is.  Smaller code is almost always
faster, if only because a single I$ miss ends up leaving a lot of cycles
to make up for.  And system software - kernels in particular - are known
for taking more cache misses than most other kinds.

On my random config, this made the kernel about 10% smaller, and lmbench
seems to say that it's pretty uniformly faster too. Your milage may vary.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-13 11:39:05 -08:00
Adrian Bunk
83bab9a4bb [PATCH] allow KOBJECT_UEVENT=n only if EMBEDDED
KOBJECT_UEVENT=n seems to be a common pitfall for udev users in 2.6.14 .

-mm already contains a bigger patch removing this option that is IMHO
too big for being applied now to 2.6.15-rc.

This patch simply allows KOBJECT_UEVENT=n only if EMBEDDED.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-13 09:18:54 -08:00
Nick Piggin
5bfb5d690f [PATCH] sched: disable preempt in idle tasks
Run idle threads with preempt disabled.

Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?

Might fix the CPU hotplugging hang which Nigel Cunningham noted.

We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.

After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep.  The CPU will continue executing
previous idle and have no chance to call play_dead.

By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.

From: alexs <ashepard@u.washington.edu>

  PPC build fix

From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>

  MIPS build fix

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:33 -08:00
Jens Axboe
3a65dfe8c0 [BLOCK] Move all core block layer code to new block/ directory
drivers/block/ is right now a mix of core and driver parts. Lets move
the core parts to a new top level directory. Al will move the fs/
related block parts to block/ next.

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-04 08:43:35 +01:00
Linus Torvalds
1e4c85f97f Revert "i386: move apic init in init_IRQs"
Commit f2b36db692 causes a bootup hang on
at least one machine.  Revert for now until we understand why.  The old
code may be ugly, but it works.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-31 19:16:17 -08:00
Randy Dunlap
34ad92c238 [PATCH] clarify help text for INIT_ENV_ARG_LIMIT
Try to make the INIT_ENV_ARG_LIMIT help text more readable and
understandable.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:18 -08:00
Eric W. Biederman
f2b36db692 [PATCH] i386: move apic init in init_IRQs
All kinds of ugliness exists because we don't initialize
the apics during init_IRQs.
- We calibrate jiffies in non apic mode even when we are using apics.
- We have to have special code to initialize the apics when non-smp.
- The legacy i8259 must exist and be setup correctly, even
  when we won't use it past initialization.
- The kexec on panic code must restore the state of the io_apics.
- init/main.c needs a special case for !smp smp_init on x86

In addition to pure code movement I needed a couple
of non-obvious changes:
- Move setup_boot_APIC_clock into APIC_late_time_init for
  simplicity.
- Use cpu_khz to generate a better approximation of loops_per_jiffies
  so I can verify the timer interrupt is working.
- Call setup_apic_nmi_watchdog again after cpu_khz is initialized on
  the boot cpu.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:13 -08:00
Jan Beulich
0f3d2bd54f [PATCH] free initrd mem adjustment
Besides freeing initrd memory, also clear out the now dangling pointers to
it, to make sure accidental late use attempts can be detected.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:28 -07:00
Olof Johansson
ffdfc40976 [PATCH] Add rdinit parameter to pick early userspace init
Since early userspace was added, there's no way to override which init to
run from it.  Some people tack on an extra cpio archive with a link from
/init depending on what they want to run, but that's sometimes impractical.

Changing the "init=" to also override the early userspace isn't feasible,
since it is still used to indicate what init to run from disk when early
userspace has completed doing whatever it's doing (i.e.  load filesystem
modules and drivers).

Instead, introduce "rdinit=" and make it override the default "/init" if
specified.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:28 -07:00
Avery, Brian
c1d7ef70a7 [PATCH] Add warning `init=' to init/main.c
I passed init=/mylinuxrc to the kernel on the command line.  The kernel
silently dropped down to exec /sbin/init.  It turned out that /mylinuxrc
had improper permissions.  Without any warning message from the kernel that
something was wrong it took awhile to find the issue.  The patch below adds
a warning.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:23 -07:00
Ingo Molnar
8446f1d391 [PATCH] detect soft lockups
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.

When enabled then per-CPU watchdog threads are started, which try to run
once per second.  If they get delayed for more than 10 seconds then a
callback from the timer interrupt detects this condition and prints out a
warning message and a stack dump (once per lockup incident).  The feature
is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
only gets the debug info out, automatically, and on all CPUs affected by
the lockup.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:17 -07:00
Linus Torvalds
ef88b7dba2 Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild 2005-09-06 00:35:51 -07:00
Rolf Eike Beer
13ae6d81b9 [PATCH] remove driverfs references from init/do_mounts.c
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in this file.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-02 00:57:31 -07:00
Arnaldo Carvalho de Melo
20380731bc [NET]: Fix sparse warnings
Of this type, mostly:

CHECK   net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:32 -07:00
Ryan Anderson
aaebf43320 [PATCH] kbuild: automatically append a short string to the version based upon the git commit
If CONFIG_AUTO_LOCALVERSION is set, the user is using a git-based tree, and the
current HEAD is not referred to by any tags in .git/refs/tags/, append -g and
the first 8 characters of the commit to the version string.  This makes it
easier to use git-bisect, and/or to do a daily build, without trampling on your
older, working builds, or accidentally setting up conflicting sets of modules.

Signed-off-by: Ryan Anderson <ryan@michonline.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-08-10 21:11:23 +02:00
Sam Ravnborg
dbec486632 kconfig: move initramfs options to General Setup
Move initramfs options from Device Drivers | Block Drivers to General Setup
This is a more natural place for this option.

Furthermore separate out intramfs options to usr/Kconfig

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-08-10 20:44:50 +02:00
Andi Kleen
a940199f20 [PATCH] x86_64: Some cleanup in setup64.c
Minor cleanup.

Move things into their include files, remove obsolete includes, fix
indentation, remove obsolete special cases etc.

I also added the per cpu section to asm-generic/sections.h and fixed
init/main.c to use it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:45:58 -07:00
Randy Dunlap
d9fd8a6d44 [PATCH] kernel/cpuset.c: add kerneldoc, fix typos
Add kerneldoc to kernel/cpuset.c

Fix cpuset typos in init/Kconfig

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:06 -07:00
Jesper Juhl
f9f97bc014 [PATCH] kallsyms: clarify KALLSYMS_ALL help text
Clarify the KALLSYMS_ALL help text slightly.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
2005-07-26 10:16:12 +00:00
Sam Ravnborg
bd5bdd875b kbuild: "PREEMPT" in UTS_VERSION
From: Matt Mackall <mpm@selenic.com>

Add PREEMPT to UTS_VERSION where enabled as is done for SMP to make
preempt kernels easily identifiable.
Added SMP PREEMPT as comment in compile.h to force it to be
updated when they change (sam).

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-07-14 20:18:07 +00:00
Paolo 'Blaisorblade' Giarrusso
a2d2b5cb8e [PATCH] remove EXPORT_SYMBOL for root_dev
Remove ROOT_DEV after unexporting it in the previous patch, as requested time
ago by Christoph Hellwig.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-14 09:00:25 -07:00
Andrew Morton
d53d9f16ea [PATCH] name_to_dev_t warning fix
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope.  mount.h
seems like an appropriate choice.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 16:00:58 -07:00
David S. Miller
f7ceba360c [SPARC64]: Add syscall auditing support.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-10 19:29:45 -07:00
Jay Lan
f220ab2a51 [PATCH] Improper initrd failure message at boot time
On system boot up, there was an failure reported to boot.msg:

     <5>Trying to move old root to /initrd ... failed

According to initrd(4) man page, step #7 of BOOT-UP OPERATION
is described as below:
          7. If the normal root file has directory /initrd, device
          /dev/ram0 is moved from  /  to  /initrd.   Otherwise  if
          directory  /initrd  does  not  exist device /dev/ram0 is
          unmounted.

We got service calls from customers concerning about this failure message
at boot time.  Many systems do not have /initrd and thus the message can be
changed in the case of non-existing /initrd so that it does not sound like
a failure of the system.

Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-30 08:45:12 -07:00
Ingo Molnar
f340c0d1a3 [PATCH] Tweak idle thread setup semantics
This patch tweaks idle thread setup semantics a bit: instead of setting
NEED_RESCHED in init_idle(), we do an explicit schedule() before calling
into cpu_idle().

This patch, while having no negative side-effects, enables wider use of
cond_resched()s.  (which might happen in the stock kernel too, but it's
particulary important for voluntary-preempt)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28 14:56:51 -07:00
Domen Puncer
29a1d2d1bc [PATCH] init/do_mounts_initrd.c: fix sparse warning
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:58 -07:00
Venkatesh Pallipadi
8a9e1b0f56 [PATCH] Platform SMIs and their interferance with tsc based delay calibration
Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:08 -07:00
Christoph Lameter
e7c8d5c995 [PATCH] node local per-cpu-pages
This patch modifies the way pagesets in struct zone are managed.

Each zone has a per-cpu array of pagesets.  So any particular CPU has some
memory in each zone structure which belongs to itself.  Even if that CPU is
not local to that zone.

So the patch relocates the pagesets for each cpu to the node that is nearest
to the cpu instead of allocating the pagesets in the (possibly remote) target
zone.  This means that the operations to manage pages on remote zone can be
done with information available locally.

We play a macro trick so that non-NUMA pmachines avoid the additional
pointer chase on the page allocator fastpath.

AIM7 benchmark on a 32 CPU SGI Altix

w/o patches:
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      484.68  100       484.6769     12.01      1.97   Fri Mar 25 11:01:42 2005
  100    27140.46   89       271.4046     21.44    148.71   Fri Mar 25 11:02:04 2005
  200    30792.02   82       153.9601     37.80    296.72   Fri Mar 25 11:02:42 2005
  300    32209.27   81       107.3642     54.21    451.34   Fri Mar 25 11:03:37 2005
  400    34962.83   78        87.4071     66.59    588.97   Fri Mar 25 11:04:44 2005
  500    31676.92   75        63.3538     91.87    742.71   Fri Mar 25 11:06:16 2005
  600    36032.69   73        60.0545     96.91    885.44   Fri Mar 25 11:07:54 2005
  700    35540.43   77        50.7720    114.63   1024.28   Fri Mar 25 11:09:49 2005
  800    33906.70   74        42.3834    137.32   1181.65   Fri Mar 25 11:12:06 2005
  900    34120.67   73        37.9119    153.51   1325.26   Fri Mar 25 11:14:41 2005
 1000    34802.37   74        34.8024    167.23   1465.26   Fri Mar 25 11:17:28 2005

with slab API changes and pageset patch:

Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      485.00  100       485.0000     12.00      1.96   Fri Mar 25 11:46:18 2005
  100    28000.96   89       280.0096     20.79    150.45   Fri Mar 25 11:46:39 2005
  200    32285.80   79       161.4290     36.05    293.37   Fri Mar 25 11:47:16 2005
  300    40424.15   84       134.7472     43.19    438.42   Fri Mar 25 11:47:59 2005
  400    39155.01   79        97.8875     59.46    590.05   Fri Mar 25 11:48:59 2005
  500    37881.25   82        75.7625     76.82    730.19   Fri Mar 25 11:50:16 2005
  600    39083.14   78        65.1386     89.35    872.79   Fri Mar 25 11:51:46 2005
  700    38627.83   77        55.1826    105.47   1022.46   Fri Mar 25 11:53:32 2005
  800    39631.94   78        49.5399    117.48   1169.94   Fri Mar 25 11:55:30 2005
  900    36903.70   79        41.0041    141.94   1310.78   Fri Mar 25 11:57:53 2005
 1000    36201.23   77        36.2012    160.77   1458.31   Fri Mar 25 12:00:34 2005

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
David Woodhouse
1c3f45ab2f Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-02 16:39:11 +01:00
Paolo 'Blaisorblade' Giarrusso
34a1a63e37 [PATCH] uml: add modversions support
Actually, the real support was added by some earlier patches.  Now we simply
re-enable the config.  option.  I've actually tested it and it works well.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 16:46:12 -07:00
Chris Wright
804a6a49d8 Audit requires CONFIG_NET
Audit now actually requires netlink.  So make it depend on CONFIG_NET, 
and remove the inline dependencies on CONFIG_NET.

Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-11 10:52:45 +01:00
David Woodhouse
ea9c102cb0 Add CONFIG_AUDITSC and CONFIG_SECCOMP support for ppc32
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-08 15:56:09 +01:00
David Woodhouse
27b030d58c Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-03 08:14:09 +01:00
Jeff Dike
79d20b14a0 [AUDIT] Update UML audit-syscall-{entry,exit} calls to new prototypes
This patch is for -mm only.  It should probably be included in git-audit,
and should be forwarded to Linus iff git-audit is.

It updates the audit-syscall-{entry,exit} calls to current -mm.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-05-03 07:54:51 +01:00
Matt Mackall
d59745ce3e [PATCH] clean up kernel messages
Arrange for all kernel printks to be no-ops.  Only available if
CONFIG_EMBEDDED.

This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:02 -07:00
Matt Mackall
c8538a7aa5 [PATCH] remove all kernel BUGs
This patch eliminates all kernel BUGs, trims about 35k off the typical
kernel, and makes the system slightly faster.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:01 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00