Commit Graph

129 Commits

Author SHA1 Message Date
Oleg Nesterov
486ccb05fd [PATCH] elf_core_dump: don't take tasklist_lock
do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep
in wait_for_completion(&mm->core_done) at this point, so we can use RCU
locks.

Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:14 -07:00
Kirill Korotaev
3b9b8ab65d [PATCH] Fix unserialized task->files changing
Fixed race on put_files_struct on exec with proc.  Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Andrew Morton
8d6b5eeea5 [PATCH] binfmt_elf: consistently use loff_t
As David Howells <dhowells@redhat.com> points out, binfmt_elf sometimes uses
off_t, sometimes uses loff_t.  Use loff_t throughout.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:53 -07:00
Andi Kleen
c16b63e09d [PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set
Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but
extended.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
David Howells
b4cac1a022 [PATCH] FDPIC: Move roundup() into linux/kernel.h
Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.

[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:22 -07:00
Chuck Ebbert
ce51059be5 [PATCH] binfmt_elf: fix checks for bad address
Fix check for bad address; use macro instead of open-coding two checks.

Taken from RHEL4 kernel update.

From: Ernie Petrides <petrides@redhat.com>

  For background, the BAD_ADDR() macro should return TRUE if the address is
  TASK_SIZE, because that's the lowest address that is *not* valid for
  user-space mappings.  The macro was correct in binfmt_aout.c but was wrong
  for the "equal to" case in binfmt_elf.c.  There were two in-line validations
  of user-space addresses in binfmt_elf.c, which have been appropriately
  converted to use the corrected BAD_ADDR() macro in the patch you posted
  yesterday.  Note that the size checks against TASK_SIZE are okay as coded.

  The additional changes that I propose are below.  These are in the error
  paths for bad ELF entry addresses once load_elf_binary() has already
  committed to exec'ing the new image (following the tearing down of the
  task's original address space).

  The 1st hunk deals with the interp-side of the outer "if".  There were two
  problems here.  The printk() should be removed because this path can be
  triggered at will by a bogus interpreter image created and used by a
  malicious user.  Further, the error code should not be ENOEXEC, because that
  causes the loop in search_binary_handler() to continue trying other exec
  handlers (twice, in fact).  But it's too late for this to work correctly,
  because the user address space has already been torn down, and an exec()
  failure cannot be returned to the user code because the code no longer
  exists.  The only recovery is to force a SIGSEGV, but it's best to terminate
  the search loop immediately.  I somewhat arbitrarily chose EINVAL as a
  fallback error code, but any error returned by load_elf_interp() will
  override that (but this value will never be seen by user-space).

  The 2nd hunk deals with the non-interp-side of the outer "if".  There were
  two problems here as well.  The SIGSEGV needs to be forced, because a prior
  sigaction() syscall might have set the associated disposition to SIG_IGN.
  And the ENOEXEC should be changed to EINVAL as described above.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:26:59 -07:00
Jesper Juhl
785d55708c [PATCH] binflt_elf: remove more casts
Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Jesper Juhl
f4e5cc2c44 [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts
Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
casts of kmalloc() return values in the same file.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Miklos Szeredi
c89681ed7d [PATCH] remove steal_locks()
This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
  /* in child */
  lock
  execve
  lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

  - have multiple threads during exec (LinuxThreads doesn't kill other
    threads, the app may do it with pthread_kill_other_threads_np())
  - rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

  - use clone(CLONE_FILES)
  - rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:57 -07:00
Andi Kleen
913bd90601 [PATCH] x86_64: Increase the variability of the process stack on 64bit architectures
8MB is not really very random, use 1GB (or more with larger page sizes)
instead.

Also use the low bits of the random generator output now instead of
throwing them away.

Only enabled on x86-64 right now. Other architectures need to add
a suitable STACK_RND_MASK

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Carsten Otte
5514854812 [PATCH] remove needless check in binfmt_elf.c
Local variable i is unsigned int and thus cannot be negative.

(akpm: unsigneds shouldn't be called `i'.  This value cannot possibly be
negative anyway).

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:01 -08:00
Oliver Neukum
11b0b5abb2 [PATCH] use kzalloc and kcalloc in core fs code
Signed-off-by: Oliver Neukum <oliver@neukum.name>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:00 -08:00
Suresh Siddha
5342fba541 [PATCH] x86_64: Check for bad elf entry address.
Fixes a local DOS on Intel systems that lead to an endless
recursive fault.  AMD machines don't seem to be affected.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26 09:53:30 -08:00
Arjan van de Ven
858119e159 [PATCH] Unlinline a bunch of other functions
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:06 -08:00
Jesper Juhl
74da6cd062 missing printk loglevel and tiny tiny whitespace change in binfmt_elf()
Patch adds a mising printk loglevel (I think KERN_WARNING is appropriate
here) in fs/binfmt_elf.c, and while I was there I made some tiny tiny tiny
adjustments to whitespacing in the neighborhood.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-11 01:51:26 +01:00
Jesper Juhl
792db3af38 [PATCH] fs/binfmt_elf: Remove unneeded kmalloc() return value casts
Remove unneeded casts of kmalloc() return value in binfmt_elf.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:02:01 -08:00
Matt Mackall
708e9a794c [PATCH] tiny: Configure ELF core dump support
configurable support for ELF core dumps

   text    data     bss     dec     hex filename
3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
3325552  528912  190556 4045020  3db8dc vmlinux-no-elf

add/remove: 0/8 grow/shrink: 0/0 up/down: 0/-4424 (-4424)
function                                     old     new   delta
fill_note                                     32       -     -32
maydump                                       58       -     -58
dump_seek                                     67       -     -67
writenote                                    180       -    -180
elf_dump_thread_status                       274       -    -274
fill_psinfo                                  308       -    -308
fill_prstatus                                466       -    -466
elf_core_dump                               3039       -   -3039

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
David Gibson
dda6ebde96 [PATCH] Fix handling of ELF segments with zero filesize
mmap() returns -EINVAL if given a zero length, and thus elf_map() in
binfmt_elf.c does likewise if it attempts to map a (page-aligned) ELF
segment with zero filesize.  Such a situation never arises with the default
linker scripts, but there's nothing inherently wrong with zero-filesize
(but non-zero memsize) ELF segments.  Custom linker scripts can generate
them, and the kernel should be able to map them; this patch makes it so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:58 -08:00
Jesper Juhl
f99d49adf5 [PATCH] kfree cleanup: fs
This is the fs/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in fs/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:54:06 -08:00
Eric W. Biederman
a928972864 [PATCH] Don't uselessly export task_struct to userspace in core dumps
task_struct is an internal structure to the kernel with a lot of good
information, that is probably interesting in core dumps.  However there is
no way for user space to know what format that information is in making it
useless.

I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
used anywhere else.  So I would be surprised if anyone notices it is
missing.

In addition exporting kernel pointers to all the interesting kernel data
structures sounds like the very definition of an information leak.  I
haven't a clue what someone with evil intentions could do with that
information, but in any attack against the kernel it looks like this is the
perfect tool for aiming that attack.

So since NT_TASKSTRUCT is useless as currently defined and is potentially
dangerous, let's just not export it.

(akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was
using NT_TASKSTRUCT).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:18 -08:00
Hugh Dickins
404351e67a [PATCH] mm: mm_init set_mm_counters
How is anon_rss initialized?  In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type.  And how is rss
initialized?  By set_mm_counter, all over the place.  Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:38 -07:00
akpm@osdl.org
6de505173e [PATCH] binfmt_elf bss padding fix
Nir Tzachar <tzachar@cs.bgu.ac.il> points out that if an ELF file specifies a
zero-length bss at a whacky address, we cannot load that binary because
padzero() tries to zero out the end of the page at the whacky address, and
that may not be writeable.

See also http://bugzilla.kernel.org/show_bug.cgi?id=5411

So teach load_elf_binary() to skip the bss settng altogether if the elf file
has a zero-length bss segment.

Cc: Roland McGrath <roland@redhat.com>
Cc: Daniel Jacobowitz <dan@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 09:46:54 -07:00
Wolfgang Wander
1363c3cd86 [PATCH] Avoiding mmap fragmentation
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

  1) the free_area_cache is used to continue a search for memory where
     the last search ended.  Before the change new areas were always
     searched from the base address on.

     So now new small areas are cluttering holes of all sizes
     throughout the whole mmap-able region whereas before small holes
     tended to close holes near the base leaving holes far from the base
     large and available for larger requests.

  2) the free_area_cache also is set to the location of the last
     munmap-ed area so in scenarios where we allocate e.g.  five regions of
     1K each, then free regions 4 2 3 in this order the next request for 1K
     will be placed in the position of the old region 3, whereas before we
     appended it to the still active region 1, placing it at the location
     of the old region 2.  Before we had 1 free region of 2K, now we only
     get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache.  If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
Daniel Jacobowitz
5db92850d3 [PATCH] Fix large core dumps with a 32-bit off_t
The ELF core dump code has one use of off_t when writing out segments.
Some of the segments may be passed the 2GB limit of an off_t, even on a
32-bit system, so it's important to use loff_t instead.  This fixes a
corrupted core dump in the bigcore test in GDB's testsuite.

Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-16 09:02:59 -07:00
Greg Kroah-Hartman
a84a505956 [PATCH] fix Linux kernel ELF core dump privilege elevation
As reported by Paul Starzetz <ihaquer@isec.pl>

Reference: CAN-2005-1263

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-05-16 21:07:05 -07:00
Roland McGrath
18c8baff8f [PATCH] Fix error recovery path for arch_setup_additional_pages
If arch_setup_additional_pages fails, the error path will do some double-frees.
This fixes it.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-28 15:17:19 -07:00
Benjamin Herrenschmidt
547ee84cea [PATCH] ppc64: Improve mapping of vDSO
This patch reworks the way the ppc64 is mapped in user memory by the kernel
to make it more robust against possible collisions with executable
segments.  Instead of just whacking a VMA at 1Mb, I now use
get_unmapped_area() with a hint, and I moved the mapping of the vDSO to
after the mapping of the various ELF segments and of the interpreter, so
that conflicts get caught properly (it still has to be before
create_elf_tables since the later will fill the AT_SYSINFO_EHDR with the
proper address).

While I was at it, I also changed the 32 and 64 bits vDSO's to link at
their "natural" address of 1Mb instead of 0.  This is the address where
they are normally mapped in absence of conflict.  By doing so, it should be
possible to properly prelink one it's been verified to work on glibc.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:35 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00