Commit Graph

37 Commits

Author SHA1 Message Date
Ingo Molnar
3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Kirill A. Shutemov
dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
David Hildenbrand
70ffdb9393 mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:15 +02:00
Linus Torvalds
33692f2759 vm: add VM_FAULT_SIGSEGV handling support
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.

Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29 10:51:32 -08:00
Johannes Weiner
759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
David Rientjes
c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Paul Mundt
90eed7d87b sh: Fix up recursive fault in oops with unset TTB.
Presently the oops code looks for the pgd either from the mm context or
the cached TTB value. There are presently cases where the TTB can be
unset or otherwise cleared by hardware, which we weren't handling,
resulting in recursive faults on the NULL pgd. In these cases we can
simply reload from swapper_pg_dir and continue on as normal.

Cc: stable@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-07-25 13:11:13 +09:00
Paul Mundt
d8fd35fc58 sh64: Fix up vmalloc fault range check.
With the previous attempt reverted this switches to conditionalizing the
end address. Nominally VMALLOC_END, but extended for P3_ADDR_MAX in the
store queue case.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-18 20:01:16 +09:00
Paul Mundt
c3e0af9879 Revert "sh: Ensure fixmap and store queue space can co-exist."
This reverts commit 20e7c297ef.
With store queues enabled the area above P4SEG has special properties
from the MMU's point of view, which was causing fixmap failure. We'll
have to do something else to satisfy the vmalloc range check.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-18 19:30:05 +09:00
Paul Mundt
28080329ed sh: Enable shared page fault handler for _32/_64.
This moves the now generic _32 page fault handling code to a shared place
and adapts the _64 implementation to make use of it.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:33:28 +09:00
Paul Mundt
811d50cb43 sh: Move in the SH-5 TLB miss.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-28 13:18:50 +09:00
Paul Mundt
0f1a394ba6 sh: lockless UTLB miss fast-path.
With the refactored update_mmu_cache() introduced in older kernels,
there's no longer any need to take the page_table_lock in this path,
so simply drop it completely.

Without this, performance degradation is seen on SMP on heavily
threaded workloads that don't use the split ptlock, and ultimately
we have no reason to contend for the lock in the first place.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-11-19 13:05:18 +09:00
Paul Mundt
1c6b2ca5e0 sh: Kill off UTLB flush in fast-path.
The __do_page_fault() fast-path contains a UTLB flush in order to
force an ITLB reload, this isn't needed in practice as the ITLB is
auto-reloaded from the UTLB anyways, which is already displaced by
the manual 'ldtlb' in the update_mmu_cache() path.

This provides a measurable speed up in the TLB miss fast-path.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-11-19 13:00:32 +09:00
Serge E. Hallyn
b460cbc581 pid namespaces: define is_global_init() and is_container_init()
is_init() is an ambiguous name for the pid==1 check.  Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns.  But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

	2.6.22-rc4-mm2-pidns1:
	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
	  global init (/sbin/init) process. This would improve performance
	  and remove dependence on the task_pid().

	2.6.21-mm2-pidns2:

	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
	  This way, we kill only the cgroup if the cgroup's init has a
	  bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:37 -07:00
Will Schmidt
dcca2bde4f During VM oom condition, kill all threads in process group
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.

Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or be otherwise handled, and makes it very obvious
that something has gone wrong.

This change allows the entire process group to be taken down, rather
than just the one thread.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:52 -07:00
Paul Mundt
06f862c8ce sh: Fix pgd mismatch from cached TTB in unhandled fault.
When reading the cached TTB value and extracting the pgd, we
accidentally applied a __va() to it and bumped it off in to bogus
space which ended up causing multiple faults in the error path.

Fix it up so unhandled faults don't do strange and highly unorthodox
things when oopsing.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-08-01 16:39:51 +09:00
Nick Piggin
83c54070ee mm: fault feedback #2
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer.  This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).

[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Paul Mundt
0630e45c88 sh: Check oops_may_print() in unhandled fault.
Only print out pgd/pte data in the oops path if oops_may_print()
holds true. Follows the i386 implementation.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-06-18 19:02:47 +09:00
Christoph Hellwig
fce692e798 sh: revert addition of page fault notifiers
Just at the time you added them on sh we're removing them from other
architectures. As there's no user yet this patch just removes them
completely. Once you actually have a kprobes patch it should follow
the direct call to kprobes_fault_handler model that powerpc, s390 and
sparc64 employ in 2.6.22-rc1 and that I'm updating other architectures
to.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-21 14:32:10 +09:00
Paul Mundt
b8947444a7 sh: Shut up compiler warnings in __do_page_fault().
GCC doesn't seem to be able to figure this one out for
itself, so just shut it up..

  CC      arch/sh/mm/fault.o
arch/sh/mm/fault.c: In function '__do_page_fault':
arch/sh/mm/fault.c:288: warning: 'ptl' may be used uninitialized in this function

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-14 09:18:34 +09:00
Paul Mundt
b118ca572d sh: Convert to common die chain.
This went in immediately after SH added the die chain notifiers,
so move over to that instead..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 10:55:38 +09:00
Paul Mundt
3a2e117e22 sh: Add die chain notifiers.
Add the atomic die chains in, kprobes needs these.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-07 02:11:57 +00:00
Paul Mundt
db2e1fa3f0 sh: Revert TLB miss fast-path changes that broke PTEA parts.
This ended up causing problems for older parts (particularly ones
using PTEA). Revert this for now, it can be added back in once it's
had some more testing.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-02-14 14:13:10 +09:00
Paul Mundt
afbfb52e47 sh: stacktrace/lockdep/irqflags tracing support.
Wire up all of the essentials for lockdep..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-06 10:45:40 +09:00
Paul Mundt
bca7c20764 sh: Get the PGD right in oops case with 64-bit PTEs.
Previously this was using a static pgd shift in the reporting
code, simply flip this to PGDIR_SHIFT which does the right
thing depending on varying PTE magnitudes on the SH-X2 MMU.

While we're at it, and since it's been recently added, use
get_TTB() for fetching the TTB, rather than the open coded
instructions.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-06 10:45:39 +09:00
Stuart Menefy
9b3a53ab76 sh: TLB miss fast-path optimizations.
Handle simple TLB miss faults which can be resolved completely
from the page table in assembler.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-06 10:45:38 +09:00
Stuart Menefy
99a596f93b sh: pmd rework.
Remove extra bits from the pmd structure and store a kernel logical
address rather than a physical address. This allows it to be directly
dereferenced. Another piece of wierdness inherited from x86.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-06 10:45:38 +09:00
Stuart Menefy
b5a1bcbee4 sh: Set up correct siginfo structures for page faults.
Remove the previous saving of fault codes into the thread_struct
as they are never used, and appeared to be inherited from x86.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-12-06 10:45:38 +09:00
Sukadev Bhattiprolu
f400e198b2 [PATCH] pidspace: is_init()
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Jason Baron
df67b3daea [PATCH] make PROT_WRITE imply PROT_READ
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.

While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach.  For
example, in arch/alpha/mm/fault.c:

"
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
        }
"

Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest.  I've verified the patch on
ia64, x86_64 and x86.

Additional discussion:

Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write.  Thus, write only is not supported in h/w.

Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV.  That check is enforced in
arch/blah/mm/fault.c.  However, if i first write that page it will fault in
and the pte will be set to read/write.  Thus, any subsequent reads to the page
will succeed.  It is this inconsistency in behavior that this patch is
attempting to address.  Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV.  Thus, any arbitrary read
on a page can potentially result in a SEGV.

According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.

The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations.  This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable.  If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Paul Mundt
0f08f33808 sh: More cosmetic cleanups and trivial fixes.
Nothing exciting here, just trivial fixes..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-09-27 17:03:56 +09:00
Paul Mundt
f647d33f87 sh: Fix split ptlock for user mappings in __do_page_fault().
There was a bug that got introduced when the split ptlock changes
went in where mm could be unintialized for user mappings, this
fixes it up..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-09-27 15:30:24 +09:00
Paul Mundt
26ff6c11ef sh: page table alloc cleanups and page fault optimizations.
Cleanup of page table allocators, using generic folded PMD and PUD
helpers. TLB flushing operations are moved to a more sensible spot.

The page fault handler is also optimized slightly, we no longer waste
cycles on IRQ disabling for flushing of the page from the ITLB, since
we're already under CLI protection by the initial exception handler.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-09-27 15:13:36 +09:00
Paul Mundt
298476220d sh: Add control register barriers.
Currently when making changes to control registers, we
typically need some time for changes to take effect (8
nops, generally).  However, for sh4a we simply need to
do an icbi..

This is a simple patch for implementing a general purpose
ctrl_barrier() which functions as a control register write
barrier. There's some additional documentation in the patch
itself, but it's pretty self explanatory.

There were also some places where we were not doing the
barrier, which didn't seem to have any adverse effects on
legacy parts, but certainly did on sh4a. It's safer to have
the barrier in place for legacy parts as well in these cases,
though this does make flush_tlb_all() more expensive (by an
order of 8 nops).  We can ifdef around the flush_tlb_all()
case for now if it's clear that all legacy parts won't have
a problem with this.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-09-27 14:57:44 +09:00
Hugh Dickins
60ec558549 [PATCH] mm: i386 sh sh64 ready for split ptlock
Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.

The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
screen fits inside the one page table, as indeed it does.

The sh __do_page_fault: which handles both kernel faults (without lock) and
user mm faults (locked - though it set_pte without locking before).

The sh64 flush_cache_range and helpers: which wrongly thought callers held
page_table_lock before (only its tlb_start_vma did, and no longer does so);
moved the flush loop down, and adjusted the large versus small range decision
to consider a range which spans page tables as large.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:41 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00