Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.
However, this is incorrect because guest_xcr0_loaded variable is
read to decide whether to reload hosts xcr0.
In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
assignment, and scheduler decides to preload FPU:
switch_to
{
__switch_to
__math_state_restore
restore_fpu_checking
fpu_restore_checking
if (use_xsave())
fpu_xrstor_checking
xrstor64 with CPU's xcr0 == guests xcr0
Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
Analyzed-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Merge more incoming from Andrew Morton:
- Various fixes which were stalled or which I picked up recently
- A large rotorooting of the AIO code. Allegedly to improve
performance but I don't really have good performance numbers (I might
have lost the email) and I can't raise Kent today. I held this out
of 3.9 and we could give it another cycle if it's all too late/scary.
I ended up taking only the first two thirds of the AIO rotorooting. I
left the percpu parts and the batch completion for later. - Linus
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits)
aio: don't include aio.h in sched.h
aio: kill ki_retry
aio: kill ki_key
aio: give shared kioctx fields their own cachelines
aio: kill struct aio_ring_info
aio: kill batch allocation
aio: change reqs_active to include unreaped completions
aio: use cancellation list lazily
aio: use flush_dcache_page()
aio: make aio_read_evt() more efficient, convert to hrtimers
wait: add wait_event_hrtimeout()
aio: refcounting cleanup
aio: make aio_put_req() lockless
aio: do fget() after aio_get_req()
aio: dprintk() -> pr_debug()
aio: move private stuff out of aio.h
aio: add kiocb_cancel()
aio: kill return value of aio_complete()
char: add aio_{read,write} to /dev/{null,zero}
aio: remove retry-based AIO
...
Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
need this anymore - ki_retry was only called right after the kiocb was
initialized.
This also refactors and trims some duplicated code, as well as cleaning up
the refcounting/error handling a bit.
[akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
[akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ki_key wasn't actually used for anything previously - it was always 0.
Drop it to trim struct kiocb a bit.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct aio_ring_info was kind of odd, the only place it's used is where
it's embedded in struct kioctx - there's no real need for it.
The next patch rearranges struct kioctx and puts various things on their
own cachelines - getting rid of struct aio_ring_info now makes that
reordering a bit clearer.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously, allocating a kiocb required touching quite a few global
(well, per kioctx) cachelines... so batching up allocation to amortize
those was worthwhile. But we've gotten rid of some of those, and in
another couple of patches kiocb allocation won't require writing to any
shared cachelines, so that means we can just rip this code out.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The aio code tries really hard to avoid having to deal with the
completion ringbuffer overflowing. To do that, it has to keep track of
the number of outstanding kiocbs, and the number of completions
currently in the ringbuffer - and it's got to check that every time we
allocate a kiocb. Ouch.
But - we can improve this quite a bit if we just change reqs_active to
mean "number of outstanding requests and unreaped completions" - that
means kiocb allocation doesn't have to look at the ringbuffer, which is
a fairly significant win.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cancelling kiocbs requires adding them to a per kioctx linked list,
which is one of the few things we need to take the kioctx lock for in
the fast path. But most kiocbs can't be cancelled - so if we just do
this lazily, we can avoid quite a bit of locking overhead.
While we're at it, instead of using a flag bit switch to using ki_cancel
itself to indicate that a kiocb has been cancelled/completed. This lets
us get rid of ki_flags entirely.
[akpm@linux-foundation.org: remove buggy BUG()]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This wasn't causing problems before because it's not needed on x86, but
it is needed on other architectures.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously, aio_read_event() pulled a single completion off the
ringbuffer at a time, locking and unlocking each time. Change it to
pull off as many events as it can at a time, and copy them directly to
userspace.
This also fixes a bug where if copying the event to userspace failed,
we'd lose the event.
Also convert it to wait_event_interruptible_hrtimeout(), which
simplifies it quite a bit.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Analagous to wait_event_timeout() and friends, this adds
wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().
Note that unlike the versions that use regular timers, these don't
return the amount of time remaining when they return - instead, they
return 0 or -ETIME if they timed out. because I was uncomfortable with
the semantics of doing it the other way (that I could get it right,
anyways).
If the timer expires, there's no real guarantee that expire_time -
current_time would be <= 0 - due to timer slack certainly, and I'm not
sure I want to know the implications of the different clock bases in
hrtimers.
If the timer does expire and the code calculates that the time remaining
is nonnegative, that could be even worse if the calling code then reuses
that timeout. Probably safer to just return 0 then, but I could imagine
weird bugs or at least unintended behaviour arising from that too.
I came to the conclusion that if other users end up actually needing the
amount of time remaining, the sanest thing to do would be to create a
version that uses absolute timeouts instead of relative.
[akpm@linux-foundation.org: fix description of `timeout' arg]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The usage of ctx->dead was fubar - it makes no sense to explicitly check
it all over the place, especially when we're already using RCU.
Now, ctx->dead only indicates whether we've dropped the initial
refcount. The new teardown sequence is:
set ctx->dead
hlist_del_rcu();
synchronize_rcu();
Now we know no system calls can take a new ref, and it's safe to drop
the initial ref:
put_ioctx();
We also need to ensure there are no more outstanding kiocbs. This was
done incorrectly - it was being done in kill_ctx(), and before dropping
the initial refcount. At this point, other syscalls may still be
submitting kiocbs!
Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
kioctx->users has dropped to 0 and we know no more iocbs could be
submitted.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Freeing a kiocb needed to touch the kioctx for three things:
* Pull it off the reqs_active list
* Decrementing reqs_active
* Issuing a wakeup, if the kioctx was in the process of being freed.
This patch moves these to aio_complete(), for a couple reasons:
* aio_complete() already has to issue the wakeup, so if we drop the
kioctx refcount before aio_complete does its wakeup we don't have to
do it twice.
* aio_complete currently has to take the kioctx lock, so it makes sense
for it to pull the kiocb off the reqs_active list too.
* A later patch is going to change reqs_active to include unreaped
completions - this will mean allocating a kiocb doesn't have to look
at the ringbuffer. So taking the decrement of reqs_active out of
kiocb_free() is useful prep work for that patch.
This doesn't really affect cancellation, since existing (usb) code that
implements a cancel function still calls aio_complete() - we just have
to make sure that aio_complete does the necessary teardown for cancelled
kiocbs.
It does affect code paths where we free kiocbs that were never
submitted; they need to decrement reqs_active and pull the kiocb off the
reqs_active list. This occurs in two places: kiocb_batch_free(), which
is going away in a later patch, and the error path in io_submit_one.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aio_get_req() will fail if we have the maximum number of requests
outstanding, which depending on the application may not be uncommon. So
avoid doing an unnecessary fget().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor refactoring, to get rid of some duplicated code
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nothing used the return value, and it probably wasn't possible to use it
safely for the locked versions (aio_complete(), aio_put_req()). Just
kill it.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These are handy for measuring the cost of the aio infrastructure with
operations that do very little and complete immediately.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the retry-based AIO infrastructure now that nothing in tree
is using it.
We want to remove retry-based AIO because it is fundemantally unsafe.
It retries IO submission from a kernel thread that has only assumed the
mm of the submitting task. All other task_struct references in the IO
submission path will see the kernel thread, not the submitting task.
This design flaw means that nothing of any meaningful complexity can use
retry-based AIO.
This removes all the code and data associated with the retry machinery.
The most significant benefit of this is the removal of the locking
around the unused run list in the submission path.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Zach Brown <zab@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the only in-tree user of aio retry. This will let us
remove the retry code from the aio core.
Removing retry is relatively easy as the USB gadget wasn't using it to
retry IOs at all. It always fully submitted the IO in the context of
the initial io_submit() call. It only used the AIO retry facility to
get the submitter's mm context for copying the result of a read back to
user space. This is easy to implement with use_mm() and a work struct,
much like kvm does with async_pf_execute() for get_user_pages().
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bunch of performance improvements and cleanups Zach Brown and I have
been working on. The code should be pretty solid at this point, though
it could of course use more review and testing.
The results in my testing are pretty impressive, particularly when an
ioctx is being shared between multiple threads. In my crappy synthetic
benchmark, with 4 threads submitting and one thread reaping completions,
I saw overhead in the aio code go from ~50% (mostly ioctx lock
contention) to low single digits. Performance with ioctx per thread
improved too, but I'd have to rerun those benchmarks.
The reason I've been focused on performance when the ioctx is shared is
that for a fair number of real world completions, userspace needs the
completions aggregated somehow - in practice people just end up
implementing this aggregation in userspace today, but if it's done right
we can do it much more efficiently in the kernel.
Performance wise, the end result of this patch series is that submitting
a kiocb writes to _no_ shared cachelines - the penalty for sharing an
ioctx is gone there. There's still going to be some cacheline
contention when we deliver the completions to the aio ringbuffer (at
least if you have interrupts being delivered on multiple cores, which
for high end stuff you do) but I have a couple more patches not in this
series that implement coalescing for that (by taking advantage of
interrupt coalescing). With that, there's basically no bottlenecks or
performance issues to speak of in the aio code.
This patch:
use_mm() is used in more places than just aio. There's no need to mention
callers when describing the function.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use preferable function name which implies using a pseudo-random number
generator.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned. This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.
This is a regression introduced in commit 40716e2924 ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.
To fix this, this patch partially reverts that commit, and adds
alignment code in caller side. And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: <iceman_dvd@yahoo.com>
Cc: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Register layout is the same, so just add the variant to the appropriate
places.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
That nameless-function-arguments thing drives me batty. Fix.
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This exports the amount of anonymous transparent hugepages for each
memcg via the new "rss_huge" stat in memory.stat. The units are in
bytes.
This is helpful to determine the hugepage utilization for individual
jobs on the system in comparison to rss and opportunities where
MADV_HUGEPAGE may be helpful.
The amount of anonymous transparent hugepages is also included in "rss"
for backwards compatibility.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use common help functions to free reserved pages.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I badly screwed up the merge in commit 6fa52ed33b ("Merge tag
'drivers-for-linus' of git://git.kernel.org/pub/.../arm-soc") by
incorrectly taking the arch/arm/mach-omap2/* data fully from the merge
target because the 'drivers-for-linus' branch seemed to be a proper
superset of the duplicate ARM commits.
That was bogus: commit ff931c821b ("ARM: OMAP: clocks: Delay clk inits
atleast until slab is initialized") only existed in head, and the
changes to arch/arm/mach-omap2/timer.c from that commit got list.
Re-doing the merge more carefully, I do think this part was the only
thing I screwed up. Knock wood.
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- make warning smp-safe
- result of atomic _unless_zero functions should be checked by caller
to avoid use-after-free error
- trivial whitespace fix.
Link: https://lkml.org/lkml/2013/4/12/391
Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
[ Removed line-break, changed to use WARN_ON_ONCE() - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more vfs updates from Al Viro:
"A couple of fixes + getting rid of __blkdev_put() return value"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
proc: Use PDE attribute setting accessor functions
make blkdev_put() return void
block_device_operations->release() should return void
mtd_blktrans_ops->release() should return void
hfs: SMP race on directory close()
Pull parisc updates from Helge Deller:
"Main fixes and updates in this patch series are:
- we faced kernel stack corruptions because of multiple delivery of
interrupts
- added kernel stack overflow checks
- added possibility to use dedicated stacks for irq processing
- initial support for page sizes > 4k
- more information in /proc/interrupts (e.g. TLB flushes and number
of IPI calls)
- documented how the parisc gateway page works
- and of course quite some other smaller cleanups and fixes."
* 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: tlb flush counting fix for SMP and UP
parisc: more irq statistics in /proc/interrupts
parisc: implement irq stacks
parisc: add kernel stack overflow check
parisc: only re-enable interrupts if we need to schedule or deliver signals when returning to userspace
parisc: implement atomic64_dec_if_positive()
parisc: use long branch in fork_like macro
parisc: fix NATIVE set up in build
parisc: document the parisc gateway page
parisc: fix partly 16/64k PAGE_SIZE boot
parisc: Provide default implementation for dma_{alloc, free}_attrs
parisc: fix whitespace errors in arch/parisc/kernel/traps.c
parisc: remove the second argument of kmap_atomic
Implements SMP support in Xen on ARM.
Add support for machine reboot and power off via Xen hypercalls.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJRe+dxAAoJEIlPj0hw4a6QSMAP/RxMT+TmQopGYLjCT2ZP7K2T
qMsYzQdSwTc1Yw2ylkjd4Si1MdwtC8oqlmC9pxYXlKUBssX/kApp6tCfZuaWv1gs
warV1DjgLBxM4ypEGOt/cUYZHs7MRF7XiKyAsalFzlS0lCmoS3n2IEmK+pOBqtIT
zs3ObNN3jYHdkDfmY7r4+pglZa2SULGDtdUDh4iruS8S8qq28RJdvyGRtjYZa35E
jUgcC5YVfKYASdDgWQdgVtP/the4JD8aqiJVA3fOvbpc4pHHpReErA3VLnK2UPzE
pHyZ9J0meK0hBt4KB/s3c49v1RiruJ9aoXixhzzsn3iPerByFG/gT6G9emb7ADhm
sct9mTpsUxEGwZ2YwnY6TvJAqvPmn2bycZ//rcG0orBYNJrfWk+MwSUrox3Oj/B1
adWtnNngM/zr/vC/B/NyRiNx71SrESJWCtHSuVoHJ6BxG9S7289CmfeAcOAqqn4d
Vks1u8kGbQp36K6aSG8PDnp98SOHmDXoClSRtmYQdadci9DDkglvBgqlYhvi/8+z
wosBfVosbfiC2FyHXIrbDr0c2bXAH1rFVVGv7s4Fr818OM/9v+bJ9g3j3m8zKQXE
HAu0E5z91Y7/eDM9WFNF9v7r+beFuiXnr2w/WSCEF7mx2qToTR+iAuYGCnJ8D/Eo
2n9BHrsOIIjUXMelfckq
=U5Oc
-----END PGP SIGNATURE-----
Merge tag '3.9-rc3-smp-6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen
Pull ARM Xen SMP updates from Stefano Stabellini:
"This contains a bunch of Xen/ARM specific changes, including some
fixes, SMP support for Xen on ARM, and moving the xenvm machine from
mach-vexpress to mach-virt.
The non-Xen files that are touched are arch/arm/Kconfig, to select
ARM_PSCI on XEN, and arch/arm/boot/dts/Makefile, to build the xenvm
DTB if CONFIG_ARCH_VIRT.
Highlights:
- Move xenvm to mach-virt.
- Implement SMP support in Xen on ARM.
- Add support for machine reboot and power off via Xen hypercalls"
* tag '3.9-rc3-smp-6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen:
xen/arm: remove duplicated include from enlighten.c
xen/arm: use sched_op hypercalls for machine reboot and power off
xenvm: add a simple PSCI node and a second cpu
xen/arm: XEN selects ARM_PSCI
xen: move the xenvm machine to mach-virt
xen/arm: SMP support
xen/arm: implement HYPERVISOR_vcpu_op
xen/arm: actually pass a non-NULL percpu pointer to request_percpu_irq
- Some refactoring, cleanups and small improvements
from Sjur Brændeland. The improvements are mainly
about better supporting varios virtio properties
(such as virtio's config space, status and features).
I now see that I messed up while commiting one of Sjur's
patches and erroneously put myself as the author, as well
as letting a nasty typo sneak in. I will not fix this in
order to avoid rebasing the patches. Sjur - sorry!
- A new remoteproc driver for OMAP-L13x (technically a
DaVinci platform) from Robert Tivy.
- Extend OMAP support to OMAP5 as well, from Vincent Stehlé.
- Fix Kconfig VIRTUALIZATION dependency, from Suman Anna
(a non-critical fix which arrived late during the rc cycle).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRiPCwAAoJELLolMlTRIoMRoQP/ivAF3r9jPaMizky82T0mDs0
kUPz6sz0zwb412h/IVtCo0ChJ1Jv4rVgo5u4klzDS5wfOutRiwDFIa8ZQF484nRb
gTIFHtqy+xk82aHAR9PWi2G3QWJ9hplZ7m52aaOIG7E6BaY3EfWL7fnt5QGBAb/O
vcR3rrj7QNQcB963PQl7cYWSX966ipzfX1g7VxFk/Ah8m9rjQp0xZgg3a5svGmq7
5iQcdxiXm63RAfgN9kbZrxWjX5/7m1N1WOfK5CE1H2jnGObttNdhN5xr7Ky5TXyN
sVJvoVhIbylSzDVl/LH6v/V9T/is+VCZOPs+erXVGv2vGctNY5Cs+RAVPF4bz4UC
R9/LDRbdZtbvcKo0TiPAjsIN3t+Rg1EDBnjOImi5kN5bYASpfcoRE0hpicL51S01
HzdT5+k3Xo5RS0EWakITc1ecc+7kfMVjLjn59/Im+bWVGhv2Lzq1pCFQ+XAcfvL9
FrQCGYCn8QZFWBHgeDRzg1ysK4hDNPo1UkEbLTTXjFcaMLeMczSSRagCi4Fk2RjL
QYmtDzKbm7Tc03Yi6ac7A8I2lgeQRsXNENLxtTtONu9rNKf4O0Of86KCLhtUtDqb
6SUxn2ZvG+maXwyrqbIjak+CKphimOnOEEIxur3viuKLs4sM8yuZCzU+v56bPbdb
Pqcdo7VOhqAwTBoW3NUp
=IUfO
-----END PGP SIGNATURE-----
Merge tag 'remoteproc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc update from Ohad Ben-Cohen:
- Some refactoring, cleanups and small improvements from Sjur
Brændeland. The improvements are mainly about better supporting
varios virtio properties (such as virtio's config space, status and
features). I now see that I messed up while commiting one of Sjur's
patches and erroneously put myself as the author, as well as letting
a nasty typo sneak in. I will not fix this in order to avoid
rebasing the patches. Sjur - sorry!
- A new remoteproc driver for OMAP-L13x (technically a DaVinci
platform) from Robert Tivy.
- Extend OMAP support to OMAP5 as well, from Vincent Stehlé.
- Fix Kconfig VIRTUALIZATION dependency, from Suman Anna (a
non-critical fix which arrived late during the rc cycle).
* tag 'remoteproc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc: fix kconfig dependencies for VIRTIO
remoteproc/davinci: add a remoteproc driver for OMAP-L13x DSP
remoteproc: support default firmware name in rproc_alloc()
remoteproc/omap: support OMAP5 too
remoteproc: set vring addresses in resource table
remoteproc: support virtio config space.
remoteproc: perserve resource table data
remoteproc: calculate max_notifyid by counting vrings
remoteproc: code cleanup of resource parsing
remoteproc: parse STE-firmware and find resource table address
remoteproc: add find_loaded_rsc_table firmware ops
remoteproc: refactor rproc_elf_find_rsc_table()
- Make rpmsg process all pending messages instead of just
one, from Robert Tivy
- Fix Kconfig dependency on VIRTUALIZATION, from Suman.
Note: this was submitted late during the 3.9 rc cycle and it
seemed appropriate to wait with it for the merge window.
- Belated addition of an rpmsg entry to the MAINTAINERS file.
People seem to look for this.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRiOZyAAoJELLolMlTRIoM7ToQAIsuq4S/YWfvZGDmHg7wsyGn
DTtTEGo61mE+B0yJ0+fGw/b1LLjFrai9Ugr5zKyntwNWjuVJwtAtfvGcjoOSs3pK
CJcpAADp+y92Dl5/6iLZwwQSMb2/OD0dfltTyvehoj4GOyVTmXZRwqLcsClZUYL3
ISrxNaRKoEGyyGSUSRG2B6g1YvuoY39UFmFOXAMHitT0Oq2rgZR/RzFXmk/FUsPy
sTxmevaLuN65RufqO2qAprdIZDk4TZf+2YC2BAEdHHiuFhykZukUxLCRH9L4DLhw
j+EHWH8vLrfLCHb1FFR6Qj3R0mMsHiYHvfsdHIbRZGB9iJfszEMi53otGUdY3EF8
bj7aAv/aI71Mg+OC98TWcCCRzdFowjfMKVTSpDwp6hNj0MrYGbmrmuqpDPiq+cmo
axeUJWeytMEI/N1Lh+uJBMUa3Qa5j++iPur/vw2EZj/KWqGFTFACKagUp7fMT8QD
7qlQF/6Dfmh5WzI5U7RrryaP6h/hKAFNIzNoiQRGFl+eybLULm+/6NluTPQunNbo
7EgBpS+5ktpGu6fbDr8z+CUYrV4b6wXNinj7qjnGOwYJlQiyNaRtM3cuL34ZIP3q
g2p23stIkjOlCkDt25SOBz6vrLTPzybxCaL8ipSl3HAOXDAKxT+055IhAYeYCJQ1
op1V1BIAw6TTxhvL3del
=dDx5
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg
Pull rpmsg changes from Ohad Ben-Cohen:
"A small pull request consisting of:
- Make rpmsg process all pending messages instead of just one, from
Robert Tivy
- Fix Kconfig dependency on VIRTUALIZATION, from Suman.
Note: this was submitted late during the 3.9 rc cycle and it seemed
appropriate to wait with it for the merge window.
- Belated addition of an rpmsg entry to the MAINTAINERS file. People
seem to look for this"
* tag 'rpmsg-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
rpmsg: fix kconfig dependencies for VIRTIO
MAINTAINERS: add rpmsg entry
rpmsg: process _all_ pending messages in rpmsg_recv_done
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRiONpAAoJELLolMlTRIoMChkP/RW1KNCJkM3iTFhgeSDkdFOn
4QXUCDlVUg8OqiUgCsdgW/CdwID8cRMBZ9Lif/RiNQlgG9I0RqiHPcrGsYaimOX0
16h6Fuoldolem2p7G5++zdwKsLRe4Yl06sXBsIHrUZMwj+TmpWzzZyIEKLg+qQAF
T2ku2nHJWtoPQGNrTYq8VUWz1tKv26w6jAJDZshzsbTinaMl20gUOynz1dsPjeIP
RH5ycmA9NVaNq89PIodbawG6o2ztMMa6QcokqmOF0p/TQC9ByHoFMoV/mPjPuvcm
2jpgPr2wjI7Ya7yoUpGVIL76OUadAZKzAojyOA3VWXYKyrTE7i/1tlOwuSU8x2Ky
mrlXfhJlFbVChO1jYV8QTnrdQlCnIFwlHsVF/XXw1NGpJoI3bYVgmpg9tmQ3vVo5
gjFukfGG34mVGiVfHFJIgh0ZCcVbyAeqlEN+/C5Zh2F5Kv/XYXUthJgy1LLFpYp7
rNLCX2AARp+aNAkZ2Gvumd0hUfI2BBxiPmMnEL8yitq759slYsbrSagTVD19LUKL
RwuRZ3gJV4oNnkttmnshrg7FK7o0WJ46UZ1T6RrVtrBbVti21N6uMcY43pu5FHyG
1PjYl0/IbUjL80K6yxpSWYlbhzlhCMkm7eRz//Sss3x5chqaRwwrmSN3h3rgv+4p
UkOjjySS1tfVV7F8O95i
=vZ99
-----END PGP SIGNATURE-----
Merge tag 'hwspinlock-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock
Pullhwspinlock update from Ohad Ben-Cohen:
"A single patch from Vincent extending OMAP's hwspinlock support to
OMAP5"
* tag 'hwspinlock-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
hwspinlock/omap: support OMAP5 as well
Pull libata maintainership change from Tejun Heo.
Tejun is taking over from Jeff, after many many years. Thanks Jeff.
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: change maintainer
Default kernel stack size on parisc is 16k. During tests we found that the
kernel stack can easily grow beyond 13k, which leaves 3k left for irq
processing.
This patch adds the possibility to activate an additional stack of 16k per CPU
which is being used during irq processing. This implementation does not yet
uses this irq stack for the irq bh handler.
The assembler code for call_on_stack was heavily cleaned up by John
David Anglin.
CC: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Add the CONFIG_DEBUG_STACKOVERFLOW config option to enable checks to
detect kernel stack overflows.
Stack overflows can not be detected reliable since we do not want to
introduce too much overhead.
Instead, during irq processing in do_cpu_irq_mask() we check kernel
stack usage of the interrupted kernel process. Kernel threads can be
easily detected by checking the value of space register 7 (sr7) which
is zero when running inside the kernel.
Since THREAD_SIZE is 16k and PAGE_SIZE is 4k, reduce the alignment of
the init thread to the lower value (PAGE_SIZE) in the kernel
vmlinux.ld.S linker script.
Signed-off-by: Helge Deller <deller@gmx.de>