Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
the function is only suitable for offlined CPUs, and if called
inappropriately, the kernel should scream aloud.
[akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kill_off_processes() might miss a valid process, this is because checking
for process->mm is not enough. Process' main thread may exit or detach
its mm via use_mm(), but other threads may still have a valid mm.
To catch this we use find_lock_task_mm(), which walks up all threads and
returns an appropriate task (with task lock held).
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep, so let's
take the task lock while we care about its mm.
Note that we should also use find_lock_task_mm() to check all process'
threads for a valid mm, but for uml we'll do it in a separate patch.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Traversing the tasks requires holding tasklist_lock, otherwise it is
unsafe.
p.s. However, I'm not sure that calling os_kill_ptraced_process() in the
atomic context is correct. It seem to work, but please take a closer
look.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov found an interesting deadlock possibility:
> sysrq_showregs_othercpus() does smp_call_function(showacpu)
> and showacpu() show_stack()->decode_address(). Now suppose that IPI
> interrupts the task holding read_lock(tasklist).
To fix this, blackfin should not grab the write_ variant of the
tasklist lock, read_ one is enough.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch fixes two problems:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so we have to take the task lock while handle its mm.
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
To catch this we use find_lock_task_mm(), which walks up all
threads and returns an appropriate task (with task lock held).
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.
To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).
clear_tasks_mm_cpumask() has the issue fixed, so let's use it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current CPU hotplug code has some task->mm handling issues:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
We can't use get_task_mm()/mmput() pair as mmput() might sleep,
so we must take the task lock while handle its mm.
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
To fix this we would need to use find_lock_task_mm(), which would
walk up all threads and returns an appropriate task (with task
lock held).
clear_tasks_mm_cpumask() has all the issues fixed, so let's use it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.
To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).
clear_tasks_mm_cpumask() has this issue fixed, so let's use it.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Child should wake up the parent from vfork() only after finishing all
operations with shared mm. There is no sense in using
CLONE_CHILD_CLEARTID together with CLONE_VFORK, but it looks more accurate
now.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, nonlinear mappings can not be distinguished from ordinary
mappings. This patch adds into /proc/pid/smaps line "Nonlinear: <size>
kB", where size is amount of nonlinear ptes in vma, this line appears only
if VM_NONLINEAR is set. This information may be useful not only for
checkpoint/restore project.
Requested by Pavel Emelyanov.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently smaps reports migration entries as "swap", as result "swap" can
appears in shared mapping.
This patch converts migration entries into pages and handles them as usual.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.
The problem with the working set detection is multilateral. In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use. The mincore syscall I though could help with this did not.
First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump. Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.
Note, that issue with swap pages is critical -- we must dump swap pages to
image file. But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file. The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).
Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages. But, it doesn't distinguish cow-ed file pages
from not cow-ed.
I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.
[comment stolen from Pavel Emelyanov's v1 patch]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- use int fpr priority and nice, since task_nice()/task_prio() return that
- field 24: get_mm_rss() returns unsigned long
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass "fd" directly, not via pointer -- one less memory read.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rcu_read_lock()/rcu_read_unlock() is nop for TINY_RCU, but is not a nop
for, say, PREEMPT_RCU.
proc_fill_cache() is called without RCU lock, there is no need to
lock/unlock on error path, simply jump out of the loop.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_for_maps() is a simple wrapper for mm_access(), and the name is
misleading, so just remove it and use mm_access() directly.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to e268337dfe ("proc: clean up and fix /proc/<pid>/mem
handling"), move the check of permission to open(), this will simplify
read() code.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In embedded systems, sometimes the same program (busybox) is the cause of
multiple warnings. Outputting the pid with the program name in the
warning printk helps distinguish which instances of a program are using
the stack most.
This is a small patch, but useful.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8f92054e7c ("CRED: Fix __task_cred()'s lockdep check and banner
comment"):
add the following validation condition:
task->exit_state >= 0
to permit the access if the target task is dead and therefore
unable to change its own credentials.
OK, but afaics currently this can only help wait_task_zombie() which calls
__task_cred() without rcu lock.
Remove this validation and change wait_task_zombie() to use task_uid()
instead. This means we do rcu_read_lock() only to shut up the lockdep,
but we already do the same in, say, wait_task_stopped().
task_is_dead() should die, task->exit_state != 0 means that this task has
passed exit_notify(), only do_wait-like code paths should use this.
Unfortunately, we can't kill task_is_dead() right now, it has already
acquired buggy users in drivers/staging. The fix already exists.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Warning(kernel/kmod.c:419): No description found for parameter 'depth'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
call_usermodehelper_setup
call_usermodehelper_setfns
call_usermodehelper_exec
And make all of them static to kmod.c
Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns(). So we loose the call to
_fns but gain 3 calls to the helpers. (Not that it matters)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
call_usermodehelper_freeinfo() is not used outside of kmod.c. So unexport
it, and make it static to kmod.c
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently FAT file-system maps the VFS "superblock" abstraction to the
FSINFO block. The FSINFO block contains non-essential data about the
amount of free clusters and the next free cluster. FAT file-system can
always find out this information by scanning the FAT table, but having it
in the FSINFO block may speed things up sometimes. So FAT file-system
relies on the VFS superblock write-out services to make sure the FSINFO
block is written out to the media from time to time.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds
and writes out all dirty superblock using the '->write_super()' call-back.
But the problem with this thread is that it wastes power by waking up the
system every 5 seconds no matter what. So we want to kill it completely
and thus, we need to make file-systems to stop using the '->write_super'
VFS service, and then remove it together with the kernel thread.
This patch switches the FAT FSINFO block management from
'->write_super()'/'->s_dirt' to 'fsinfo_inode'/'->write_inode'. Now,
instead of setting the 's_dirt' flag, we just mark the special
'fsinfo_inode' inode as dirty and let VFS invoke the '->write_inode'
call-back when needed, where we write-out the FSINFO block.
This patch also makes sure we do not mark the 'fsinfo_inode' inode as
dirty if we are not FAT32 (FAT16 and FAT12 do not have the FSINFO block)
or if we are in R/O mode.
As a bonus, we can also remove the '->sync_fs()' and '->write_super()' FAT
call-back function because they become unneeded.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Preparation for further changes. It touches few functions in fatent.c and
prevents them from marking the superblock as dirty unnecessarily often.
Namely, instead of marking it as dirty in the internal tight loops - do it
only once at the end of the functions. And instead of marking it as dirty
while holding the FAT table lock, do it outside the lock.
The reason for this patch is that marking the superblock as dirty will
soon become a little bit heavier operation, so it is cleaner to do this
only when it is necessary.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A preparation patch which introduces a 'mark_fsinfo_dirty()' helper
function which just sets the 's_dirt' flag to 1 so far. I'll add more
code to this helper later, so I do not mark it as 'inline'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is patchset makes fatfs stop using the VFS '->write_super()' method
for writing out the FSINFO block.
The final goal is to get rid of the 'sync_supers()' kernel thread. This
kernel thread wakes up every 5 seconds (by default) and calls
'->write_super()' for all mounted file-systems. And the bad thing is that
this is done even if all the superblocks are clean. Moreover, some
file-systems do not even need this end they do not register the
'->write_super()' method at all (e.g., btrfs).
So 'sync_supers()' most often just generates useless wake-ups and wastes
power. I am trying to make all file-systems independent of
'->write_super()' and plan to remove 'sync_supers()' and '->write_super'
completely once there are no more users.
The '->write_supers()' method is mostly used by baroque file-systems like
hfs, udf, etc. Modern file-systems like btrfs and xfs do not use it.
This justifies removing this stuff from VFS completely and make every FS
self-manage own superblock.
Tested with xfstests.
This patch:
Preparation for further changes. It introduces a special inode
('fsinfo_inode') in FAT file-system which we'll later use for managing the
FSINFO block. Note, this there is already one special inode ('fat_inode')
which is used for managing the FAT tables.
Introduce new 'MSDOS_FSINFO_INO' constant for this special inode. It is
safe to do because FAT file-system does not store inode numbers on the
media but generates them run-time.
I've also cleaned up the comment to existing 'MSDOS_ROOT_INO' constant,
while on it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PRINTK() macro isn't really used. Let's just remove it because it
is ugly and out of date.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two cases that the cache flush is needed to avoid data loss
against unexpected hang or power failure. One is sync file function (i.e.
nilfs_sync_file) and another is checkpointing ioctl.
This issues a cache flush request to device for such cases if barrier
mount option is enabled, and makes sure data really is on persistent
storage on their completion.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As described in commit 07d106d0a3 ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand. Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.
This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL
when passed an unknown ioctl command.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The init/mount.o source files produce a number of sparse warnings of the
type:
warning: incorrect type in argument 1 (different address spaces)
expected char [noderef] <asn:1>*dev_name
got char *name
This is due to the syscalls expecting some of the arguments to be user
pointers but they are being passed as kernel pointers. This is harmless
but adds a lot of noise to a sparse build.
To limit the noise just disable the sparse checking in the relevant source
files, but still display a warning so that the user knows this has been
done.
Since the sparse checking has been disabled we can also remove the __user
__force casts that are scattered thru the source.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggest the shorter pr_<level> instead of printk(KERN_<LEVEL>.
Prefer to use pr_<level> over bare printks.
Prefer to use pr_warn over pr_warning.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Requires --strict option during invocation:
~/linux$ scripts/checkpatch --strict foo.patch
This tests for a bad habits of mine like this:
return 0 ;
Note that it does allow a special case of a bare semicolon
for empty loops:
while (foo())
;
Signed-off-by: Eric Nelson <eric.nelson@boundarydevices.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previous code was using optimizations which were developed to work well
even on narrow-word CPUs (by today's standards). But Linux runs only on
32-bit and wider CPUs. We can use that.
First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
divide by 10 much larger numbers, and thus we can print groups of 9 digits
instead of groups of 5 digits.
Next: there are two algorithms to print larger numbers. One is generic:
divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
It's conceptually simple, but requires an (unsigned long long) /
1000000000 division.
Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
manipulates them cleverly and generates groups of 4 decimal digits. It so
happens that it does NOT require long long division.
If long is > 32 bits, division of 64-bit values is relatively easy, and we
will use the first algorithm. If long long is > 64 bits (strange
architecture with VERY large long long), second algorithm can't be used,
and we again use the first one.
Else (if long is 32 bits and long long is 64 bits) we use second one.
And third: there is a simple optimization which takes fast path not only
for zero as was done before, but for all one-digit numbers.
In all tested cases new code is faster than old one, in many cases by 30%,
in few cases by more than 50% (for example, on x86-32, conversion of
12345678). Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
case.
This patch is based upon an original from Michal Nazarewicz.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The '%p' output of the kernel's vsprintf() uses spec.field_width to
determine how many digits to output based on 2 * sizeof(void*) so that all
digits of a pointer are shown. ie. a pointer will be output as
"001A2B3C" instead of "1A2B3C". However, if the '#' flag is used in the
format (%#p), then the code doesn't take into account the width of the
'0x' prefix and will end up outputing "0x1A2B3C" instead of "0x001A2B3C".
This patch reworks the "pointer()" format hook to include 2 characters for
the '0x' prefix if the '#' flag is included.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the module-wide pr_fmt() mechanism rather than open-coding "genirq: "
everywhere.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sethostname() and setdomainname() notify userspace on failure (without
modifying uts_kern_table). Change things so that we only notify userspace
on success, when uts_kern_table was actually modified.
Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: WANG Cong <amwang@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it
wasn't converted by 44c10138fd ("PCI: Change all drivers to use
pci_device->revision").
In one case, it even reads PCI revision ID without using it -- that code
is now removed...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: "Nandigama, Nagalakshmi" <Nagalakshmi.Nandigama@lsi.com>
Cc: Eric Moore <eric.moore@lsi.com>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the comment of allocate_resource(), the explanation of parameter max
and min is not correct.
Actually, these two parameters are used to specify the range of the
resource that will be allocated, not the min/max size that will be
allocated.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ULONG_MAX is often used to check for integer overflow when calculating
allocation size. While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.
This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Alex Elder <elder@dreamhost.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the new kmalloc_array() to the list of general-purpose memory
allocators in chapter 14.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit d065bd810b ("mm: retry page fault when blocking on disk
transfer") and commit 37b23e0525 ("x86,mm: make pagefault killable")
introduced changes into the x86 pagefault handler for making the page
fault handler retryable as well as killable.
These changes reduce the mmap_sem hold time, which is crucial during OOM
killer invocation.
Port these changes to um.
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allocation may be large. The code is probing to see if it will
succeed and if not, it falls back to vmalloc(). We should suppress any
page-allocation failure messages when the fallback happens.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>