linux_dsm_epyc7002/fs/autofs4/dev-ioctl.c

763 lines
18 KiB
C
Raw Normal View History

/*
* Copyright 2008 Red Hat, Inc. All rights reserved.
* Copyright 2008 Ian Kent <raven@themaw.net>
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*/
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/miscdevice.h>
#include <linux/init.h>
#include <linux/wait.h>
#include <linux/namei.h>
#include <linux/fcntl.h>
#include <linux/file.h>
#include <linux/fdtable.h>
#include <linux/sched.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
#include <linux/magic.h>
#include <linux/dcache.h>
#include <linux/uaccess.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
#include <linux/slab.h>
#include "autofs_i.h"
/*
* This module implements an interface for routing autofs ioctl control
* commands via a miscellaneous device file.
*
* The alternate interface is needed because we need to be able open
* an ioctl file descriptor on an autofs mount that may be covered by
* another mount. This situation arises when starting automount(8)
* or other user space daemon which uses direct mounts or offset
* mounts (used for autofs lazy mount/umount of nested mount trees),
* which have been left busy at at service shutdown.
*/
#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl)
typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *,
struct autofs_dev_ioctl *);
static int check_name(const char *name)
{
if (!strchr(name, '/'))
return -EINVAL;
return 0;
}
/*
* Check a string doesn't overrun the chunk of
* memory we copied from user land.
*/
static int invalid_str(char *str, size_t size)
{
if (memchr(str, 0, size))
return 0;
return -EINVAL;
}
/*
* Check that the user compiled against correct version of autofs
* misc device code.
*
* As well as checking the version compatibility this always copies
* the kernel interface version out.
*/
static int check_dev_ioctl_version(int cmd, struct autofs_dev_ioctl *param)
{
int err = 0;
if ((AUTOFS_DEV_IOCTL_VERSION_MAJOR != param->ver_major) ||
(AUTOFS_DEV_IOCTL_VERSION_MINOR < param->ver_minor)) {
AUTOFS_WARN("ioctl control interface version mismatch: "
"kernel(%u.%u), user(%u.%u), cmd(%d)",
AUTOFS_DEV_IOCTL_VERSION_MAJOR,
AUTOFS_DEV_IOCTL_VERSION_MINOR,
param->ver_major, param->ver_minor, cmd);
err = -EINVAL;
}
/* Fill in the kernel version. */
param->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
param->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
return err;
}
/*
* Copy parameter control struct, including a possible path allocated
* at the end of the struct.
*/
static struct autofs_dev_ioctl *copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
{
struct autofs_dev_ioctl tmp;
if (copy_from_user(&tmp, in, sizeof(tmp)))
return ERR_PTR(-EFAULT);
if (tmp.size < sizeof(tmp))
return ERR_PTR(-EINVAL);
return memdup_user(in, tmp.size);
}
static inline void free_dev_ioctl(struct autofs_dev_ioctl *param)
{
kfree(param);
return;
}
/*
* Check sanity of parameter control fields and if a path is present
* check that it is terminated and contains at least one "/".
*/
static int validate_dev_ioctl(int cmd, struct autofs_dev_ioctl *param)
{
int err;
err = check_dev_ioctl_version(cmd, param);
if (err) {
AUTOFS_WARN("invalid device control module version "
"supplied for cmd(0x%08x)", cmd);
goto out;
}
if (param->size > sizeof(*param)) {
err = invalid_str(param->path, param->size - sizeof(*param));
if (err) {
AUTOFS_WARN(
"path string terminator missing for cmd(0x%08x)",
cmd);
goto out;
}
err = check_name(param->path);
if (err) {
AUTOFS_WARN("invalid path supplied for cmd(0x%08x)",
cmd);
goto out;
}
}
err = 0;
out:
return err;
}
/*
* Get the autofs super block info struct from the file opened on
* the autofs mount point.
*/
static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f)
{
struct autofs_sb_info *sbi = NULL;
struct inode *inode;
if (f) {
inode = f->f_path.dentry->d_inode;
sbi = autofs4_sbi(inode->i_sb);
}
return sbi;
}
/* Return autofs module protocol version */
static int autofs_dev_ioctl_protover(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
param->protover.version = sbi->version;
return 0;
}
/* Return autofs module protocol sub version */
static int autofs_dev_ioctl_protosubver(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
param->protosubver.sub_version = sbi->sub_version;
return 0;
}
static int find_autofs_mount(const char *pathname,
struct path *res,
int test(struct path *path, void *data),
void *data)
{
struct path path;
int err = kern_path(pathname, 0, &path);
if (err)
return err;
err = -ENOENT;
while (path.dentry == path.mnt->mnt_root) {
if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) {
if (test(&path, data)) {
path_get(&path);
if (!err) /* already found some */
path_put(res);
*res = path;
err = 0;
}
}
if (!follow_up(&path))
break;
}
path_put(&path);
return err;
}
static int test_by_dev(struct path *path, void *p)
{
return path->dentry->d_sb->s_dev == *(dev_t *)p;
}
static int test_by_type(struct path *path, void *p)
{
struct autofs_info *ino = autofs4_dentry_ino(path->dentry);
return ino && ino->sbi->type & *(unsigned *)p;
}
static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file)
{
struct files_struct *files = current->files;
struct fdtable *fdt;
spin_lock(&files->file_lock);
fdt = files_fdtable(files);
BUG_ON(fdt->fd[fd] != NULL);
rcu_assign_pointer(fdt->fd[fd], file);
FD_SET(fd, fdt->close_on_exec);
spin_unlock(&files->file_lock);
}
/*
* Open a file descriptor on the autofs mount point corresponding
* to the given path and device number (aka. new_encode_dev(sb->s_dev)).
*/
static int autofs_dev_ioctl_open_mountpoint(const char *name, dev_t devid)
{
int err, fd;
fd = get_unused_fd();
if (likely(fd >= 0)) {
struct file *filp;
struct path path;
err = find_autofs_mount(name, &path, test_by_dev, &devid);
if (err)
goto out;
/*
* Find autofs super block that has the device number
* corresponding to the autofs fs we want to open.
*/
filp = dentry_open(path.dentry, path.mnt, O_RDONLY,
current_cred());
if (IS_ERR(filp)) {
err = PTR_ERR(filp);
goto out;
}
autofs_dev_ioctl_fd_install(fd, filp);
}
return fd;
out:
put_unused_fd(fd);
return err;
}
/* Open a file descriptor on an autofs mount point */
static int autofs_dev_ioctl_openmount(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
const char *path;
dev_t devid;
int err, fd;
/* param->path has already been checked */
if (!param->openmount.devid)
return -EINVAL;
param->ioctlfd = -1;
path = param->path;
devid = new_decode_dev(param->openmount.devid);
err = 0;
fd = autofs_dev_ioctl_open_mountpoint(path, devid);
if (unlikely(fd < 0)) {
err = fd;
goto out;
}
param->ioctlfd = fd;
out:
return err;
}
/* Close file descriptor allocated above (user can also use close(2)). */
static int autofs_dev_ioctl_closemount(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
return sys_close(param->ioctlfd);
}
/*
* Send "ready" status for an existing wait (either a mount or an expire
* request).
*/
static int autofs_dev_ioctl_ready(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
autofs_wqt_t token;
token = (autofs_wqt_t) param->ready.token;
return autofs4_wait_release(sbi, token, 0);
}
/*
* Send "fail" status for an existing wait (either a mount or an expire
* request).
*/
static int autofs_dev_ioctl_fail(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
autofs_wqt_t token;
int status;
token = (autofs_wqt_t) param->fail.token;
status = param->fail.status ? param->fail.status : -ENOENT;
return autofs4_wait_release(sbi, token, status);
}
/*
* Set the pipe fd for kernel communication to the daemon.
*
* Normally this is set at mount using an option but if we
* are reconnecting to a busy mount then we need to use this
* to tell the autofs mount about the new kernel pipe fd. In
* order to protect mounts against incorrectly setting the
* pipefd we also require that the autofs mount be catatonic.
*
* This also sets the process group id used to identify the
* controlling process (eg. the owning automount(8) daemon).
*/
static int autofs_dev_ioctl_setpipefd(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
int pipefd;
int err = 0;
if (param->setpipefd.pipefd == -1)
return -EINVAL;
pipefd = param->setpipefd.pipefd;
mutex_lock(&sbi->wq_mutex);
if (!sbi->catatonic) {
mutex_unlock(&sbi->wq_mutex);
return -EBUSY;
} else {
struct file *pipe = fget(pipefd);
if (!pipe) {
err = -EBADF;
goto out;
}
if (!pipe->f_op || !pipe->f_op->write) {
err = -EPIPE;
fput(pipe);
goto out;
}
sbi->oz_pgrp = task_pgrp_nr(current);
sbi->pipefd = pipefd;
sbi->pipe = pipe;
sbi->catatonic = 0;
}
out:
mutex_unlock(&sbi->wq_mutex);
return err;
}
/*
* Make the autofs mount point catatonic, no longer responsive to
* mount requests. Also closes the kernel pipe file descriptor.
*/
static int autofs_dev_ioctl_catatonic(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
autofs4_catatonic_mode(sbi);
return 0;
}
/* Set the autofs mount timeout */
static int autofs_dev_ioctl_timeout(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
unsigned long timeout;
timeout = param->timeout.timeout;
param->timeout.timeout = sbi->exp_timeout / HZ;
sbi->exp_timeout = timeout * HZ;
return 0;
}
/*
* Return the uid and gid of the last request for the mount
*
* When reconstructing an autofs mount tree with active mounts
* we need to re-connect to mounts that may have used the original
* process uid and gid (or string variations of them) for mount
* lookups within the map entry.
*/
static int autofs_dev_ioctl_requester(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
struct autofs_info *ino;
struct path path;
dev_t devid;
int err = -ENOENT;
if (param->size <= sizeof(*param)) {
err = -EINVAL;
goto out;
}
devid = sbi->sb->s_dev;
param->requester.uid = param->requester.gid = -1;
err = find_autofs_mount(param->path, &path, test_by_dev, &devid);
if (err)
goto out;
ino = autofs4_dentry_ino(path.dentry);
if (ino) {
err = 0;
autofs4_expire_wait(path.dentry);
spin_lock(&sbi->fs_lock);
param->requester.uid = ino->uid;
param->requester.gid = ino->gid;
spin_unlock(&sbi->fs_lock);
}
path_put(&path);
out:
return err;
}
/*
* Call repeatedly until it returns -EAGAIN, meaning there's nothing
* more that can be done.
*/
static int autofs_dev_ioctl_expire(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
struct vfsmount *mnt;
int how;
how = param->expire.how;
mnt = fp->f_path.mnt;
return autofs4_do_expire_multi(sbi->sb, mnt, sbi, how);
}
/* Check if autofs mount point is in use */
static int autofs_dev_ioctl_askumount(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
param->askumount.may_umount = 0;
if (may_umount(fp->f_path.mnt))
param->askumount.may_umount = 1;
return 0;
}
/*
* Check if the given path is a mountpoint.
*
* If we are supplied with the file descriptor of an autofs
* mount we're looking for a specific mount. In this case
* the path is considered a mountpoint if it is itself a
* mountpoint or contains a mount, such as a multi-mount
* without a root mount. In this case we return 1 if the
* path is a mount point and the super magic of the covering
* mount if there is one or 0 if it isn't a mountpoint.
*
* If we aren't supplied with a file descriptor then we
* lookup the nameidata of the path and check if it is the
* root of a mount. If a type is given we are looking for
* a particular autofs mount and if we don't find a match
* we return fail. If the located nameidata path is the
* root of a mount we return 1 along with the super magic
* of the mount or 0 otherwise.
*
* In both cases the the device number (as returned by
* new_encode_dev()) is also returned.
*/
static int autofs_dev_ioctl_ismountpoint(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
struct path path;
const char *name;
unsigned int type;
unsigned int devid, magic;
int err = -ENOENT;
if (param->size <= sizeof(*param)) {
err = -EINVAL;
goto out;
}
name = param->path;
type = param->ismountpoint.in.type;
param->ismountpoint.out.devid = devid = 0;
param->ismountpoint.out.magic = magic = 0;
if (!fp || param->ioctlfd == -1) {
if (autofs_type_any(type))
err = kern_path(name, LOOKUP_FOLLOW, &path);
else
err = find_autofs_mount(name, &path, test_by_type, &type);
if (err)
goto out;
devid = new_encode_dev(path.dentry->d_sb->s_dev);
err = 0;
if (path.mnt->mnt_root == path.dentry) {
err = 1;
magic = path.dentry->d_sb->s_magic;
}
} else {
dev_t dev = sbi->sb->s_dev;
err = find_autofs_mount(name, &path, test_by_dev, &dev);
if (err)
goto out;
devid = new_encode_dev(dev);
err = have_submounts(path.dentry);
Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 01:45:26 +07:00
if (follow_down_one(&path))
magic = path.dentry->d_sb->s_magic;
}
param->ismountpoint.out.devid = devid;
param->ismountpoint.out.magic = magic;
path_put(&path);
out:
return err;
}
/*
* Our range of ioctl numbers isn't 0 based so we need to shift
* the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table
* lookup.
*/
#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST))
static ioctl_fn lookup_dev_ioctl(unsigned int cmd)
{
static struct {
int cmd;
ioctl_fn fn;
} _ioctls[] = {
{cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL},
{cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD),
autofs_dev_ioctl_protover},
{cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD),
autofs_dev_ioctl_protosubver},
{cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD),
autofs_dev_ioctl_openmount},
{cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD),
autofs_dev_ioctl_closemount},
{cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD),
autofs_dev_ioctl_ready},
{cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD),
autofs_dev_ioctl_fail},
{cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD),
autofs_dev_ioctl_setpipefd},
{cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD),
autofs_dev_ioctl_catatonic},
{cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD),
autofs_dev_ioctl_timeout},
{cmd_idx(AUTOFS_DEV_IOCTL_REQUESTER_CMD),
autofs_dev_ioctl_requester},
{cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD),
autofs_dev_ioctl_expire},
{cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD),
autofs_dev_ioctl_askumount},
{cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD),
autofs_dev_ioctl_ismountpoint}
};
unsigned int idx = cmd_idx(cmd);
return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn;
}
/* ioctl dispatcher */
static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user)
{
struct autofs_dev_ioctl *param;
struct file *fp;
struct autofs_sb_info *sbi;
unsigned int cmd_first, cmd;
ioctl_fn fn = NULL;
int err = 0;
/* only root can play with this */
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) ||
cmd - cmd_first >= AUTOFS_DEV_IOCTL_IOC_COUNT) {
return -ENOTTY;
}
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
return PTR_ERR(param);
err = validate_dev_ioctl(command, param);
if (err)
goto out;
/* The validate routine above always sets the version */
if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD)
goto done;
fn = lookup_dev_ioctl(cmd);
if (!fn) {
AUTOFS_WARN("unknown command 0x%08x", command);
return -ENOTTY;
}
fp = NULL;
sbi = NULL;
/*
* For obvious reasons the openmount can't have a file
* descriptor yet. We don't take a reference to the
* file during close to allow for immediate release.
*/
if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD &&
cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) {
fp = fget(param->ioctlfd);
if (!fp) {
if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD)
goto cont;
err = -EBADF;
goto out;
}
if (!fp->f_op) {
err = -ENOTTY;
fput(fp);
goto out;
}
sbi = autofs_dev_ioctl_sbi(fp);
if (!sbi || sbi->magic != AUTOFS_SBI_MAGIC) {
err = -EINVAL;
fput(fp);
goto out;
}
/*
* Admin needs to be able to set the mount catatonic in
* order to be able to perform the re-open.
*/
if (!autofs4_oz_mode(sbi) &&
cmd != AUTOFS_DEV_IOCTL_CATATONIC_CMD) {
err = -EACCES;
fput(fp);
goto out;
}
}
cont:
err = fn(fp, sbi, param);
if (fp)
fput(fp);
done:
if (err >= 0 && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE))
err = -EFAULT;
out:
free_dev_ioctl(param);
return err;
}
static long autofs_dev_ioctl(struct file *file, uint command, ulong u)
{
int err;
err = _autofs_dev_ioctl(command, (struct autofs_dev_ioctl __user *) u);
return (long) err;
}
#ifdef CONFIG_COMPAT
static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u)
{
return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u));
}
#else
#define autofs_dev_ioctl_compat NULL
#endif
static const struct file_operations _dev_ioctl_fops = {
.unlocked_ioctl = autofs_dev_ioctl,
.compat_ioctl = autofs_dev_ioctl_compat,
.owner = THIS_MODULE,
llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-08-15 23:52:59 +07:00
.llseek = noop_llseek,
};
static struct miscdevice _autofs_dev_ioctl_misc = {
driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20 23:07:20 +07:00
.minor = AUTOFS_MINOR,
.name = AUTOFS_DEVICE_NAME,
.fops = &_dev_ioctl_fops
};
driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20 23:07:20 +07:00
MODULE_ALIAS_MISCDEV(AUTOFS_MINOR);
MODULE_ALIAS("devname:autofs");
/* Register/deregister misc character device */
int autofs_dev_ioctl_init(void)
{
int r;
r = misc_register(&_autofs_dev_ioctl_misc);
if (r) {
AUTOFS_ERROR("misc_register failed for control device");
return r;
}
return 0;
}
void autofs_dev_ioctl_exit(void)
{
misc_deregister(&_autofs_dev_ioctl_misc);
return;
}