2017-11-07 22:59:23 +07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2005-04-17 05:20:36 +07:00
|
|
|
/*
|
2011-07-16 15:45:13 +07:00
|
|
|
* inode.c - part of debugfs, a tiny little debug file system
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
2019-07-03 14:16:53 +07:00
|
|
|
* Copyright (C) 2004,2019 Greg Kroah-Hartman <greg@kroah.com>
|
2005-04-17 05:20:36 +07:00
|
|
|
* Copyright (C) 2004 IBM Inc.
|
2019-07-03 14:16:53 +07:00
|
|
|
* Copyright (C) 2019 Linux Foundation <gregkh@linuxfoundation.org>
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* debugfs is for people to use instead of /proc or /sys.
|
2017-05-14 22:09:53 +07:00
|
|
|
* See ./Documentation/core-api/kernel-api.rst for more details.
|
2005-04-17 05:20:36 +07:00
|
|
|
*/
|
|
|
|
|
2019-07-03 14:16:52 +07:00
|
|
|
#define pr_fmt(fmt) "debugfs: " fmt
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/mount.h>
|
|
|
|
#include <linux/pagemap.h>
|
|
|
|
#include <linux/init.h>
|
2006-11-26 02:09:26 +07:00
|
|
|
#include <linux/kobject.h>
|
2005-04-17 05:20:36 +07:00
|
|
|
#include <linux/namei.h>
|
|
|
|
#include <linux/debugfs.h>
|
2006-11-25 01:45:37 +07:00
|
|
|
#include <linux/fsnotify.h>
|
2007-02-13 18:13:54 +07:00
|
|
|
#include <linux/string.h>
|
2012-01-25 17:52:28 +07:00
|
|
|
#include <linux/seq_file.h>
|
|
|
|
#include <linux/parser.h>
|
2008-10-08 01:00:12 +07:00
|
|
|
#include <linux/magic.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
|
|
|
#include <linux/slab.h>
|
debugfs: prevent access to possibly dead file_operations at file open
Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.
Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.
A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at
http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
("Yet another pipe related oops.")
Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.
Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
SRCU grace period after the dentry has been delete()'d and before they
return to their callers.
- Introduce an intermediate file_operations object named
"debugfs_open_proxy_file_operations". It's ->open() functions checks,
under the protection of a SRCU read lock, whether the dentry is still
alive, i.e. has not been d_delete()'d and if so, tries to acquire a
reference on the owning module.
On success, it sets the file object's ->f_op to the original
file_operations and forwards the ongoing open() call to the original
->open().
- For clarity, rename the former debugfs_file_operations to
debugfs_noop_file_operations -- they are in no way canonical.
The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.
Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:13 +07:00
|
|
|
|
|
|
|
#include "internal.h"
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2012-08-28 03:32:15 +07:00
|
|
|
#define DEBUGFS_DEFAULT_MODE 0700
|
2012-01-25 17:52:28 +07:00
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
static struct vfsmount *debugfs_mount;
|
|
|
|
static int debugfs_mount_count;
|
2009-03-23 05:10:44 +07:00
|
|
|
static bool debugfs_registered;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2015-01-26 02:36:18 +07:00
|
|
|
static struct inode *debugfs_get_inode(struct super_block *sb)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
struct inode *inode = new_inode(sb);
|
|
|
|
if (inode) {
|
2010-10-23 22:19:54 +07:00
|
|
|
inode->i_ino = get_next_ino();
|
2016-02-22 22:17:47 +07:00
|
|
|
inode->i_atime = inode->i_mtime =
|
2016-09-14 21:48:06 +07:00
|
|
|
inode->i_ctime = current_time(inode);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
2014-06-07 00:42:04 +07:00
|
|
|
return inode;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2012-01-25 17:52:28 +07:00
|
|
|
struct debugfs_mount_opts {
|
2012-04-04 04:01:31 +07:00
|
|
|
kuid_t uid;
|
|
|
|
kgid_t gid;
|
2012-01-25 17:52:28 +07:00
|
|
|
umode_t mode;
|
|
|
|
};
|
|
|
|
|
|
|
|
enum {
|
|
|
|
Opt_uid,
|
|
|
|
Opt_gid,
|
|
|
|
Opt_mode,
|
|
|
|
Opt_err
|
|
|
|
};
|
|
|
|
|
|
|
|
static const match_table_t tokens = {
|
|
|
|
{Opt_uid, "uid=%u"},
|
|
|
|
{Opt_gid, "gid=%u"},
|
|
|
|
{Opt_mode, "mode=%o"},
|
|
|
|
{Opt_err, NULL}
|
|
|
|
};
|
|
|
|
|
|
|
|
struct debugfs_fs_info {
|
|
|
|
struct debugfs_mount_opts mount_opts;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int debugfs_parse_options(char *data, struct debugfs_mount_opts *opts)
|
|
|
|
{
|
|
|
|
substring_t args[MAX_OPT_ARGS];
|
|
|
|
int option;
|
|
|
|
int token;
|
2012-04-04 04:01:31 +07:00
|
|
|
kuid_t uid;
|
|
|
|
kgid_t gid;
|
2012-01-25 17:52:28 +07:00
|
|
|
char *p;
|
|
|
|
|
|
|
|
opts->mode = DEBUGFS_DEFAULT_MODE;
|
|
|
|
|
|
|
|
while ((p = strsep(&data, ",")) != NULL) {
|
|
|
|
if (!*p)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
token = match_token(p, tokens, args);
|
|
|
|
switch (token) {
|
|
|
|
case Opt_uid:
|
|
|
|
if (match_int(&args[0], &option))
|
|
|
|
return -EINVAL;
|
2012-04-04 04:01:31 +07:00
|
|
|
uid = make_kuid(current_user_ns(), option);
|
|
|
|
if (!uid_valid(uid))
|
|
|
|
return -EINVAL;
|
|
|
|
opts->uid = uid;
|
2012-01-25 17:52:28 +07:00
|
|
|
break;
|
|
|
|
case Opt_gid:
|
2013-01-02 20:54:37 +07:00
|
|
|
if (match_int(&args[0], &option))
|
2012-01-25 17:52:28 +07:00
|
|
|
return -EINVAL;
|
2012-04-04 04:01:31 +07:00
|
|
|
gid = make_kgid(current_user_ns(), option);
|
|
|
|
if (!gid_valid(gid))
|
|
|
|
return -EINVAL;
|
|
|
|
opts->gid = gid;
|
2012-01-25 17:52:28 +07:00
|
|
|
break;
|
|
|
|
case Opt_mode:
|
|
|
|
if (match_octal(&args[0], &option))
|
|
|
|
return -EINVAL;
|
|
|
|
opts->mode = option & S_IALLUGO;
|
|
|
|
break;
|
|
|
|
/*
|
|
|
|
* We might like to report bad mount options here;
|
|
|
|
* but traditionally debugfs has ignored all mount options
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int debugfs_apply_options(struct super_block *sb)
|
|
|
|
{
|
|
|
|
struct debugfs_fs_info *fsi = sb->s_fs_info;
|
2015-03-18 05:25:59 +07:00
|
|
|
struct inode *inode = d_inode(sb->s_root);
|
2012-01-25 17:52:28 +07:00
|
|
|
struct debugfs_mount_opts *opts = &fsi->mount_opts;
|
|
|
|
|
|
|
|
inode->i_mode &= ~S_IALLUGO;
|
|
|
|
inode->i_mode |= opts->mode;
|
|
|
|
|
|
|
|
inode->i_uid = opts->uid;
|
|
|
|
inode->i_gid = opts->gid;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int debugfs_remount(struct super_block *sb, int *flags, char *data)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
struct debugfs_fs_info *fsi = sb->s_fs_info;
|
|
|
|
|
2014-03-13 21:14:33 +07:00
|
|
|
sync_filesystem(sb);
|
2012-01-25 17:52:28 +07:00
|
|
|
err = debugfs_parse_options(data, &fsi->mount_opts);
|
|
|
|
if (err)
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
debugfs_apply_options(sb);
|
|
|
|
|
|
|
|
fail:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int debugfs_show_options(struct seq_file *m, struct dentry *root)
|
|
|
|
{
|
|
|
|
struct debugfs_fs_info *fsi = root->d_sb->s_fs_info;
|
|
|
|
struct debugfs_mount_opts *opts = &fsi->mount_opts;
|
|
|
|
|
2012-04-04 04:01:31 +07:00
|
|
|
if (!uid_eq(opts->uid, GLOBAL_ROOT_UID))
|
|
|
|
seq_printf(m, ",uid=%u",
|
|
|
|
from_kuid_munged(&init_user_ns, opts->uid));
|
|
|
|
if (!gid_eq(opts->gid, GLOBAL_ROOT_GID))
|
|
|
|
seq_printf(m, ",gid=%u",
|
|
|
|
from_kgid_munged(&init_user_ns, opts->gid));
|
2012-01-25 17:52:28 +07:00
|
|
|
if (opts->mode != DEBUGFS_DEFAULT_MODE)
|
|
|
|
seq_printf(m, ",mode=%o", opts->mode);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-04-15 10:19:45 +07:00
|
|
|
static void debugfs_free_inode(struct inode *inode)
|
2015-02-22 10:05:11 +07:00
|
|
|
{
|
|
|
|
if (S_ISLNK(inode->i_mode))
|
2015-05-02 21:27:18 +07:00
|
|
|
kfree(inode->i_link);
|
2019-03-26 08:43:37 +07:00
|
|
|
free_inode_nonrcu(inode);
|
|
|
|
}
|
|
|
|
|
2012-01-25 17:52:28 +07:00
|
|
|
static const struct super_operations debugfs_super_operations = {
|
|
|
|
.statfs = simple_statfs,
|
|
|
|
.remount_fs = debugfs_remount,
|
|
|
|
.show_options = debugfs_show_options,
|
2019-04-15 10:19:45 +07:00
|
|
|
.free_inode = debugfs_free_inode,
|
2012-01-25 17:52:28 +07:00
|
|
|
};
|
|
|
|
|
2017-10-31 06:15:47 +07:00
|
|
|
static void debugfs_release_dentry(struct dentry *dentry)
|
|
|
|
{
|
debugfs: defer debugfs_fsdata allocation to first usage
Currently, __debugfs_create_file allocates one struct debugfs_fsdata
instance for every file created. However, there are potentially many
debugfs file around, most of which are never touched by userspace.
Thus, defer the allocations to the first usage, i.e. to the first
debugfs_file_get().
A dentry's ->d_fsdata starts out to point to the "real", user provided
fops. After a debugfs_fsdata instance has been allocated (and the real
fops pointer has been moved over into its ->real_fops member),
->d_fsdata is changed to point to it from then on. The two cases are
distinguished by setting BIT(0) for the real fops case.
struct debugfs_fsdata's foremost purpose is to track active users and to
make debugfs_remove() block until they are done. Since no debugfs_fsdata
instance means no active users, make debugfs_remove() return immediately
in this case.
Take care of possible races between debugfs_file_get() and
debugfs_remove(): either debugfs_remove() must see a debugfs_fsdata
instance and thus wait for possible active users or debugfs_file_get() must
see a dead dentry and return immediately.
Make a dentry's ->d_release(), i.e. debugfs_release_dentry(), check whether
->d_fsdata is actually a debugfs_fsdata instance before kfree()ing it.
Similarly, make debugfs_real_fops() check whether ->d_fsdata is actually
a debugfs_fsdata instance before returning it, otherwise emit a warning.
The set of possible error codes returned from debugfs_file_get() has grown
from -EIO to -EIO and -ENOMEM. Make open_proxy_open() and full_proxy_open()
pass the -ENOMEM onwards to their callers.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:54 +07:00
|
|
|
void *fsd = dentry->d_fsdata;
|
|
|
|
|
|
|
|
if (!((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT))
|
|
|
|
kfree(dentry->d_fsdata);
|
2017-10-31 06:15:47 +07:00
|
|
|
}
|
|
|
|
|
2015-01-26 03:10:32 +07:00
|
|
|
static struct vfsmount *debugfs_automount(struct path *path)
|
|
|
|
{
|
2017-02-01 00:06:16 +07:00
|
|
|
debugfs_automount_t f;
|
|
|
|
f = (debugfs_automount_t)path->dentry->d_fsdata;
|
|
|
|
return f(path->dentry, d_inode(path->dentry)->i_private);
|
2015-01-26 03:10:32 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct dentry_operations debugfs_dops = {
|
|
|
|
.d_delete = always_delete_dentry,
|
2017-10-31 06:15:47 +07:00
|
|
|
.d_release = debugfs_release_dentry,
|
2015-01-26 03:10:32 +07:00
|
|
|
.d_automount = debugfs_automount,
|
|
|
|
};
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
static int debug_fill_super(struct super_block *sb, void *data, int silent)
|
|
|
|
{
|
2017-03-26 11:15:37 +07:00
|
|
|
static const struct tree_descr debug_files[] = {{""}};
|
2012-01-25 17:52:28 +07:00
|
|
|
struct debugfs_fs_info *fsi;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
fsi = kzalloc(sizeof(struct debugfs_fs_info), GFP_KERNEL);
|
|
|
|
sb->s_fs_info = fsi;
|
|
|
|
if (!fsi) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = debugfs_parse_options(data, &fsi->mount_opts);
|
|
|
|
if (err)
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
err = simple_fill_super(sb, DEBUGFS_MAGIC, debug_files);
|
|
|
|
if (err)
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
sb->s_op = &debugfs_super_operations;
|
2015-01-26 03:10:32 +07:00
|
|
|
sb->s_d_op = &debugfs_dops;
|
2012-01-25 17:52:28 +07:00
|
|
|
|
|
|
|
debugfs_apply_options(sb);
|
|
|
|
|
|
|
|
return 0;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2012-01-25 17:52:28 +07:00
|
|
|
fail:
|
|
|
|
kfree(fsi);
|
|
|
|
sb->s_fs_info = NULL;
|
|
|
|
return err;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2010-07-25 04:48:30 +07:00
|
|
|
static struct dentry *debug_mount(struct file_system_type *fs_type,
|
[PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 16:02:57 +07:00
|
|
|
int flags, const char *dev_name,
|
2010-07-25 04:48:30 +07:00
|
|
|
void *data)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2010-07-25 04:48:30 +07:00
|
|
|
return mount_single(fs_type, flags, data, debug_fill_super);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct file_system_type debug_fs_type = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.name = "debugfs",
|
2010-07-25 04:48:30 +07:00
|
|
|
.mount = debug_mount,
|
2005-04-17 05:20:36 +07:00
|
|
|
.kill_sb = kill_litter_super,
|
|
|
|
};
|
2013-03-03 10:39:14 +07:00
|
|
|
MODULE_ALIAS_FS("debugfs");
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2017-02-01 05:53:17 +07:00
|
|
|
/**
|
|
|
|
* debugfs_lookup() - look up an existing debugfs file
|
|
|
|
* @name: a pointer to a string containing the name of the file to look up.
|
|
|
|
* @parent: a pointer to the parent dentry of the file.
|
|
|
|
*
|
|
|
|
* This function will return a pointer to a dentry if it succeeds. If the file
|
|
|
|
* doesn't exist or an error occurs, %NULL will be returned. The returned
|
|
|
|
* dentry must be passed to dput() when it is no longer needed.
|
|
|
|
*
|
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
|
|
|
* returned.
|
|
|
|
*/
|
|
|
|
struct dentry *debugfs_lookup(const char *name, struct dentry *parent)
|
|
|
|
{
|
|
|
|
struct dentry *dentry;
|
|
|
|
|
|
|
|
if (IS_ERR(parent))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (!parent)
|
|
|
|
parent = debugfs_mount->mnt_root;
|
|
|
|
|
2018-03-08 23:01:22 +07:00
|
|
|
dentry = lookup_one_len_unlocked(name, parent, strlen(name));
|
2017-02-01 05:53:17 +07:00
|
|
|
if (IS_ERR(dentry))
|
|
|
|
return NULL;
|
|
|
|
if (!d_really_is_positive(dentry)) {
|
|
|
|
dput(dentry);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
return dentry;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_lookup);
|
|
|
|
|
2015-01-26 01:55:55 +07:00
|
|
|
static struct dentry *start_creating(const char *name, struct dentry *parent)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2015-01-26 01:55:55 +07:00
|
|
|
struct dentry *dentry;
|
2012-06-10 07:33:28 +07:00
|
|
|
int error;
|
|
|
|
|
2019-07-03 14:16:52 +07:00
|
|
|
pr_debug("creating file '%s'\n", name);
|
2012-06-10 07:33:28 +07:00
|
|
|
|
2015-03-30 19:59:15 +07:00
|
|
|
if (IS_ERR(parent))
|
|
|
|
return parent;
|
|
|
|
|
2012-06-10 07:33:28 +07:00
|
|
|
error = simple_pin_fs(&debug_fs_type, &debugfs_mount,
|
|
|
|
&debugfs_mount_count);
|
2019-07-03 14:16:53 +07:00
|
|
|
if (error) {
|
|
|
|
pr_err("Unable to pin filesystem for file '%s'\n", name);
|
2015-01-26 01:55:55 +07:00
|
|
|
return ERR_PTR(error);
|
2019-07-03 14:16:53 +07:00
|
|
|
}
|
2005-04-17 05:20:36 +07:00
|
|
|
|
|
|
|
/* If the parent is not specified, we create it in the root.
|
2014-06-07 00:42:04 +07:00
|
|
|
* We need the root dentry to do this, which is in the super
|
2005-04-17 05:20:36 +07:00
|
|
|
* block. A pointer to that is in the struct vfsmount that we
|
|
|
|
* have around.
|
|
|
|
*/
|
2010-01-25 16:50:43 +07:00
|
|
|
if (!parent)
|
2011-12-08 06:21:57 +07:00
|
|
|
parent = debugfs_mount->mnt_root;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_lock(d_inode(parent));
|
2012-06-10 07:33:28 +07:00
|
|
|
dentry = lookup_one_len(name, parent, strlen(name));
|
2015-03-18 05:25:59 +07:00
|
|
|
if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
|
2012-06-10 07:33:28 +07:00
|
|
|
dput(dentry);
|
2019-07-03 14:16:53 +07:00
|
|
|
pr_err("File '%s' already present!\n", name);
|
2015-01-26 01:55:55 +07:00
|
|
|
dentry = ERR_PTR(-EEXIST);
|
|
|
|
}
|
2015-11-05 06:01:51 +07:00
|
|
|
|
|
|
|
if (IS_ERR(dentry)) {
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(parent));
|
2015-11-05 06:01:51 +07:00
|
|
|
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
|
|
|
|
}
|
|
|
|
|
2015-01-26 01:55:55 +07:00
|
|
|
return dentry;
|
|
|
|
}
|
|
|
|
|
2015-01-26 02:39:49 +07:00
|
|
|
static struct dentry *failed_creating(struct dentry *dentry)
|
2015-01-26 01:55:55 +07:00
|
|
|
{
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(dentry->d_parent));
|
2015-01-26 01:55:55 +07:00
|
|
|
dput(dentry);
|
2015-01-26 02:39:49 +07:00
|
|
|
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2015-01-26 02:39:49 +07:00
|
|
|
}
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2015-01-26 02:39:49 +07:00
|
|
|
static struct dentry *end_creating(struct dentry *dentry)
|
|
|
|
{
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(dentry->d_parent));
|
2012-06-10 07:28:22 +07:00
|
|
|
return dentry;
|
|
|
|
}
|
|
|
|
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
|
|
|
|
struct dentry *parent, void *data,
|
|
|
|
const struct file_operations *proxy_fops,
|
|
|
|
const struct file_operations *real_fops)
|
|
|
|
{
|
|
|
|
struct dentry *dentry;
|
|
|
|
struct inode *inode;
|
|
|
|
|
|
|
|
if (!(mode & S_IFMT))
|
|
|
|
mode |= S_IFREG;
|
|
|
|
BUG_ON(!S_ISREG(mode));
|
|
|
|
dentry = start_creating(name, parent);
|
|
|
|
|
debugfs: defer debugfs_fsdata allocation to first usage
Currently, __debugfs_create_file allocates one struct debugfs_fsdata
instance for every file created. However, there are potentially many
debugfs file around, most of which are never touched by userspace.
Thus, defer the allocations to the first usage, i.e. to the first
debugfs_file_get().
A dentry's ->d_fsdata starts out to point to the "real", user provided
fops. After a debugfs_fsdata instance has been allocated (and the real
fops pointer has been moved over into its ->real_fops member),
->d_fsdata is changed to point to it from then on. The two cases are
distinguished by setting BIT(0) for the real fops case.
struct debugfs_fsdata's foremost purpose is to track active users and to
make debugfs_remove() block until they are done. Since no debugfs_fsdata
instance means no active users, make debugfs_remove() return immediately
in this case.
Take care of possible races between debugfs_file_get() and
debugfs_remove(): either debugfs_remove() must see a debugfs_fsdata
instance and thus wait for possible active users or debugfs_file_get() must
see a dead dentry and return immediately.
Make a dentry's ->d_release(), i.e. debugfs_release_dentry(), check whether
->d_fsdata is actually a debugfs_fsdata instance before kfree()ing it.
Similarly, make debugfs_real_fops() check whether ->d_fsdata is actually
a debugfs_fsdata instance before returning it, otherwise emit a warning.
The set of possible error codes returned from debugfs_file_get() has grown
from -EIO to -EIO and -ENOMEM. Make open_proxy_open() and full_proxy_open()
pass the -ENOMEM onwards to their callers.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:54 +07:00
|
|
|
if (IS_ERR(dentry))
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return dentry;
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
|
|
|
|
inode = debugfs_get_inode(dentry->d_sb);
|
2019-07-03 14:16:53 +07:00
|
|
|
if (unlikely(!inode)) {
|
|
|
|
pr_err("out of free dentries, can not create file '%s'\n",
|
|
|
|
name);
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
return failed_creating(dentry);
|
2019-07-03 14:16:53 +07:00
|
|
|
}
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
|
|
|
|
inode->i_mode = mode;
|
|
|
|
inode->i_private = data;
|
|
|
|
|
|
|
|
inode->i_fop = proxy_fops;
|
debugfs: defer debugfs_fsdata allocation to first usage
Currently, __debugfs_create_file allocates one struct debugfs_fsdata
instance for every file created. However, there are potentially many
debugfs file around, most of which are never touched by userspace.
Thus, defer the allocations to the first usage, i.e. to the first
debugfs_file_get().
A dentry's ->d_fsdata starts out to point to the "real", user provided
fops. After a debugfs_fsdata instance has been allocated (and the real
fops pointer has been moved over into its ->real_fops member),
->d_fsdata is changed to point to it from then on. The two cases are
distinguished by setting BIT(0) for the real fops case.
struct debugfs_fsdata's foremost purpose is to track active users and to
make debugfs_remove() block until they are done. Since no debugfs_fsdata
instance means no active users, make debugfs_remove() return immediately
in this case.
Take care of possible races between debugfs_file_get() and
debugfs_remove(): either debugfs_remove() must see a debugfs_fsdata
instance and thus wait for possible active users or debugfs_file_get() must
see a dead dentry and return immediately.
Make a dentry's ->d_release(), i.e. debugfs_release_dentry(), check whether
->d_fsdata is actually a debugfs_fsdata instance before kfree()ing it.
Similarly, make debugfs_real_fops() check whether ->d_fsdata is actually
a debugfs_fsdata instance before returning it, otherwise emit a warning.
The set of possible error codes returned from debugfs_file_get() has grown
from -EIO to -EIO and -ENOMEM. Make open_proxy_open() and full_proxy_open()
pass the -ENOMEM onwards to their callers.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:54 +07:00
|
|
|
dentry->d_fsdata = (void *)((unsigned long)real_fops |
|
|
|
|
DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
|
|
|
|
d_instantiate(dentry, inode);
|
|
|
|
fsnotify_create(d_inode(dentry->d_parent), dentry);
|
|
|
|
return end_creating(dentry);
|
|
|
|
}
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_file - create a file in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the file to create.
|
2009-11-01 04:26:52 +07:00
|
|
|
* @mode: the permission that the file should have.
|
2005-04-17 05:20:36 +07:00
|
|
|
* @parent: a pointer to the parent dentry for this file. This should be a
|
2014-02-18 20:54:36 +07:00
|
|
|
* directory dentry if set. If this parameter is NULL, then the
|
2005-04-17 05:20:36 +07:00
|
|
|
* file will be created in the root of the debugfs filesystem.
|
|
|
|
* @data: a pointer to something that the caller will want to get to later
|
2006-09-27 15:50:46 +07:00
|
|
|
* on. The inode.i_private pointer will point to this value on
|
2005-04-17 05:20:36 +07:00
|
|
|
* the open() call.
|
|
|
|
* @fops: a pointer to a struct file_operations that should be used for
|
|
|
|
* this file.
|
|
|
|
*
|
|
|
|
* This is the basic "create a file" function for debugfs. It allows for a
|
2009-11-01 04:26:52 +07:00
|
|
|
* wide range of flexibility in creating a file, or a directory (if you want
|
|
|
|
* to create a directory, the debugfs_create_dir() function is
|
2005-04-17 05:20:36 +07:00
|
|
|
* recommended to be used instead.)
|
|
|
|
*
|
|
|
|
* This function will return a pointer to a dentry if it succeeds. This
|
|
|
|
* pointer must be passed to the debugfs_remove() function when the file is
|
|
|
|
* to be removed (no automatic cleanup happens if your module is unloaded,
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
* you are responsible here.) If an error occurs, %ERR_PTR(-ERROR) will be
|
|
|
|
* returned.
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
2006-07-20 22:16:42 +07:00
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
2007-02-14 13:57:47 +07:00
|
|
|
* returned.
|
2005-04-17 05:20:36 +07:00
|
|
|
*/
|
2011-07-24 15:33:43 +07:00
|
|
|
struct dentry *debugfs_create_file(const char *name, umode_t mode,
|
2005-04-17 05:20:36 +07:00
|
|
|
struct dentry *parent, void *data,
|
2006-03-28 16:56:41 +07:00
|
|
|
const struct file_operations *fops)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2015-01-26 02:31:32 +07:00
|
|
|
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
return __debugfs_create_file(name, mode, parent, data,
|
|
|
|
fops ? &debugfs_full_proxy_file_operations :
|
|
|
|
&debugfs_noop_file_operations,
|
|
|
|
fops);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_create_file);
|
debugfs: prevent access to possibly dead file_operations at file open
Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.
Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.
A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at
http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
("Yet another pipe related oops.")
Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.
Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
SRCU grace period after the dentry has been delete()'d and before they
return to their callers.
- Introduce an intermediate file_operations object named
"debugfs_open_proxy_file_operations". It's ->open() functions checks,
under the protection of a SRCU read lock, whether the dentry is still
alive, i.e. has not been d_delete()'d and if so, tries to acquire a
reference on the owning module.
On success, it sets the file object's ->f_op to the original
file_operations and forwards the ongoing open() call to the original
->open().
- For clarity, rename the former debugfs_file_operations to
debugfs_noop_file_operations -- they are in no way canonical.
The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.
Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:13 +07:00
|
|
|
|
2016-03-22 20:11:15 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_file_unsafe - create a file in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the file to create.
|
|
|
|
* @mode: the permission that the file should have.
|
|
|
|
* @parent: a pointer to the parent dentry for this file. This should be a
|
|
|
|
* directory dentry if set. If this parameter is NULL, then the
|
|
|
|
* file will be created in the root of the debugfs filesystem.
|
|
|
|
* @data: a pointer to something that the caller will want to get to later
|
|
|
|
* on. The inode.i_private pointer will point to this value on
|
|
|
|
* the open() call.
|
|
|
|
* @fops: a pointer to a struct file_operations that should be used for
|
|
|
|
* this file.
|
|
|
|
*
|
|
|
|
* debugfs_create_file_unsafe() is completely analogous to
|
|
|
|
* debugfs_create_file(), the only difference being that the fops
|
|
|
|
* handed it will not get protected against file removals by the
|
|
|
|
* debugfs core.
|
|
|
|
*
|
|
|
|
* It is your responsibility to protect your struct file_operation
|
2018-12-30 10:46:52 +07:00
|
|
|
* methods against file removals by means of debugfs_file_get()
|
|
|
|
* and debugfs_file_put(). ->open() is still protected by
|
2016-03-22 20:11:15 +07:00
|
|
|
* debugfs though.
|
|
|
|
*
|
|
|
|
* Any struct file_operations defined by means of
|
|
|
|
* DEFINE_DEBUGFS_ATTRIBUTE() is protected against file removals and
|
|
|
|
* thus, may be used here.
|
|
|
|
*/
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
struct dentry *debugfs_create_file_unsafe(const char *name, umode_t mode,
|
|
|
|
struct dentry *parent, void *data,
|
|
|
|
const struct file_operations *fops)
|
|
|
|
{
|
debugfs: prevent access to possibly dead file_operations at file open
Nothing prevents a dentry found by path lookup before a return of
__debugfs_remove() to actually get opened after that return. Now, after
the return of __debugfs_remove(), there are no guarantees whatsoever
regarding the memory the corresponding inode's file_operations object
had been kept in.
Since __debugfs_remove() is seldomly invoked, usually from module exit
handlers only, the race is hard to trigger and the impact is very low.
A discussion of the problem outlined above as well as a suggested
solution can be found in the (sub-)thread rooted at
http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
("Yet another pipe related oops.")
Basically, Greg KH suggests to introduce an intermediate fops and
Al Viro points out that a pointer to the original ones may be stored in
->d_fsdata.
Follow this line of reasoning:
- Add SRCU as a reverse dependency of DEBUG_FS.
- Introduce a srcu_struct object for the debugfs subsystem.
- In debugfs_create_file(), store a pointer to the original
file_operations object in ->d_fsdata.
- Make debugfs_remove() and debugfs_remove_recursive() wait for a
SRCU grace period after the dentry has been delete()'d and before they
return to their callers.
- Introduce an intermediate file_operations object named
"debugfs_open_proxy_file_operations". It's ->open() functions checks,
under the protection of a SRCU read lock, whether the dentry is still
alive, i.e. has not been d_delete()'d and if so, tries to acquire a
reference on the owning module.
On success, it sets the file object's ->f_op to the original
file_operations and forwards the ongoing open() call to the original
->open().
- For clarity, rename the former debugfs_file_operations to
debugfs_noop_file_operations -- they are in no way canonical.
The choice of SRCU over "normal" RCU is justified by the fact, that the
former may also be used to protect ->i_private data from going away
during the execution of a file's readers and writers which may (and do)
sleep.
Finally, introduce the fs/debugfs/internal.h header containing some
declarations internal to the debugfs implementation.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:13 +07:00
|
|
|
|
debugfs: prevent access to removed files' private data
Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
still be attempted to access associated private file data through
previously opened struct file objects. If that data has been freed by
the caller of debugfs_remove*() in the meanwhile, the reading/writing
process would either encounter a fault or, if the memory address in
question has been reassigned again, unrelated data structures could get
overwritten.
However, since debugfs files are seldomly removed, usually from module
exit handlers only, the impact is very low.
Currently, there are ~1000 call sites of debugfs_create_file() spread
throughout the whole tree and touching all of those struct file_operations
in order to make them file removal aware by means of checking the result of
debugfs_use_file_start() from within their methods is unfeasible.
Instead, wrap the struct file_operations by a lifetime managing proxy at
file open:
- In debugfs_create_file(), the original fops handed in has got stashed
away in ->d_fsdata already.
- In debugfs_create_file(), install a proxy file_operations factory,
debugfs_full_proxy_file_operations, at ->i_fop.
This proxy factory has got an ->open() method only. It carries out some
lifetime checks and if successful, dynamically allocates and sets up a new
struct file_operations proxy at ->f_op. Afterwards, it forwards to the
->open() of the original struct file_operations in ->d_fsdata, if any.
The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
set for each of the methods defined in the original struct file_operations
in ->d_fsdata.
Its ->release()er frees the proxy again and forwards to the original
->release(), if any.
In order not to mislead the VFS layer, it is strictly necessary to leave
those fields blank in the proxy that have been NULL in the original
struct file_operations also, i.e. aren't supported. This is why there is a
need for dynamically allocated proxies. The choice made not to allocate a
proxy instance for every dentry at file creation, but for every
struct file object instantiated thereof is justified by the expected usage
pattern of debugfs, namely that in general very few files get opened more
than once at a time.
The wrapper methods set in the struct file_operations implement lifetime
managing by means of the SRCU protection facilities already in place for
debugfs:
They set up a SRCU read side critical section and check whether the dentry
is still alive by means of debugfs_use_file_start(). If so, they forward
the call to the original struct file_operation stored in ->d_fsdata, still
under the protection of the SRCU read side critical section.
This SRCU read side critical section prevents any pending debugfs_remove()
and friends to return to their callers. Since a file's private data must
only be freed after the return of debugfs_remove(), the ongoing proxied
call is guarded against any file removal race.
If, on the other hand, the initial call to debugfs_use_file_start() detects
that the dentry is dead, the wrapper simply returns -EIO and does not
forward the call. Note that the ->poll() wrapper is special in that its
signature does not allow for the return of arbitrary -EXXX values and thus,
POLLHUP is returned here.
In order not to pollute debugfs with wrapper definitions that aren't ever
needed, I chose not to define a wrapper for every struct file_operations
method possible. Instead, a wrapper is defined only for the subset of
methods which are actually set by any debugfs users.
Currently, these are:
->llseek()
->read()
->write()
->unlocked_ioctl()
->poll()
The ->release() wrapper is special in that it does not protect the original
->release() in any way from dead files in order not to leak resources.
Thus, any ->release() handed to debugfs must implement file lifetime
management manually, if needed.
For only 33 out of a total of 434 releasers handed in to debugfs, it could
not be verified immediately whether they access data structures that might
have been freed upon a debugfs_remove() return in the meanwhile.
Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
allow any ->release() to manually implement file lifetime management.
For a set of common cases of struct file_operations implemented by the
debugfs_core itself, future patches will incorporate file lifetime
management directly within those in order to allow for their unproxied
operation. Rename the original, non-proxying "debugfs_create_file()" to
"debugfs_create_file_unsafe()" and keep it for future internal use by
debugfs itself. Factor out code common to both into the new
__debugfs_create_file().
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-22 20:11:14 +07:00
|
|
|
return __debugfs_create_file(name, mode, parent, data,
|
|
|
|
fops ? &debugfs_open_proxy_file_operations :
|
|
|
|
&debugfs_noop_file_operations,
|
|
|
|
fops);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
2016-03-22 20:11:15 +07:00
|
|
|
EXPORT_SYMBOL_GPL(debugfs_create_file_unsafe);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2015-01-22 03:03:40 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_file_size - create a file in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the file to create.
|
|
|
|
* @mode: the permission that the file should have.
|
|
|
|
* @parent: a pointer to the parent dentry for this file. This should be a
|
|
|
|
* directory dentry if set. If this parameter is NULL, then the
|
|
|
|
* file will be created in the root of the debugfs filesystem.
|
|
|
|
* @data: a pointer to something that the caller will want to get to later
|
|
|
|
* on. The inode.i_private pointer will point to this value on
|
|
|
|
* the open() call.
|
|
|
|
* @fops: a pointer to a struct file_operations that should be used for
|
|
|
|
* this file.
|
|
|
|
* @file_size: initial file size
|
|
|
|
*
|
|
|
|
* This is the basic "create a file" function for debugfs. It allows for a
|
|
|
|
* wide range of flexibility in creating a file, or a directory (if you want
|
|
|
|
* to create a directory, the debugfs_create_dir() function is
|
|
|
|
* recommended to be used instead.)
|
|
|
|
*
|
|
|
|
* This function will return a pointer to a dentry if it succeeds. This
|
|
|
|
* pointer must be passed to the debugfs_remove() function when the file is
|
|
|
|
* to be removed (no automatic cleanup happens if your module is unloaded,
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
* you are responsible here.) If an error occurs, %ERR_PTR(-ERROR) will be
|
|
|
|
* returned.
|
2015-01-22 03:03:40 +07:00
|
|
|
*
|
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
|
|
|
* returned.
|
|
|
|
*/
|
|
|
|
struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
|
|
|
|
struct dentry *parent, void *data,
|
|
|
|
const struct file_operations *fops,
|
|
|
|
loff_t file_size)
|
|
|
|
{
|
|
|
|
struct dentry *de = debugfs_create_file(name, mode, parent, data, fops);
|
|
|
|
|
|
|
|
if (de)
|
2015-03-18 05:25:59 +07:00
|
|
|
d_inode(de)->i_size = file_size;
|
2015-01-22 03:03:40 +07:00
|
|
|
return de;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_create_file_size);
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_dir - create a directory in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the directory to
|
|
|
|
* create.
|
|
|
|
* @parent: a pointer to the parent dentry for this file. This should be a
|
2014-02-18 20:54:36 +07:00
|
|
|
* directory dentry if set. If this parameter is NULL, then the
|
2005-04-17 05:20:36 +07:00
|
|
|
* directory will be created in the root of the debugfs filesystem.
|
|
|
|
*
|
|
|
|
* This function creates a directory in debugfs with the given name.
|
|
|
|
*
|
|
|
|
* This function will return a pointer to a dentry if it succeeds. This
|
|
|
|
* pointer must be passed to the debugfs_remove() function when the file is
|
|
|
|
* to be removed (no automatic cleanup happens if your module is unloaded,
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
* you are responsible here.) If an error occurs, %ERR_PTR(-ERROR) will be
|
|
|
|
* returned.
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
2006-07-20 22:16:42 +07:00
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
2007-02-14 13:57:47 +07:00
|
|
|
* returned.
|
2005-04-17 05:20:36 +07:00
|
|
|
*/
|
|
|
|
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
|
|
|
|
{
|
2015-01-26 02:02:31 +07:00
|
|
|
struct dentry *dentry = start_creating(name, parent);
|
2015-01-26 02:31:32 +07:00
|
|
|
struct inode *inode;
|
2015-01-26 02:02:31 +07:00
|
|
|
|
|
|
|
if (IS_ERR(dentry))
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return dentry;
|
2015-01-26 02:02:31 +07:00
|
|
|
|
2015-01-26 02:36:18 +07:00
|
|
|
inode = debugfs_get_inode(dentry->d_sb);
|
2019-07-03 14:16:53 +07:00
|
|
|
if (unlikely(!inode)) {
|
|
|
|
pr_err("out of free dentries, can not create directory '%s'\n",
|
|
|
|
name);
|
2015-01-26 02:39:49 +07:00
|
|
|
return failed_creating(dentry);
|
2019-07-03 14:16:53 +07:00
|
|
|
}
|
2015-01-26 02:31:32 +07:00
|
|
|
|
2018-06-13 10:52:16 +07:00
|
|
|
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
|
2015-01-26 02:36:18 +07:00
|
|
|
inode->i_op = &simple_dir_inode_operations;
|
|
|
|
inode->i_fop = &simple_dir_operations;
|
|
|
|
|
|
|
|
/* directory inodes start off with i_nlink == 2 (for "." entry) */
|
|
|
|
inc_nlink(inode);
|
2015-01-26 02:31:32 +07:00
|
|
|
d_instantiate(dentry, inode);
|
2015-03-18 05:25:59 +07:00
|
|
|
inc_nlink(d_inode(dentry->d_parent));
|
|
|
|
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
|
2015-01-26 02:39:49 +07:00
|
|
|
return end_creating(dentry);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_create_dir);
|
|
|
|
|
2015-01-26 03:10:32 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_automount - create automount point in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the file to create.
|
|
|
|
* @parent: a pointer to the parent dentry for this file. This should be a
|
|
|
|
* directory dentry if set. If this parameter is NULL, then the
|
|
|
|
* file will be created in the root of the debugfs filesystem.
|
|
|
|
* @f: function to be called when pathname resolution steps on that one.
|
|
|
|
* @data: opaque argument to pass to f().
|
|
|
|
*
|
|
|
|
* @f should return what ->d_automount() would.
|
|
|
|
*/
|
|
|
|
struct dentry *debugfs_create_automount(const char *name,
|
|
|
|
struct dentry *parent,
|
2017-02-01 00:06:16 +07:00
|
|
|
debugfs_automount_t f,
|
2015-01-26 03:10:32 +07:00
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
struct dentry *dentry = start_creating(name, parent);
|
|
|
|
struct inode *inode;
|
|
|
|
|
|
|
|
if (IS_ERR(dentry))
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return dentry;
|
2015-01-26 03:10:32 +07:00
|
|
|
|
|
|
|
inode = debugfs_get_inode(dentry->d_sb);
|
2019-07-03 14:16:53 +07:00
|
|
|
if (unlikely(!inode)) {
|
|
|
|
pr_err("out of free dentries, can not create automount '%s'\n",
|
|
|
|
name);
|
2015-01-26 03:10:32 +07:00
|
|
|
return failed_creating(dentry);
|
2019-07-03 14:16:53 +07:00
|
|
|
}
|
2015-01-26 03:10:32 +07:00
|
|
|
|
2016-03-09 22:18:07 +07:00
|
|
|
make_empty_dir_inode(inode);
|
2015-01-26 03:10:32 +07:00
|
|
|
inode->i_flags |= S_AUTOMOUNT;
|
|
|
|
inode->i_private = data;
|
|
|
|
dentry->d_fsdata = (void *)f;
|
2016-02-09 17:30:29 +07:00
|
|
|
/* directory inodes start off with i_nlink == 2 (for "." entry) */
|
|
|
|
inc_nlink(inode);
|
2015-01-26 03:10:32 +07:00
|
|
|
d_instantiate(dentry, inode);
|
2016-02-09 17:30:29 +07:00
|
|
|
inc_nlink(d_inode(dentry->d_parent));
|
|
|
|
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
|
2015-01-26 03:10:32 +07:00
|
|
|
return end_creating(dentry);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(debugfs_create_automount);
|
|
|
|
|
2007-02-13 18:13:54 +07:00
|
|
|
/**
|
|
|
|
* debugfs_create_symlink- create a symbolic link in the debugfs filesystem
|
|
|
|
* @name: a pointer to a string containing the name of the symbolic link to
|
|
|
|
* create.
|
|
|
|
* @parent: a pointer to the parent dentry for this symbolic link. This
|
2014-02-18 20:54:36 +07:00
|
|
|
* should be a directory dentry if set. If this parameter is NULL,
|
2007-02-13 18:13:54 +07:00
|
|
|
* then the symbolic link will be created in the root of the debugfs
|
|
|
|
* filesystem.
|
|
|
|
* @target: a pointer to a string containing the path to the target of the
|
|
|
|
* symbolic link.
|
|
|
|
*
|
|
|
|
* This function creates a symbolic link with the given name in debugfs that
|
|
|
|
* links to the given target path.
|
|
|
|
*
|
|
|
|
* This function will return a pointer to a dentry if it succeeds. This
|
|
|
|
* pointer must be passed to the debugfs_remove() function when the symbolic
|
|
|
|
* link is to be removed (no automatic cleanup happens if your module is
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
* unloaded, you are responsible here.) If an error occurs, %ERR_PTR(-ERROR)
|
|
|
|
* will be returned.
|
2007-02-13 18:13:54 +07:00
|
|
|
*
|
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
2007-02-14 13:57:47 +07:00
|
|
|
* returned.
|
2007-02-13 18:13:54 +07:00
|
|
|
*/
|
|
|
|
struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
|
|
|
|
const char *target)
|
|
|
|
{
|
2015-01-26 02:02:31 +07:00
|
|
|
struct dentry *dentry;
|
2015-01-26 02:31:32 +07:00
|
|
|
struct inode *inode;
|
|
|
|
char *link = kstrdup(target, GFP_KERNEL);
|
2007-02-13 18:13:54 +07:00
|
|
|
if (!link)
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2007-02-13 18:13:54 +07:00
|
|
|
|
2015-01-26 02:02:31 +07:00
|
|
|
dentry = start_creating(name, parent);
|
|
|
|
if (IS_ERR(dentry)) {
|
2007-02-13 18:13:54 +07:00
|
|
|
kfree(link);
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
return dentry;
|
2015-01-26 02:02:31 +07:00
|
|
|
}
|
|
|
|
|
2015-01-26 02:36:18 +07:00
|
|
|
inode = debugfs_get_inode(dentry->d_sb);
|
2015-01-26 02:31:32 +07:00
|
|
|
if (unlikely(!inode)) {
|
2019-07-03 14:16:53 +07:00
|
|
|
pr_err("out of free dentries, can not create symlink '%s'\n",
|
|
|
|
name);
|
2015-01-26 02:02:31 +07:00
|
|
|
kfree(link);
|
2015-01-26 02:39:49 +07:00
|
|
|
return failed_creating(dentry);
|
2015-01-26 02:31:32 +07:00
|
|
|
}
|
2015-01-26 02:36:18 +07:00
|
|
|
inode->i_mode = S_IFLNK | S_IRWXUGO;
|
2015-05-02 21:27:18 +07:00
|
|
|
inode->i_op = &simple_symlink_inode_operations;
|
|
|
|
inode->i_link = link;
|
2015-01-26 02:31:32 +07:00
|
|
|
d_instantiate(dentry, inode);
|
2015-01-26 02:39:49 +07:00
|
|
|
return end_creating(dentry);
|
2007-02-13 18:13:54 +07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_create_symlink);
|
|
|
|
|
debugfs: implement per-file removal protection
Since commit 49d200deaa68 ("debugfs: prevent access to removed files'
private data"), accesses to a file's private data are protected from
concurrent removal by covering all file_operations with a SRCU read section
and sychronizing with those before returning from debugfs_remove() by means
of synchronize_srcu().
As pointed out by Johannes Berg, there are debugfs files with forever
blocking file_operations. Their corresponding SRCU read side sections would
block any debugfs_remove() forever as well, even unrelated ones. This
results in a livelock. Because a remover can't cancel any indefinite
blocking within foreign files, this is a problem.
Resolve this by introducing support for more granular protection on a
per-file basis.
This is implemented by introducing an 'active_users' refcount_t to the
per-file struct debugfs_fsdata state. At file creation time, it is set to
one and a debugfs_remove() will drop that initial reference. The new
debugfs_file_get() and debugfs_file_put(), intended to be used in place of
former debugfs_use_file_start() and debugfs_use_file_finish(), increment
and decrement it respectively. Once the count drops to zero,
debugfs_file_put() will signal a completion which is possibly being waited
for from debugfs_remove().
Thus, as long as there is a debugfs_file_get() not yet matched by a
corresponding debugfs_file_put() around, debugfs_remove() will block.
Actual users of debugfs_use_file_start() and -finish() will get converted
to the new debugfs_file_get() and debugfs_file_put() by followup patches.
Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:48 +07:00
|
|
|
static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent)
|
|
|
|
{
|
|
|
|
struct debugfs_fsdata *fsd;
|
|
|
|
|
|
|
|
simple_unlink(d_inode(parent), dentry);
|
|
|
|
d_delete(dentry);
|
debugfs: defer debugfs_fsdata allocation to first usage
Currently, __debugfs_create_file allocates one struct debugfs_fsdata
instance for every file created. However, there are potentially many
debugfs file around, most of which are never touched by userspace.
Thus, defer the allocations to the first usage, i.e. to the first
debugfs_file_get().
A dentry's ->d_fsdata starts out to point to the "real", user provided
fops. After a debugfs_fsdata instance has been allocated (and the real
fops pointer has been moved over into its ->real_fops member),
->d_fsdata is changed to point to it from then on. The two cases are
distinguished by setting BIT(0) for the real fops case.
struct debugfs_fsdata's foremost purpose is to track active users and to
make debugfs_remove() block until they are done. Since no debugfs_fsdata
instance means no active users, make debugfs_remove() return immediately
in this case.
Take care of possible races between debugfs_file_get() and
debugfs_remove(): either debugfs_remove() must see a debugfs_fsdata
instance and thus wait for possible active users or debugfs_file_get() must
see a dead dentry and return immediately.
Make a dentry's ->d_release(), i.e. debugfs_release_dentry(), check whether
->d_fsdata is actually a debugfs_fsdata instance before kfree()ing it.
Similarly, make debugfs_real_fops() check whether ->d_fsdata is actually
a debugfs_fsdata instance before returning it, otherwise emit a warning.
The set of possible error codes returned from debugfs_file_get() has grown
from -EIO to -EIO and -ENOMEM. Make open_proxy_open() and full_proxy_open()
pass the -ENOMEM onwards to their callers.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:54 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Paired with the closing smp_mb() implied by a successful
|
|
|
|
* cmpxchg() in debugfs_file_get(): either
|
|
|
|
* debugfs_file_get() must see a dead dentry or we must see a
|
|
|
|
* debugfs_fsdata instance at ->d_fsdata here (or both).
|
|
|
|
*/
|
|
|
|
smp_mb();
|
|
|
|
fsd = READ_ONCE(dentry->d_fsdata);
|
|
|
|
if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
|
|
|
|
return;
|
debugfs: implement per-file removal protection
Since commit 49d200deaa68 ("debugfs: prevent access to removed files'
private data"), accesses to a file's private data are protected from
concurrent removal by covering all file_operations with a SRCU read section
and sychronizing with those before returning from debugfs_remove() by means
of synchronize_srcu().
As pointed out by Johannes Berg, there are debugfs files with forever
blocking file_operations. Their corresponding SRCU read side sections would
block any debugfs_remove() forever as well, even unrelated ones. This
results in a livelock. Because a remover can't cancel any indefinite
blocking within foreign files, this is a problem.
Resolve this by introducing support for more granular protection on a
per-file basis.
This is implemented by introducing an 'active_users' refcount_t to the
per-file struct debugfs_fsdata state. At file creation time, it is set to
one and a debugfs_remove() will drop that initial reference. The new
debugfs_file_get() and debugfs_file_put(), intended to be used in place of
former debugfs_use_file_start() and debugfs_use_file_finish(), increment
and decrement it respectively. Once the count drops to zero,
debugfs_file_put() will signal a completion which is possibly being waited
for from debugfs_remove().
Thus, as long as there is a debugfs_file_get() not yet matched by a
corresponding debugfs_file_put() around, debugfs_remove() will block.
Actual users of debugfs_use_file_start() and -finish() will get converted
to the new debugfs_file_get() and debugfs_file_put() by followup patches.
Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:48 +07:00
|
|
|
if (!refcount_dec_and_test(&fsd->active_users))
|
|
|
|
wait_for_completion(&fsd->active_users_drained);
|
|
|
|
}
|
|
|
|
|
2011-02-07 21:00:27 +07:00
|
|
|
static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
|
2008-07-01 20:14:51 +07:00
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
2015-05-18 21:10:34 +07:00
|
|
|
if (simple_positive(dentry)) {
|
2015-02-22 10:05:11 +07:00
|
|
|
dget(dentry);
|
debugfs: implement per-file removal protection
Since commit 49d200deaa68 ("debugfs: prevent access to removed files'
private data"), accesses to a file's private data are protected from
concurrent removal by covering all file_operations with a SRCU read section
and sychronizing with those before returning from debugfs_remove() by means
of synchronize_srcu().
As pointed out by Johannes Berg, there are debugfs files with forever
blocking file_operations. Their corresponding SRCU read side sections would
block any debugfs_remove() forever as well, even unrelated ones. This
results in a livelock. Because a remover can't cancel any indefinite
blocking within foreign files, this is a problem.
Resolve this by introducing support for more granular protection on a
per-file basis.
This is implemented by introducing an 'active_users' refcount_t to the
per-file struct debugfs_fsdata state. At file creation time, it is set to
one and a debugfs_remove() will drop that initial reference. The new
debugfs_file_get() and debugfs_file_put(), intended to be used in place of
former debugfs_use_file_start() and debugfs_use_file_finish(), increment
and decrement it respectively. Once the count drops to zero,
debugfs_file_put() will signal a completion which is possibly being waited
for from debugfs_remove().
Thus, as long as there is a debugfs_file_get() not yet matched by a
corresponding debugfs_file_put() around, debugfs_remove() will block.
Actual users of debugfs_use_file_start() and -finish() will get converted
to the new debugfs_file_get() and debugfs_file_put() by followup patches.
Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-31 06:15:48 +07:00
|
|
|
if (!d_is_reg(dentry)) {
|
|
|
|
if (d_is_dir(dentry))
|
|
|
|
ret = simple_rmdir(d_inode(parent), dentry);
|
|
|
|
else
|
|
|
|
simple_unlink(d_inode(parent), dentry);
|
|
|
|
if (!ret)
|
|
|
|
d_delete(dentry);
|
|
|
|
} else {
|
|
|
|
__debugfs_remove_file(dentry, parent);
|
|
|
|
}
|
2015-02-22 10:05:11 +07:00
|
|
|
dput(dentry);
|
2008-07-01 20:14:51 +07:00
|
|
|
}
|
2011-02-07 21:00:27 +07:00
|
|
|
return ret;
|
2008-07-01 20:14:51 +07:00
|
|
|
}
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
/**
|
|
|
|
* debugfs_remove - removes a file or directory from the debugfs filesystem
|
|
|
|
* @dentry: a pointer to a the dentry of the file or directory to be
|
2015-09-08 00:03:15 +07:00
|
|
|
* removed. If this parameter is NULL or an error value, nothing
|
|
|
|
* will be done.
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* This function removes a file or directory in debugfs that was previously
|
|
|
|
* created with a call to another debugfs function (like
|
2006-10-04 04:28:36 +07:00
|
|
|
* debugfs_create_file() or variants thereof.)
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* This function is required to be called in order for the file to be
|
|
|
|
* removed, no automatic cleanup of files will happen when a module is
|
|
|
|
* removed, you are responsible here.
|
|
|
|
*/
|
|
|
|
void debugfs_remove(struct dentry *dentry)
|
|
|
|
{
|
|
|
|
struct dentry *parent;
|
2011-02-07 21:00:27 +07:00
|
|
|
int ret;
|
|
|
|
|
2012-05-23 20:13:07 +07:00
|
|
|
if (IS_ERR_OR_NULL(dentry))
|
2005-04-17 05:20:36 +07:00
|
|
|
return;
|
|
|
|
|
|
|
|
parent = dentry->d_parent;
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_lock(d_inode(parent));
|
2011-02-07 21:00:27 +07:00
|
|
|
ret = __debugfs_remove(dentry, parent);
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(parent));
|
2011-02-07 21:00:27 +07:00
|
|
|
if (!ret)
|
|
|
|
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
|
2008-07-01 20:14:51 +07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_remove);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* debugfs_remove_recursive - recursively removes a directory
|
2015-09-08 00:03:15 +07:00
|
|
|
* @dentry: a pointer to a the dentry of the directory to be removed. If this
|
|
|
|
* parameter is NULL or an error value, nothing will be done.
|
2008-07-01 20:14:51 +07:00
|
|
|
*
|
|
|
|
* This function recursively removes a directory tree in debugfs that
|
|
|
|
* was previously created with a call to another debugfs function
|
|
|
|
* (like debugfs_create_file() or variants thereof.)
|
|
|
|
*
|
|
|
|
* This function is required to be called in order for the file to be
|
|
|
|
* removed, no automatic cleanup of files will happen when a module is
|
|
|
|
* removed, you are responsible here.
|
|
|
|
*/
|
|
|
|
void debugfs_remove_recursive(struct dentry *dentry)
|
|
|
|
{
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
struct dentry *child, *parent;
|
2008-07-01 20:14:51 +07:00
|
|
|
|
2012-05-23 20:13:07 +07:00
|
|
|
if (IS_ERR_OR_NULL(dentry))
|
2008-07-01 20:14:51 +07:00
|
|
|
return;
|
|
|
|
|
|
|
|
parent = dentry;
|
2013-07-26 22:12:56 +07:00
|
|
|
down:
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_lock(d_inode(parent));
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
loop:
|
|
|
|
/*
|
|
|
|
* The parent->d_subdirs is protected by the d_lock. Outside that
|
|
|
|
* lock, the child can be unlinked and set to be freed which can
|
|
|
|
* use the d_u.d_child as the rcu head and corrupt this list.
|
|
|
|
*/
|
|
|
|
spin_lock(&parent->d_lock);
|
2014-10-27 06:19:16 +07:00
|
|
|
list_for_each_entry(child, &parent->d_subdirs, d_child) {
|
2015-05-18 21:10:34 +07:00
|
|
|
if (!simple_positive(child))
|
2013-07-26 22:12:56 +07:00
|
|
|
continue;
|
2008-07-01 20:14:51 +07:00
|
|
|
|
2013-07-26 22:12:56 +07:00
|
|
|
/* perhaps simple_empty(child) makes more sense */
|
2008-07-01 20:14:51 +07:00
|
|
|
if (!list_empty(&child->d_subdirs)) {
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
spin_unlock(&parent->d_lock);
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(parent));
|
2008-07-01 20:14:51 +07:00
|
|
|
parent = child;
|
2013-07-26 22:12:56 +07:00
|
|
|
goto down;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
|
|
|
|
spin_unlock(&parent->d_lock);
|
|
|
|
|
2013-07-26 22:12:56 +07:00
|
|
|
if (!__debugfs_remove(child, parent))
|
|
|
|
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The parent->d_lock protects agaist child from unlinking
|
|
|
|
* from d_subdirs. When releasing the parent->d_lock we can
|
|
|
|
* no longer trust that the next pointer is valid.
|
|
|
|
* Restart the loop. We'll skip this one with the
|
2015-05-18 21:10:34 +07:00
|
|
|
* simple_positive() check.
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
*/
|
|
|
|
goto loop;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
spin_unlock(&parent->d_lock);
|
2008-07-01 20:14:51 +07:00
|
|
|
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(parent));
|
2013-07-26 22:12:56 +07:00
|
|
|
child = parent;
|
|
|
|
parent = parent->d_parent;
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_lock(d_inode(parent));
|
2013-07-26 22:12:56 +07:00
|
|
|
|
debugfs: Fix corrupted loop in debugfs_remove_recursive
[ I'm currently running my tests on it now, and so far, after a few
hours it has yet to blow up. I'll run it for 24 hours which it never
succeeded in the past. ]
The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.
When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.
I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.
Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP: 0018:ffff880077019df8 EFLAGS: 00010246
RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
Stack:
ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
Call Trace:
[<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
[<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
[<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
[<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
RSP <ffff880077019df8>
It took a while, but every time it triggered, it was always in the
same place:
list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
Where the child->d_u.d_child seemed to be corrupted. I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():
static void dentry_free(struct dentry *dentry)
{
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.
Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:
ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
it.
In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:
if (!debugfs_positive(child))
continue;
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-10 01:06:07 +07:00
|
|
|
if (child != dentry)
|
|
|
|
/* go up */
|
|
|
|
goto loop;
|
2013-07-26 22:12:56 +07:00
|
|
|
|
|
|
|
if (!__debugfs_remove(child, parent))
|
|
|
|
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
|
2016-01-23 03:40:57 +07:00
|
|
|
inode_unlock(d_inode(parent));
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
2008-07-01 20:14:51 +07:00
|
|
|
EXPORT_SYMBOL_GPL(debugfs_remove_recursive);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2007-05-09 18:19:52 +07:00
|
|
|
/**
|
|
|
|
* debugfs_rename - rename a file/directory in the debugfs filesystem
|
|
|
|
* @old_dir: a pointer to the parent dentry for the renamed object. This
|
|
|
|
* should be a directory dentry.
|
|
|
|
* @old_dentry: dentry of an object to be renamed.
|
|
|
|
* @new_dir: a pointer to the parent dentry where the object should be
|
|
|
|
* moved. This should be a directory dentry.
|
|
|
|
* @new_name: a pointer to a string containing the target name.
|
|
|
|
*
|
|
|
|
* This function renames a file/directory in debugfs. The target must not
|
|
|
|
* exist for rename to succeed.
|
|
|
|
*
|
|
|
|
* This function will return a pointer to old_dentry (which is updated to
|
|
|
|
* reflect renaming) if it succeeds. If an error occurs, %NULL will be
|
|
|
|
* returned.
|
|
|
|
*
|
|
|
|
* If debugfs is not enabled in the kernel, the value -%ENODEV will be
|
|
|
|
* returned.
|
|
|
|
*/
|
|
|
|
struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
|
|
|
|
struct dentry *new_dir, const char *new_name)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
struct dentry *dentry = NULL, *trap;
|
2017-07-08 01:51:19 +07:00
|
|
|
struct name_snapshot old_name;
|
2007-05-09 18:19:52 +07:00
|
|
|
|
2019-01-23 17:27:02 +07:00
|
|
|
if (IS_ERR(old_dir))
|
|
|
|
return old_dir;
|
|
|
|
if (IS_ERR(new_dir))
|
|
|
|
return new_dir;
|
|
|
|
if (IS_ERR_OR_NULL(old_dentry))
|
|
|
|
return old_dentry;
|
|
|
|
|
2007-05-09 18:19:52 +07:00
|
|
|
trap = lock_rename(new_dir, old_dir);
|
|
|
|
/* Source or destination directories don't exist? */
|
2015-03-18 05:25:59 +07:00
|
|
|
if (d_really_is_negative(old_dir) || d_really_is_negative(new_dir))
|
2007-05-09 18:19:52 +07:00
|
|
|
goto exit;
|
|
|
|
/* Source does not exist, cyclic rename, or mountpoint? */
|
2015-03-18 05:25:59 +07:00
|
|
|
if (d_really_is_negative(old_dentry) || old_dentry == trap ||
|
2007-05-09 18:19:52 +07:00
|
|
|
d_mountpoint(old_dentry))
|
|
|
|
goto exit;
|
|
|
|
dentry = lookup_one_len(new_name, new_dir, strlen(new_name));
|
|
|
|
/* Lookup failed, cyclic rename or target exists? */
|
2015-03-18 05:25:59 +07:00
|
|
|
if (IS_ERR(dentry) || dentry == trap || d_really_is_positive(dentry))
|
2007-05-09 18:19:52 +07:00
|
|
|
goto exit;
|
|
|
|
|
2017-07-08 01:51:19 +07:00
|
|
|
take_dentry_name_snapshot(&old_name, old_dentry);
|
2007-05-09 18:19:52 +07:00
|
|
|
|
2015-03-18 05:25:59 +07:00
|
|
|
error = simple_rename(d_inode(old_dir), old_dentry, d_inode(new_dir),
|
2016-09-27 16:03:57 +07:00
|
|
|
dentry, 0);
|
2007-05-09 18:19:52 +07:00
|
|
|
if (error) {
|
2017-07-08 01:51:19 +07:00
|
|
|
release_dentry_name_snapshot(&old_name);
|
2007-05-09 18:19:52 +07:00
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
d_move(old_dentry, dentry);
|
2019-04-27 00:21:24 +07:00
|
|
|
fsnotify_move(d_inode(old_dir), d_inode(new_dir), &old_name.name,
|
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 19:02:35 +07:00
|
|
|
d_is_dir(old_dentry),
|
2007-06-07 23:19:32 +07:00
|
|
|
NULL, old_dentry);
|
2017-07-08 01:51:19 +07:00
|
|
|
release_dentry_name_snapshot(&old_name);
|
2007-05-09 18:19:52 +07:00
|
|
|
unlock_rename(new_dir, old_dir);
|
|
|
|
dput(dentry);
|
|
|
|
return old_dentry;
|
|
|
|
exit:
|
|
|
|
if (dentry && !IS_ERR(dentry))
|
|
|
|
dput(dentry);
|
|
|
|
unlock_rename(new_dir, old_dir);
|
debugfs: return error values, not NULL
When an error happens, debugfs should return an error pointer value, not
NULL. This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).
So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: Gary R Hook <ghook@amd.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Michal Hocko <mhocko@kernel.org>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 17:28:14 +07:00
|
|
|
if (IS_ERR(dentry))
|
|
|
|
return dentry;
|
|
|
|
return ERR_PTR(-EINVAL);
|
2007-05-09 18:19:52 +07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_rename);
|
|
|
|
|
2009-03-23 05:10:44 +07:00
|
|
|
/**
|
|
|
|
* debugfs_initialized - Tells whether debugfs has been registered
|
|
|
|
*/
|
|
|
|
bool debugfs_initialized(void)
|
|
|
|
{
|
|
|
|
return debugfs_registered;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(debugfs_initialized);
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
static int __init debugfs_init(void)
|
|
|
|
{
|
|
|
|
int retval;
|
|
|
|
|
2015-05-14 05:35:41 +07:00
|
|
|
retval = sysfs_create_mount_point(kernel_kobj, "debug");
|
|
|
|
if (retval)
|
|
|
|
return retval;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
|
|
|
retval = register_filesystem(&debug_fs_type);
|
|
|
|
if (retval)
|
2015-05-14 05:35:41 +07:00
|
|
|
sysfs_remove_mount_point(kernel_kobj, "debug");
|
2009-03-23 05:10:44 +07:00
|
|
|
else
|
|
|
|
debugfs_registered = true;
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
core_initcall(debugfs_init);
|
|
|
|
|