This patch replaces uses of ablkcipher and blkcipher with skcipher,
and the long obsolete hash interface with shash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.
Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.
One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.
There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:
-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----
I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.
It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull trivial updates from Jiri Kosina:
"Trivial stuff from trivial tree that can be trivially summed up as:
- treewide drop of spurious unlikely() before IS_ERR() from Viresh
Kumar
- cosmetic fixes (that don't really affect basic functionality of the
driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
- various comment / printk fixes and updates all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
bcache: Really show state of work pending bit
hwmon: applesmc: fix comment typos
Kconfig: remove comment about scsi_wait_scan module
class_find_device: fix reference to argument "match"
debugfs: document that debugfs_remove*() accepts NULL and error values
net: Drop unlikely before IS_ERR(_OR_NULL)
mm: Drop unlikely before IS_ERR(_OR_NULL)
fs: Drop unlikely before IS_ERR(_OR_NULL)
drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
UBI: Update comments to reflect UBI_METAONLY flag
pktcdvd: drop null test before destroy functions
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJV7v55AAoJENaSAD2qAscKAwoQAKkBJiRk4s8xmhHh++qgMxKr
JPOJqiwsV86KWMrxzT7xYO9IUr/D8zFo0m7mXydMeJvRZl1MJe9atnoZL9xRuvch
+ALh6i2c1o/tjuWze4ewMatxMiWffJ7W72p8dcihDiacDp0gxArd6zqaURjQRyns
nwk/vt41PerXv2dloFFsnC3XibojwS5/dltM8B98dzdP8lj9XYSHAhwhsa4/MUvF
pkv/PGuoDuuK0mbNaCEvjy2c40qjETlW0Hn/YkvJhBYYDR3awuXqCvNO+dZbhoym
NeHb7Jfvzkagfd8v5Us9aQ0lK1ABTjmRdW5zGIymCke5HbpzzkMMKfAJKBvx186I
M3eNnztc78lOjWOfqrfwOHM3/X1ELKiHgrZcG3KR+XX9J3fCmp7fL5EYL+JDf86H
2FPEU1RZue3oJUo3PYhCbEvWekUyHbG/18JHxRXAwVsUpcjeHprfvmn7hBN0sFPs
hom3Mcc1IMUuos00Mj3H6CxpD2xe2+77L6VngQ681zNIhkG2nnCLx5zNPUx7d+68
RjV/KKs17/YNGZKxuNVD//RYJqx58NyM2BwBQ7caVm3dPWHa5tLoAxJstFjhKwhZ
lon7mS4odEidSU4s1apEtoTeCjQMSAMgtQbTHUj+p+dNXx+CVdXXFaMqNvBHhCEI
5h01NR6QzfE8QGzLUcXf
=ejQ5
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Invalidate stale eCryptfs dcache entries caused by unlinked lower
inodes"
* tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Delete a check before the function call "key_put"
eCryptfs: Invalidate dcache entries when lower i_nlink is zero
The key_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around this call might not be needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Consider eCryptfs dcache entries to be stale when the corresponding
lower inode's i_nlink count is zero. This solves a problem caused by the
lower inode being directly modified, without going through the eCryptfs
mount, leaving stale eCryptfs dentries cached and the eCryptfs inode's
i_nlink count not being cleared.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: stable@vger.kernel.org
This patch fix spelling typo inv various part of sources.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
The FITRIM ioctl has the same arguments on 32-bit and 64-bit
architectures, so we can add it to the list of compatible ioctls and
drop it from compat_ioctl method of various filesystems.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ted Ts'o <tytso@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.
Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().
b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The AIO interface is fairly complex because it tries to allow
filesystems to always work async and then wakeup a synchronous
caller through aio_complete. It turns out that basically no one
was doing this to avoid the complexity and context switches,
and we've already fixed up the remaining users and can now
get rid of this case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.
One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.
This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.
https://bugzilla.kernel.org/show_bug.cgi?id=93691https://launchpad.net/bugs/1305335
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org # v2.6.36+: c43f7b8 eCryptfs: Handle ioctl calls with unlocked and compat functions
The patch 237fead619: "[PATCH] ecryptfs: fs/Makefile and
fs/Kconfig" from Oct 4, 2006, leads to the following static checker
warning:
fs/ecryptfs/crypto.c:846 ecryptfs_new_file_context()
error: off-by-one overflow 'crypt_stat->cipher' size 32. rl = '0-32'
There is a mismatch between the size of ecryptfs_crypt_stat.cipher
and ecryptfs_mount_crypt_stat.global_default_cipher_name causing the
copy of the cipher name to cause a off-by-one string copy error. This
fix ensures the space reserved for this string is the same size including
the trailing zero at the end throughout ecryptfs.
This fix avoids increasing the size of ecryptfs_crypt_stat.cipher
and also ecryptfs_parse_tag_70_packet_silly_stack.cipher_string and instead
reduces the of ECRYPTFS_MAX_CIPHER_NAME_SIZE to 31 and includes the + 1 for
the end of string terminator.
NOTE: An overflow is not possible in practice since the value copied
into global_default_cipher_name is validated by
ecryptfs_code_for_cipher_string() at mount time. None of the allowed
cipher strings are long enough to cause the potential buffer overflow
fixed by this patch.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[tyhicks: Added the NOTE about the overflow not being triggerable]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
- The filename decryption routines were, at times, writing a zero byte one
character past the end of the filename buffer
- The encrypted view feature attempted, and failed, to roll its own form of
enforcing a read-only mount instead of letting the VFS enforce it
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJUlGlWAAoJENaSAD2qAscKgdwQAJLCgt5xgoij8uH65mYVy6PE
++E8KggCHchhUcFh19EuuG++ENzwZedqKqbbRhmZt+svla08uPDO0jY+sYaOZzqB
HMzZyyL4Fw/DiVsK6LUMGfN2CS7tua9ReK0dj4tUc3jwBplnQnGrQlUPbgdPRJJp
XJDKHVtHGQPQC1dJAeProCJTg23jv5wXacly2I5VxZuh5DnDWjuo2KkzqGMKfadU
9bxOf5DjVwQ/X6apgkNxG/8Q6J8a+80K7SGs/aRELIuRL0+mIj5AX9ht+p5Z0XI+
E8jU4HM2biuqj8PtxK/WbyF1bHiREVyOLdneW+n3pcqGTLMZ3TLpDURkNd0cwZcd
AGoZtUZZuQ/xvzE9IrEj+KYeINnZzPQpyJu9jXXp1CsQJ8u/wUG9e0kcJrUI1dZ/
q26zhPaVhJ9x0go/BNOxzTUpOtuMWXttAVZGICthp74I5l+DgfSZ4J4CeQmFMMRs
kbiTg5h4qsrD4jnEU/BCscpCLoz4qlF6+q/+lspwkg7OwvhHc0qSMJn3UjZ/eYzb
ncDp6gc4Iju3RhWnVEZphA4ttbZJX7ahER4y0NdIG65Fa37AUEUxy0OmGfoyFOmC
KVtlHkl2hKoCB5+ujLzL7WimH33ogtVhtOJTZlXzS0lsdSfSxUWU4/OZ22jTy0hL
Bnx6uoNIPWsr6b5OJiA0
=jTyq
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Fixes for filename decryption and encrypted view plus a cleanup
- The filename decryption routines were, at times, writing a zero
byte one character past the end of the filename buffer
- The encrypted view feature attempted, and failed, to roll its own
form of enforcing a read-only mount instead of letting the VFS
enforce it"
* tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Remove buggy and unnecessary write in file name decode routine
eCryptfs: Remove unnecessary casts when parsing packet lengths
eCryptfs: Force RO mount when encrypted view is enabled
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # v2.6.29+: 51ca58d eCryptfs: Filename Encryption: Encoding and encryption functions
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
The elements in the data array are already unsigned chars and do not
need to be casted.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems. Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.
Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.
To limit the kernel stack usage we must limit the depth of the
filesystem stack. Initially the limit is set to 2.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:
e77a56d [PATCH] eCryptfs: Encrypted passthrough
This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Priya Bansal <p.bansal@samsung.com>
Cc: stable@vger.kernel.org # v2.6.21+: e77a56d [PATCH] eCryptfs: Encrypted passthrough
Pull vfs updates from Al Viro:
"The big thing in this pile is Eric's unmount-on-rmdir series; we
finally have everything we need for that. The final piece of prereqs
is delayed mntput() - now filesystem shutdown always happens on
shallow stack.
Other than that, we have several new primitives for iov_iter (Matt
Wilcox, culled from his XIP-related series) pushing the conversion to
->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
cleanups and fixes (including the external name refcounting, which
gives consistent behaviour of d_move() wrt procfs symlinks for long
and short names alike) and assorted cleanups and fixes all over the
place.
This is just the first pile; there's a lot of stuff from various
people that ought to go in this window. Starting with
unionmount/overlayfs mess... ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
fs/file_table.c: Update alloc_file() comment
vfs: Deduplicate code shared by xattr system calls operating on paths
reiserfs: remove pointless forward declaration of struct nameidata
don't need that forward declaration of struct nameidata in dcache.h anymore
take dname_external() into fs/dcache.c
let path_init() failures treated the same way as subsequent link_path_walk()
fix misuses of f_count() in ppp and netlink
ncpfs: use list_for_each_entry() for d_subdirs walk
vfs: move getname() from callers to do_mount()
gfs2_atomic_open(): skip lookups on hashed dentry
[infiniband] remove pointless assignments
gadgetfs: saner API for gadgetfs_create_file()
f_fs: saner API for ffs_sb_create_file()
jfs: don't hash direct inode
[s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
ecryptfs: ->f_op is never NULL
android: ->f_op is never NULL
nouveau: __iomem misannotations
missing annotation in fs/file.c
fs: namespace: suppress 'may be used uninitialized' warnings
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJUNtXXAAoJENaSAD2qAscKx1UP/jqt4gm1AYFBpkwBhVRUssIQ
wHckk8QPasIdEGeKvyCKXl88sUDLSsJwf/mUpl8pBfKm64LokP2fmUBU9Pkf9hVU
lcuaFNIEmHh8p1IqcfaFbnZOjuuHc9M2ULQLmo5ShoTHNu6JYAP2zRBMfFrEdcMR
vKh+RCARa5jr1CdwTHX+dH5vJQIQXW/qgRK5G5Z6KeBI766jK2BvZxYHivUgLWuC
dV6K4RzvHHJYEVoXjCUhrgGepGwHlDoEgx/Y0GK9vbFPG38IrfSlN6fgvzV6mMYE
ien8FsuHPv5oiBmFM2byRmWpJtWUOViCbMaMmKqY5Ix21E0lUafA7ixH3nSpOHNZ
b29dxmHnEDomCJXLAF0NQUE84yTw6ITLp2FldUR2o+sidnJsDx/hph/KvmsK6d6P
sDfEN/DtzPluZoXKY0jrRtoAhi0citNTgKfrujmx6baBstxRp7AfwGcP4skXJq7w
wkg0Seo449CUaBJK9A4s7nIjMBQ6/3hjvF/NVcry1aY+/RyhJDz6uFrtnOEqW3So
6khwJOOcRkmLLXrk+DzvthDRCVNJhWM80cB5/UBBfVG2Kk3PHa86Jfp5Y4teWugx
zyoacEWiitKcK4Po8skZpLd2vUwny/cD0qS43LT+SMvgRsYZ4XdUnceoY8FkOnLr
ooIFKHCR/+2XVnAaC101
=mjS7
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs updates from Tyler Hicks:
"Minor code cleanups and a fix for when eCryptfs metadata is stored in
xattrs"
* tag 'ecryptfs-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: remove unneeded buggy code in ecryptfs_do_create()
ecryptfs: avoid to access NULL pointer when write metadata in xattr
ecryptfs: remove unnecessary break after goto
ecryptfs: Remove unnecessary include of syscall.h in keystore.c
fs/ecryptfs/messaging.c: remove null test before kfree
ecryptfs: Drop cast
Use %pd in eCryptFS
There is a bug in error handling of lock_parent() in ecryptfs_do_create():
lock_parent() acquries mutex even if dget_parent() fails, so mutex should be unlocked anyway.
But dget_parent() does not fail, so the patch just removes unneeded buggy code.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
There's no reason to include syscalls.h in keystore.c. Remove it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: ecryptfs@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
This patch does away with cast on void * and the if as it is unnecessary.
The following Coccinelle semantic patch was used for making the change:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T *)x)->f
|
- (T *)
e
)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Pull renameat2 system call from Miklos Szeredi:
"This adds a new syscall, renameat2(), which is the same as renameat()
but with a flags argument.
The purpose of extending rename is to add cross-rename, a symmetric
variant of rename, which exchanges the two files. This allows
interesting things, which were not possible before, for example
atomically replacing a directory tree with a symlink, etc... This
also allows overlayfs and friends to operate on whiteouts atomically.
Andy Lutomirski also suggested a "noreplace" flag, which disables the
overwriting behavior of rename.
These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
implemented for ext4 as an example and for testing"
* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ext4: add cross rename support
ext4: rename: split out helper functions
ext4: rename: move EMLINK check up
ext4: rename: create ext4_renament structure for local vars
vfs: add cross-rename
vfs: lock_two_nondirectories: allow directory args
security: add flags to rename hooks
vfs: add RENAME_NOREPLACE flag
vfs: add renameat2 syscall
vfs: rename: use common code for dir and non-dir
vfs: rename: move d_move() up
vfs: add d_is_dir()
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add new renameat2 syscall, which is the same as renameat with an added
flags argument.
Pass flags to vfs_rename() and to i_op->rename() as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
If ecryptfs_readlink_lower() fails, buf remains an uninitialized
pointer and passing it nd_set_link() won't do anything good.
Fixed by switching ecryptfs_readlink_lower() to saner API - make it
return buf or ERR_PTR(...) and update callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use the new %pd printk() specifier in eCryptFS to replace passing of dentry
name or dentry name and name length * 2 with just passing the dentry.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: ecryptfs@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When accessing the lower_file pointer located in private_data of
eCryptfs files, there is no need to check to see if the private_data
pointer has been initialized to a non-NULL value. The file->private_data
and file->private_data->lower_file pointers are always initialized to
non-NULL values in ecryptfs_open().
This change quiets a Smatch warning:
CHECK /var/scm/kernel/linux/fs/ecryptfs/file.c
fs/ecryptfs/file.c:321 ecryptfs_unlocked_ioctl() error: potential NULL dereference 'lower_file'.
fs/ecryptfs/file.c:335 ecryptfs_compat_ioctl() error: potential NULL dereference 'lower_file'.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Geyslan G. Bem <geyslan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We need to break delegations on any operation that changes the set of
links pointing to an inode. Start with unlink.
Such operations also hold the i_mutex on a parent directory. Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client. To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:
acquire locks
...
test for delegation; if found:
take reference on inode
release locks
wait for delegation break
drop reference on inode
retry
It is possible this could never terminate. (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.) But this seems very unlikely.
The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack. We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the underlying dentry doesn't have ->d_revalidate(), there's no need to
force dropping out of RCU mode. All we need for that is to make freeing
ecryptfs_dentry_info RCU-delayed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Shifting page->index on 32 bit systems was overflowing, causing
data corruption of > 4GB files. Fix this by casting it first.
https://launchpad.net/bugs/1243636
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Lars Duesing <lars.duesing@camelotsweb.de>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
In 'decrypt_pki_encrypted_session_key' function:
Initializes 'payload' pointer and releases it on exit.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org # v2.6.28+
It might be possible for two callers to race the mutex lock after the
NULL ctx check. Instead, move the lock above the check so there isn't
the possibility of leaking a crypto ctx. Additionally, report the full
algo name when failing.
Signed-off-by: Kees Cook <keescook@chromium.org>
[tyhicks: remove out label, which is no longer used]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
It doesn't make sense to check if an array is NULL. The compiler just
removes the check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
- Remove redundant code by merging some encrypt and decrypt functions
- Get rid of a helper page allocation during page decryption by using in-place
decryption
- Better use of entire pages during page crypto operations
- Several code cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIbBAABCgAGBQJR3Z1JAAoJENaSAD2qAscK+jAP92NR3W17njuPFJBHCROsS481
tyhzy2W/LlNK1njnS6SRP3O3Icv3adiJRtMZePV7bpCH3yD/JoYZ6VHQBNV7msHW
VuEwE6eqCkKl3NLunhwd4+m5R9qijJYGfYEzTve/RNASDU7/LPcTUF8OQ5dIB0wA
J6IMGGZSpsFa8ymN01YzmUEmOUx1IBR2aYBT8Og4Ke117ywDYqxS0ghd1rb953sS
7H8wnNcijs1DLGe71SZnKMVCYwO32GWarxDBAa0KsabcyY4Yr43O3ov/CsIAAA7B
Q1Dn7KNNSOCu9G6fHrnuMOTncGnNPLhIe6Yc0PCZ7ykVstpzlNkKJ628IEonsJaJ
4bYc3bqq4KH7rqMxjA+1GoLehpJWJzqwfiFI1fWLlYMmO2ky126rJUgSNBHQe9+M
iWl+ZrYokSsNWBcUsIq7SJFaLIhWDNcb+Wl7RiTNBBwoBaZclrNuWKIyeWPhH+9/
+/K3LBaggujzVpE743wgJhY60sfdHZmaRAD9agEbcG773JePXBg9OkiUp/hKSe8s
UaGkfmwAlz8u6mR1eJCuFDCqwJKByyT4vObuOFroh7NgOHaQZghlnO4HwuOjzp6U
wTUiMVslFY9WAsEWxdDhaCXxB8IrjHz3YZGIt8PU2eT6ucQU+HLvkkxqtzSYm9/7
BBfryWZKwR7T50JdehI=
=mRpN
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.11-rc1-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs updates from Tyler Hicks:
"Code cleanups and improved buffer handling during page crypto
operations:
- Remove redundant code by merging some encrypt and decrypt functions
- Get rid of a helper page allocation during page decryption by using
in-place decryption
- Better use of entire pages during page crypto operations
- Several code cleanups"
* tag 'ecryptfs-3.11-rc1-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
Use ecryptfs_dentry_to_lower_path in a couple of places
eCryptfs: Make extent and scatterlist crypt function parameters similar
eCryptfs: Collapse crypt_page_offset() into crypt_extent()
eCryptfs: Merge ecryptfs_encrypt_extent() and ecryptfs_decrypt_extent()
eCryptfs: Combine page_offset crypto functions
eCryptfs: Combine encrypt_scatterlist() and decrypt_scatterlist()
eCryptfs: Decrypt pages in-place
eCryptfs: Accept one offset parameter in page offset crypto functions
eCryptfs: Simplify lower file offset calculation
eCryptfs: Read/write entire page during page IO
eCryptfs: Use entire helper page during page crypto operations
eCryptfs: Cocci spatch "memdup.spatch"
There are two places in ecryptfs that benefit from using
ecryptfs_dentry_to_lower_path() instead of separate calls to
ecryptfs_dentry_to_lower() and ecryptfs_dentry_to_lower_mnt(). Both
sites use fewer instructions and less stack (determined by examining
objdump output).
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
iterate_dir(): new helper, replacing vfs_readdir().
struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The 'dest' abbreviation is only used in crypt_scatterlist(), while all
other functions in crypto.c use 'dst' so dest_sg should be renamed to
dst_sg.
The crypt_stat parameter is typically the first parameter in internal
eCryptfs functions so crypt_stat and dst_page should be swapped in
crypt_extent().
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
crypt_page_offset() simply initialized the two scatterlists and called
crypt_scatterlist() so it is simple enough to move into the only
function that calls it.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
They are identical except if the src_page or dst_page index is used, so
they can be merged safely if page_index is conditionally assigned.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Combine ecryptfs_encrypt_page_offset() and
ecryptfs_decrypt_page_offset(). These two functions are functionally
identical so they can be safely merged if the caller can indicate
whether an encryption or decryption operation should occur.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
These two functions are identical except for a debug printk and whether
they call crypto_ablkcipher_encrypt() or crypto_ablkcipher_decrypt(), so
they can be safely merged if the caller can indicate if encryption or
decryption should occur.
The debug printk is useless so it is removed.
Two new #define's are created to indicate if an ENCRYPT or DECRYPT
operation is desired.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
When reading in a page, eCryptfs would allocate a helper page, fill it
with encrypted data from the lower filesytem, and then decrypt the data
from the encrypted page and store the result in the eCryptfs page cache
page.
The crypto API supports in-place crypto operations which means that the
allocation of the helper page is unnecessary when decrypting. This patch
gets rid of the unneeded page allocation by reading encrypted data from
the lower filesystem directly into the page cache page. The page cache
page is then decrypted in-place.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
There is no longer a need to accept different offset values for the
source and destination pages when encrypting/decrypting an extent in an
eCryptfs page. The two offsets can be collapsed into a single parameter.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Now that lower filesystem IO operations occur for complete
PAGE_CACHE_SIZE bytes, the calculation for converting an eCryptfs extent
index into a lower file offset can be simplified.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
When reading and writing encrypted pages, perform IO using the entire
page all at once rather than 4096 bytes at a time.
This only affects architectures where PAGE_CACHE_SIZE is larger than
4096 bytes.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
When encrypting eCryptfs pages and decrypting pages from the lower
filesystem, utilize the entire helper page rather than only the first
4096 bytes.
This only affects architectures where PAGE_CACHE_SIZE is larger than
4096 bytes.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Error out of ecryptfs_fsync() if filemap_write_and_wait() fails.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Paul Taysom <taysom@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: stable@vger.kernel.org # v3.6+
When msync is called on a memory mapped file, that
data is not flushed to the disk.
In Linux, msync calls fsync for the file. For ecryptfs,
fsync just calls the lower level file system's fsync.
Changed the ecryptfs fsync code to call filemap_write_and_wait
before calling the lower level fsync.
Addresses the problem described in http://crbug.com/239536
Signed-off-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org # v3.6+
available by moving to the ablkcipher crypto API. The improvement is more
apparent on faster storage devices. There's no noticeable change when hardware
crypto is not available.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRjJFjAAoJENaSAD2qAscKuRgQAJkefyLPNBb4plXr3R+zJ1aM
jkcqkUJWamCxlmqHg/n9LmlxpSZiiWSWJM8iiq8zQhPE0PVXVOVhqvzogT1xwv75
xfT9xTuRi1v7UFaSGHtj3WoO2nscQ0pjZW/0CHgd/PZjz9y0iJ/l6ueoWCOz5L2i
3oJvx/W407qM+MSogWKx79i1B2jILdBdQH/7PZ+UJS3jWEo3rMWBbfCwbYhd4pUG
oVc+qFglNs+3HLdHVUmHPCerCL9qYAJIJmDrvupSOQ6DwdFaV8IysTgSEdFtLcfC
8Z6DUOPzXnvA/+y+NCCCUxg1CrkgYkNrefLKAq18atFu63zIZIHZyJBTJ5Q6vXVF
o1H8UcIOg/liGa6lXGf5b4ENNKvB0qMYQgiSrL0/FVVima4zGqUWkQLno4kQl1zx
FHB5imQ7F/EMcow/nTN3YQYC3N/iYIFAIRxf35SiGGsNhO2sEyIqYlRSyq2MNrRl
pLWNbhnRuhBUqcbqZDxq1oZ7624Ui4jnHHx7rl6Y3gfm8Xa1ZmQeY6rOadSZaRd7
+ZqFZi1jiHz1c0tVUO/3DsqABIhbr9Ee03vracN8bVTGV0ZmO0qneugFB9esoeDf
UnrU0Im8ilHu3OHAyc+UkRZuzThL9bLZEivwICQ+JDUJ1zsLYSQZEaGIt8hA0iDy
8Bu3gtfX2WxD4ak/AkFv
=3EhP
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs update from Tyler Hicks:
"Improve performance when AES-NI (and most likely other crypto
accelerators) is available by moving to the ablkcipher crypto API.
The improvement is more apparent on faster storage devices.
There's no noticeable change when hardware crypto is not available"
* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Use the ablkcipher crypto API
Make the switch from the blkcipher kernel crypto interface to the
ablkcipher interface.
encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
interface but, from the eCryptfs standpoint, still treat the crypto
operation as a synchronous operation. They submit the async request and
then wait until the operation is finished before they return. Most of
the changes are contained inside those two functions.
Despite waiting for the completion of the crypto operation, the
ablkcipher interface provides performance increases in most cases when
used on AES-NI capable hardware.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Reviewed-by: Zeev Zilberman <zeev@annapurnaLabs.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Thieu Le <thieule@google.com>
Cc: Li Wang <dragonylffly@163.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
Pull more vfs fixes from Al Viro:
"Regression fix from Geert + yet another open-coded kernel_read()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs: don't open-code kernel_read()
xtensa simdisk: Fix proc_create_data() conversion fallout
Pull namespace bugfixes from Eric Biederman:
"This is three simple fixes against 3.9-rc1. I have tested each of
these fixes and verified they work correctly.
The userns oops in key_change_session_keyring and the BUG_ON triggered
by proc_ns_follow_link were found by Dave Jones.
I am including the enhancement for mount to only trigger requests of
filesystem modules here instead of delaying this for the 3.10 merge
window because it is both trivial and the kind of change that tends to
bit-rot if left untouched for two months."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Use nd_jump_link in proc_ns_follow_link
fs: Limit sys_mount to only request filesystem modules (Part 2).
fs: Limit sys_mount to only request filesystem modules.
userns: Stop oopsing in key_change_session_keyring
The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The
new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to
userspace communication channel to be compiled out. This may be the first step
in it being eventually removed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRN+Z7AAoJENaSAD2qAscKr00P/0/sNgen9e5zqe1q+CAj6hW0
ynzWY/ZNk905hU6tmYb/rHwt7DfaSmrZuzypZP9sGbu+q9RITLl65Hm9HGEJuvJA
fK0UHejcAMQmf+AGZiiMs0SB4B4z+eAzUQTWZsX22C1u+3zyI5xLs1NBquKwDyeq
5sNbcmzQYn4w04xag3yYVQEow0NeIjjuCUc8gNUPctDQldN9DdFTdwFTar5lvC0s
V4qPWqa61mS9xtegryWAw4DNKjUIrZZFFupWPqRYDVYK8N+RQRBL1RWGVRFCJ17j
Ho8yi2onPFGt2y/kW6MwsC41wWFk0Mxsfxf/ZaBMm3lpfYM8UbGQJ6+V9wQWOokU
kioUcTI0WvK999mRLxUNkXuVuNDv0OUysgtALy5bevfneWrfXxoSKq+MPbyNfC7+
mo2BCIyHLXn7BYhzPTU+XfksPfMneYUi5LWf4Km5XYXlZ8rwk3IKvJQFyVThEv8+
peVvwSwblUHaoQLnFhEVeu4olHO6AdVQtwr53HPgpMPaZj2/vaWQNA4+bu5HZHTG
wqBmdo4DH4jgd9D8xiMZMIJTik8j9aUmpntc4eR7RJEKSice4+X1fUXL4n4N4NfD
FkYjWCUZI6nkFUGhGDCokCjzZ3GTEzbe+4pNi3ycTnywcOXFSoq2Kx+tNzE4zXBs
FlWGJYrCub9UOLwoYV2C
=XwgS
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Minor code cleanups and new Kconfig option to disable /dev/ecryptfs
The code cleanups fix up W=1 compiler warnings and some unnecessary
checks. The new Kconfig option, defaulting to N, allows the rarely
used eCryptfs kernel to userspace communication channel to be compiled
out. This may be the first step in it being eventually removed."
Hmm. I'm not sure whether these should be called "fixes", and it
probably should have gone in the merge window. But I'll let it slide.
* tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: allow userspace messaging to be disabled
eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
eCryptfs: remove unneeded checks in virt_to_scatterlist()
eCryptfs: Fix -Wmissing-prototypes warnings
eCryptfs: Fix -Wunused-but-set-variable warnings
eCryptfs: initialize payload_len in keystore.c
When the userspace messaging (for the less common case of userspace key
wrap/unwrap via ecryptfsd) is not needed, allow eCryptfs to build with
it removed. This saves on kernel code size and reduces potential attack
surface by removing the /dev/ecryptfs node.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>