Commit Graph

228 Commits

Author SHA1 Message Date
Linus Torvalds
b26b5ef5ec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more misc uaccess and vfs updates from Al Viro:
 "The rest of the stuff from -next (more uaccess work) + assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  score: traps: Add missing include file to fix build error
  fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
  fs/super.c: fix race between freeze_super() and thaw_super()
  overlayfs: Fix setting IOP_XATTR flag
  iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
  blackfin: no access_ok() for __copy_{to,from}_user()
  arm64: don't zero in __copy_from_user{,_inatomic}
  arm: don't zero in __copy_from_user_inatomic()/__copy_from_user()
  arc: don't leak bits of kernel stack into coredump
  alpha: get rid of tail-zeroing in __copy_user()
2016-10-14 18:19:05 -07:00
Linus Torvalds
1a892b485f Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This update contains fixes to the "use mounter's permission to access
  underlying layers" area, and miscellaneous other fixes and cleanups.

  No new features this time"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: use vfs_get_link()
  vfs: add vfs_get_link() helper
  ovl: use generic_readlink
  ovl: explain error values when removing acl from workdir
  ovl: Fix info leak in ovl_lookup_temp()
  ovl: during copy up, switch to mounter's creds early
  ovl: lookup: do getxattr with mounter's permission
  ovl: copy_up_xattr(): use strnlen
2016-10-14 17:23:33 -07:00
Vivek Goyal
655042cc14 overlayfs: Fix setting IOP_XATTR flag
ovl_fill_super calls ovl_new_inode to create a root inode for the new
superblock before initializing sb->s_xattr.  This wrongly causes
IOP_XATTR to be cleared in i_opflags of the new inode, causing SELinux
to log the following message:

  SELinux: (dev overlay, type overlay) has no xattr support

Fix this by initializing sb->s_xattr and similar fields before calling
ovl_new_inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14 20:00:34 -04:00
Miklos Szeredi
7764235bec ovl: use vfs_get_link()
Resulting in a complete removal of a function basically implementing the
inverse of vfs_readlink().

As a bonus, now the proper security hook is also called.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14 11:16:47 +02:00
Miklos Szeredi
78a3fa4f32 ovl: use generic_readlink
All filesystems that are backers for overlayfs would also use
generic_readlink().  Move this logic to the overlay itself, which is a nice
cleanup.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14 11:16:46 +02:00
Miklos Szeredi
cb348edb6b ovl: explain error values when removing acl from workdir
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14 11:16:46 +02:00
Linus Torvalds
101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Linus Torvalds
97d2116708 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "xattr stuff from Andreas

  This completes the switch to xattr_handler ->get()/->set() from
  ->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Remove {get,set,remove}xattr inode operations
  xattr: Stop calling {get,set,remove}xattr inode operations
  vfs: Check for the IOP_XATTR flag in listxattr
  xattr: Add __vfs_{get,set,remove}xattr helpers
  libfs: Use IOP_XATTR flag for empty directory handling
  vfs: Use IOP_XATTR flag for bad-inode handling
  vfs: Add IOP_XATTR inode operations flag
  vfs: Move xattr_resolve_name to the front of fs/xattr.c
  ecryptfs: Switch to generic xattr handlers
  sockfs: Get rid of getxattr iop
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  kernfs: Switch to generic xattr handlers
  hfs: Switch to generic xattr handlers
  jffs2: Remove jffs2_{get,set,remove}xattr macros
  xattr: Remove unnecessary NULL attribute name check
2016-10-10 17:11:50 -07:00
Linus Torvalds
abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Al Viro
e55f1d1d13 Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
Andreas Gruenbacher
fd50ecaddf vfs: Remove {get,set,remove}xattr inode operations
These inode operations are no longer used; remove them.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 21:48:36 -04:00
Andreas Gruenbacher
5d6c31910b xattr: Add __vfs_{get,set,remove}xattr helpers
Right now, various places in the kernel check for the existence of
getxattr, setxattr, and removexattr inode operations and directly call
those operations.  Switch to helper functions and test for the IOP_XATTR
flag instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-07 20:10:44 -04:00
Linus Torvalds
a3443cda55 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:

  SELinux/LSM:
   - overlayfs support, necessary for container filesystems

  LSM:
   - finally remove the kernel_module_from_file hook

  Smack:
   - treat signal delivery as an 'append' operation

  TPM:
   - lots of bugfixes & updates

  Audit:
   - new audit data type: LSM_AUDIT_DATA_FILE

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (47 commits)
  Revert "tpm/tpm_crb: implement tpm crb idle state"
  Revert "tmp/tpm_crb: fix Intel PTT hw bug during idle state"
  Revert "tpm/tpm_crb: open code the crb_init into acpi_add"
  Revert "tmp/tpm_crb: implement runtime pm for tpm_crb"
  lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE
  tmp/tpm_crb: implement runtime pm for tpm_crb
  tpm/tpm_crb: open code the crb_init into acpi_add
  tmp/tpm_crb: fix Intel PTT hw bug during idle state
  tpm/tpm_crb: implement tpm crb idle state
  tpm: add check for minimum buffer size in tpm_transmit()
  tpm: constify TPM 1.x header structures
  tpm/tpm_crb: fix the over 80 characters checkpatch warring
  tpm/tpm_crb: drop useless cpu_to_le32 when writing to registers
  tpm/tpm_crb: cache cmd_size register value.
  tmp/tpm_crb: drop include to platform_device
  tpm/tpm_tis: remove unused itpm variable
  tpm_crb: fix incorrect values of cmdReady and goIdle bits
  tpm_crb: refine the naming of constants
  tpm_crb: remove wmb()'s
  tpm_crb: fix crb_req_canceled behavior
  ...
2016-10-04 14:48:27 -07:00
Miklos Szeredi
2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Jan Kara
31051c85b5 fs: Give dentry to inode_change_ok() instead of inode
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22 10:56:19 +02:00
Richard Weinberger
6a45b3628c ovl: Fix info leak in ovl_lookup_temp()
The function uses the memory address of a struct dentry as unique id.
While the address-based directory entry is only visible to root it is IMHO
still worth fixing since the temporary name does not have to be a kernel
address.  It can be any unique number.  Replace it by an atomic integer
which is allowed to wrap around.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v3.18+
Fixes: e9be9d5e76 ("overlay filesystem")
2016-09-21 16:37:07 +02:00
Vivek Goyal
8eac98b8be ovl: during copy up, switch to mounter's creds early
Now, we have the notion that copy up of a file is done with the creds
of mounter of overlay filesystem (as opposed to task). Right now before
we switch creds, we do some vfs_getattr() operations in the context of
task and that itself can fail. We should do that getattr() using the
creds of mounter instead.

So this patch switches to mounter's creds early during copy up process so
that even vfs_getattr() is done with mounter's creds.

Do not call revert_creds() unless we have already called
ovl_override_creds(). [Reported by Arnd Bergmann]

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-19 16:50:59 +02:00
Miklos Szeredi
2b6bc7f48d ovl: lookup: do getxattr with mounter's permission
The getxattr() in ovl_is_opaquedir() was missed when converting all
operations on underlying fs to be done under mounter's permission.

This patch fixes this by moving the ovl_override_creds()/revert_creds() out
from ovl_lookup_real() to ovl_lookup().

Also convert to using vfs_getxattr() instead of directly calling
i_op->getxattr().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-16 14:12:11 +02:00
Miklos Szeredi
8b326c61de ovl: copy_up_xattr(): use strnlen
Be defensive about what underlying fs provides us in the returned xattr
list buffer.  strlen() may overrun the buffer, so use strnlen() and WARN if
the contents are not properly null terminated.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-09-16 14:12:11 +02:00
Miklos Szeredi
c568d68341 locks: fix file locking on overlayfs
This patch allows flock, posix locks, ofd locks and leases to work
correctly on overlayfs.

Instead of using the underlying inode for storing lock context use the
overlay inode.  This allows locks to be persistent across copy-up.

This is done by introducing locks_inode() helper and using it instead of
file_inode() to get the inode in locking code.  For non-overlayfs the two
are equivalent, except for an extra pointer dereference in locks_inode().

Since lock operations are in "struct file_operations" we must also make
sure not to call underlying filesystem's lock operations.  Introcude a
super block flag MS_NOREMOTELOCK to this effect.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
2016-09-16 12:44:20 +02:00
Miklos Szeredi
e1ff3dd1ae ovl: fix workdir creation
Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*).  Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.

Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: c11b9fdd6a ("ovl: remove posix_acl_default from workdir")
Cc: <stable@vger.kernel.org>
2016-09-05 13:55:20 +02:00
Miklos Szeredi
7cb35119d0 ovl: listxattr: use strnlen()
Be defensive about what underlying fs provides us in the returned xattr
list buffer.  If it's not properly null terminated, bail out with a warning
insead of BUG.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-09-01 11:12:00 +02:00
Andreas Gruenbacher
0eb45fc3bb ovl: Switch to generic_getxattr
Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:12:00 +02:00
Miklos Szeredi
ce31513a91 ovl: copyattr after setting POSIX ACL
Setting POSIX acl may also modify the file mode, so need to copy that up to
the overlay inode.

Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: d837a49bd5 ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:12:00 +02:00
Andreas Gruenbacher
0e585ccc13 ovl: Switch to generic_removexattr
Commit d837a49bd5 ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well.  As far as permission
checking goes, the same rules should apply in either case.

While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.

Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49bd5 ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:12:00 +02:00
Andreas Gruenbacher
0c97be22f9 ovl: Get rid of ovl_xattr_noacl_handlers array
Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do.  Flag the code
that is now only used conditionally with __maybe_unused.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Andreas Gruenbacher
fe2b759523 ovl: Fix OVL_XATTR_PREFIX
Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49bd5 ("ovl: fix POSIX ACL setting")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Colin Ian King
fd36570a88 ovl: fix spelling mistake: "directries" -> "directories"
Trivial fix to spelling mistake in pr_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Miklos Szeredi
2a3a2a3f35 ovl: don't cache acl on overlay layer
Some operations (setxattr/chmod) can make the cached acl stale.  We either
need to clear overlay's acl cache for the affected inode or prevent acl
caching on the overlay altogether.  Preventing caching has the following
advantages:

 - no double caching, less memory used

 - overlay cache doesn't go stale when fs clears it's own cache

Possible disadvantage is performance loss.  If that becomes a problem
get_acl() can be optimized for overlayfs.

This patch disables caching by pre setting i_*acl to a value that

  - has bit 0 set, so is_uncached_acl() will return true

  - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it

The constant -3 was chosen for this purpose.

Fixes: 39a25b2b37 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Miklos Szeredi
5201dc449e ovl: use cached acl on underlying layer
Instead of calling ->get_acl() directly, use get_acl() to get the cached
value.

We will have the acl cached on the underlying inode anyway, because we do
permission checking on the both the overlay and the underlying fs.

So, since we already have double caching, this improves performance without
any cost.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Miklos Szeredi
eea2fb4851 ovl: proper cleanup of workdir
When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.

Previously the mount code removed this directory if it already existed and
created a new one.  If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.

While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.

In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.

This patch implements cleaning out any files left in workdir.  It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-09-01 11:11:59 +02:00
Miklos Szeredi
c11b9fdd6a ovl: remove posix_acl_default from workdir
Clear out posix acl xattrs on workdir and also reset the mode after
creation so that an inherited sgid bit is cleared.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-09-01 11:11:59 +02:00
Miklos Szeredi
38b256973e ovl: handle umask and posix_acl_default correctly on creation
Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
create functions without masking against umask.

Another problem when creating over a whiteout is that the default posix acl
is not inherited from the parent dir (because the real parent dir at the
time of creation is the work directory).

Fix these problems by:

 a) If upper fs does not have MS_POSIXACL, then mask mode with umask.

 b) If creating over a whiteout, call posix_acl_create() to get the
 inherited acls.  After creation (but before moving to the final
 destination) set these acls on the created file.  posix_acl_create() also
 updates the file creation mode as appropriate.

Fixes: 39a25b2b37 ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01 11:11:59 +02:00
Vivek Goyal
2602625b7e security, overlayfs: Provide hook to correctly label newly created files
During a new file creation we need to make sure new file is created with the
right label. New file is created in upper/ so effectively file should get
label as if task had created file in upper/.

We switched to mounter's creds for actual file creation. Also if there is a
whiteout present, then file will be created in work/ dir first and then
renamed in upper. In none of the cases file will be labeled as we want it to
be.

This patch introduces a new hook dentry_create_files_as(), which determines
the label/context dentry will get if it had been created by task in upper
and modify passed set of creds appropriately. Caller makes use of these new
creds for file creation.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: fix whitespace issues found with checkpatch.pl]
[PM: changes to use stat->mode in ovl_create_or_link()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-08 20:46:46 -04:00
Vivek Goyal
121ab822ef security,overlayfs: Provide security hook for copy up of xattrs for overlay file
Provide a security hook which is called when xattrs of a file are being
copied up. This hook is called once for each xattr and LSM can return
0 if the security module wants the xattr to be copied up, 1 if the
security module wants the xattr to be discarded on the copy, -EOPNOTSUPP
if the security module does not handle/manage the xattr, or a -errno
upon an error.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup for checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-08 20:42:13 -04:00
Vivek Goyal
d8ad8b4961 security, overlayfs: provide copy up security hook for unioned files
Provide a security hook to label new file correctly when a file is copied
up from lower layer to upper layer of a overlay/union mount.

This hook can prepare a new set of creds which are suitable for new file
creation during copy up. Caller will use new creds to create file and then
revert back to old creds and release new creds.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: whitespace cleanup to appease checkpatch.pl]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-08 20:06:53 -04:00
Miklos Szeredi
0956254a2d ovl: don't copy up opaqueness
When a copy up of a directory occurs which has the opaque xattr set, the
xattr remains in the upper directory. The immediate behavior with overlayfs
is that the upper directory is not treated as opaque, however after a
remount the opaque flag is used and upper directory is treated as opaque.
This causes files created in the lower layer to be hidden when using
multiple lower directories.

Fix by not copying up the opaque flag.

To reproduce:

 ----8<---------8<---------8<---------8<---------8<---------8<----
mkdir -p l/d/s u v w mnt
mount -t overlay overlay -olowerdir=l,upperdir=u,workdir=w mnt
rm -rf mnt/d/
mkdir -p mnt/d/n
umount mnt
mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt
touch mnt/d/foo
umount mnt
mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt
ls mnt/d
 ----8<---------8<---------8<---------8<---------8<---------8<----
 
output should be:  "foo  n"

Reported-by: Derek McGowan <dmcg@drizz.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=151291
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-08-08 15:08:49 +02:00
Miklos Szeredi
30c17ebfb2 ovl: simplify empty checking
The empty checking logic is duplicated in ovl_check_empty_and_clear() and
ovl_remove_and_whiteout(), except the condition for clearing whiteouts is
different:

ovl_check_empty_and_clear() checked for being upper

ovl_remove_and_whiteout() checked for merge OR lower

Move the intersection of those checks (upper AND merge) into
ovl_check_empty_and_clear() and simplify ovl_remove_and_whiteout().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:25 +02:00
Al Viro
29c42e80ba qstr: constify instances in overlayfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
dbc816d05d ovl: clear nlink on rmdir
To make delete notification work on fa/inotify.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
76bc8e2843 ovl: disallow overlayfs as upperdir
This does not work and does not make sense.  So instead of fixing it
(probably not hard) just disallow.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
656189d207 ovl: fix warning
There's a superfluous newline in the warning message in ovl_d_real().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Wei Yongjun
5f215013a9 ovl: remove duplicated include from super.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Vivek Goyal
500cac3cce ovl: append MAY_READ when diluting write checks
Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on
lower/. This is done as files on lower will never be written and will be
copied up. But to copy up a file, mounter should have MAY_READ permission
otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is
reset.

Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned
True (context mounts) but when he tried to actually write to file, it
failed as mounter did not have permission on lower file.

[SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this
won't trigger a copy-up.

Reported-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Vivek Goyal
e29841a0ab ovl: dilute permission checks on lower only if not special file
Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from
mask as lower/ will never be written and file will be copied up. But this
is not true for special files. These files are not copied up and are opened
in place. So don't dilute the checks for these types of files.

Reported-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
d837a49bd5 ovl: fix POSIX ACL setting
Setting POSIX ACL needs special handling:

1) Some permission checks are done by ->setxattr() which now uses mounter's
creds ("ovl: do operations on underlying file system in mounter's
context").  These permission checks need to be done with current cred as
well.

2) Setting ACL can fail for various reasons.  We do not need to copy up in
these cases.

In the mean time switch to using generic_setxattr.

[Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr()
doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is
disabled, and I could not come up with an obvious way to do it.

This instead avoids the link error by defining two sets of ACL operations
and letting the compiler drop one of the two at compile time depending
on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code,
also leading to smaller code.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
51f7e52dc9 ovl: share inode for hard link
Inode attributes are copied up to overlay inode (uid, gid, mode, atime,
mtime, ctime) so generic code using these fields works correcty.  If a hard
link is created in overlayfs separate inodes are allocated for each link.
If chmod/chown/etc. is performed on one of the links then the inode
belonging to the other ones won't be updated.

This patch attempts to fix this by sharing inodes for hard links.

Use inode hash (with real inode pointer as a key) to make sure overlay
inodes are shared for hard links on upper.  Hard links on lower are still
split (which is not user observable until the copy-up happens, see
Documentation/filesystems/overlayfs.txt under "Non-standard behavior").

The inode is only inserted in the hash if it is non-directoy and upper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
39b681f802 ovl: store real inode pointer in ->i_private
To get from overlay inode to real inode we currently use 'struct
ovl_entry', which has lifetime connected to overlay dentry.  This is okay,
since each overlay dentry had a new overlay inode allocated.

Following patch will break that assumption, so need to leave out ovl_entry.
This patch stores the real inode directly in i_private, with the lowest bit
used to indicate whether the inode is upper or lower.

Lifetime rules remain, using ovl_inode_real() must only be done while
caller holds ref on overlay dentry (and hence on real dentry), or within
RCU protected regions.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:24 +02:00
Miklos Szeredi
a999d7e161 ovl: permission: return ECHILD instead of ENOENT
The error is due to RCU and is temporary.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Miklos Szeredi
d719e8f268 ovl: update atime on upper
Fix atime update logic in overlayfs.

This patch adds an i_op->update_time() handler to overlayfs inodes.  This
forwards atime updates to the upper layer only.  No atime updates are done
on lower layers.

Remove implicit atime updates to underlying files and directories with
O_NOATIME.  Remove explicit atime update in ovl_readlink().

Clear atime related mnt flags from cloned upper mount.  This means atime
updates are controlled purely by overlayfs mount options.

Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> 
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Miklos Szeredi
bb0d2b8ad2 ovl: fix sgid on directory
When creating directory in workdir, the group/sgid inheritance from the
parent dir was omitted completely.  Fix this by calling inode_init_owner()
on overlay inode and using the resulting uid/gid/mode to create the file.

Unfortunately the sgid bit can be stripped off due to umask, so need to
reset the mode in this case in workdir before moving the directory in
place.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Miklos Szeredi
9c630ebefe ovl: simplify permission checking
The fact that we always do permission checking on the overlay inode and
clear MAY_WRITE for checking access to the lower inode allows cruft to be
removed from ovl_permission().

1) "default_permissions" option effectively did generic_permission() on the
overlay inode with i_mode, i_uid and i_gid updated from underlying
filesystem.  This is what we do by default now.  It did the update using
vfs_getattr() but that's only needed if the underlying filesystem can
change (which is not allowed).  We may later introduce a "paranoia_mode"
that verifies that mode/uid/gid are not changed.

2) splitting out the IS_RDONLY() check from inode_permission() also becomes
unnecessary once we remove the MAY_WRITE from the lower inode check.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Vivek Goyal
754f8cb72b ovl: do not require mounter to have MAY_WRITE on lower
Now we have two levels of checks in ovl_permission(). overlay inode
is checked with the creds of task while underlying inode is checked
with the creds of mounter.

Looks like mounter does not have to have WRITE access to files on lower/.
So remove the MAY_WRITE from access mask for checks on underlying
lower inode.

This means task should still have the MAY_WRITE permission on lower
inode and mounter is not required to have MAY_WRITE.

It also solves the problem of read only NFS mounts being used as lower.
If __inode_permission(lower_inode, MAY_WRITE) is called on read only
NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
read only NFS shold work with overlay without having to specify any
special mount options (default permission).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Vivek Goyal
1175b6b8d9 ovl: do operations on underlying file system in mounter's context
Given we are now doing checks both on overlay inode as well underlying
inode, we should be able to do checks and operations on underlying file
system using mounter's context.

So modify all operations to do checks/operations on underlying dentry/inode
in the context of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Vivek Goyal
c0ca3d70e8 ovl: modify ovl_permission() to do checks on two inodes
Right now ovl_permission() calls __inode_permission(realinode), to do
permission checks on real inode and no checks are done on overlay inode.

Modify it to do checks both on overlay inode as well as underlying inode.
Checks on overlay inode will be done with the creds of calling task while
checks on underlying inode will be done with the creds of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Vivek Goyal
39a25b2b37 ovl: define ->get_acl() for overlay inodes
Now we are planning to do DAC permission checks on overlay inode
itself. And to make it work, we will need to make sure we can get acls from
underlying inode. So define ->get_acl() for overlay inodes and this in turn
calls into underlying filesystem to get acls, if any.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Vivek Goyal
72e4848181 ovl: move some common code in a function
ovl_create_upper() and ovl_create_over_whiteout() seem to be sharing some
common code which can be moved into a separate function.  No functionality
change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:23 +02:00
Andreas Gruenbacher
58ed4e70f2 ovl: store ovl_entry in inode->i_private for all inodes
Previously this was only done for directory inodes.  Doing so for all
inodes makes for a nice cleanup in ovl_permission at zero cost.

Inodes are not shared for hard links on the overlay, so this works fine.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:22 +02:00
Miklos Szeredi
eead4f2dc4 ovl: use generic_delete_inode
No point in keeping overlay inodes around since they will never be reused.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-29 12:05:22 +02:00
Miklos Szeredi
c1b2cc1a76 ovl: check mounter creds on underlying lookup
The hash salting changes meant that we can no longer reuse the hash in the
overlay dentry to look up the underlying dentry.

Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to
mounter's creds (like we do for all other operations later in the series).

Now the lookup_hash() export introduced in 4.6 by 3c9fe8cdff ("vfs: add
lookup_hash() helper") is unused and can possibly be removed; its
usefulness negated by the hash salting and the idea that mounter's creds
should be used on operations on underlying filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8387ff2577 ("vfs: make the string hashes salt the hash")
2016-07-29 12:05:22 +02:00
Miklos Szeredi
1b91dbdd29 Merge branch 'd_real' into overlayfs-next 2016-07-27 11:36:03 +02:00
Maxim Patlasov
cfc9fde0b0 ovl: verify upper dentry in ovl_remove_and_whiteout()
The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.

To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.

Essentially, it is the same problem and similar solution as in
commit 11f3710417 ("ovl: verify upper dentry before unlink and rename").

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-07-22 10:54:20 +02:00
Vivek Goyal
07a2daab49 ovl: Copy up underlying inode's ->i_mode to overlay inode
Right now when a new overlay inode is created, we initialize overlay
inode's ->i_mode from underlying inode ->i_mode but we retain only
file type bits (S_IFMT) and discard permission bits.

This patch changes it and retains permission bits too. This should allow
overlay to do permission checks on overlay inode itself in task context.

[SzM] It also fixes clearing suid/sgid bits on write.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
2016-07-04 16:49:48 +02:00
Miklos Szeredi
b99c2d9138 ovl: handle ATTR_KILL*
Before 4bacc9c923 ("overlayfs: Make f_path...") file->f_path pointed to
the underlying file, hence suid/sgid removal on write worked fine.

After that patch file->f_path pointed to the overlay file, and the file
mode bits weren't copied to overlay_inode->i_mode.  So the suid/sgid
removal simply stopped working.

The fix is to copy the mode bits, but then ovl_setattr() needs to clear
ATTR_MODE to avoid the BUG() in notify_change().  So do this first, then in
the next patch copy the mode.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
2016-07-04 16:49:48 +02:00
Vivek Goyal
e7c0b5991d ovl: warn instead of error if d_type is not supported
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.

But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.

https://github.com/docker/docker/issues/22937#issuecomment-229881315

So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
2016-07-03 09:39:31 +02:00
Miklos Szeredi
2d902671ce vfs: merge .d_select_inode() into .d_real()
The two methods essentially do the same: find the real dentry/inode
belonging to an overlay dentry.  The difference is in the usage:

vfs_open() uses ->d_select_inode() and expects the function to perform
copy-up if necessary based on the open flags argument.

file_dentry() uses ->d_real() passing in the overlay dentry as well as the
underlying inode.

vfs_rename() uses ->d_select_inode() but passes zero flags.  ->d_real()
with a zero inode would have worked just as well here.

This patch merges the functionality of ->d_select_inode() into ->d_real()
by adding an 'open_flags' argument to the latter.

[Al Viro] Make the signature of d_real() match that of ->d_real() again.
And constify the inode argument, while we are at it.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-06-30 08:53:27 +02:00
Miklos Szeredi
03bea60409 ovl: get_write_access() in truncate
When truncating a file we should check write access on the underlying
inode.  And we should do so on the lower file as well (before copy-up) for
consistency.

Original patch and test case by Aihua Zhang.

 - - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	int ret;

	ret = truncate(argv[0], 4096);
	if (ret != -1) {
		fprintf(stderr, "truncate(argv[0]) should have failed\n");
		return 1;
	}
	if (errno != ETXTBSY) {
		perror("truncate(argv[0])");
		return 1;
	}

	return 0;
}
 - - >o >o - - >o >o - - >o >o - -

Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-06-29 16:03:55 +02:00
Miklos Szeredi
a4859d7594 ovl: fix dentry leak for default_permissions
When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
2016-06-29 08:26:59 +02:00
Miklos Szeredi
d0e13f5bbe ovl: fix uid/gid when creating over whiteout
Fix a regression when creating a file over a whiteout.  The new
file/directory needs to use the current fsuid/fsgid, not the ones from the
mounter's credentials.

The refcounting is a bit tricky: prepare_creds() sets an original refcount,
override_creds() gets one more, which revert_cred() drops.  So

  1) we need to expicitly put the mounter's credentials when overriding
     with the updated one

  2) we need to put the original ref to the updated creds (and this can
     safely be done before revert_creds(), since we'll still have the ref
     from override_creds()).

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Fixes: 3fe6e52f06 ("ovl: override creds with the ones from the superblock mounter")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-06-15 14:18:59 +02:00
Miklos Szeredi
b581755b1c ovl: xattr filter fix
a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers
overlaid, all of which (except the lowest one) honouring the
"trusted.overlay.opaque" xattr.  So need to filter everything except the
bottom and the pure-upper layer.

b) we no longer can assume that inode is attached to dentry in
get/setxattr.

This patch unconditionally filters private xattrs to fix both of the above.
Performance impact for get/removexattrs is likely in the noise.

For listxattrs it might be measurable in pathological cases, but I very
much hope nobody cares.  If they do, we'll fix it then.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: b96809173e ("security_d_instantiate(): move to the point prior to attaching dentry to inode")
2016-06-06 16:21:37 +02:00
Linus Torvalds
d102a56edb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Followups to the parallel lookup work:

   - update docs

   - restore killability of the places that used to take ->i_mutex
     killably now that we have down_write_killable() merged

   - Additionally, it turns out that I missed a prerequisite for
     security_d_instantiate() stuff - ->getxattr() wasn't the only thing
     that could be called before dentry is attached to inode; with smack
     we needed the same treatment applied to ->setxattr() as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch ->setxattr() to passing dentry and inode separately
  switch xattr_handler->set() to passing dentry and inode separately
  restore killability of old mutex_lock_killable(&inode->i_mutex) users
  add down_write_killable_nested()
  update D/f/directory-locking
2016-05-27 17:14:05 -07:00
Al Viro
3767e255b3 switch ->setxattr() to passing dentry and inode separately
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.

Similar change for ->getxattr() had been done in commit ce23e64.  Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.

Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-27 20:09:16 -04:00
Linus Torvalds
0121a32201 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
 "The meat of this is a change to use the mounter's credentials for
  operations that require elevated privileges (such as whiteout
  creation).  This fixes behavior under user namespaces as well as being
  a nice cleanup"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: Do d_type check only if work dir creation was successful
  ovl: update documentation
  ovl: override creds with the ones from the superblock mounter
2016-05-27 16:44:39 -07:00
Vivek Goyal
21765194ce ovl: Do d_type check only if work dir creation was successful
d_type check requires successful creation of workdir as iterates
through work dir and expects work dir to be present in it. If that's
not the case, this check will always return d_type not supported even
if underlying filesystem might be supporting it.

So don't do this check if work dir creation failed in previous step.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-27 10:18:56 +02:00
Antonio Murdaca
3fe6e52f06 ovl: override creds with the ones from the superblock mounter
In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.

A simple reproducer:

$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash

Now as root in the user namespace:

\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*

This ends up failing with -EPERM after the files in dir has been
correctly deleted:

unlinkat(4, "2", 0)                     = 0
unlinkat(4, "1", 0)                     = 0
unlinkat(4, "3", 0)                     = 0
close(4)                                = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)

Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.

This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace.  The old cap_raise game is removed in favor of just overriding
the old cred struct.

This patch also drops from ovl_copy_up_one() the following two lines:

override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;

This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-27 08:55:26 +02:00
Al Viro
002354112f restore killability of old mutex_lock_killable(&inode->i_mutex) users
The ones that are taking it exclusive, that is...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-26 00:13:25 -04:00
Al Viro
0e0162bb8c Merge branch 'ovl-fixes' into for-linus
Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.
2016-05-17 02:17:59 -04:00
Miklos Szeredi
38b78a5f18 ovl: ignore permissions on underlying lookup
Generally permission checking is not necessary when overlayfs looks up a
dentry on one of the underlying layers, since search permission on base
directory was already checked in ovl_permission().

More specifically using lookup_one_len() causes a problem when the lower
directory lacks search permission for a specific user while the upper
directory does have search permission.  Since lookups are cached, this
causes inconsistency in behavior: success depends on who did the first
lookup.

So instead use lookup_hash() which doesn't do the permission check.

Reported-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-10 23:58:18 -04:00
Al Viro
9902af79c0 parallel lookups: actual switch to rwsem
ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:28 -04:00
Al Viro
b9e1d435fd ovl_lookup_real(): use lookup_one_len_unlocked()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:24 -04:00
Al Viro
84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
Al Viro
ce23e64013 ->getxattr(): pass dentry and inode as separate arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-11 00:48:00 -04:00
Miklos Szeredi
d101a12595 fs: add file_dentry()
This series fixes bugs in nfs and ext4 due to 4bacc9c923 ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").

Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.

This confuses filesystems which get the dentry from struct file and assume
it's theirs.

Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file.  This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).

In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).

The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.

[*] If possible, it's better simply to use file_inode() instead.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: <stable@vger.kernel.org> # v4.2
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Axtens <dja@axtens.net>
2016-03-26 16:14:37 -04:00
Miklos Szeredi
6986c012fa ovl: cleanup unused var in rename2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:46 +01:00
Miklos Szeredi
56656e960b ovl: rename is_merge to is_lowest
The 'is_merge' is an historical naming from when only a single lower layer
could exist.  With the introduction of multiple lower layers the meaning of
this flag was changed to mean only the "lowest layer" (while all lower
layers were being merged).

So now 'is_merge' is inaccurate and hence renaming to 'is_lowest'

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:46 +01:00
Sohom Bhattacharjee
f134f24465 ovl: fixed coding style warning
This patch fixes a newline warning found by the checkpatch.pl tool

Signed-off-by: Sohom-Bhattacharjee <soham.bhattacharjee15@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Vivek Goyal
45aebeaf4f ovl: Ensure upper filesystem supports d_type
In some instances xfs has been created with ftype=0 and there if a file
on lower fs is removed, overlay leaves a whiteout in upper fs but that
whiteout does not get filtered out and is visible to overlayfs users.

And reason it does not get filtered out because upper filesystem does
not report file type of whiteout as DT_CHR during iterate_dir().

So it seems to be a requirement that upper filesystem support d_type for
overlayfs to work properly. Do this check during mount and fail if d_type
is not supported.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
David Howells
fb5bb2c3b7 ovl: Warn on copy up if a process has a R/O fd open to the lower file
Print a warning when overlayfs copies up a file if the process that
triggered the copy up has a R/O fd open to the lower file being copied up.

This can help catch applications that do things like the following:

	fd1 = open("foo", O_RDONLY);
	fd2 = open("foo", O_RDWR);

where they expect fd1 and fd2 to refer to the same file - which will no
longer be the case post-copy up.

With this patch, the following commands:

	bash 5</mnt/a/foo128
	6<>/mnt/a/foo128

assuming /mnt/a/foo128 to be an un-copied up file on an overlay will
produce the following warning in the kernel log:

	overlayfs: Copying up foo129, but open R/O on fd 5 which will cease
	to be coherent [pid=3818 bash]

This is enabled by setting:

	/sys/module/overlay/parameters/check_copy_up

to 1.

The warnings are ratelimited and are also limited to one warning per file -
assuming the copy up completes in each case.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Konstantin Khlebnikov
07f2af7bfd ovl: honor flag MS_SILENT at mount
This patch hides error about missing lowerdir if MS_SILENT is set.

We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of
overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel
automatically loads module and returns -EINVAL because lowerdir is missing.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Miklos Szeredi
11f3710417 ovl: verify upper dentry before unlink and rename
Unlink and rename in overlayfs checked the upper dentry for staleness by
verifying upper->d_parent against upperdir.  However the dentry can go
stale also by being unhashed, for example.

Expand the verification to actually look up the name again (under parent
lock) and check if it matches the upper dentry.  This matches what the VFS
does before passing the dentry to filesytem's unlink/rename methods, which
excludes any inconsistency caused by overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:44 +01:00
Konstantin Khlebnikov
b81de061fa ovl: copy new uid/gid into overlayfs runtime inode
Overlayfs must update uid/gid after chown, otherwise functions
like inode_owner_or_capable() will check user against stale uid.
Catched by xfstests generic/087, it chowns file and calls utimes.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
2016-03-03 17:17:46 +01:00
Konstantin Khlebnikov
45d1173896 ovl: ignore lower entries when checking purity of non-directory entries
After rename file dentry still holds reference to lower dentry from
previous location. This doesn't matter for data access because data comes
from upper dentry. But this stale lower dentry taints dentry at new
location and turns it into non-pure upper. Such file leaves visible
whiteout entry after remove in directory which shouldn't have whiteouts at
all.

Overlayfs already tracks pureness of file location in oe->opaque.  This
patch just uses that for detecting actual path type.

Comment from Vivek Goyal's patch:

Here are the details of the problem. Do following.

$ mkdir upper lower work merged upper/dir/
$ touch lower/test
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=
work merged
$ mv merged/test merged/dir/
$ rm merged/dir/test
$ ls -l merged/dir/
/usr/bin/ls: cannot access merged/dir/test: No such file or directory
total 0
c????????? ? ? ? ?            ? test

Basic problem seems to be that once a file has been unlinked, a whiteout
has been left behind which was not needed and hence it becomes visible.

Whiteout is visible because parent dir is of not type MERGE, hence
od->is_real is set during ovl_dir_open(). And that means ovl_iterate()
passes on iterate handling directly to underlying fs. Underlying fs does
not know/filter whiteouts so it becomes visible to user.

Why did we leave a whiteout to begin with when we should not have.
ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave
whiteout if file is pure upper. In this case file is not found to be pure
upper hence whiteout is left.

So why file was not PURE_UPPER in this case? I think because dentry is
still carrying some leftover state which was valid before rename. For
example, od->numlower was set to 1 as it was a lower file. After rename,
this state is not valid anymore as there is no such file in lower.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Viktor Stanchev <me@viktorstanchev.com>
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
2016-03-03 17:17:45 +01:00
Rui Wang
ce9113bbcb ovl: fix getcwd() failure after unsuccessful rmdir
ovl_remove_upper() should do d_drop() only after it successfully
removes the dir, otherwise a subsequent getcwd() system call will
fail, breaking userspace programs.

This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
2016-03-03 17:17:45 +01:00
Konstantin Khlebnikov
b5891cfab0 ovl: fix working on distributed fs as lower layer
This adds missing .d_select_inode into alternative dentry_operations.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 7c03b5d45b ("ovl: allow distributed fs as lower layer")
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Reviewed-by: Nikolay Borisov <kernel@kyup.com>
Tested-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org> # 4.2+
2016-03-03 17:17:45 +01:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Linus Torvalds
eae21770b4 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "I'm pretty much done for -rc1 now:

   - the rest of MM, basically

   - lib/ updates

   - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

   - cpu_mask simplifications

   - kexec, rapidio, MAINTAINERS etc, etc.

   - more dma-mapping cleanups/simplifications from hch"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  MAINTAINERS: add/fix git URLs for various subsystems
  mm: memcontrol: add "sock" to cgroup2 memory.stat
  mm: memcontrol: basic memory statistics in cgroup2 memory controller
  mm: memcontrol: do not uncharge old page in page cache replacement
  Documentation: cgroup: add memory.swap.{current,max} description
  mm: free swap cache aggressively if memcg swap is full
  mm: vmscan: do not scan anon pages if memcg swap limit is hit
  swap.h: move memcg related stuff to the end of the file
  mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
  mm: vmscan: pass memcg to get_scan_count()
  mm: memcontrol: charge swap to cgroup2
  mm: memcontrol: clean up alloc, online, offline, free functions
  mm: memcontrol: flatten struct cg_proto
  mm: memcontrol: rein in the CONFIG space madness
  net: drop tcp_memcontrol.c
  mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
  mm: memcontrol: allow to disable kmem accounting for cgroup2
  mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
  mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
  mm: memcontrol: separate kmem code from legacy tcp accounting code
  ...
2016-01-21 12:32:08 -08:00
Linus Torvalds
e9f57ebcba Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This contains several bug fixes and a new mount option
  'default_permissions' that allows read-only exported NFS
  filesystems to be used as lower layer"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: check dentry positiveness in ovl_cleanup_whiteouts()
  ovl: setattr: check permissions before copy-up
  ovl: root: copy attr
  ovl: move super block magic number to magic.h
  ovl: use a minimal buffer in ovl_copy_xattr
  ovl: allow zero size xattr
  ovl: default permissions
2016-01-21 12:20:46 -08:00
Andrew Morton
e458bcd16f fs/overlayfs/super.c needs pagemap.h
i386 allmodconfig:

  In file included from fs/overlayfs/super.c:10:0:
  fs/overlayfs/super.c: In function 'ovl_fill_super':
  include/linux/fs.h:898:36: error: 'PAGE_CACHE_SIZE' undeclared (first use in this function)
   #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
                                      ^
  fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE'
    sb->s_maxbytes = MAX_LFS_FILESIZE;
                     ^
  include/linux/fs.h:898:36: note: each undeclared identifier is reported only once for each function it appears in
   #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
                                      ^
  fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE'
    sb->s_maxbytes = MAX_LFS_FILESIZE;
                     ^

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Konstantin Khlebnikov
84889d4933 ovl: check dentry positiveness in ovl_cleanup_whiteouts()
This patch fixes kernel crash at removing directory which contains
whiteouts from lower layers.

Cache of directory content passed as "list" contains entries from all
layers, including whiteouts from lower layers. So, lookup in upper dir
(moved into work at this stage) will return negative entry. Plus this
cache is filled long before and we can race with external removal.

Example:
 mkdir -p lower0/dir lower1/dir upper work overlay
 touch lower0/dir/a lower0/dir/b
 mknod lower1/dir/a c 0 0
 mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work
 rm -fr overlay/dir

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org> # 3.18+
2015-12-11 16:30:49 +01:00