Commit Graph

45030 Commits

Author SHA1 Message Date
Trond Myklebust
79566ef018 NFS: Getattr doesn't require data sync semantics
When retrieving stat() information, NFS unfortunately does require us to
sync writes to disk in order to ensure that mtime and ctime are up to
date. However we shouldn't have to ensure that those writes are persisted.

Relaxing that requirement does mean that we may see an mtime/ctime change
if the server reboots and forces us to replay all writes.

The exception to this rule are pNFS clients that are required to send
layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
in _nfs_revalidate_inode().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:06 -04:00
Trond Myklebust
651b0e7029 NFS: Do not aggressively cache file attributes in the case of O_DIRECT
A file that is open for O_DIRECT is by definition not obeying
close-to-open cache consistency semantics, so let's not cache
the attributes too aggressively either.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:06 -04:00
Trond Myklebust
be527494e0 NFS: Remove unused function nfs_revalidate_mapping_protected()
Clean up...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:05 -04:00
Trond Myklebust
f508d46ae4 NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin()
We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:05 -04:00
Trond Myklebust
f7b5c340ac NFS: Cleanup nfs_direct_complete()
There is only one caller that sets the "write" argument to true,
so just move the call to nfs_zap_mapping() and get rid of the
now redundant argument.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:04 -04:00
Trond Myklebust
a5864c999d NFS: Do not serialise O_DIRECT reads and writes
Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:04 -04:00
Trond Myklebust
18290650b1 NFS: Move buffered I/O locking into nfs_file_write()
Preparation for the patch that de-serialises O_DIRECT reads and
writes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:03 -04:00
Trond Myklebust
89698b24d2 NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:03 -04:00
Trond Myklebust
2f3c7d87a3 NFS: Remove racy size manipulations in O_DIRECT
On success, the RPC callbacks will ensure that we make the appropriate calls
to nfs_writeback_update_inode()

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:02 -04:00
Trond Myklebust
a5314a7492 NFS: Ensure we reset the write verifier 'committed' value on resend.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:02 -04:00
Trond Myklebust
8fc3c38627 NFS: Fix O_DIRECT verifier problems
We should not be interested in looking at the value of the stable field,
since that could take any value.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:01 -04:00
Trond Myklebust
6712007734 pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1
Cleanup...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:01 -04:00
Trond Myklebust
ac46bd374c pNFS: Ensure we layoutcommit before revalidating attributes
If we need to update the cached attributes, then we'd better make
sure that we also layoutcommit first. Otherwise, the server may have stale
attributes.

Prior to this patch, the revalidation code tried to "fix" this problem by
simply disabling attributes that would be affected by the layoutcommit.
That approach breaks nfs_writeback_check_extend(), leading to a file size
corruption.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:08:27 -04:00
Trond Myklebust
2e18d4d822 pNFS: Files and flexfiles always need to commit before layoutcommit
So ensure that we mark the layout for commit once the write is done,
and then ensure that the commit to ds is finished before sending
layoutcommit.

Note that by doing this, we're able to optimise away the commit
for the case of servers that don't need layoutcommit in order to
return updated attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:08:01 -04:00
Trond Myklebust
bc28e1c2e3 pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()
Let's just have one place where we check ff_layout_need_layoutcommit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:26 -04:00
Trond Myklebust
c001c87a63 pNFS/flexfiles: Fix layoutcommit after a commit to DS
We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:26 -04:00
Trond Myklebust
73e6c5d854 pNFS/files: Fix layoutcommit after a commit to DS
According to the errata
https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
we should always send layout commit after a commit to DS.

Fixes: bc7d4b8fd0 ("nfs/filelayout: set layoutcommit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 18:52:25 -04:00
Linus Torvalds
0b295dd5b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "This makes sure userspace filesystems are not broken by the parallel
  lookups and readdir feature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: serialize dirops by default
2016-07-03 12:02:00 -07:00
Linus Torvalds
236bfd8ed8 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This contains fixes for a dentry leak, a regression in 4.6 noticed by
  Docker users and missing write access checking in truncate"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: warn instead of error if d_type is not supported
  ovl: get_write_access() in truncate
  ovl: fix dentry leak for default_permissions
2016-07-03 11:57:09 -07:00
Vivek Goyal
e7c0b5991d ovl: warn instead of error if d_type is not supported
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.

But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.

https://github.com/docker/docker/issues/22937#issuecomment-229881315

So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
2016-07-03 09:39:31 +02:00
Linus Torvalds
48c4565ed6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Tmpfs readdir throughput regression fix (this cycle) + some -stable
  fodder all over the place.

  One missing bit is Miklos' tonight locks.c fix - NFS folks had already
  grabbed that one by the time I woke up ;-)"

[ The locks.c fix came through the nfsd tree just moments ago ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  namespace: update event counter when umounting a deleted dentry
  9p: use file_dentry()
  ceph: fix d_obtain_alias() misuses
  lockless next_positive()
  libfs.c: new helper - next_positive()
  dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
2016-07-01 15:20:11 -07:00
Linus Torvalds
2728c57fda One fix for lockd soft lookups in an error path, and one fix for file
leases on overlayfs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXdr1UAAoJECebzXlCjuG+QlsQAJiZTmio6k9tupN5+iKsZNL3
 m919ooj8GYsXxlC0OTFfi09dUi1yeF8MEE3j9egk/+qxEtsdmKdEOIy0RVdcLSfd
 HeGjXgLh79hVcGxgyBP+pdax2XhZ3RVisg8F5gTw2GPo+FPFZfrEuO5h7ctn+t45
 MCQ+4yqYqzEhYnoPyo5XKh5Aj6wBWiaTzg3/jSe6uSuSfuBfyaMaBPq7l7ayGra/
 5El+tu61o/SrJ41N2EayWSj/bOFJE92LIuGOh8NdfANuuP70JhxlwgVSldah3CCQ
 6PymXAVcjw0+gJ00mokzKfTJW5FPfasxckMHaOvcFsSONy4rlmrwwqUr9C2AFzTE
 wQGIzibCDYOSI8uF+//Oe+dh8JWp2TF8rfJcmKLyJMIcCq/Xl6cNx1qrq/oSWvuk
 WKOv1otJrPeT31h/s5iLjr/E/Po1eX+d2ySJdvUHYu/5aZwFgWPnVXwJk0s9bLow
 auZU85tPnuz+tbS2pWEK+el7LMgDBdzraVRogMdH1c+m3+G9pr53EzmpYovkZ2X8
 duVJ2Leslyya347TnJAgEY47Fbeu26JaoeIChGVhKcEyCENlcqAJWaGVECrxvs3y
 p/Y2MYMkO8YrCz5wQXPiLFiG4rAc+jSIn1Q+vRGl2Pkel0y7AgJNNMFtANjMCSIO
 pg6BqUjOyKt8cXVy4UW7
 =K0mr
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux

Pull lockd/locks fixes from Bruce Fields:
 "One fix for lockd soft lookups in an error path, and one fix for file
  leases on overlayfs"

* tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux:
  locks: use file_inode()
  lockd: unregister notifier blocks if the service fails to come up completely
2016-07-01 15:18:49 -07:00
Linus Torvalds
f3683ccd12 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "1/ Two regression fixes since v4.6: one for the byte order of a sysfs
     attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI
     Device Specific Method) implementation that gets tripped up by new
     auto-probing behavior in the NFIT driver.

  2/ A fix tagged for -stable that stops the kernel from
     clobbering/ignoring changes to the configuration of a 'pfn'
     instance ("struct page" driver).  For example changing the
     alignment from 2M to 1G may silently revert to 2M if that value is
     currently stored on media.

  3/ A fix from Eric for an xfstests failure in dax.  It is not
     currently tagged for -stable since it requires an 8-exabyte file
     system to trigger, and there appear to be no user visible side
     effects"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nfit: fix format interface code byte order
  dax: fix offset overflow in dax_io
  acpi, nfit: fix acpi_check_dsm() vs zero functions implemented
  libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
2016-07-01 15:15:03 -07:00
Miklos Szeredi
6343a21208 locks: use file_inode()
(Another one for the f_path debacle.)

ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode().  This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.

So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode.  When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.

Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-07-01 10:24:18 -04:00
Andrey Ulanov
e06b933e6d namespace: update event counter when umounting a deleted dentry
- m_start() in fs/namespace.c expects that ns->event is incremented each
  time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
  counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
  "event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
  longer belongs to ns->list (i.e. use after free which usually leads
  to infinite loop).

This change fixes the above problem by incrementing global event counter
before invoking umount_tree().

Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30 23:28:30 -04:00
Miklos Szeredi
b403f0e37a 9p: use file_dentry()
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry
can lead to a crash.  In this case it's a NULL pointer dereference in
p9_fid_create().

Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.

Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30 23:28:09 -04:00
Scott Mayhew
cb7d224f82 lockd: unregister notifier blocks if the service fails to come up completely
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0751ddf77b "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-30 16:35:07 -04:00
Tahsin Erdogan
7452495555 writeback: inode cgroup wb switch should not call ihold()
Asynchronous wb switching of inodes takes an additional ref count on an
inode to make sure inode remains valid until switchover is completed.

However, anyone calling ihold() must already have a ref count on inode,
but in this case inode->i_count may already be zero:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30
CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: writeback wb_workfn (flush-8:16)
 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000
 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2
 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003
Call Trace:
 [<ffffffff805990af>] dump_stack+0x4d/0x6e
 [<ffffffff80268702>] __warn+0xc2/0xe0
 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20
 [<ffffffff8035b4ab>] ihold+0x2b/0x30
 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180
 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0
 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530
 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0
 [<ffffffff8036a147>] wb_workfn+0xd7/0x280
 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0
 [<ffffffff8027bb09>] process_one_work+0x129/0x300
 [<ffffffff8027be06>] worker_thread+0x126/0x480
 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561
 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300
 [<ffffffff80280ff4>] kthread+0xc4/0xe0
 [<ffffffff80335578>] ? kfree+0xc8/0x100
 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40
 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70
---[ end trace aaefd2fd9f306bc4 ]---

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-30 13:58:41 -06:00
Trond Myklebust
8487c479e2 NFSv4: Allow retry of operations that used a returned delegation stateid
Fix up nfs4_do_handle_exception() so that it can check if the operation
that received the NFS4ERR_BAD_STATEID was using a defunct delegation.
Apply that to the case of SETATTR, which will currently return EIO
in some cases where this happens.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-30 15:29:57 -04:00
Trond Myklebust
ca857cc1d4 NFS/pnfs: Do not clobber existing pgio_done_cb in nfs4_proc_read_setup
If a pNFS client sets hdr->pgio_done_cb, then we should not overwrite that
in nfs4_proc_read_setup()

Fixes: 75bf47ebf6 ("pNFS/flexfile: Fix erroneous fall back to...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-30 15:29:57 -04:00
Trond Myklebust
5c6e5b60aa NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS
Chris Worley reports:
 RIP: 0010:[<ffffffffa0245f80>]  [<ffffffffa0245f80>] rpc_new_client+0x2a0/0x2e0 [sunrpc]
 RSP: 0018:ffff880158f6f548  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff880234f8bc00 RCX: 000000000000ea60
 RDX: 0000000000074cc0 RSI: 000000000000ea60 RDI: ffff880234f8bcf0
 RBP: ffff880158f6f588 R08: 000000000001ac80 R09: ffff880237003300
 R10: ffff880201171000 R11: ffffea0000d75200 R12: ffffffffa03afc60
 R13: ffff880230c18800 R14: 0000000000000000 R15: ffff880158f6f680
 FS:  00007f0e32673740(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000008 CR3: 0000000234886000 CR4: 00000000001406e0
 Stack:
  ffffffffa047a680 0000000000000000 ffff880158f6f598 ffff880158f6f680
  ffff880158f6f680 ffff880234d11d00 ffff88023357f800 ffff880158f6f7d0
  ffff880158f6f5b8 ffffffffa024660a ffff880158f6f5b8 ffffffffa02492ec
 Call Trace:
  [<ffffffffa024660a>] rpc_create_xprt+0x1a/0xb0 [sunrpc]
  [<ffffffffa02492ec>] ? xprt_create_transport+0x13c/0x240 [sunrpc]
  [<ffffffffa0246766>] rpc_create+0xc6/0x1a0 [sunrpc]
  [<ffffffffa038e695>] nfs_create_rpc_client+0xf5/0x140 [nfs]
  [<ffffffffa038f31a>] nfs_init_client+0x3a/0xd0 [nfs]
  [<ffffffffa038f22f>] nfs_get_client+0x25f/0x310 [nfs]
  [<ffffffffa025cef8>] ? rpc_ntop+0xe8/0x100 [sunrpc]
  [<ffffffffa047512c>] nfs3_set_ds_client+0xcc/0x100 [nfsv3]
  [<ffffffffa041fa10>] nfs4_pnfs_ds_connect+0x120/0x400 [nfsv4]
  [<ffffffffa03d41c7>] nfs4_ff_layout_prepare_ds+0xe7/0x330 [nfs_layout_flexfiles]
  [<ffffffffa03d1b1b>] ff_layout_pg_init_write+0xcb/0x280 [nfs_layout_flexfiles]
  [<ffffffffa03a14dc>] __nfs_pageio_add_request+0x12c/0x490 [nfs]
  [<ffffffffa03a1fa2>] nfs_pageio_add_request+0xc2/0x2a0 [nfs]
  [<ffffffffa03a0365>] ? nfs_pageio_init+0x75/0x120 [nfs]
  [<ffffffffa03a5b50>] nfs_do_writepage+0x120/0x270 [nfs]
  [<ffffffffa03a5d31>] nfs_writepage_locked+0x61/0xc0 [nfs]
  [<ffffffff813d4115>] ? __percpu_counter_add+0x55/0x70
  [<ffffffffa03a6a9f>] nfs_wb_single_page+0xef/0x1c0 [nfs]
  [<ffffffff811ca4a3>] ? __dec_zone_page_state+0x33/0x40
  [<ffffffffa0395b21>] nfs_launder_page+0x41/0x90 [nfs]
  [<ffffffff811baba0>] invalidate_inode_pages2_range+0x340/0x3a0
  [<ffffffff811bac17>] invalidate_inode_pages2+0x17/0x20
  [<ffffffffa039960e>] nfs_release+0x9e/0xb0 [nfs]
  [<ffffffffa0399570>] ? nfs_open+0x60/0x60 [nfs]
  [<ffffffffa0394dad>] nfs_file_release+0x3d/0x60 [nfs]
  [<ffffffff81226e6c>] __fput+0xdc/0x1e0
  [<ffffffff81226fbe>] ____fput+0xe/0x10
  [<ffffffff810bf2e4>] task_work_run+0xc4/0xe0
  [<ffffffff810a4188>] do_exit+0x2e8/0xb30
  [<ffffffff8102471c>] ? do_audit_syscall_entry+0x6c/0x70
  [<ffffffff811464e6>] ? __audit_syscall_exit+0x1e6/0x280
  [<ffffffff810a4a5f>] do_group_exit+0x3f/0xa0
  [<ffffffff810a4ad4>] SyS_exit_group+0x14/0x20
  [<ffffffff8179b76e>] system_call_fastpath+0x12/0x71

Which seems to be due to a call to utsname() when in a task exit context
in order to determine the hostname to set in rpc_new_client().

In reality, what we want here is not the hostname of the current task, but
the hostname that was used to set up the metadata server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-30 15:29:56 -04:00
Miklos Szeredi
5c672ab3f0 fuse: serialize dirops by default
Negotiate with userspace filesystems whether they support parallel readdir
and lookup.  Disable parallelism by default for fear of breaking fuse
filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
2016-06-30 13:10:49 +02:00
Marek Vasut
f8608985f8 configfs: Remove ppos increment in configfs_write_bin_file
The simple_write_to_buffer() already increments the @ppos on success,
see fs/libfs.c simple_write_to_buffer() comment:

"
On success, the number of bytes written is returned and the offset @ppos
advanced by this number, or negative value is returned on error.
"

If the configfs_write_bin_file() is invoked with @count smaller than the
total length of the written binary file, it will be invoked multiple times.
Since configfs_write_bin_file() increments @ppos on success, after calling
simple_write_to_buffer(), the @ppos is incremented twice.

Subsequent invocation of configfs_write_bin_file() will result in the next
piece of data being written to the offset twice as long as the length of
the previous write, thus creating buffer with "holes" in it.

The simple testcase using DTO follows:
  $ mkdir /sys/kernel/config/device-tree/overlays/1
  $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo
Without this patch, the testcase will result in twice as big buffer in the
kernel, which is then passed to the cfs_overlay_item_dtbo_write() .

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
2016-06-30 11:28:55 +02:00
Linus Torvalds
e7bdea7750 NFS client bugfixes for Linux 4.7
Stable bugfixes:
 - Fix _cancel_empty_pagelist
 - Fix a double page unlock
 - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
 - Fix another OPEN_DOWNGRADE bug
 
 Other bugfixes:
 - Ensure we handle delegation errors in nfs4_proc_layoutget()
 - Layout stateids start out as being invalid
 - Add sparse lock annotations for pnfs_find_alloc_layout
 - Handle bad delegation stateids in nfs4_layoutget_handle_exception
 - Fix up O_DIRECT results
 - Fix potential use after free of state in nfs4_do_reclaim.
 - Mark the layout stateid invalid when all segments are removed
 - Don't let readdirplus revalidate an inode that was marked as stale
 - Fix potential race in nfs_fhget()
 - Fix an unused variable warning
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXdDWkAAoJENfLVL+wpUDrpsUP/1F2zu12r/Zkv3ZEFhShcpQc
 2N1TRD9X7Lruod2pUD95qqjjdw+/vu3LjcyJljrasRaJENijvZ2GQhKkB7xPODlu
 qxZcmnQsH+WmpmJKcqAByAW1czcNGMoMHnt4tV0gG21NH+XUb92fgn+aGeIJDVrK
 Hcd9d8TfnFWO70ZgTUXW/hv0CXwu4MEJhN2JfF4lolbxUkmjLHHLoxSDDm0AdXGC
 EE8f0V9/7xurvOeLe5bQOQXfZPedBydsLNXa1ZacMGKgmBUoRNxJ5yCpPUtcTVBx
 HwbiY+WDFQ7MdKTzUQqqbnrIqKw8Hu4SugIV/vHRqR+Lhc6u29YGOqdU4d2G8IKW
 Nh8MBqS+dDefCkL3TJoE7MpjhP3EOO6HXnv5FjMZLOuu2X2o+Sz3+DkhYCq6pj/g
 fFh480vZfZYaTsfDf1ttvVN8kIvQ+1Uk3LK6aC2EVwgPrv+0OIRu36F0JQQimxOp
 EbYDlhVk7mzH/ZQ31GmPPbSIk+3sm2V58lqXnUMovoqPFPiN3xZDuBcnlyFrrzaI
 NjOvsdVxkOdHWbYZyBQzj16Vo651EYbAAUEwsud70N8C3aCgkTxCZ30Q0v+KqqxU
 pP5kz3zYUdkXQeHxE6T0iXG9fGcv/nGS21hTfT01YJCK7v67K8TYRNMrOEVURVgk
 LSD/CZJXJHVJn1Vr4F7o
 =IGNO
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfixes:
   - Fix _cancel_empty_pagelist
   - Fix a double page unlock
   - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
   - Fix another OPEN_DOWNGRADE bug

  Other bugfixes:
   - Ensure we handle delegation errors in nfs4_proc_layoutget()
   - Layout stateids start out as being invalid
   - Add sparse lock annotations for pnfs_find_alloc_layout
   - Handle bad delegation stateids in nfs4_layoutget_handle_exception
   - Fix up O_DIRECT results
   - Fix potential use after free of state in nfs4_do_reclaim.
   - Mark the layout stateid invalid when all segments are removed
   - Don't let readdirplus revalidate an inode that was marked as stale
   - Fix potential race in nfs_fhget()
   - Fix an unused variable warning"

* tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix another OPEN_DOWNGRADE bug
  make nfs_atomic_open() call d_drop() on all ->open_context() errors.
  NFS: Fix an unused variable warning
  NFS: Fix potential race in nfs_fhget()
  NFS: Don't let readdirplus revalidate an inode that was marked as stale
  NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
  NFS: Fix a double page unlock
  pnfs_nfs: fix _cancel_empty_pagelist
  nfs4: Fix potential use after free of state in nfs4_do_reclaim.
  NFS: Fix up O_DIRECT results
  NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
  NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
  NFSv4.1/pnfs: Layout stateids start out as being invalid
  NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
2016-06-29 15:30:26 -07:00
Miklos Szeredi
03bea60409 ovl: get_write_access() in truncate
When truncating a file we should check write access on the underlying
inode.  And we should do so on the lower file as well (before copy-up) for
consistency.

Original patch and test case by Aihua Zhang.

 - - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	int ret;

	ret = truncate(argv[0], 4096);
	if (ret != -1) {
		fprintf(stderr, "truncate(argv[0]) should have failed\n");
		return 1;
	}
	if (errno != ETXTBSY) {
		perror("truncate(argv[0])");
		return 1;
	}

	return 0;
}
 - - >o >o - - >o >o - - >o >o - -

Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-06-29 16:03:55 +02:00
Miklos Szeredi
a4859d7594 ovl: fix dentry leak for default_permissions
When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
2016-06-29 08:26:59 +02:00
Trond Myklebust
e547f26283 NFS: Fix another OPEN_DOWNGRADE bug
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.

	fd0 = open(foo, RDRW)   -- should be open on the wire for "both"
	fd1 = open(foo, RDONLY)  -- should be open on the wire for "read"
	close(fd0) -- should trigger an open_downgrade
	read(fd1)
	close(fd1)

The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-28 16:55:34 -04:00
Eric Sandeen
023954351f dax: fix offset overflow in dax_io
This isn't functionally apparent for some reason, but
when we test io at extreme offsets at the end of the loff_t
rang, such as in fstests xfs/071, the calculation of
"max" in dax_io() can be wrong due to pos + size overflowing.

For example,

# xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file

enters dax_io with:

start 0x7ffffffffffff000
end   0x7ffffffffffff200

and the rounded up "size" variable is 0x1000.  This yields:

pos + size 0x8000000000000000 (overflows loff_t)
       end 0x7ffffffffffff200

Due to the overflow, the min() function picks the wrong
value for the "max" variable, and when we send (max - pos)
into i.e. copy_from_iter_pmem() it is also the wrong value.

This somehow(tm) gets magically absorbed without incident,
probably because iter->count is correct.  But it seems best
to fix it up properly by comparing the two values as
unsigned.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-27 12:18:44 -07:00
Linus Torvalds
fbe601f7a3 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Various small cifs/smb3 fixes, include some for stable, and some from
  the recent SMB3 test event"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  File names with trailing period or space need special case conversion
  Fix reconnect to not defer smb3 session reconnect long after socket reconnect
  cifs: check hash calculating succeeded
  cifs: dynamic allocation of ntlmssp blob
  cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name
  cifs: stuff the fl_owner into "pid" field in the lock request
2016-06-27 11:23:44 -07:00
Al Viro
d20cb71dbf make nfs_atomic_open() call d_drop() on all ->open_context() errors.
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed.  It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones.  Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors.  As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup().  On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).

Cc: stable@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-27 08:59:08 -04:00
Linus Torvalds
da2f6aba4a Merge branch 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes part 2 from Chris Mason:
 "This has one patch from Omar to bring iterate_shared back to btrfs.

  We have a tree of work we queue up for directory items and it doesn't
  lend itself well to shared access.  While we're cleaning it up, Omar
  has changed things to use an exclusive lock when there are delayed
  items"

* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
2016-06-25 08:53:38 -07:00
Linus Torvalds
b971712afc Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I have a two part pull this time because one of the patches Dave
  Sterba collected needed to be against v4.7-rc2 or higher (we used
  rc4).  I try to make my for-linus-xx branch testable on top of the
  last major so we can hand fixes to people on the list more easily, so
  I've split this pull in two.

  This first part has some fixes and two performance improvements that
  we've been testing for some time.

  Josef's two performance fixes are most notable.  The transid tracking
  patch makes a big improvement on pretty much every workload"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: Force stripesize to the value of sectorsize
  btrfs: fix disk_i_size update bug when fallocate() fails
  Btrfs: fix error handling in map_private_extent_buffer
  Btrfs: fix error return code in btrfs_init_test_fs()
  Btrfs: don't do nocow check unless we have to
  btrfs: fix deadlock in delayed_ref_async_start
  Btrfs: track transid for delayed ref flushing
2016-06-25 08:42:31 -07:00
Omar Sandoval
02dbfc99b4 Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Commit fe742fd4f9 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.

This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.

Tested with xfstests and by doing a parallel kernel build:

	while make tinyconfig && make -j4 && git clean dqfx; do
		:
	done

along with a bunch of parallel finds in another shell:

	while true; do
		for ((i=0; i<4; i++)); do
			find . >/dev/null &
		done
		wait
	done

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-25 06:20:10 -07:00
Al Viro
b42b90d177 ceph: fix d_obtain_alias() misuses
on failure d_obtain_alias() will have done iput()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-24 23:49:03 -04:00
Linus Torvalds
086e3eb65e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "Two weeks worth of fixes here"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
  autofs: don't get stuck in a loop if vfs_write() returns an error
  mm/page_owner: avoid null pointer dereference
  tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
  fs/nilfs2: fix potential underflow in call to crc32_le
  oom, suspend: fix oom_reaper vs. oom_killer_disable race
  ocfs2: disable BUG assertions in reading blocks
  mm, compaction: abort free scanner if split fails
  mm: prevent KASAN false positives in kmemleak
  mm/hugetlb: clear compound_mapcount when freeing gigantic pages
  mm/swap.c: flush lru pvecs on compound page arrival
  memcg: css_alloc should return an ERR_PTR value on error
  memcg: mem_cgroup_migrate() may be called with irq disabled
  hugetlb: fix nr_pmds accounting with shared page tables
  Revert "mm: disable fault around on emulated access bit architecture"
  Revert "mm: make faultaround produce old ptes"
  mailmap: add Boris Brezillon's email
  mailmap: add Antoine Tenart's email
  mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
  mm: mempool: kasan: don't poot mempool objects in quarantine
  ...
2016-06-24 19:08:33 -07:00
Andrey Vagin
5a9294e5c5 autofs: don't get stuck in a loop if vfs_write() returns an error
__vfs_write() returns a negative value in a error case.

Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Torsten Hilbrich
63d2f95d63 fs/nilfs2: fix potential underflow in call to crc32_le
The value `bytes' comes from the filesystem which is about to be
mounted.  We cannot trust that the value is always in the range we
expect it to be.

Check its value before using it to calculate the length for the crc32_le
call.  It value must be larger (or equal) sumoff + 4.

This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1.  This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.

  BUG: unable to handle kernel paging request at ffff88021e600000
  IP:  crc32_le+0x36/0x100
  ...
  Call Trace:
    nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
    nilfs_load_super_block+0x142/0x300 [nilfs2]
    init_nilfs+0x60/0x390 [nilfs2]
    nilfs_mount+0x302/0x520 [nilfs2]
    mount_fs+0x38/0x160
    vfs_kern_mount+0x67/0x110
    do_mount+0x269/0xe00
    SyS_mount+0x9f/0x100
    entry_SYSCALL_64_fastpath+0x16/0x71

Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Gang He
7186ee06b6 ocfs2: disable BUG assertions in reading blocks
According to some high-load testing, these two BUG assertions were
encountered, this led system panic.  Actually, there were some
discussions about removing these two BUG() assertions, it would not
bring any side effect.

Then, I did the the following changes,

1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the
   ocfs2_read_blocks_sync function like before.

2) disable the macro CATCH_BH_JBD_RACES in Makefile by default.

Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
f2db19719a jbd2: get rid of superfluous __GFP_REPEAT
jbd2_alloc is explicit about its allocation preferences wrt.  the
allocation size.  Sub page allocations go to the slab allocator and
larger are using either the page allocator or vmalloc.  This is all good
but the logic is unnecessarily complex.

1) as per Ted, the vmalloc fallback is a left-over:

 : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so
 : the code path that calls vmalloc() should never get called.  When we
 : conveted jbd2_alloc() to suppor sub-page size allocations in commit
 : d2eecb0393, there was an assumption that it could be called with a size
 : greater than PAGE_SIZE, but that's certaily not true today.

Moreover vmalloc allocation might even lead to a deadlock because the
callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored
   since the flag was introduced.

Let's simplify the code flow and use the slab allocator for sub-page
requests and the page allocator for others.  Even though order > 0 is
not currently used as per above leave that option open.

Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Linus Torvalds
9c46a6df3b Fix missing server-side permission checks on setting NFS ACLs.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXba7TAAoJECebzXlCjuG+j+gP/18y6ot02Y5R2pI/O8nqoY3I
 WeBNOo1yD77wQ1SopiIbPL/ChxOh/OVlUzo9ikNtwm5l6Op8mLMxPYaDjaIpA5Nt
 FC/pAHibdTJA4ZjzenRhnEEFYbOQh0GssF/qMG30ySGPhx0eoonXi5/qYvjFyTBF
 BuDrpC4YHSNvqCZ/r0aD2bw79Skw8cBPdj+SUfK2r37WyuQ4Kade9NCmDYwSNxSx
 6cru5ztRQSE8Ni0le3U2wTlYhq8xrpP0bRdIzc/9EipdKVdsvfukonjnT+dwtDks
 72fwDoALAZq0iiIur7LKaUjkaZcKzHwe6LVsZEoiJ5aeI2a2FodLwoyXl4SntAR7
 027YEqe7Pc+KHGUYACVuNuCcJkEK5B3zRBBSNoskhkPaK/lJ7BMSXNNhIt248YE3
 HAl1vuf4PakCgh7qIsiUHB1EVs6FCcG8aKH1TmumvPD2udwabiYcKqd8soNu5ZWu
 ALi1vtD/8B1LEI8TacP5NIt8Pdr1AQ0kVDFWlZSiK3oE11DrHLiUgfvl2y7cokMa
 xzcNnoyEppaWNFJzYzQes8XO7Ti/DLJoCB8JnxMaWT1BfVhpEAs1LNl4AIHij5fO
 /PKNs4OusntvOmEvgKtxZpvqXaElgvXz7LMgzM2bmMGMVY+mq0+lpDbzAK91ijk0
 di8+ivIMayA60P5xV4dJ
 =TZ/R
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fix missing server-side permission checks on setting NFS ACLs"

* tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
2016-06-24 17:22:27 -07:00
Steve French
45e8a2583d File names with trailing period or space need special case conversion
POSIX allows files with trailing spaces or a trailing period but
SMB3 does not, so convert these using the normal Services For Mac
mapping as we do for other reserved characters such as
	: < > | ? *
This is similar to what Macs do for the same problem over SMB3.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
2016-06-24 12:05:52 -05:00
Steve French
4fcd1813e6 Fix reconnect to not defer smb3 session reconnect long after socket reconnect
Azure server blocks clients that open a socket and don't do anything on it.
In our reconnect scenarios, we can reconnect the tcp session and
detect the socket is available but we defer the negprot and SMB3 session
setup and tree connect reconnection until the next i/o is requested, but
this looks suspicous to some servers who expect SMB3 negprog and session
setup soon after a socket is created.

In the echo thread, reconnect SMB3 sessions and tree connections
that are disconnected.  A later patch will replay persistent (and
resilient) handle opens.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
2016-06-24 12:04:50 -05:00
Ben Hutchings
999653786d nfsd: check permissions when setting ACLs
Use set_posix_acl, which includes proper permission checks, instead of
calling ->set_acl directly.  Without this anyone may be able to grant
themselves permissions to a file by setting the ACL.

Lock the inode to make the new checks atomic with respect to set_acl.
(Also, nfsd was the only caller of set_acl not locking the inode, so I
suspect this may fix other races.)

This also simplifies the code, and ensures our ACLs are checked by
posix_acl_valid.

The permission checks and the inode locking were lost with commit
4ac7249e, which changed nfsd to use the set_acl inode operation directly
instead of going through xattr handlers.

Reported-by: David Sinquin <david@sinquin.eu>
[agreunba@redhat.com: use set_posix_acl]
Fixes: 4ac7249e
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:52 -04:00
Andreas Gruenbacher
485e71e8fb posix_acl: Add set_posix_acl
Factor out part of posix_acl_xattr_set into a common function that takes
a posix_acl, which nfsd can also call.

The prototype already exists in include/linux/posix_acl.h.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-06-24 12:11:34 -04:00
Trond Myklebust
1b982ea2ca NFS: Fix an unused variable warning
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
916ec34d0b NFS: Fix potential race in nfs_fhget()
If we don't set the mode correctly in nfs_init_locked(), then there is
potential for a race with a second call to nfs_fhget that will cause
inode aliasing.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
d8fdb47fae NFS: Don't let readdirplus revalidate an inode that was marked as stale
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
2d148c7e84 NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
According to RFC5661, section 12.5.3. the layout stateid is no longer
valid once the client no longer holds any layout segments. Ensure that
we mark it invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
cbebaf897e NFS: Fix a double page unlock
Since commit 0bcbf039f6, nfs_readpage_release() has been used to
unlock the page in the read code.

Fixes: 0bcbf039f6 ("nfs: handle request add failure properly")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Weston Andros Adamson
5e3a98883e pnfs_nfs: fix _cancel_empty_pagelist
pnfs_generic_commit_cancel_empty_pagelist calls nfs_commitdata_release,
but that is wrong: nfs_commitdata_release puts the open context, something
that isn't valid until nfs_init_commit is called, which is never the case
when pnfs_generic_commit_cancel_empty_pagelist is called.

This was introduced in "nfs: avoid race that crashes nfs_init_commit".

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Oleg Drokin
cea7f829d3 nfs4: Fix potential use after free of state in nfs4_do_reclaim.
Commit e8d975e73e ("fixing infinite OPEN loop in 4.0 stateid recovery")
introduced access to state after it was just potentially freed by
nfs4_put_open_state leading to a random data corruption somewhere.

BUG: unable to handle kernel paging request at ffff88004941ee40
IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm
CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000
RIP: 0010:[<ffffffff813baf01>]  [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
RSP: 0018:ffff88003ff4bd68  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8
RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88
RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98
R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00
FS:  0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0
Stack:
 ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68
 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48
 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40
Call Trace:
 [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740
 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40
 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
 [<ffffffff810af0d1>] kthread+0x101/0x120
 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
 [<ffffffff818843af>] ret_from_fork+0x1f/0x40
 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250
Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6
RIP  [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
 RSP <ffff88003ff4bd68>
CR2: ffff88004941ee40

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
d2a7de0b34 NFS: Fix up O_DIRECT results
if we read or wrote something, we must report it

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
dd1beb3d16 NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
We must call nfs4_handle_exception() on BAD_STATEID errors. The only
exception is if the stateid argument turns out to be a layout stateid
that is declared invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
e5241e4388 NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
67a3b72146 NFSv4.1/pnfs: Layout stateids start out as being invalid
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Trond Myklebust
bc23676caf NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
nfs4_handle_exception() relies on the caller setting the 'inode' field
in the struct nfs4_exception argument when the error applies to a
delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-24 12:01:00 -04:00
Linus Torvalds
63c04ee7d3 This pull requests contains fixes for two critical bugs in UBI and UBIFS:
1. Fixes the possibility of losing data upon a power cut when UBI tries
    to recover from a write error.
 2. Fixes page migration on UBIFS. It turned out that the default page
    migration function is not suitable for UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXa5tEAAoJEEtJtSqsAOnW090P/RcQjIfVf2g3r8VRp38OQPbb
 MTd4sD/rnyt5Eq0QYUPWG5xcYK2BWI1PwpdB81JvW5hxnXPgG8DpVIxjzt/7Xgnp
 QheYe9tMfgYjDntz1rzGVa/uHSAldP9V4czgczrBW/0lwnRsZ6mLY1RA9Oz0hRdG
 cp53I8CSD0DPyqU0XkgzLkzVUstmySwQ5i46C0kQEnlRcytReOLgcjSrXXn+/Zih
 yZxhtDQSCKmQAfVmERggPXVHo8jFtVfej52ja7RFcMA2uXvXqljOBNCyLUYPdYka
 XdQEKsXRLl69ktFUXwZwPAYAW23I8+PMpsoljHDVc0hF25p8omp3D+7HE18SsMSv
 6RNnUwz+PDbiFApyoTu0SBgHN/OO9o6rjNNoRIInoKpk0NvWmrMQOo6BIFsX4yq1
 0dOVJiKXVoFuo75Yw9mOKdrV/Z5P1TvgdTBj6g03aUM9vcX1Gz6+1xKkvcXGgh02
 8qFDZdZ5L87TlpMkvtWO87Ir0ssrfjxpvxR8pPsxxqvxbfUuVmss4ILuh9AVSVk+
 d1zrz30+JZzTbIrky/7R31i6Bx2+reYdTKiPIkST9sF5WblUPSeyUoKq1OlNRYxj
 n+0Q8S5Tm/6AHXUOQFxurbXU+D7G7TaL/CsBeepvV/AqJb07+vBxUuGFH1rDbmLB
 r5dTfOXn3iNEmmNyrhgN
 =EDeX
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Richard Weinberger:
 "This contains fixes for two critical bugs in UBI and UBIFS:

   - fix the possibility of losing data upon a power cut when UBI tries
     to recover from a write error

   - fix page migration on UBIFS.  It turned out that the default page
     migration function is not suitable for UBIFS"

* tag 'upstream-4.7-rc5' of git://git.infradead.org/linux-ubifs:
  UBIFS: Implement ->migratepage()
  mm: Export migrate_page_move_mapping and migrate_page_copy
  ubi: Make recover_peb power cut aware
  gpio: make library immune to error pointers
  gpio: make sure gpiod_to_irq() returns negative on NULL desc
  gpio: 104-idi-48: Fix missing spin_lock_init for ack_lock
2016-06-23 22:48:48 -07:00
Luis de Bethencourt
a6b6befbb2 cifs: check hash calculating succeeded
calc_lanman_hash() could return -ENOMEM or other errors, we should check
that everything went fine before using the calculated key.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-06-23 23:45:17 -05:00
Jerome Marchand
b8da344b74 cifs: dynamic allocation of ntlmssp blob
In sess_auth_rawntlmssp_authenticate(), the ntlmssp blob is allocated
statically and its size is an "empirical" 5*sizeof(struct
_AUTHENTICATE_MESSAGE) (320B on x86_64). I don't know where this value
comes from or if it was ever appropriate, but it is currently
insufficient: the user and domain name in UTF16 could take 1kB by
themselves. Because of that, build_ntlmssp_auth_blob() might corrupt
memory (out-of-bounds write). The size of ntlmssp_blob in
SMB2_sess_setup() is too small too (sizeof(struct _NEGOTIATE_MESSAGE)
+ 500).

This patch allocates the blob dynamically in
build_ntlmssp_auth_blob().

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2016-06-23 23:45:07 -05:00
Jerome Marchand
202d772ba0 cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name
Currently in build_ntlmssp_auth_blob(), when converting the domain
name to UTF16, CIFS_MAX_USERNAME_LEN limit is used. It should be
CIFS_MAX_DOMAINNAME_LEN. This patch fixes this.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-06-23 23:44:56 -05:00
Jeff Layton
3d22462ae9 cifs: stuff the fl_owner into "pid" field in the lock request
Right now, we send the tgid cross the wire. What we really want to send
though is a hashed fl_owner_t since samba treats this field as a generic
lockowner.

It turns out that because we enforce and release locks locally before
they are ever sent to the server, this patch makes no difference in
behavior. Still, setting OFD locks on the server using the process
pid seems wrong, so I think this patch still makes sense.

Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-06-23 23:44:44 -05:00
Chandan Rajendra
b7f67055d2 Btrfs: Force stripesize to the value of sectorsize
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:42 -07:00
Wang Xiaoguang
c0d2f6104e btrfs: fix disk_i_size update bug when fallocate() fails
When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.

And below test case can reveal this bug:

dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs  -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint

echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
	i=$((i + 1))
	dst_offset=$((blocksize * i))
	xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
		testfile > /dev/null
done
sync

truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit

In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:41 -07:00
Liu Bo
415b35a55b Btrfs: fix error handling in map_private_extent_buffer
map_private_extent_buffer() can return -EINVAL in two different cases,
1. when the requested contents span two pages if nodesize is larger
   than pagesize,
2. when it detects something insane.

The 2nd one used to be only a WARN_ON(1), and we decided to return a error
to callers, but we didn't fix up all its callers, which will be
addressed by this patch.

Without this, btrfs may end up with 'general protection', ie.
reading invalid memory.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:40 -07:00
Wei Yongjun
04e1b65af2 Btrfs: fix error return code in btrfs_init_test_fs()
Fix to return a negative error code from the kern_mount() error handling
case instead of 0(ret is set to 0 by register_filesystem), as done
elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:39 -07:00
Josef Bacik
c6887cd111 Btrfs: don't do nocow check unless we have to
Before we write into prealloc/nocow space we have to make sure that there are no
references to the extents we are writing into, which means checking the extent
tree and csum tree in the case of nocow.  So we don't want to do the nocow dance
unless we can't reserve data space, since it's a serious drag on performance.
With the following sequence

fallocate -l10737418240 /mnt/btrfs-test/file
cp --reflink /mnt/btrfs-test/file /mnt/btrfs-test/link
fio --name=randwrite --rw=randwrite --bs=4k --filename=/mnt/btrfs-test/file \
	--end_fsync=1

we get the worst case scenario where we have to fall back on to doing the check
anyway.

Without this patch
lat (usec): min=5, max=111598, avg=27.65, stdev=124.51
write: io=10240MB, bw=126876KB/s, iops=31718, runt= 82646msec

With this patch
lat (usec): min=3, max=91210, avg=14.09, stdev=110.62
write: io=10240MB, bw=212753KB/s, iops=53188, runt= 49286msec

We get twice the throughput, half of the runtime, and half of the average
latency.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
[ PAGE_CACHE_ removal related fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:57:14 -07:00
Chris Mason
0f873eca82 btrfs: fix deadlock in delayed_ref_async_start
"Btrfs: track transid for delayed ref flushing" was deadlocking on
btrfs_attach_transaction because its not safe to call from the async
delayed ref start code.  This commit brings back btrfs_join_transaction
instead and checks for a blocked commit.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00
Josef Bacik
31b9655f43 Btrfs: track transid for delayed ref flushing
Using the offwakecputime bpf script I noticed most of our time was spent waiting
on the delayed ref throttling.  This is what is supposed to happen, but
sometimes the transaction can commit and then we're waiting for throttling that
doesn't matter anymore.  So change this stuff to be a little smarter by tracking
the transid we were in when we initiated the throttling.  If the transaction we
get is different then we can just bail out.  This resulted in a 50% speedup in
my fs_mark test, and reduced the amount of time spent throttling by 60 seconds
over the entire run (which is about 30 minutes).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00
Kirill A. Shutemov
4ac1c17b20 UBIFS: Implement ->migratepage()
During page migrations UBIFS might get confused
and the following assert triggers:
[  213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
[  213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
[  213.490000] Hardware name: Allwinner sun4i/sun5i Families
[  213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
[  213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
[  213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
[  213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
[  213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
[  213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
[  213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
[  213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
[  213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
[  213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
[  213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
[  213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
[  213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
[  213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
[  213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
[  213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
[  213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
[  213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
[  213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)

UBIFS is using PagePrivate() which can have different meanings across
filesystems. Therefore the generic page migration code cannot handle this
case correctly.
We have to implement our own migration function which basically does a
plain copy but also duplicates the page private flag.
UBIFS is not a block device filesystem and cannot use buffer_migrate_page().

Cc: stable@vger.kernel.org
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[rw: Massaged changelog, build fixes, etc...]
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Christoph Hellwig <hch@lst.de>
2016-06-23 00:29:53 +02:00
Linus Torvalds
f9020d1741 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fix from Eric Biederman:
 "This contains just a single small patch that fixes a tiny hole in the
  logic of allowing unprivileged mounting of proc and sysfs.

  In practice I don't think anyone is affected because having MNT_RDONLY
  clear in mnt->mnt_flags but MS_RDONLY set in sb->s_flags is very weird
  for a filesystem, and weirder for proc and sysfs.  However if it
  happens let's handle it correctly and then no one has to to worry
  about this crazy case"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mnt: Account for MS_RDONLY in fs_fully_visible
2016-06-22 14:11:24 -07:00
Trond Myklebust
4f52b6bb8c NFS: Don't call COMMIT in ->releasepage()
While COMMIT has the potential to free up a lot of memory that is being
taken by unstable writes, it isn't guaranteed to free up this particular
page. Also, calling fsync() on the server is expensive and so we want to
do it in a more controlled fashion, rather than have it triggered at
random by the VM.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22 09:59:44 -04:00
Trond Myklebust
93761d9863 NFS: Don't hold the inode lock across fsync()
Commits are no longer required to be serialised.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22 09:59:43 -04:00
Trond Myklebust
811ed92ecc NFS: writepage of a single page should not be synchronous
It is almost always better to wait for more so that we can issue a
bulk commit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22 09:59:43 -04:00
Trond Myklebust
6b56a89833 NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer
filemap_datawrite() and friends already deal just fine with livelock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22 09:59:42 -04:00
Trond Myklebust
ca0daa277a NFS: Cache aggressively when file is open for writing
Unless the user is using file locking, we must assume close-to-open
cache consistency when the file is open for writing. Adjust the
caching algorithm so that it does not clear the cache on out-of-order
writes and/or attribute revalidations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22 09:59:42 -04:00
Al Viro
ebaaa80e8f lockless next_positive()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-20 17:11:29 -04:00
Al Viro
4f42c1b5b9 libfs.c: new helper - next_positive()
Return nth positive child after given or NULL if there's
less than n left.  dcache_readdir() and dcache_dir_lseek()
switched to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-20 17:11:28 -04:00
Al Viro
274f5b041d dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
Make sure that directory is locked shared in dcache_dir_lseek();
for dcache_readdir() it's already tru, and that's enough to make
simple_positive() stable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-20 17:11:26 -04:00
Linus Torvalds
67016f6cdf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple more of d_walk()/d_subdirs reordering fixes (stable fodder;
  ought to solve that crap for good) and a fix for a brown paperbag bug
  in d_alloc_parallel() (this cycle)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix idiotic braino in d_alloc_parallel()
  autofs races
  much milder d_walk() race
2016-06-20 10:41:51 -07:00
Chris J Arges
40f0fd372a ecryptfs: fix spelling mistakes
Noticed some minor spelling errors when looking through the code.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-06-20 10:02:35 -05:00
Wei Yuan
5f9f2c2abd eCryptfs: fix typos in comment
Signed-off-by: Weiyuan <weiyuan.wei@huawei.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-06-20 10:02:23 -05:00
Julia Lawall
c39341cf0d ecryptfs: drop null test before destroy functions
Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-06-20 10:02:22 -05:00
Al Viro
e7d6ef9790 fix idiotic braino in d_alloc_parallel()
Check for d_unhashed() while searching in in-lookup hash was absolutely
wrong.  Worse, it masked a deadlock on dget() done under bitlock that
nests inside ->d_lock.  Thanks to J. R. Okajima for spotting it.

Spotted-by: "J. R. Okajima" <hooanon05g@gmail.com>
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-20 10:07:42 -04:00
Linus Torvalds
c3695331f3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and a reiserfs fix from Jan Kara:
 "A couple of udf fixes (most notably a bug in parsing UDF partitions
  which led to inability to mount recent Windows installation media) and
  a reiserfs fix for handling kstrdup failure"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: check kstrdup failure
  udf: Use correct partition reference number for metadata
  udf: Use IS_ERR when loading metadata mirror file entry
  udf: Don't BUG on missing metadata partition descriptor
2016-06-19 07:05:14 -10:00
Linus Torvalds
607117a153 Driver core fixes for 4.7-rc4
Here are a small number of debugfs, ISA, and one driver core fix for 4.7-rc4.
 
 All of these resolve reported issues.  The ISA ones have spent the least
 amount of time in linux-next, sorry about that, I didn't realize they
 were regressions that needed to get in now (thanks to Thorsten for the
 prodding!) but they do all pass the 0-day bot tests.  The others have
 been in linux-next for a while now.
 
 Full details about them are in the shortlog below.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAldlbeQACgkQMUfUDdst+ymnFACfaWhEKA/84jwNNHiim92diJrY
 zYsAoLOmpBw68yL6qTSZbcWJF4Flb6Xk
 =N8M2
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are a small number of debugfs, ISA, and one driver core fix for
  4.7-rc4.

  All of these resolve reported issues.  The ISA ones have spent the
  least amount of time in linux-next, sorry about that, I didn't realize
  they were regressions that needed to get in now (thanks to Thorsten
  for the prodding!) but they do all pass the 0-day bot tests.  The
  others have been in linux-next for a while now.

  Full details about them are in the shortlog below"

* tag 'driver-core-4.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  isa: Dummy isa_register_driver should return error code
  isa: Call isa_bus_init before dependent ISA bus drivers register
  watchdog: ebc-c384_wdt: Allow build for X86_64
  iio: stx104: Allow build for X86_64
  gpio: Allow PC/104 devices on X86_64
  isa: Allow ISA-style drivers on modern systems
  base: make module_create_drivers_dir race-free
  debugfs: open_proxy_open(): avoid double fops release
  debugfs: full_proxy_open(): free proxy on ->open() failure
  kernel/kcov: unproxify debugfs file's fops
2016-06-18 06:04:01 -10:00
Linus Torvalds
4c6459f945 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The most user visible change here is a fix for our recent superblock
  validation checks that were causing problems on non-4k pagesized
  systems"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
  btrfs: remove build fixup for qgroup_account_snapshot
  btrfs: use new error message helper in qgroup_account_snapshot
  btrfs: avoid blocking open_ctree from cleaner_kthread
  Btrfs: don't BUG_ON() in btrfs_orphan_add
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  Btrfs: check if extent buffer is aligned to sectorsize
  btrfs: Use correct format specifier
2016-06-18 05:57:59 -10:00
Chandan Rajendra
dd5c93111d Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence
restricting stripesize to be equal to sectorsize would cause super block
validation to return an error on architectures where PAGE_SIZE is not
equal to 4096.

Hence as a workaround, this commit allows stripesize to be set to 4096
bytes.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:49 +02:00
David Sterba
89c5a5441d btrfs: remove build fixup for qgroup_account_snapshot
Introduced in 2c1984f244 ("btrfs: build fixup for
qgroup_account_snapshot") as temporary bisectability build fixup.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
David Sterba
f7af3934c2 btrfs: use new error message helper in qgroup_account_snapshot
We've renamed btrfs_std_error, this one is left from last merge.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Zygo Blaxell
90c711ab38 btrfs: avoid blocking open_ctree from cleaner_kthread
This fixes a problem introduced in commit 2f3165ecf1
"btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".

open_ctree eventually calls btrfs_replay_log which in turn calls
btrfs_commit_super which tries to lock the cleaner_mutex, causing a
recursive mutex deadlock during mount.

Instead of playing whack-a-mole trying to keep up with all the
functions that may want to lock cleaner_mutex, put all the cleaner_mutex
lockers back where they were, and attack the problem more directly:
keep cleaner_kthread asleep until the filesystem is mounted.

When filesystems are mounted read-only and later remounted read-write,
open_ctree did not set fs_info->open and neither does anything else.
Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
nor cleaner_kthread get confused by the common case of "/" filesystem
read-only mount followed by read-write remount.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00