linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 16:16:46 +07:00

Author	SHA1	Message	Date
Al Viro	d4fb392f4c	kill aio_setup_single_vector() identical to import_single_range() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:27:10 -04:00
Al Viro	a96114fa1a	aio: simplify arguments of aio_setup_..._rw() We don't need req in either of those. We don't need nr_segs in caller. We don't really need len in caller either - iov_iter_count(&iter) will do. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:26:45 -04:00
Al Viro	4c185ce06d	aio: lift iov_iter_init() into aio_setup_..._rw() the only non-trivial detail is that we do it before rw_verify_area(), so we'd better cap the length ourselves in aio_setup_single_rw() case (for vectored case rw_copy_check_uvector() will do that for us). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:26:45 -04:00
Al Viro	ac15ac0669	lift iov_iter into {compat_,}do_readv_writev() get it closer to matching {compat_,}rw_copy_check_uvector(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:26:45 -04:00
Al Viro	c0fec3a98b	Merge branch 'iocb' into for-next	2015-04-11 22:24:41 -04:00
Andrew Elble	c1b8940b42	NFS: fix BUG() crash in notify_change() with patch to chown_common() We have observed a BUG() crash in fs/attr.c:notify_change(). The crash occurs during an rsync into a filesystem that is exported via NFS. 1.) fs/attr.c:notify_change() modifies the caller's version of attr. 2.) `6de0ec00ba` ("VFS: make notify_change pass ATTR_KILL_SID to setattr operations") introduced a BUG() restriction such that "no function will ever call notify_change() with both ATTR_MODE and ATTR_KILL_SID set". Under some circumstances though, it will have assisted in setting the caller's version of attr to this very combination. 3.) `27ac0ffeac` ("locks: break delegations on any attribute modification") introduced code to handle breaking delegations. This can result in notify_change() being re-called. attr _must_ be explicitly reset to avoid triggering the BUG() established in #2. 4.) The path that that triggers this is via fs/open.c:chmod_common(). The combination of attr flags set here and in the first call to notify_change() along with a later failed break_deleg_wait() results in notify_change() being called again via retry_deleg without resetting attr. Solution is to move retry_deleg in chmod_common() a bit further up to ensure attr is completely reset. There are other places where this seemingly could occur, such as fs/utimes.c:utimes_common(), but the attr flags are not initially set in such a way to trigger this. Fixes: `27ac0ffeac` ("locks: break delegations on any attribute modification") Reported-by: Eric Meddaugh <etmsys@rit.edu> Tested-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:34 -04:00
J. Bruce Fields	3d330dc175	dcache: return -ESTALE not -EBUSY on distributed fs race On a distributed filesystem it's possible for lookup to discover that a directory it just found is already cached elsewhere in the directory heirarchy. The dcache won't let us keep the directory in both places, so we have to move the dentry to the new location from the place we previously had it cached. If the parent has changed, then this requires all the same locks as we'd need to do a cross-directory rename. But we're already in lookup holding one parent's i_mutex, so it's too late to acquire those locks in the right order. The (unreliable) solution in __d_unalias is to trylock() the required locks and return -EBUSY if it fails. I see no particular reason for returning -EBUSY, and -ESTALE is already the result of some other lookup races on NFS. I think -ESTALE is the more helpful error return. It also allows us to take advantage of the logic Jeff Layton added in `c6a9428401` "vfs: fix renameat to retry on ESTALE errors" and ancestors, which hopefully resolves some of these errors before they're returned to userspace. I can reproduce these cases using NFS with: ssh root@$client ' mount -olookupcache=pos '$server':'$export' /mnt/ mkdir /mnt/TO mkdir /mnt/DIR touch /mnt/DIR/test.txt while true; do strace -e open cat /mnt/DIR/test.txt 2>&1 \| grep EBUSY done ' ssh root@$server ' while true; do mv $export/DIR $export/TO/DIR mv $export/TO/DIR $export/DIR done ' It also helps to add some other concurrent use of the directory on the client (e.g., "ls /mnt/TO"). And you can replace the server-side mv's by client-side mv's that are repeatedly killed. (If the client is interrupted while waiting for the RENAME response then it's left with a dentry that has to go under one parent or the other, but it doesn't yet know which.) Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:33 -04:00
Anton Altaparmakov	a632f55930	NTFS: Version 2.1.32 - Update file write from aio_write to write_iter. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:33 -04:00
Al Viro	e5b811e38a	drop bogus check in file_open_root() For one thing, LOOKUP_DIRECTORY will be dealt with in do_last(). For another, name can be an empty string, but not NULL - no callers pass that and it would oops immediately if they would. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:32 -04:00
Al Viro	3f7036a071	switch security_inode_getattr() to struct path * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:32 -04:00
Al Viro	9e7543e939	remove incorrect comment in lookup_one_len() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:30 -04:00
Al Viro	74eb8cc5a5	namei.c: fold do_path_lookup() into both callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:30 -04:00
Al Viro	fd2f7cb5bc	kill struct filename.separate just make const char iname[] the last member and compare name->name with name->iname instead of checking name->separate We need to make sure that out-of-line name doesn't end up allocated adjacent to struct filename refering to it; fortunately, it's easy to achieve - just allocate that struct filename with one byte in ->iname[], so that ->iname[0] will be inside the same object and thus have an address different from that of out-of-line name [spotted by Boqun Feng <boqun.feng@gmail.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:21:24 -04:00
Christoph Hellwig	e2e40f2c1e	fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-25 20:28:11 -04:00
Al Viro	6e8a1f8741	switch path_init() to struct filename Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-24 17:19:16 -04:00
Al Viro	668696dcbb	switch path_mountpoint() to struct filename Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-24 17:19:15 -04:00
Al Viro	5eb6b495c6	switch path_lookupat() to struct filename all callers were passing it ->name of some struct filename Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-24 17:19:15 -04:00
Al Viro	94b5d2621a	getname_flags(): clean up a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-24 17:19:14 -04:00
Linus Torvalds	4541c22605	Driver core fixes for 4.0-rc5 Here are two bugfixes for things reported. One regression in kernfs, and another issue fixed in the LZ4 code that was fixed in the "upstream" codebase that solves a reported kernel crash. Both have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlUOl6YACgkQMUfUDdst+ylRjQCfT46/we0U/S38/sTNYtElQXUI ZEsAnAwozBj6nIYOydxaKnA0BcujsSGz =unYd -----END PGP SIGNATURE----- Merge tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two bugfixes for things reported. One regression in kernfs, and another issue fixed in the LZ4 code that was fixed in the "upstream" codebase that solves a reported kernel crash Both have been in linux-next for a while" * tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: LZ4 : fix the data abort issue kernfs: handle poll correctly on 'direct_read' files.	2015-03-22 12:07:47 -07:00
Linus Torvalds	521d474631	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Most of these are fixing extent reservation accounting, or corners with tree writeback during commit. Josef's set does add a test, which isn't strictly a fix, but it'll keep us from making this same mistake again" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix outstanding_extents accounting in DIO Btrfs: add sanity test for outstanding_extents accounting Btrfs: just free dummy extent buffers Btrfs: account merges/splits properly Btrfs: prepare block group cache before writing Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list) Btrfs: account for the correct number of extents for delalloc reservations Btrfs: fix merge delalloc logic Btrfs: fix comp_oper to get right order Btrfs: catch transaction abortion after waiting for it btrfs: fix sizeof format specifier in btrfs_check_super_valid()	2015-03-21 10:53:37 -07:00
Linus Torvalds	0d122f7430	Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux Pull nfsd bufix from Bruce Fields: "This is a fix for a crash easily triggered by 4.1 activity to a server built with CONFIG_NFSD_PNFS. There are some more bugfixes queued up that I intend to pass along next week, but this is the most critical" * 'for-4.0' of git://linux-nfs.org/~bfields/linux: Subject: nfsd: don't recursively call nfsd4_cb_layout_fail	2015-03-21 10:41:15 -07:00
Linus Torvalds	1e744c938d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This fixes bugs in zero-copy splice to the fuse device" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: explicitly set /dev/fuse file's private_data fuse: set stolen page uptodate fuse: notify: don't move pages	2015-03-19 16:36:24 -07:00
Linus Torvalds	e409ac3550	Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This fixes minor issues with the multi-layer update in v4.0" * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: upper fs should not be R/O ovl: check lowerdir amount for non-upper mount ovl: print error message for invalid mount options	2015-03-19 16:27:36 -07:00
Christoph Hellwig	133d558216	Subject: nfsd: don't recursively call nfsd4_cb_layout_fail Due to a merge error when creating `c5c707f9` ("nfsd: implement pNFS layout recalls"), we recursively call nfsd4_cb_layout_fail from itself, leading to stack overflows. Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: `c5c707f9` ("nfsd: implement pNFS layout recalls") Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/nfsd/nfs4layouts.c \| 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 3c1bfa1..1028a06 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid ls) rpc_ntop((struct sockaddr )&clp->cl_addr, addr_str, sizeof(addr_str)); - nfsd4_cb_layout_fail(ls); - printk(KERN_WARNING "nfsd: client %s failed to respond to layout recall. " " Fencing..\n", addr_str); -- 1.9.1	2015-03-19 15:49:27 -04:00
Tom Van Braeckel	94e4fe2cab	fuse: explicitly set /dev/fuse file's private_data The misc subsystem (which is used for /dev/fuse) initializes private_data to point to the misc device when a driver has registered a custom open file operation, and initializes it to NULL when a custom open file operation has not been provided. This subtle quirk is confusing, to the point where kernel code registers empty file open operations to have private_data point to the misc device structure. And it leads to bugs, where the addition or removal of a custom open file operation surprisingly changes the initial contents of a file's private_data structure. So to simplify things in the misc subsystem, a patch [1] has been proposed to always set the private_data to point to the misc device, instead of only doing this when a custom open file operation has been registered. But before this patch can be applied we need to modify drivers that make the assumption that a misc device file's private_data is initialized to NULL because they didn't register a custom open file operation, so they don't rely on this assumption anymore. FUSE uses private_data to store the fuse_conn and errors out if this is not initialized to NULL at mount time. Hence, we now set a file's private_data to NULL explicitly, to be independent of whatever value the misc subsystem initializes it to by default. [1] https://lkml.org/lkml/2014/12/4/939 Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com> Reported-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-19 15:29:22 +01:00
hujianyang	71cbad7e69	ovl: upper fs should not be R/O After importing multi-lower layer support, users could mount a r/o partition as the left most lowerdir instead of using it as upperdir. And a r/o upperdir may cause an error like overlayfs: failed to create directory ./workdir/work during mount. This patch check the s_flags of upper fs and return an error if it is a r/o partition. The checking of upper_mnt->mnt_sb->s_flags can be removed now. This patch also remove /* FIXME: workdir is not needed for a R/O mount */ from ovl_fill_super() because: 1) for upper fs r/o case Setting a r/o partition as upper is prevented, no need to care about workdir in this case. 2) for "mount overlay -o ro" with a r/w upper fs case Users could remount overlayfs to r/w in this case, so workdir should not be omitted. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:48 +01:00
hujianyang	6be4506e34	ovl: check lowerdir amount for non-upper mount Recently multi-lower layer mount support allow upperdir and workdir to be omitted, then cause overlayfs can be mount with only one lowerdir directory. This action make no sense and have potential risk. This patch check the total number of lower directories to prevent mounting overlayfs with only one directory. Also, an error message is added to indicate lower directories exceed OVL_MAX_STACK limit. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:47 +01:00
hujianyang	bead55ef77	ovl: print error message for invalid mount options Overlayfs should print an error message if an incorrect mount option is caught like other filesystems. After this patch, improper option input could be clearly known. Reported-by: Fabian Sturm <fabian.sturm@aduu.de> Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:47 +01:00
Josef Bacik	e1cbbfa5f5	Btrfs: fix outstanding_extents accounting in DIO We are keeping track of how many extents we need to reserve properly based on the amount we want to write, but we were still incrementing outstanding_extents if we wrote less than what we requested. This isn't quite right since we will be limited to our max extent size. So instead lets do something horrible! Keep track of how many outstanding_extents we reserved, and decrement each time we allocate an extent. If we use our entire reserve make sure to jack up outstanding_extents on the inode so the accounting works out properly. Thanks, Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <jbacik@fb.com>	2015-03-17 16:36:35 -04:00
Josef Bacik	6a3891c551	Btrfs: add sanity test for outstanding_extents accounting I introduced a regression wrt outstanding_extents accounting. These are tricky areas that aren't easily covered by xfstests as we could change MAX_EXTENT_SIZE at any time. So add sanity tests to cover the various conditions that are tricky in order to make sure we don't introduce regressions in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>	2015-03-17 16:36:31 -04:00
Josef Bacik	bcb7e449ec	Btrfs: just free dummy extent buffers If we fail during our sanity tests we could get NULL deref's because we unload the module before the dummy extent buffers are free'd via RCU. So check for this case and just free the things directly. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>	2015-03-17 16:30:18 -04:00
Josef Bacik	ba11721355	Btrfs: account merges/splits properly My fix Btrfs: fix merge delalloc logic only fixed half of the problems, it didn't fix the case where we have two large extents on either side and then join them together with a new small extent. We need to instead keep track of how many extents we have accounted for with each side of the new extent, and then see how many extents we need for the new large extent. If they match then we know we need to keep our reservation, otherwise we need to drop our reservation. This shows up with a case like this [BTRFS_MAX_EXTENT_SIZE+4K][4K HOLE][BTRFS_MAX_EXTENT_SIZE+4K] Previously the logic would have said that the number extents required for the new size (3) is larger than the number of extents required for the largest side (2) therefore we need to keep our reservation. But this isn't the case, since both sides require a reservation of 2 which leads to 4 for the whole range currently reserved, but we only need 3, so we need to drop one of the reservations. The same problem existed for splits, we'd think we only need 3 extents when creating the hole but in reality we need 4. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>	2015-03-17 16:28:21 -04:00
Kirill A. Shutemov	ab676b7d6f	pagemap: do not leak physical addresses to non-privileged userspace As pointed by recent post[1] on exploiting DRAM physical imperfection, /proc/PID/pagemap exposes sensitive information which can be used to do attacks. This disallows anybody without CAP_SYS_ADMIN to read the pagemap. [1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html [ Eventually we might want to do anything more finegrained, but for now this is the simple model. - Linus ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Seaborn <mseaborn@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-17 09:31:30 -07:00
Josef Bacik	dcdf7f6ddb	Btrfs: prepare block group cache before writing Writing the block group cache will modify the extent tree quite a bit because it truncates the old space cache and pre-allocates new stuff. To try and cut down on the churn lets do the setup dance first, then later on hopefully we can avoid looping with newly dirtied roots. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com>	2015-03-17 10:56:55 -04:00
NeilBrown	7cff4b1836	kernfs: handle poll correctly on 'direct_read' files. Kernfs supports two styles of read: direct_read and seqfile_read. The latter supports 'poll' correctly thanks to the update of '->event' in kernfs_seq_show. The former does not as '->event' is never updated on a read. So add an appropriate update in kernfs_file_direct_read(). This was noticed because some 'md' sysfs attributes were recently changed to use direct reads. Reported-by: Prakash Punnoor <prakash@punnoor.de> Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Fixes: `750f199ee8` Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-16 21:51:20 +01:00
Jeff Layton	a9b1b455c5	locks: fix generic_delete_lease tracepoint to use victim pointer It's possible that "fl" won't point at a valid lock at this point, so use "victim" instead which is either a valid lock or NULL. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>	2015-03-14 09:45:35 -04:00
Josef Bacik	ea526d1899	Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list) Dave could hit this assert consistently running btrfs/078. This is because when we update the block groups we could truncate the free space, which would try to delete the csums for that range and dirty the csum root. For this to happen we have to have already written out the csum root so it's kind of hard to hit this case. This patch fixes this by changing the logic to only write the dirty block groups if the dirty_cowonly_roots list is empty. This will get us the same effect as before since we add the extent root last, and will cover the case that we dirty some other root again but not the extent root. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:47:04 -07:00
Josef Bacik	6a41dd0922	Btrfs: account for the correct number of extents for delalloc reservations Direct IO can easily pass in an buffer that is greater than BTRFS_MAX_EXTENT_SIZE, so take this into account when reserving extents in the delalloc reservation code. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:46:59 -07:00
Josef Bacik	8461a3de77	Btrfs: fix merge delalloc logic My patch to properly count outstanding extents wrt MAX_EXTENT_SIZE introduced a regression when re-dirtying already dirty areas. We have logic in split to make sure we are taking the largest space into account but didn't have it for merge, so it was sometimes making us think we were turning a tiny extent into a huge extent, when in reality we already had a huge extent and needed to use the other side in our logic. This fixes the regression that was reported by a user on list. Thanks, Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:46:59 -07:00
Liu Bo	48da5f0a4c	Btrfs: fix comp_oper to get right order Case (oper1->seq > oper2->seq) should differ with case (oper1->seq < oper2->seq). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:46:59 -07:00
Liu Bo	b4924a0fa1	Btrfs: catch transaction abortion after waiting for it This problem is uncovered by a test case: http://patchwork.ozlabs.org/patch/244297. Fsync() can report success when it actually doesn't. When we have several threads running fsync() at the same tiem and in one fsync() we get a transaction abortion due to some problems(in the test case it's disk failures), and other fsync()s may return successfully which makes userspace programs think that data is now safely flushed into disk. It's because that after fsyncs() fail btrfs_sync_log() due to disk failures, they get to try btrfs_commit_transaction() where it finds that there is already a transaction being committed, and they'll just call wait_for_commit() and return. Note that we actually check "trans->aborted" in btrfs_end_transaction, but it's likely that the error message is still not yet throwed out and only after wait_for_commit() we're sure whether the transaction is committed successfully. This add the necessary check and it now passes the test. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:38:23 -07:00
Fabian Frederick	d22071293f	btrfs: fix sizeof format specifier in btrfs_check_super_valid() This patch fixes mips compilation warning: fs/btrfs/disk-io.c: In function 'btrfs_check_super_valid': fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat] Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Chris Mason <clm@fb.com>	2015-03-13 13:38:22 -07:00
Christoph Hellwig	04b2fa9f8f	fs: split generic and aio kiocb Most callers in the kernel want to perform synchronous file I/O, but still have to bloat the stack with a full struct kiocb. Split out the parts needed in filesystem code from those in the aio code, and only allocate those needed to pass down argument on the stack. The aio code embedds the generic iocb in the one it allocates and can easily get back to it by using container_of. Also add a ->ki_complete method to struct kiocb, this is used to call into the aio code and thus removes the dependency on aio for filesystems impementing asynchronous operations. It will also allow other callers to substitute their own completion callback. We also add a new ->ki_flags field to work around the nasty layering violation recently introduced in commit 5e33f6 ("usb: gadget: ffs: add eventfd notification about ffs events"). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-13 12:10:27 -04:00
Christoph Hellwig	599bd19bdc	fs: don't allow to complete sync iocbs through aio_complete The AIO interface is fairly complex because it tries to allow filesystems to always work async and then wakeup a synchronous caller through aio_complete. It turns out that basically no one was doing this to avoid the complexity and context switches, and we've already fixed up the remaining users and can now get rid of this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-13 12:10:22 -04:00
Christoph Hellwig	9d5722b777	fuse: handle synchronous iocbs internally Based on a patch from Maxim Patlasov <MPatlasov@parallels.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-13 12:10:15 -04:00
Christoph Hellwig	66ee59af63	fs: remove ki_nbytes There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-12 23:50:23 -04:00
Suzuki K. Poulose	b3c1030d50	fanotify: fix event filtering with FAN_ONDIR set With FAN_ONDIR set, the user can end up getting events, which it hasn't marked. This was revealed with fanotify04 testcase failure on Linux-4.0-rc1, and is a regression from 3.19, revealed with `66ba93c0d7` ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask"). # /opt/ltp/testcases/bin/fanotify04 [ ... ] fanotify04 7 TPASS : event generated properly for type 100000 fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30 fanotify04 9 TPASS : No event as expected The testcase sets the adds the following marks : FAN_OPEN \| FAN_ONDIR for a fanotify on a dir. Then does an open(), followed by close() of the directory and expects to see an event FAN_OPEN(0x20). However, the fanotify returns (FAN_OPEN\|FAN_CLOSE_NOWRITE(0x10)). This happens due to the flaw in the check for event_mask in fanotify_should_send_event() which does: if (event_mask & marks_mask & ~marks_ignored_mask) return true; where, event_mask == (FAN_ONDIR \| FAN_CLOSE_NOWRITE), marks_mask == (FAN_ONDIR \| FAN_OPEN), marks_ignored_mask == 0 Fix this by masking the outgoing events to the user, as we already take care of FAN_ONDIR and FAN_EVENT_ON_CHILD. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-12 18:46:08 -07:00
Ryusuke Konishi	283ee1482f	nilfs2: fix deadlock of segment constructor during recovery According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck during recovery at mount time. The code path that caused the deadlock was as follows: nilfs_fill_super() load_nilfs() nilfs_salvage_orphan_logs() * Do roll-forwarding, attach segment constructor for recovery, and kick it. nilfs_segctor_thread() nilfs_segctor_thread_construct() * A lock is held with nilfs_transaction_lock() nilfs_segctor_do_construct() nilfs_segctor_drop_written_files() iput() iput_final() write_inode_now() writeback_single_inode() __writeback_single_inode() do_writepages() nilfs_writepage() nilfs_construct_dsync_segment() nilfs_transaction_lock() --> deadlock This can happen if commit `7ef3ff2fea` ("nilfs2: fix deadlock of segment constructor over I_SYNC flag") is applied and roll-forward recovery was performed at mount time. The roll-forward recovery can happen if datasync write is done and the file system crashes immediately after that. For instance, we can reproduce the issue with the following steps: < nilfs2 is mounted on /nilfs (device: /dev/sdb1) > # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k count=1 && reboot -nfh < the system will immediately reboot > # mount -t nilfs2 /dev/sdb1 /nilfs The deadlock occurs because iput() can run segment constructor through writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The above commit changed segment constructor so that it calls iput() asynchronously for inodes with i_nlink == 0, but that change was imperfect. This fixes the another deadlock by deferring iput() in segment constructor even for the case that mount is not finished, that is, for the case that MS_ACTIVE flag is not set. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Yuxuan Shui <yshuiv7@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-12 18:46:08 -07:00
Mark Fasheh	18d585f0f2	ocfs2: make append_dio an incompat feature It turns out that making this feature ro_compat isn't quite enough to prevent accidental corruption on mount from older kernels. Ocfs2 (like other file systems) will process orphaned inodes even when the user mounts in 'ro' mode. So for the case of a filesystem not knowing the append_dio feature, mounting the filesystem could result in orphaned-for-dio files being deleted, which we clearly don't want. So instead, turn this into an incompat flag. Btw, this is kind of my fault - initially I asked that we add a flag to cover the feature and even suggested that we use an ro flag. It wasn't until I was looking through our commits for v4.0-rc1 that I realized we actually want this to be incompat. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-03-12 18:46:07 -07:00
Linus Torvalds	84399bb075	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Outside of misc fixes, Filipe has a few fsync corners and we're pulling in one more of Josef's fixes from production use here" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. Btrfs: fix data loss in the fast fsync path Btrfs: remove extra run_delayed_refs in update_cowonly_root Btrfs: incremental send, don't rename a directory too soon btrfs: fix lost return value due to variable shadowing Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr Btrfs: fix off-by-one logic error in btrfs_realloc_node Btrfs: add missing inode update when punching hole Btrfs: abort the transaction if we fail to update the free space cache inode Btrfs: fix fsync race leading to ordered extent memory leaks	2015-03-06 13:52:54 -08:00

1 2 3 4 5 ...

39725 Commits