Commit Graph

53419 Commits

Author SHA1 Message Date
David Howells
a9e5b73288 vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion
In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.

Undo this change.

Fixes: e462ec50cb ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20 09:59:33 -07:00
David Howells
660625922b afs: Fix server record deletion
AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.

Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
  ...
  Workqueue: kafsd afs_process_async_call [kafs]
  RIP: 0010:afs_find_server+0x271/0x36f [kafs]
  ...
  Call Trace:
   afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
   afs_deliver_to_call+0x1ee/0x5e8 [kafs]
   afs_process_async_call+0x5b/0xd0 [kafs]
   process_one_work+0x2c2/0x504
   worker_thread+0x1d4/0x2ac
   kthread+0x11f/0x127
   ret_from_fork+0x24/0x30

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20 09:59:33 -07:00
Linus Torvalds
b9abdcfd10 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes.

  Some of that is only a matter with fault injection (broken handling of
  small allocation failure in various mount-related places), but the
  last one is a root-triggerable stack overflow, and combined with
  userns it gets really nasty ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Don't leak MNT_INTERNAL away from internal mounts
  mm,vmscan: Allow preallocating memory for register_shrinker().
  rpc_pipefs: fix double-dput()
  orangefs_kill_sb(): deal with allocation failures
  jffs2_kill_sb(): deal with failed allocations
  hypfs_kill_super(): deal with failed allocations
2018-04-20 09:15:14 -07:00
Linus Torvalds
43f70c9601 Minor cleanups and a bug fix to completely ignore unencrypted filenames
in the lower filesystem when filename encryption is enabled at the
 eCryptfs layer.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJa2LwFAAoJENaSAD2qAscKt00P+wSOy/iUn7wd3L/mtjN+B34l
 8MaBNoruoZ87pPPHm4ZjB85zk5fSzwQl/jKgS8y6dRA1bxcwsUYRMooTIsiOBFeg
 c1BEWfJVL8uq5AtmZ+czwUD2+DNn/U6S+XejXKr/bQyzOOgUtAeX8IPERPzGqJkB
 dXbz21NLBpICxslIHTTHCpCWodm8B9SJSrBeKd1WIad1d+ITjB6fnyHyIlEyFxlk
 yxkXLVZskR/yihIQ8JugJ4Li0nWz4KcdAvS2fbSIG5zgkAtLS9ypPAAQrfCf9pCA
 BYc4Q2Ai425tgVqAwK2uHXOzy7/WsPL4yrIXt/04vTra1XcK9cv2EdojYi1cK04W
 /a6xIWzFtp7HYEhuHZiVdvSTRXEOF9IYUXTw/gavE0KvMZ36k2FU9bLo4WKmx/91
 atr+9p4vE6uA932YY54Wit++IjpLGN7FzXYxVVoACVz7HJ+sESqytPiLVX5iufux
 /xrrbmlEpLEeJa6/F1Yboo9R7aclrMsxEz4Y+Txkk/zmjzP8wYYiYodOCJ23pauY
 5MJruAxLtlcmAzW7FbES/Id+6Rfj42BuzXi3TRceiUv8NWmDTk3y205o94CfJW5U
 mxEiFgb7Nxr1UsL66rZGmpnwcfJY4FFEyuXRGEmATBquEGJPTqhku9HKUkmORa7I
 mYJzK27VX7dE2lAzuMpr
 =bwS0
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Minor cleanups and a bug fix to completely ignore unencrypted
  filenames in the lower filesystem when filename encryption is enabled
  at the eCryptfs layer"

* tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: don't pass up plaintext names when using filename encryption
  ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
  ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
2018-04-20 09:08:37 -07:00
Linus Torvalds
0d9cf33b4a \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlrXYi4ACgkQnJ2qBz9k
 QNmp5Af9HhwGDuTgY+OcVEeZzXQ30BbFQNP4cby5LZ3Wmffio8T97c2DnlDDjr2E
 l8Biq5HTm3Gg5x7WEuopyrS6L4on76RDyr1Ohf34mOoIcxdcWQ7XpiyA8jeP9uLn
 9IJSPYPbA2RsQk4yr8GBRlVQh/udJfSG0vQOQ5cPylgICLsfxuMIC294vFfcgDt1
 wLOy24INlJYbyECMIQKrleyoZQF37Cv7v12KPU0w3eLtuNX0rdI4gwpOS4eEb27G
 2KrAsRYCb5KyFl3mRXBeCPNytiJ3ffFgvFC1Z57GVWVmNnadj/Jctw3zFirsMC2d
 j7IUA4XI/T+1Gql5rCOXcSgeJ0LRZA==
 =KQzD
 -----END PGP SIGNATURE-----

Merge tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

 - isofs memory leak fix

 - two fsnotify fixes of event mask handling

 - udf fix of UTF-16 handling

 - couple other smaller cleanups

* tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix leak of UTF-16 surrogates into encoded strings
  fs: ext2: Adding new return type vm_fault_t
  isofs: fix potential memory leak in mount option parsing
  MAINTAINERS: add an entry for FSNOTIFY infrastructure
  fsnotify: fix typo in a comment about mark->g_list
  fsnotify: fix ignore mask logic in send_to_group()
  isofs compress: Remove VLA usage
  fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
  fanotify: fix logic of events on child
2018-04-20 09:01:26 -07:00
Al Viro
16a34adb93 Don't leak MNT_INTERNAL away from internal mounts
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: Alexander Aring <aring@mojatatu.com>
Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-19 23:52:15 -04:00
Jan Kara
44f06ba829 udf: Fix leak of UTF-16 surrogates into encoded strings
OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # >= v4.6
Reported-by: Mingye Wang <arthur200126@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-18 16:34:55 +02:00
Tyler Hicks
e86281e700 eCryptfs: don't pass up plaintext names when using filename encryption
Both ecryptfs_filldir() and ecryptfs_readlink_lower() use
ecryptfs_decode_and_decrypt_filename() to translate lower filenames to
upper filenames. The function correctly passes up lower filenames,
unchanged, when filename encryption isn't in use. However, it was also
passing up lower filenames when the filename wasn't encrypted or
when decryption failed. Since 88ae4ab980, eCryptfs refuses to lookup
lower plaintext names when filename encryption is enabled so this
resulted in a situation where userspace would see lower plaintext
filenames in calls to getdents(2) but then not be able to lookup those
filenames.

An example of this can be seen when enabling filename encryption on an
eCryptfs mount at the root directory of an Ext4 filesystem:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
ls: cannot access '/upper/lost+found': No such file or directory
 ? lost+found
12 test

With this change, the lower lost+found dentry is ignored:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
12 test

Additionally, some potentially noisy error/info messages in the related
code paths are turned into debug messages so that the logs can't be
easily filled.

Fixes: 88ae4ab980 ("ecryptfs_lookup(): try either only encrypted or plaintext name")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2018-04-16 18:51:22 +00:00
Souptick Joarder
0685693811 fs: ext2: Adding new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite,
pfn_mkwrite and fault handler.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-16 09:52:24 +02:00
Chengguang Xu
4f34a5130a isofs: fix potential memory leak in mount option parsing
When specifying string type mount option (e.g., iocharset)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case. Meanwhile, check memory allocation result
for it.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-16 09:47:41 +02:00
Yan, Zheng
ffdeec7aa4 ceph: always update atime/mtime/ctime for new inode
For new inode, atime/mtime/ctime are uninitialized.  Don't compare
against them.

Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16 09:38:40 +02:00
Tetsuo Handa
8e04944f0e mm,vmscan: Allow preallocating memory for register_shrinker().
syzbot is catching so many bugs triggered by commit 9ee332d99e
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe, but the reality was different; some callers assign
attributes which are needed for kill_sb() after sget() succeeds.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler. Let's avoid
calling deactivate_locked_super() from sget_userns() by preallocating
memory for the shrinker and making register_shrinker() in sget_userns()
never fail.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-16 02:06:47 -04:00
Al Viro
659038428c orangefs_kill_sb(): deal with allocation failures
orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:49:12 -04:00
Al Viro
c66b23c284 jffs2_kill_sb(): deal with failed allocations
jffs2_fill_super() might fail to allocate jffs2_sb_info;
jffs2_kill_sb() must survive that.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:49:05 -04:00
Linus Torvalds
e37563bb6c for-4.17-part2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlrTQ4sACgkQxWXV+ddt
 WDti3Q/+MAeqsLTjvre2RQ3ka5hNyCuVftUIBmcP3YfJbt+xZYQyaewW4Xkfi3cm
 cbJE+zehzf5ag+RJhxk3OwFTvNfLGIO9asWs3b08NGUi6VzwL0/8B/iOdZPuHSAV
 TrecQIBE2Tp+xax9cQEnxav34D4dUtXNaDweGjp1MIIUkDneQP/I0vlTu7vafBgX
 UVxP6riL/MCs7sjTHGIPs0lv8L/fgdmo+dk5SnNuIPTOcFTQXgVrtHjw9IvbKWd4
 aq+sbNWoSrhXUfllbFg/wZqDe9tWn9E2f6m/H0ThSoNdxusSVgacOjFRYh20NKLW
 WGB8Amd/ItGtJwJ1CIypa7VX2U11UAi0XT7BeiK82rUNEJ6moRqFOXG861gRLoTZ
 SpH8uWO+e+CogfXob1KCndn5lot4AM2ZTkCqfrjpM35Nul72PZdne0CxNlmiRupY
 Fdt5GB+sg8plcMaRiYr++BbbHP5tggX1MrhLGEbx2XBs2eRdn+2Lv2I1Ig/U4NUb
 Vf+xk/tFLKGOTSZlbv7SV7ekXxG/3+7gAuL7A1XMETZCwBF4L3hwyW7CgkEBb3PC
 TqX8BwMaRpyp/FgW/QL6edjXZ3a64VaHIfqRPNks3lWWcCHVbzyiVPfEjx+AEJ/0
 abx6jXTJhLUBPuPxEgb5rsSv62RoxPoYHqCErrG95XwnZ8rSCts=
 =I0y0
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "We have queued a few more fixes (error handling, log replay,
  softlockup) and the rest is SPDX updates that touche almost all files
  so the diffstat is long"

* tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Only check first key for committed tree blocks
  btrfs: add SPDX header to Kconfig
  btrfs: replace GPL boilerplate by SPDX -- sources
  btrfs: replace GPL boilerplate by SPDX -- headers
  Btrfs: fix loss of prealloc extents past i_size after fsync log replay
  Btrfs: clean up resources during umount after trans is aborted
  btrfs: Fix possible softlock on single core machines
  Btrfs: bail out on error during replay_dir_deletes
  Btrfs: fix NULL pointer dereference in log_dir_items
2018-04-15 18:08:35 -07:00
Linus Torvalds
09c9b0eaa0 SMB3 fixes, a few for stable, and some important cleanup work from Ronnie of the smb3 transport code
-----BEGIN PGP SIGNATURE-----
 
 iQGwBAABCAAaBQJa0mDhExxzbWZyZW5jaEBnbWFpbC5jb20ACgkQiiy9cAdyT1HA
 EQv+KxZAFONG/YGBIdoyJevnSobzZSTYgsrEmox28rIvWSFr6Xkt9K6WN4lki+Td
 nacByZBhyeKPp7CxdgjmiNunbGfBZJsWk5LwtgdE2jOLjM9CFq8Z7m2qmkXQ4IkJ
 EoH+NWpCUvEe9nj8oaTdSelR2OHH63SG5E5dK29eOB59ixcVe7qKUG29SpJks9qd
 mCdZyhYyuzh469CKGFeQ8qfWopBH+QFBs4SrpQ9d4liNtZ3W8zTsQ3ezX069AGlU
 Ef2xVTT1DtlZ4/6SiXBKlXGH3jjUg67vEGHzT3ZJwfLVNhTLEpKM/hsPiduSnjS8
 JqdVRdFLc2lQzr4HknhNR9yuE5wT5tZM4iZOjNDwT6Sef2Mt0QIBuYI6kb4mv6Qg
 aCVv7uXUMdvSrTk0mYiO1dXYozGUC+uFhJCuaiLax583o3ihp+/INDosQyqTbavZ
 amRBZpWeNysSJwwJu21RdJiM74+fasR/CDy11U94ssxplg1qE72M8DShK7kXtUIy
 g9ZW
 =Ah5z
 -----END PGP SIGNATURE-----

Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "SMB3 fixes, a few for stable, and some important cleanup work from
  Ronnie of the smb3 transport code"

* tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: change validate_buf to validate_iov
  cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
  cifs: Change SMB2_open to return an iov for the error parameter
  cifs: add resp_buf_size to the mid_q_entry structure
  smb3.11: replace a 4 with server->vals->header_preamble_size
  cifs: replace a 4 with server->vals->header_preamble_size
  cifs: add pdu_size to the TCP_Server_Info structure
  SMB311: Improve checking of negotiate security contexts
  SMB3: Fix length checking of SMB3.11 negotiate request
  CIFS: add ONCE flag for cifs_dbg type
  cifs: Use ULL suffix for 64-bit constant
  SMB3: Log at least once if tree connect fails during reconnect
  cifs: smb2pdu: Fix potential NULL pointer dereference
2018-04-15 18:06:22 -07:00
Linus Torvalds
18b7fd1c93 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - various hotfixes

 - kexec_file updates and feature work

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  kernel/kexec_file.c: move purgatories sha256 to common code
  kernel/kexec_file.c: allow archs to set purgatory load address
  kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
  kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: split up __kexec_load_puragory
  kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
  kernel/kexec_file.c: search symbols in read-only kexec_purgatory
  kernel/kexec_file.c: make purgatory_info->ehdr const
  kernel/kexec_file.c: remove checks in kexec_purgatory_load
  include/linux/kexec.h: silence compile warnings
  kexec_file, x86: move re-factored code to generic side
  x86: kexec_file: clean up prepare_elf64_headers()
  x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
  x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
  x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
  kexec_file,x86,powerpc: factor out kexec_file_ops functions
  kexec_file: make use of purgatory optional
  proc: revalidate misc dentries
  mm, slab: reschedule cache_reap() on the same CPU
  ...
2018-04-14 08:50:50 -07:00
Alexey Dobriyan
1da4d377f9 proc: revalidate misc dentries
If module removes proc directory while another process pins it by
chdir'ing to it, then subsequent recreation of proc entry and all
entries down the tree will not be visible to any process until pinning
process unchdir from directory and unpins everything.

Steps to reproduce:

	proc_mkdir("aaa", NULL);
	proc_create("aaa/bbb", ...);

		chdir("/proc/aaa");

	remove_proc_entry("aaa/bbb", NULL);
	remove_proc_entry("aaa", NULL);

	proc_mkdir("aaa", NULL);
	# inaccessible because "aaa" dentry still points
	# to the original "aaa".
	proc_create("aaa/bbb", ...);

Fix is to implement ->d_revalidate and ->d_delete.

Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Linus Torvalds
48023102b7 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "In addition to bug fixes and cleanups there are two new features from
  Amir:

   - Consistent inode number support for the case when layers are not
     all on the same filesystem (feature is dubbed "xino").

   - Optimize overlayfs file handle decoding. This one touches the
     exportfs interface to allow detecting the disconnected directory
     case"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update documentation w.r.t "xino" feature
  ovl: add support for "xino" mount and config options
  ovl: consistent d_ino for non-samefs with xino
  ovl: consistent i_ino for non-samefs with xino
  ovl: constant st_ino for non-samefs with xino
  ovl: allocate anon bdev per unique lower fs
  ovl: factor out ovl_map_dev_ino() helper
  ovl: cleanup ovl_update_time()
  ovl: add WARN_ON() for non-dir redirect cases
  ovl: cleanup setting OVL_INDEX
  ovl: set d->is_dir and d->opaque for last path element
  ovl: Do not check for redirect if this is last layer
  ovl: lookup in inode cache first when decoding lower file handle
  ovl: do not try to reconnect a disconnected origin dentry
  ovl: disambiguate ovl_encode_fh()
  ovl: set lower layer st_dev only if setting lower st_ino
  ovl: fix lookup with middle layer opaque dir and absolute path redirects
  ovl: Set d->last properly during lookup
  ovl: set i_ino to the value of st_ino for NFS export
2018-04-13 16:55:41 -07:00
Qu Wenruo
5d41be6f70 btrfs: Only check first key for committed tree blocks
When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
kernel warning due to first key verification:

[ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.523830] Modules linked in:
[ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.527101] Call Trace:
[ 4239.527251]  read_tree_block+0x42/0x70
[ 4239.527434]  read_node_slot+0xd2/0x110
[ 4239.527632]  push_leaf_right+0xad/0x1b0
[ 4239.527809]  split_leaf+0x4ea/0x700
[ 4239.527988]  ? leaf_space_used+0xbc/0xe0
[ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
[ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
[ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
[ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
[ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
[ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
[ 4239.529445]  ? start_transaction+0xa8/0x4f0
[ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
[ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
[ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
[ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
[ 4239.532098]  generic_file_direct_write+0xcb/0x170
[ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
[ 4239.532491]  aio_write+0xe2/0x180
[ 4239.532669]  ? lock_acquire+0xac/0x1e0
[ 4239.532839]  ? __might_fault+0x3e/0x90
[ 4239.533032]  do_io_submit+0x594/0x860
[ 4239.533223]  ? do_io_submit+0x594/0x860
[ 4239.533398]  SyS_io_submit+0x10/0x20
[ 4239.533560]  ? SyS_io_submit+0x10/0x20
[ 4239.533729]  do_syscall_64+0x75/0x1d0
[ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 4239.534182] RIP: 0033:0x7f8519741697

The problem here is, at btree_read_extent_buffer_pages() we don't have
acquired read/write lock on that extent buffer, only basic info like
level/bytenr is reliable.

So race condition leads to such false alert.

However in current call site, it's impossible to acquire proper lock
without race window.
To fix the problem, we only verify first key for committed tree blocks
(whose generation is no larger than fs_info->last_trans_committed), so
the content of such tree blocks will not change and there is no need to
get read/write lock.

Reported-by: Nikolay Borisov <nborisov@suse.com>
Fixes: 581c176041 ("btrfs: Validate child tree block's level and first key")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-13 16:16:15 +02:00
Amir Goldstein
92183a4289 fsnotify: fix ignore mask logic in send_to_group()
The ignore mask logic in send_to_group() does not match the logic
in fanotify_should_send_event(). In the latter, a vfsmount mark ignore
mask precedes an inode mark mask and in the former, it does not.

That difference may cause events to be sent to fanotify backend for no
reason. Fix the logic in send_to_group() to match that of
fanotify_should_send_event().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-13 15:52:49 +02:00
Ronnie Sahlberg
c1596ff524 cifs: change validate_buf to validate_iov
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:55 -05:00
Ronnie Sahlberg
05432e2938 cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:53 -05:00
Ronnie Sahlberg
91cb74f514 cifs: Change SMB2_open to return an iov for the error parameter
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:50 -05:00
Ronnie Sahlberg
e19b2bc079 cifs: add resp_buf_size to the mid_q_entry structure
and get rid of some more calls to get_rfc1002_length()

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:48 -05:00
Steve French
0d4b46ba7d smb3.11: replace a 4 with server->vals->header_preamble_size
More cleanup of use of hardcoded 4 byte RFC1001 field size

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-12 20:32:13 -05:00
Ronnie Sahlberg
9fdd2e0034 cifs: replace a 4 with server->vals->header_preamble_size
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 17:12:22 -05:00
Ronnie Sahlberg
2e96467d9e cifs: add pdu_size to the TCP_Server_Info structure
and get rid of some get_rfc1002_length() in smb2

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 17:06:33 -05:00
Steve French
5100d8a3fe SMB311: Improve checking of negotiate security contexts
SMB3.11 crypto and hash contexts were not being checked strictly enough.
Add parsing and validity checking for the security contexts in the SMB3.11
negotiate response.

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 16:54:06 -05:00
Steve French
136ff1b4b6 SMB3: Fix length checking of SMB3.11 negotiate request
The length checking for SMB3.11 negotiate request includes
"negotiate contexts" which caused a buffer validation problem
and a confusing warning message on SMB3.11 mount e.g.:

     SMB2 server sent bad RFC1001 len 236 not 170

Fix the length checking for SMB3.11 negotiate to account for
the new negotiate context so that we don't log a warning on
SMB3.11 mount by default but do log warnings if lengths returned
by the server are incorrect.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 16:52:38 -05:00
Linus Torvalds
80aa76bcd3 Changes since last update:
- Cleanup unnecessary function call parameters
 - Fix a use-after-free bug when aborting logging intents
 - Refactor filestreams state data to avoid use-after-free bug
 - Fix incorrect removal of cow extents when truncating extended
   attributes.
 - Refactor open-coded __set_page_dirty in favor of using vfs function.
 - Fix a deadlock when fstrim and fs shutdown race.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJazZ/HAAoJEPh/dxk0SrTrGeYP/Asis7MZ3TWfzsPHwK6EoH+w
 q1VnKVMqjSRnE8DYmx8w3tMqny2qg9klLkT1SRRz9Htr5CER5XqIX6ZYlQpVKx2R
 ycS+I1l/V/kqEmAgDgSK3DZ5uMgKHfo0w7GbKFDg69YDztN3yNBpDfUZ4VqA/Ua2
 ADaeMYkJh6oBB5yZAWGyAJEM5TieqQppqyc+WLhbORyjEFreUTTmLqbzLPnceSih
 rQLg0noDwZK0XqbwndYXGTNKKoQtiJalnZP18DhH4zOr+FH03i/gmlU3w+ANl3eX
 IEE0TR2EkNHLZeduz7xT7ZHUOo0TaGObBK8CJFSojibQ/HooVjLcFQResd3q0coN
 WTkbTuxHwMqk2IujKRTqli/saENhvFrOrm/nYTFPw9+3GpRt0iLrXPSBbeMjmsZG
 XntdimPEHywjYrdW10VRH+6E6tvQiC/tl3abBuXdaEOJs1KZmPYNt42EF0ZQ5Xs5
 IeDOhPLuuyUwRf12RVA9WS6xGdMR0+foMqncXZNcAzxQeAVfUNEtASNMTNIOO2H5
 xD34/1ooFJnwT755VLT9U/qUd5CHtWO3AkH+9RVCpKWTaEOSY+gV6ZgmINv9npur
 Vw2xnZwxysegVlu76uct/b/sy/8J3OKYIoSYwbt6Zxi0M1WF9zanJ9pKCsBYdL6A
 xWoMr0X1yFGcYJozqwJU
 =uGz0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "Most of these are code cleanups, but there are a couple of notable
  use-after-free bug fixes.

  This series has been run through a full xfstests run over the week and
  through a quick xfstests run against this morning's master, with no
  major failures reported.

   - clean up unnecessary function call parameters

   - fix a use-after-free bug when aborting logging intents

   - refactor filestreams state data to avoid use-after-free bug

   - fix incorrect removal of cow extents when truncating extended
     attributes.

   - refactor open-coded __set_page_dirty in favor of using vfs
     function.

   - fix a deadlock when fstrim and fs shutdown race"

* tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Force log to disk before reading the AGF during a fstrim
  Export __set_page_dirty
  xfs: only cancel cow blocks when truncating the data fork
  xfs: non-scrub - remove unused function parameters
  xfs: remove filestream item xfs_inode reference
  xfs: fix intent use-after-free on abort
  xfs: Remove "committed" argument of xfs_dir_ialloc
2018-04-12 13:28:22 -07:00
Linus Torvalds
4ac1800f81 We decided to request the latest three patches to be merged into this
merge window while it's still open.
 
 1. The first patch adds a new function to lockref: lockref_put_not_zero
 2. The second patch fixes GFS2's glock dump code so it uses the new lockref
    function. This fixes a problem whereby lock dumps could miss glocks.
 3. I made a minor patch to update some comments and fix the lock ordering
    text in our gfs2-glocks.txt Documentation file.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaz6pdAAoJENeLYdPf93o71wMH/0cEo34xWiScRM07EgLmZZ3q
 YXMvpTvrwK+9i2u8anxiX1smezHeS+7jPrYOG8AGu3IZvKYGTDOwoIY9pxESy5gs
 1Rf60s6pPE/dkTSqPaNNuBxPrM1yVyRWOPx04LxC5BCXhsS/6U2RS9ElxGDe7Nyq
 P66z1wfm63+erDR7mKSuOL3Ejtglj2EPcrAupaBlRS0wjdUQ9ORyrZBpT6JMOWqd
 HWjchrzWVAqx+iyLHlKZjTyPHsPaUBaj1fuv/Vcgu5sJmEJ9mF4s/GQTdwIzi8ip
 ByD7MfilyrT7dxRm1uw8OJ7TvqNeaCtxsyNGGBOlSx81s/pk5Vhs8bevnczNvi8=
 =jWsi
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.17.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull more gfs2 updates from Bob Peterson:
 "We decided to request the latest three patches to be merged into this
  merge window while it's still open.

   - The first patch adds a new function to lockref:
     lockref_put_not_zero

   - The second patch fixes GFS2's glock dump code so it uses the new
     lockref function. This fixes a problem whereby lock dumps could
     miss glocks.

   - I made a minor patch to update some comments and fix the lock
     ordering text in our gfs2-glocks.txt Documentation file"

* tag 'gfs2-4.17.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Minor improvements to comments and documentation
  gfs2: Stop using rhashtable_walk_peek
  lockref: Add lockref_put_not_zero
2018-04-12 13:00:44 -07:00
Linus Torvalds
a1bf4c7da6 NFS client updates for Linux 4.17
Stable bugfixes:
 - xprtrdma: Fix corner cases when handling device removal # v4.12+
 - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+
 
 Features:
 - New sunrpc tracepoint for RPC pings
 - Finer grained NFSv4 attribute checking
 - Don't unnecessarily return NFS v4 delegations
 
 Other bugfixes and cleanups:
 - Several other small NFSoRDMA cleanups
 - Improvements to the sunrpc RTT measurements
 - A few sunrpc tracepoint cleanups
 - Various fixes for NFS v4 lock notifications
 - Various sunrpc and NFS v4 XDR encoding cleanups
 - Switch to the ida_simple API
 - Fix NFSv4.1 exclusive create
 - Forget acl cache after setattr operation
 - Don't advance the nfs_entry readdir cookie if xdr decoding fails
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlrNG1IACgkQ18tUv7Cl
 QOvotw//fQoUgQ/AOJGlZo/4ws2mGJN3dfwwKM8xYOnHaxppOYubZRHwvswK8d22
 +XR/Q6IVbUxI3mJluv1L0d9CJT06s3c9CO90McIJbk4CWihGP19bNIY4JiPlzrbv
 4FDiyOvMBej2UXbHX5EzKj0srxyBoEVf3iUAIa6DaHi3c6EIUo6fP3d2eRNJStqd
 WMyZs+nqr2W9biyClxntT7l/Sk+o+4I7M3Oo9pjjS+PiePYdaMrL5T1kPeHaJshF
 GMGXkbvVdqpDRiXX84R9+2/nuSiA15eEnaR94UNvs84oLR3qob3ZhxhudqFdSPrX
 RS6E7m34gY/EaQm/wbB26PZm+3jHd4Pqm5SKLbyFfoCmG6oMwBvXNRJZas1DFaHM
 CMOECvfAr6kixVLkAN0MNQ2Ku/FuJ52OLP1dRLmxsblocnhEPujc6RSz6Ju/v3a0
 adbpmJMA2IoSGgXMu3g1VGnjHfMj7ZmjtpigXVvlcUqQGCL7t4ngh23cpeTQeJ76
 bMwSHUQu18NbmtJjBTE+PIm7mdCrpQD7ZuOPWpK62zxLYUnnv7nm75m84DrDru7d
 XAmrCmdUJNrVWQs6BAtCXgO4PZ6xNGLosb0xTQXTAQYftc+DRJ9SW/VGc0Mp1L9m
 0G0iz++b8cy4Pih5UCDJcCkpjCIvHLcn72zn1kbufWqG3xr2koc=
 =IlWo
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - xprtrdma: Fix corner cases when handling device removal # v4.12+
   - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+

  Features:
   - New sunrpc tracepoint for RPC pings
   - Finer grained NFSv4 attribute checking
   - Don't unnecessarily return NFS v4 delegations

  Other bugfixes and cleanups:
   - Several other small NFSoRDMA cleanups
   - Improvements to the sunrpc RTT measurements
   - A few sunrpc tracepoint cleanups
   - Various fixes for NFS v4 lock notifications
   - Various sunrpc and NFS v4 XDR encoding cleanups
   - Switch to the ida_simple API
   - Fix NFSv4.1 exclusive create
   - Forget acl cache after setattr operation
   - Don't advance the nfs_entry readdir cookie if xdr decoding fails"

* tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits)
  NFS: advance nfs_entry cookie only after decoding completes successfully
  NFSv3/acl: forget acl cache after setattr
  NFSv4.1: Fix exclusive create
  NFSv4: Declare the size up to date after it was set.
  nfs: Use ida_simple API
  NFSv4: Fix the nfs_inode_set_delegation() arguments
  NFSv4: Clean up CB_GETATTR encoding
  NFSv4: Don't ask for attributes when ACCESS is protected by a delegation
  NFSv4: Add a helper to encode/decode struct timespec
  NFSv4: Clean up encode_attrs
  NFSv4; Clean up XDR encoding of type bitmap4
  NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group
  SUNRPC: Add a helper for encoding opaque data inline
  SUNRPC: Add helpers for decoding opaque and string types
  NFSv4: Ignore change attribute invalidations if we hold a delegation
  NFS: More fine grained attribute tracking
  NFS: Don't force unnecessary cache invalidation in nfs_update_inode()
  NFS: Don't redirty the attribute cache in nfs_wcc_update_inode()
  NFS: Don't force a revalidation of all attributes if change is missing
  NFS: Convert NFS_INO_INVALID flags to unsigned long
  ...
2018-04-12 12:55:50 -07:00
Linus Torvalds
7214dd4ea9 Merge branch 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs thaw updates from Al Viro:
 "An ancient series that has fallen through the cracks in the previous
  cycle"

* 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  buffer.c: call thaw_super during emergency thaw
  vfs: factor sb iteration out of do_emergency_remount
2018-04-12 12:28:32 -07:00
Linus Torvalds
19e8a2f875 Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "The AFS series posted by dhowells depended upon lookup_one_len()
  rework; now that prereq is in the mainline, that series had been
  rebased on top of it and got some exposure and testing..."

* 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Do better accretion of small writes on newly created content
  afs: Add stats for data transfer operations
  afs: Trace protocol errors
  afs: Locally edit directory data for mkdir/create/unlink/...
  afs: Adjust the directory XDR structures
  afs: Split the directory content defs into a header
  afs: Fix directory handling
  afs: Split the dynroot stuff out and give it its own ops tables
  afs: Keep track of invalid-before version for dentry coherency
  afs: Rearrange status mapping
  afs: Make it possible to get the data version in readpage
  afs: Init inode before accessing cache
  afs: Introduce a statistics proc file
  afs: Dump bad status record
  afs: Implement @cell substitution handling
  afs: Implement @sys substitution handling
  afs: Prospectively look up extra files when doing a single lookup
  afs: Don't over-increment the cell usage count when pinning it
  afs: Fix checker warnings
  vfs: Remove the const from dir_context::actor
2018-04-12 11:59:06 -07:00
Bob Peterson
3e7aafc39c GFS2: Minor improvements to comments and documentation
This patch simply fixes some comments and the gfs2-glocks.txt file:
Places where i_rwsem was called i_mutex, and adding i_rw_mutex.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12 10:07:51 -07:00
Andreas Gruenbacher
3fd5d3ad35 gfs2: Stop using rhashtable_walk_peek
Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump.  Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12 09:41:19 -07:00
David Sterba
852eb3aeea btrfs: add SPDX header to Kconfig
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 16:29:55 +02:00
David Sterba
c1d7c514f7 btrfs: replace GPL boilerplate by SPDX -- sources
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 16:29:51 +02:00
David Sterba
9888c3402c btrfs: replace GPL boilerplate by SPDX -- headers
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Unify the include protection macros to match the file names.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 16:29:46 +02:00
Filipe Manana
471d557afe Btrfs: fix loss of prealloc extents past i_size after fsync log replay
Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009, commit c71bf099ab ("Btrfs:
Avoid orphan inodes cleanup while replaying log"), because it marks
the inode as an orphan instead of dropping any extents beyond i_size
before replaying logged extents, so after the log replay, and while
the mount operation is still ongoing, we find the inode marked as an
orphan and then perform a truncation (drop extents beyond the inode's
i_size). Because the processing of orphan inodes is still done
right after replaying the log and before the mount operation finishes,
the intention of that commit does not make any sense (at least as
of today). However reverting that behaviour is not enough, because
we can not simply discard all extents beyond i_size and then replay
logged extents, because we risk dropping extents beyond i_size created
in past transactions, for example:

  add prealloc extent beyond i_size
  fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
  transaction commit
  add another prealloc extent beyond i_size
  fsync - triggers the fast fsync path
  power failure

In that scenario, we would drop the first extent and then replay the
second one. To fix this just make sure that all prealloc extents
beyond i_size are logged, and if we find too many (which is far from
a common case), fallback to a full transaction commit (like we do when
logging regular extents in the fast fsync path).

Trivial reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt
 $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
 $ sync
 $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
 $ xfs_io -c "fsync" /mnt/foo
 <power failure>

 # mount to replay log
 $ mount /dev/sdb /mnt
 # at this point the file only has one extent, at offset 0, size 256K

A test case for fstests follows soon, covering multiple scenarios that
involve adding prealloc extents with previous shrinking truncates and
without such truncates.

Fixes: c71bf099ab ("Btrfs: Avoid orphan inodes cleanup while replaying log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 14:50:36 +02:00
Liu Bo
af72273381 Btrfs: clean up resources during umount after trans is aborted
Currently if some fatal errors occur, like all IO get -EIO, resources
would be cleaned up when
a) transaction is being committed or
b) BTRFS_FS_STATE_ERROR is set

However, in some rare cases, resources may be left alone after transaction
gets aborted and umount may run into some ASSERT(), e.g.
ASSERT(list_empty(&block_group->dirty_list));

For case a), in btrfs_commit_transaciton(), there're several places at the
beginning where we just call btrfs_end_transaction() without cleaning up
resources.  For case b), it is possible that the trans handle doesn't have
any dirty stuff, then only trans hanlde is marked as aborted while
BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.

This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
all resources won't stay in memory after umount.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 14:49:47 +02:00
Amir Goldstein
795939a93e ovl: add support for "xino" mount and config options
With mount option "xino=on", mounter declares that there are enough
free high bits in underlying fs to hold the layer fsid.
If overlayfs does encounter underlying inodes using the high xino
bits reserved for layer fsid, a warning will be emitted and the original
inode number will be used.

The mount option name "xino" goes after a similar meaning mount option
of aufs, but in overlayfs case, the mapping is stateless.

An example for a use case of "xino=on" is when upper/lower is on an xfs
filesystem. xfs uses 64bit inode numbers, but it currently never uses the
upper 8bit for inode numbers exposed via stat(2) and that is not likely to
change in the future without user opting-in for a new xfs feature. The
actual number of unused upper bit is much larger and determined by the xfs
filesystem geometry (64 - agno_log - agblklog - inopblog). That means
that for all practical purpose, there are enough unused bits in xfs
inode numbers for more than OVL_MAX_STACK unique fsid's.

Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode
numbers are allocated sequentially since boot, so they will practially
never use the high inode number bits.

For compatibility with applications that expect 32bit inodes, the feature
can be disabled with "xino=off". The option "xino=auto" automatically
detects underlying filesystem that use 32bit inodes and enables the
feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of
the same name, determine if the default mode for overlayfs mount is
"xino=auto" or "xino=off".

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein
adbf4f7ea8 ovl: consistent d_ino for non-samefs with xino
When overlay layers are not all on the same fs, but all inode numbers
of underlying fs do not use the high 'xino' bits, overlay st_ino values
are constant and persistent.

In that case, relax non-samefs constraint for consistent d_ino and always
iterate non-merge dir using ovl_fill_real() actor so we can remap lower
inode numbers to unique lower fs range.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein
12574a9f4c ovl: consistent i_ino for non-samefs with xino
When overlay layers are not all on the same fs, but all inode numbers
of underlying fs do not use the high 'xino' bits, overlay st_ino values
are constant and persistent.

In that case, set i_ino value to the same value as st_ino for nfsd
readdirplus validator.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein
e487d889b7 ovl: constant st_ino for non-samefs with xino
On 64bit systems, when overlay layers are not all on the same fs, but
all inode numbers of underlying fs are not using the high bits, use the
high bits to partition the overlay st_ino address space.  The high bits
hold the fsid (upper fsid is 0).  This way overlay inode numbers are unique
and all inodes use overlay st_dev.  Inode numbers are also persistent
for a given layer configuration.

Currently, our only indication for available high ino bits is from a
filesystem that supports file handles and uses the default encode_fh()
operation, which encodes a 32bit inode number.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein
5148626b80 ovl: allocate anon bdev per unique lower fs
Instead of allocating an anonymous bdev per lower layer, allocate
one anonymous bdev per every unique lower fs that is different than
upper fs.

Every unique lower fs is assigned an fsid > 0 and the number of
unique lower fs are stored in ofs->numlowerfs.

The assigned fsid is stored in the lower layer struct and will be
used also for inode number multiplexing.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Amir Goldstein
da309e8c05 ovl: factor out ovl_map_dev_ino() helper
A helper for ovl_getattr() to map the values of st_dev and st_ino
according to constant st_ino rules.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Miklos Szeredi
8f35cf51cd ovl: cleanup ovl_update_time()
No need to mess with an alias, the upperdentry can be retrieved directly
from the overlay inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:50 +02:00
Miklos Szeredi
3a291774d1 ovl: add WARN_ON() for non-dir redirect cases
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12 12:04:49 +02:00