linux_dsm_epyc7002/fs
Dave Chinner 666d644cd7 xfs: don't free EFIs before the EFDs are committed
Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:

http://oss.sgi.com/archives/xfs/2013-01/msg00082.html

The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI			EFI in AIL	EFD		Result
committed/unpinned	Yes		committed	OK
committed/pinned	No		committed	Shutdown
uncommitted		No		committed	Shutdown


Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-05 13:25:35 -05:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
affs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
autofs4 autofs4 - autofs4_catatonic_mode(): remove redundant null check on kfree() 2013-03-01 12:04:39 -08:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
bfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-03-03 13:13:20 -08:00
cachefiles FS-Cache: Mark cancellation of in-progress operation 2012-12-20 22:34:00 +00:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2013-02-28 17:43:09 -08:00
cifs Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 2013-03-01 12:05:13 -08:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
configfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
cramfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
debugfs Merge 3.9-rc4 into driver-core-next 2013-01-17 19:48:18 -08:00
devpts userns: Allow the userns root to mount of devpts 2013-01-26 22:22:21 -08:00
dlm hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ecryptfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
efs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
exofs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
exportfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
ext3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
f2fs more file_inode() open-coded instances 2013-02-27 16:59:05 -05:00
fat hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
freevxfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
fscache hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
fuse more file_inode() open-coded instances 2013-02-27 16:59:05 -05:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
hfsplus hfsplus: fix issue with unzeroed unused b-tree nodes 2013-02-27 19:10:10 -08:00
hostfs hostfs: directory methods have no business in non-directory inode_operations 2013-02-22 23:31:37 -05:00
hpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
hppfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
hugetlbfs hugetlb_file_setup(): use d_alloc_pseudo() 2013-02-26 02:45:52 -05:00
isofs fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
jbd jbd: don't wake kjournald unnecessarily 2013-01-14 22:50:45 +01:00
jbd2 jbd2: fix ERR_PTR dereference in jbd2__journal_start 2013-03-02 17:08:46 -05:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
lockd Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux 2013-02-28 18:02:55 -08:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
minix new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
nfs NFS client bugfixes for Linux 3.9 2013-03-02 16:46:07 -08:00
nfs_common nfs_common: Update the translation between nfsv3 acls linux posix acls 2013-02-13 06:15:14 -08:00
nfsd Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux 2013-02-28 18:02:55 -08:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
nls
notify hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ntfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ocfs2 hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
omfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
openpromfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
proc fs/proc/vmcore.c: put if tests in the top of the while loop to reduce duplication 2013-02-27 19:10:11 -08:00
pstore A few fixes to reduce places where pstore might hang 2013-02-21 09:38:18 -08:00
qnx4 new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
qnx6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
quota
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
reiserfs fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
romfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
squashfs new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
sysfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sysv new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ubifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
xfs xfs: don't free EFIs before the EFDs are committed 2013-04-05 13:25:35 -05:00
aio.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
anon_inodes.c get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero 2013-02-26 02:46:11 -05:00
attr.c
bad_inode.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
binfmt_aout.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
binfmt_elf.c ImgTec Meta architecture changes for v3.9-rc1 2013-03-03 12:06:09 -08:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_flat.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_misc.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_script.c exec: do not leave bprm->interp on stack 2012-12-20 17:40:19 -08:00
binfmt_som.c
bio-integrity.c
bio.c block: add missing block_bio_complete() tracepoint 2013-01-14 15:00:36 +01:00
block_dev.c Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block 2013-02-28 12:52:24 -08:00
buffer.c Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block 2013-02-28 12:52:24 -08:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
compat.c switch timerfd compat syscalls to COMPAT_SYSCALL_DEFINE 2013-02-03 15:09:25 -05:00
coredump.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
coredump.h
dcache.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
dcookies.c
direct-io.c fs: Fix possible use-after-free with AIO 2013-02-22 23:31:36 -05:00
drop_caches.c
eventfd.c fs, eventfd: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
eventpoll.c epoll: prevent missed events on EPOLL_CTL_MOD 2013-01-02 09:16:43 -08:00
exec.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
fcntl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
fhandle.c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux 2012-12-20 14:04:11 -08:00
fifo.c
file_table.c cache the value of file_inode() in struct file 2013-03-01 19:48:30 -05:00
file.c locking: Various static lock initializer fixes 2013-02-19 08:42:45 +01:00
filesystems.c
fs_struct.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
fs-writeback.c 2 writeback fixes 2013-02-28 13:21:44 -08:00
generic_acl.c
inode.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
internal.h constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ioprio.c
Kconfig fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig 2013-01-17 13:08:45 +01:00
Kconfig.binfmt
libfs.c vfs: drop vmtruncate 2012-12-20 18:46:29 -05:00
locks.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
namespace.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
no-block.c
open.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
pipe.c fs: Preserve error code in get_empty_filp(), part 2 2013-02-22 23:31:32 -05:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-03-02 08:34:06 -08:00
read_write.h
readdir.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
select.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
seq_file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
signalfd.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
splice.c export kernel_write(), convert open-coded instances 2013-02-26 02:46:11 -05:00
stack.c
stat.c switch vfs_getattr() to struct path 2013-02-26 02:46:08 -05:00
statfs.c vfs: fix user_statfs to retry once on ESTALE errors 2012-12-20 18:50:07 -05:00
super.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sync.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
timerfd.c compat: restore timerfd settime and gettime compat syscalls 2013-03-02 09:35:13 -05:00
utimes.c vfs: allow utimensat() calls to retry once on an ESTALE error 2012-12-20 18:50:08 -05:00
xattr_acl.c
xattr.c vfs: make lremovexattr retry once on ESTALE error 2012-12-20 18:50:11 -05:00