linux_dsm_epyc7002/fs/nilfs2
Andreas Rohner 31ccb1f7ba nilfs2: fix race condition that causes file system corruption
There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode.  It calls __nilfs_mark_inode_dirty() in a
separate transaction.  __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field.
If there is a cached buffer_head in i_bh it is not marked as dirty
again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two.  If this happens the inode is not
on the ns_dirty_files list, but its ifile block is still marked as dirty
and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree.  Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption.  Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time.  A typical error
message would be one of the "bad btree" errors or a warning that a DAT
entry could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:03 -08:00
..
alloc.c nilfs2: use i_blocksize() 2017-02-27 18:43:46 -08:00
alloc.h nilfs2: avoid bare use of 'unsigned' 2016-05-23 17:04:14 -07:00
bmap.c nilfs2: hide function name argument from nilfs_error() 2016-08-02 19:35:16 -04:00
bmap.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
btnode.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
btnode.h fs: have submit_bh users pass in op and flags separately 2016-06-07 13:41:38 -06:00
btree.c mm, pagevec: remove cold parameter for pagevecs 2017-11-15 18:21:06 -08:00
btree.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
cpfile.c nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
cpfile.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
dat.c nilfs2: reduce bare use of printk() with nilfs_msg() 2016-08-02 19:35:17 -04:00
dat.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
dir.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
direct.c nilfs2: reduce bare use of printk() with nilfs_msg() 2016-08-02 19:35:17 -04:00
direct.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
file.c mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
gcinode.c nilfs2: emit error message when I/O error is detected 2016-08-02 19:35:19 -04:00
ifile.c nilfs2: replace nilfs_warning() with nilfs_msg() 2016-08-02 19:35:18 -04:00
ifile.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
inode.c VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) 2017-07-17 08:45:34 +01:00
ioctl.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
Kconfig fs/nilfs2: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:04 -08:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mdt.c VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) 2017-07-17 08:45:34 +01:00
mdt.h nilfs2: avoid bare use of 'unsigned' 2016-05-23 17:04:14 -07:00
namei.c vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
nilfs.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
page.c mm, pagevec: remove cold parameter for pagevecs 2017-11-15 18:21:06 -08:00
page.h nilfs2: avoid bare use of 'unsigned' 2016-05-23 17:04:14 -07:00
recovery.c nilfs2: reduce bare use of printk() with nilfs_msg() 2016-08-02 19:35:17 -04:00
segbuf.c block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
segbuf.h nilfs2: avoid bare use of 'unsigned' 2016-05-23 17:04:14 -07:00
segment.c nilfs2: fix race condition that causes file system corruption 2017-11-17 16:10:03 -08:00
segment.h fs/nilfs2: convert timers to use timer_setup() 2017-11-17 16:10:03 -08:00
sufile.c nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
sufile.h nilfs2: move ioctl interface and disk layout to uapi separately 2016-08-02 19:35:21 -04:00
super.c VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) 2017-07-17 08:45:34 +01:00
sysfs.c nilfs2: fix misuse of a semaphore in sysfs code 2016-08-02 19:35:20 -04:00
sysfs.h nilfs2: clean trailing semicolons in macros 2016-05-23 17:04:14 -07:00
the_nilfs.c nilfs2: reduce bare use of printk() with nilfs_msg() 2016-08-02 19:35:17 -04:00
the_nilfs.h nilfs2: fix misuse of a semaphore in sysfs code 2016-08-02 19:35:20 -04:00