linux_dsm_epyc7002/fs
Nicholas Piggin d53c3dfb23 mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race
Reading and modifying current->mm and current->active_mm and switching
mm should be done with irqs off, to prevent races seeing an intermediate
state.

This is similar to commit 38cf307c1f ("mm: fix kthread_use_mm() vs TLB
invalidate"). At exec-time when the new mm is activated, the old one
should usually be single-threaded and no longer used, unless something
else is holding an mm_users reference (which may be possible).

Absent other mm_users, there is also a race with preemption and lazy tlb
switching. Consider the kernel_execve case where the current thread is
using a lazy tlb active mm:

  call_usermodehelper()
    kernel_execve()
      old_mm = current->mm;
      active_mm = current->active_mm;
      *** preempt *** -------------------->  schedule()
                                               prev->active_mm = NULL;
                                               mmdrop(prev active_mm);
                                             ...
                      <--------------------  schedule()
      current->mm = mm;
      current->active_mm = mm;
      if (!old_mm)
          mmdrop(active_mm);

If we switch back to the kernel thread from a different mm, there is a
double free of the old active_mm, and a missing free of the new one.

Closing this race only requires interrupts to be disabled while ->mm
and ->active_mm are being switched, but the TLB problem requires also
holding interrupts off over activate_mm. Unfortunately not all archs
can do that yet, e.g., arm defers the switch if irqs are disabled and
expects finish_arch_post_lock_switch() to be called to complete the
flush; um takes a blocking lock in activate_mm().

So as a first step, disable interrupts across the mm/active_mm updates
to close the lazy tlb preempt race, and provide an arch option to
extend that to activate_mm which allows architectures doing IPI based
TLB shootdowns to close the second race.

This is a bit ugly, but in the interest of fixing the bug and backporting
before all architectures are converted this is a compromise.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
2020-09-16 12:24:31 +10:00
..
9p
adfs
affs
afs afs: Fix NULL deref in afs_dynroot_depopulate() 2020-08-21 10:56:40 -07:00
autofs fs: autofs: delete repeated words in comments 2020-08-14 19:56:56 -07:00
befs
bfs
btrfs for-5.9-tag 2020-08-13 12:26:18 -07:00
cachefiles
ceph ceph: handle zero-length feature mask in session messages 2020-08-05 17:47:07 +02:00
cifs 3 small cifs/smb3 fixes, one for stable fixing mkdir path with idsfromsid mount option 2020-08-15 08:31:39 -07:00
coda
configfs
cramfs
crypto mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
debugfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
devpts
dlm dlm for 5.9 2020-08-06 19:44:25 -07:00
ecryptfs mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
efivarfs
efs
erofs Changes since last update: 2020-08-06 19:22:51 -07:00
exfat exfat: retain 'VolumeFlags' properly 2020-08-12 08:31:13 +09:00
exportfs
ext2
ext4 Improvements to ext4's block allocator performance for very large file 2020-08-21 11:03:38 -07:00
f2fs f2fs-for-5.9-rc1 2020-08-10 18:33:22 -07:00
fat fat: fix fat_ra_init() for data clusters == 0 2020-08-12 10:58:01 -07:00
freevxfs
fscache
fuse virtio: fixes, features 2020-08-11 14:34:17 -07:00
gfs2 Changes in gfs2: 2020-08-10 18:22:43 -07:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlbfs: prevent filesystem stacking of hugetlbfs 2020-08-12 10:57:56 -07:00
iomap iomap: fall back to buffered writes for invalidation failures 2020-08-05 09:24:16 -07:00
isofs Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
jbd2 jbd2: clean up checksum verification in do_one_pass() 2020-08-19 12:04:35 -04:00
jffs2 This pull request contains changes for JFFS2, UBI and UBIFS 2020-08-10 18:20:04 -07:00
jfs
kernfs
lockd
minix fs/minix: remove expected error message in block_to_path() 2020-08-12 10:58:00 -07:00
nfs NFS client updates for Linux 5.9 2020-08-15 08:26:55 -07:00
nfs_common
nfsd Highlights: 2020-08-09 13:58:04 -07:00
nilfs2 nilfs2: use a more common logging style 2020-08-12 10:58:01 -07:00
nls
notify
ntfs ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type 2020-08-07 11:33:21 -07:00
ocfs2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 21:14:30 -07:00
omfs
openpromfs
orangefs orangefs: remove unnecessary assignment to variable ret 2020-08-04 15:01:58 -04:00
overlayfs Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
proc mm, oom: make the calculation of oom badness more accurate 2020-08-12 10:57:56 -07:00
pstore
qnx4
qnx6
quota \n 2020-08-06 19:28:26 -07:00
ramfs
reiserfs \n 2020-08-06 19:28:26 -07:00
romfs romfs: fix uninitialized memory leak in romfs_dev_read() 2020-08-21 09:52:53 -07:00
squashfs squashfs: avoid bio_alloc() failure with 1Mbyte blocks 2020-08-21 09:52:53 -07:00
sysfs
sysv
tracefs
ubifs This pull request contains changes for JFFS2, UBI and UBIFS 2020-08-10 18:20:04 -07:00
udf \n 2020-08-06 19:28:26 -07:00
ufs fs/ufs: avoid potential u32 multiplication overflow 2020-08-12 10:58:01 -07:00
unicode
vboxsf
verity
xfs Fixes for 5.9-rc1: 2020-08-13 12:22:19 -07:00
zonefs zonefs: add zone-capacity support 2020-08-11 17:42:24 +09:00
aio.c mm: remove unnecessary wrapper function do_mmap_pgoff() 2020-08-07 11:33:27 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c for-5.9/io_uring-20200802 2020-08-03 13:01:22 -07:00
buffer.c Improvements to ext4's block allocator performance for very large file 2020-08-21 11:03:38 -07:00
char_dev.c
compat_binfmt_elf.c
compat.c
coredump.c coredump: add %f for executable filename 2020-08-12 10:58:01 -07:00
d_path.c
dax.c
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c do_epoll_ctl(): clean the failure exits up a bit 2020-08-22 18:25:52 -04:00
exec.c mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race 2020-09-16 12:24:31 +10:00
fcntl.c
fhandle.c
file_table.c
file.c Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 09:40:34 -07:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c
fsopen.c
init.c init: add an init_dup helper 2020-08-04 21:02:38 -04:00
inode.c
internal.h Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 09:40:34 -07:00
io_uring.c io_uring: kill extra iovec=NULL in import_iovec() 2020-08-20 05:36:19 -06:00
io-wq.c
io-wq.h
ioctl.c
Kconfig tmpfs: support 64-bit inums per-sb 2020-08-07 11:33:24 -07:00
Kconfig.binfmt
libfs.c
locks.c Highlights: 2020-08-09 13:58:04 -07:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c exec: restore EACCES of S_ISDIR execve() 2020-08-14 19:56:56 -07:00
namespace.c Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 21:03:25 -07:00
no-block.c
nsfs.c
open.c exec: move S_ISREG() check earlier 2020-08-12 10:58:01 -07:00
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
select.c
seq_file.c
signalfd.c fs/signalfd.c: fix inconsistent return codes for signalfd4 2020-08-12 10:58:01 -07:00
splice.c
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c A set of locking fixes and updates: 2020-08-10 19:07:44 -07:00
utimes.c
xattr.c