linux_dsm_epyc7002/mm
Ilya Lipnitskiy ec3e06e06f mm: fix race by making init_zero_pfn() early_initcall
commit e720e7d0e983bf05de80b231bccc39f1487f0f16 upstream.

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall.  For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput.  If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:

  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:

  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       kobject_uevent_env+0x7e4/0x7ec
       kset_register+0x68/0x88
       bus_register+0xdc/0x34c
       subsys_virtual_register+0x34/0x78
       wq_sysfs_init+0x1c/0x4c
       do_one_initcall+0x50/0x1a8
       kernel_init_freeable+0x230/0x2c8
       kernel_init+0x10/0x100
       ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       add_mm_counter_fast+0xb4/0xc0
       handle_mm_fault+0x6e4/0xea0
       __get_user_pages.part.78+0x190/0x37c
       __get_user_pages_remote+0x128/0x360
       get_arg_page+0x34/0xa0
       copy_string_kernel+0x194/0x2a4
       kernel_execve+0x11c/0x298
       call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       __mmdrop+0x98/0x1d0
       free_bprm+0x44/0x118
       kernel_execve+0x160/0x1d8
       call_usermodehelper_exec_async+0x114/0x194
       ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level.  Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.

Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07 15:00:10 +02:00
..
kasan kasan: fix incorrect arguments passing in kasan_add_zero_shadow 2021-01-27 11:55:23 +01:00
backing-dev.c
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm, compaction: make fast_isolate_freepages() stay within zone 2021-03-04 11:38:38 +01:00
debug_page_ref.c
debug_vm_pgtable.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() 2021-02-10 09:29:21 +01:00
frame_vector.c
frontswap.c
gup_benchmark.c
gup.c mm/gup: combine put_compound_head() and unpin_user_page() 2020-12-30 11:53:54 +01:00
highmem.c
hmm.c
huge_memory.c mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument 2021-03-30 14:31:47 +02:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hwpoison-inject.c
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h
interval_tree.c
ioremap.c
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug
khugepaged.c mm,thp,shmem: make khugepaged obey tmpfs mount flags 2021-03-04 11:38:20 +01:00
kmemleak.c
ksm.c
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-17 17:06:37 +01:00
Makefile
mapping_dirty_helpers.c
memblock.c memblock: do not start bottom-up allocations with kernel_end 2021-02-10 09:29:15 +01:00
memcontrol.c mm/memcg: fix 5.10 backport of splitting page memcg 2021-03-30 14:32:07 +02:00
memfd.c
memory_hotplug.c arm64: mte: Map hotplugged memory as Normal Tagged 2021-03-17 17:06:28 +01:00
memory-failure.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memory.c mm: fix race by making init_zero_pfn() early_initcall 2021-04-07 15:00:10 +02:00
mempolicy.c
mempool.c
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c
migrate.c mm: fix numa stats for thp migration 2021-01-27 11:55:14 +01:00
mincore.c
mlock.c
mm_init.c
mmap.c mm/mmap.c: fix mmap return value when vma is merged after call_mmap() 2020-12-06 10:19:07 -08:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-30 14:32:06 +02:00
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page_alloc.c mm/memcg: set memcg when splitting page 2021-03-30 14:31:47 +02:00
page_counter.c
page_ext.c
page_idle.c
page_io.c swap: fix swapfile read/write offset 2021-03-07 12:34:15 +01:00
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c
readahead.c
rmap.c mm/rmap: always do TTU_IGNORE_ACCESS 2020-12-30 11:53:55 +01:00
rodata_test.c
shmem.c
shuffle.c
shuffle.h
slab_common.c mm: memcontrol: fix slub memory accounting 2021-03-04 11:38:19 +01:00
slab.c
slab.h mm: memcg/slab: fix obj_cgroup_charge() return value handling 2020-12-06 10:19:07 -08:00
slob.c
slub.c Revert "mm, slub: consider rest of partial list if acquire_slab() fails" 2021-03-17 17:06:13 +01:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swapfile.c swap: fix swapfile read/write offset 2021-03-07 12:34:15 +01:00
truncate.c
usercopy.c
userfaultfd.c
util.c
vmacache.c
vmalloc.c mm/vmalloc.c: fix potential memory leak 2021-01-19 18:27:21 +01:00
vmpressure.c
vmscan.c mm/vmscan: restore zone_reclaim_mode ABI 2021-03-04 11:38:38 +01:00
vmstat.c
workingset.c
z3fold.c z3fold: prevent reclaim/free race for headless pages 2021-03-30 14:31:54 +02:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: account the number of compacted pages correctly 2021-03-07 12:34:15 +01:00
zswap.c