linux_dsm_epyc7002/mm
Waiman Long 5458985533 mm: memcg/slab: properly set up gfp flags for objcg pointer array
[ Upstream commit 41eb5df1cbc9b302fc263ad7c9f38cfc38b4df61 ]

Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4.

Since the merging of the new slab memory controller in v5.9, the page
structure stores a pointer to objcg pointer array for slab pages.  When
the slab has no used objects, it can be freed in free_slab() which will
call kfree() to free the objcg pointer array in
memcg_alloc_page_obj_cgroups().  If it happens that the objcg pointer
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.

With the right workload, the slab cache may be set up in a way that allows
the recursive kfree() calling loop to nest deep enough to cause a kernel
stack overflow and panic the system.  In fact, we have a reproducer that
can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
and kmalloc-rcl-128 slabs with the following kfree() loop recursively
called 74 times:

  [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
[<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
[<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
[<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
also found an issue on the allocation side.  If the objcg pointer array
happen to come from the same slab or a circular dependency linkage is
formed with multiple slabs, those affected slabs can never be freed again.

This patch series addresses these two issues by introducing a new set of
kmalloc-cg-<n> caches split from kmalloc-<n> caches.  The new set will
only contain non-reclaimable and non-dma objects that are accounted in
memory cgroups whereas the old set are now for unaccounted objects only.
By making this split, all the objcg pointer arrays will come from the
kmalloc-<n> caches, but those caches will never hold any objcg pointer
array.  As a result, deeply nested kfree() call and the unfreeable slab
problems are now gone.

This patch (of 4):

Since the merging of the new slab memory controller in v5.9, the page
structure may store a pointer to obj_cgroup pointer array for slab pages.
Currently, only the __GFP_ACCOUNT bit is masked off.  However, the array
is not readily reclaimable and doesn't need to come from the DMA buffer.
So those GFP bits should be masked off as well.

Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
that it is consistently applied no matter where it is called.

Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com
Fixes: 286e04b8ed ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:14 +02:00
..
kasan kasan: fix incorrect arguments passing in kasan_add_zero_shadow 2021-01-27 11:55:23 +01:00
backing-dev.c
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm, compaction: make fast_isolate_freepages() stay within zone 2021-03-04 11:38:38 +01:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() 2021-07-14 16:56:13 +02:00
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() 2021-02-10 09:29:21 +01:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm/gup: fix try_grab_compound_head() race with split_huge_page() 2021-07-14 16:55:42 +02:00
highmem.c
hmm.c
huge_memory.c mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split 2021-06-30 08:47:27 -04:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c mm, futex: fix shared futex pgoff on shmem huge page 2021-06-30 08:47:29 -04:00
hwpoison-inject.c
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug
khugepaged.c khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate() 2021-05-19 10:13:07 +02:00
kmemleak.c
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-17 17:06:37 +01:00
Makefile
mapping_dirty_helpers.c
memblock.c memblock: do not start bottom-up allocations with kernel_end 2021-02-10 09:29:15 +01:00
memcontrol.c mm: memcg/slab: properly set up gfp flags for objcg pointer array 2021-07-14 16:56:14 +02:00
memfd.c
memory_hotplug.c arm64: mte: Map hotplugged memory as Normal Tagged 2021-03-17 17:06:28 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-23 14:42:40 +02:00
memory.c swap: fix do_swap_page() race with swapoff 2021-07-14 16:56:13 +02:00
mempolicy.c mm: mempolicy: fix potential pte_unmap_unlock pte error 2020-11-02 12:14:19 -08:00
mempool.c
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c
migrate.c mm, thp: use head page in __migration_entry_wait() 2021-06-30 08:47:26 -04:00
mincore.c
mlock.c
mm_init.c
mmap.c mm/mmap.c: fix mmap return value when vma is merged after call_mmap() 2020-12-06 10:19:07 -08:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-30 14:32:06 +02:00
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c
page_alloc.c mm/page_alloc: fix counting of free pages after take off from buddy 2021-06-10 13:39:27 +02:00
page_counter.c
page_ext.c
page_idle.c
page_io.c swap: fix swapfile read/write offset 2021-03-07 12:34:15 +01:00
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: ptdump: fix build failure 2021-04-21 13:00:57 +02:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/thp: fix page_address_in_vma() on file THP tails 2021-06-30 08:47:27 -04:00
rodata_test.c
shmem.c mm/shmem: fix shmem_swapin() race with swapoff 2021-07-14 16:56:14 +02:00
shuffle.c
shuffle.h
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: memcg/slab: properly set up gfp flags for objcg pointer array 2021-07-14 16:56:14 +02:00
slob.c
slub.c mm/slub.c: include swab.h 2021-06-23 14:42:54 +02:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:50:45 +02:00
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swapfile.c mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare 2021-06-23 14:42:53 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c
userfaultfd.c
util.c
vmacache.c
vmalloc.c mm/vmalloc.c: fix potential memory leak 2021-01-19 18:27:21 +01:00
vmpressure.c
vmscan.c mm/vmscan: restore zone_reclaim_mode ABI 2021-03-04 11:38:38 +01:00
vmstat.c
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c z3fold: prevent reclaim/free race for headless pages 2021-03-30 14:31:54 +02:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: account the number of compacted pages correctly 2021-03-07 12:34:15 +01:00
zswap.c