linux_dsm_epyc7002/mm
Hugh Dickins 59927fb984 memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().

Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.

It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.

But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.

However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work().  The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15 17:03:03 -07:00
..
backing-dev.c
bootmem.c
bounce.c
cleancache.c
compaction.c
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c
fremap.c
highmem.c
huge_memory.c mm: thp: fix BUG on mm->nr_ptes 2012-03-05 15:49:43 -08:00
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c memcg: free mem_cgroup by RCU to fix oops 2012-03-15 17:03:03 -07:00
memory_hotplug.c
memory-failure.c
memory.c
mempolicy.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mempool.c
migrate.c
mincore.c
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm: fix find_vma_prev 2012-03-06 16:48:03 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c
page_alloc.c
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
prio_tree.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap_state.c
swap.c
swapfile.c
thrash.c
truncate.c
util.c
vmalloc.c
vmscan.c
vmstat.c