linux_dsm_epyc7002/mm
Glauber Costa 6d42c232bd memcg: also test for skip accounting at the page allocation level
The memory we used to hold the memcg arrays is currently accounted to
the current memcg.  But that creates a problem, because that memory can
only be freed after the last user is gone.  Our only way to know which
is the last user, is to hook up to freeing time, but the fact that we
still have some in flight kmallocs will prevent freeing to happen.  I
believe therefore to be just easier to account this memory as global
overhead.

This patch (of 2):

Disabling accounting is only relevant for some specific memcg internal
allocations.  Therefore we would initially not have such check at
memcg_kmem_newpage_charge, since direct calls to the page allocator that
are marked with GFP_KMEMCG only happen outside memcg core.  We are
mostly concerned with cache allocations and by having this test at
memcg_kmem_get_cache we are already able to relay the allocation to the
root cache and bypass the memcg caches altogether.

There is one exception, though: the SLUB allocator does not create large
order caches, but rather service large kmallocs directly from the page
allocator.  Therefore, the following sequence, when backed by the SLUB
allocator:

	memcg_stop_kmem_account();
	kmalloc(<large_number>)
	memcg_resume_kmem_account();

would effectively ignore the fact that we should skip accounting, since
it will drive us directly to this function without passing through the
cache selector memcg_kmem_get_cache.  Such large allocations are
extremely rare but can happen, for instance, for the cache arrays.

This was never a problem in practice, because we weren't skipping
accounting for the cache arrays.  All the allocations we were skipping
were fairly small.  However, the fact that we were not skipping those
allocations are a problem and can prevent the memcgs from going away.
As we fix that, we need to make sure that the fix will also work with
the SLUB allocator.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Reported-by: Michal Hocko <mhocko@suze.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:21 -07:00
..
backing-dev.c drivers: avoid format string in dev_set_name 2013-07-03 16:07:41 -07:00
balloon_compaction.c
bootmem.c mm: kill free_all_bootmem_node() 2013-07-03 16:07:39 -07:00
bounce.c
cleancache.c
compaction.c
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c
filemap.c
fremap.c
frontswap.c
highmem.c
huge_memory.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-07-04 10:29:23 -07:00
hugetlb_cgroup.c
hugetlb.c mm: correctly update zone->managed_pages 2013-07-03 16:07:33 -07:00
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig mm: soft-dirty bits for user memory changes tracking 2013-07-03 16:07:26 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c memcg: also test for skip accounting at the page allocation level 2013-07-09 10:33:21 -07:00
memory_hotplug.c Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
memory-failure.c mm/memory-failure.c: fix memory leak in successful soft offlining 2013-07-03 16:07:31 -07:00
memory.c mm: kill global variable num_physpages 2013-07-03 16:07:38 -07:00
mempolicy.c
mempool.c
migrate.c
mincore.c
mlock.c
mm_init.c mm: tune vm_committed_as percpu_counter batching size 2013-07-03 16:07:32 -07:00
mmap.c mm: use vma_pages() to replace (vm_end - vm_start) >> PAGE_SHIFT 2013-07-03 16:07:26 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c mm: mremap: validate input before taking lock 2013-07-09 10:33:20 -07:00
msync.c
nobootmem.c mm: concentrate modification of totalram_pages into the mm core 2013-07-03 16:07:33 -07:00
nommu.c mm: kill global variable num_physpages 2013-07-03 16:07:38 -07:00
oom_kill.c
page_alloc.c mm: remove duplicated call of get_pfn_range_for_nid 2013-07-09 10:33:20 -07:00
page_cgroup.c
page_io.c mm: remove compressed copy from zram in-memory 2013-07-03 16:07:26 -07:00
page_isolation.c
page-writeback.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru 2013-07-03 16:07:31 -07:00
shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2013-07-03 14:04:58 -07:00
slab_common.c
slab.c
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-07-04 11:40:58 -07:00
swap_state.c
swap.c mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru 2013-07-03 16:07:31 -07:00
swapfile.c swap: discard while swapping only if SWAP_FLAG_DISCARD_PAGES 2013-07-03 16:07:32 -07:00
truncate.c
util.c
vmalloc.c mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_info 2013-07-09 10:33:21 -07:00
vmpressure.c
vmscan.c mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru 2013-07-03 16:07:31 -07:00
vmstat.c