linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-19 09:08:46 +07:00

History

Johannes Weiner 8a931f8013 mm: memcontrol: recursive memory.low protection Right now, the effective protection of any given cgroup is capped by its own explicit memory.low setting, regardless of what the parent says. The reasons for this are mostly historical and ease of implementation: to make delegation of memory.low safe, effective protection is the min() of all memory.low up the tree. Unfortunately, this limitation makes it impossible to protect an entire subtree from another without forcing the user to make explicit protection allocations all the way to the leaf cgroups - something that is highly undesirable in real life scenarios. Consider memory in a data center host. At the cgroup top level, we have a distinction between system management software and the actual workload the system is executing. Both branches are further subdivided into individual services, job components etc. We want to protect the workload as a whole from the system management software, but that doesn't mean we want to protect and prioritize individual workload wrt each other. Their memory demand can vary over time, and we'd want the VM to simply cache the hottest data within the workload subtree. Yet, the current memory.low limitations force us to allocate a fixed amount of protection to each workload component in order to get protection from system management software in general. This results in very inefficient resource distribution. Another concern with mandating downward allocation is that, as the complexity of the cgroup tree grows, it gets harder for the lower levels to be informed about decisions made at the host-level. Consider a container inside a namespace that in turn creates its own nested tree of cgroups to run multiple workloads. It'd be extremely difficult to configure memory.low parameters in those leaf cgroups that on one hand balance pressure among siblings as the container desires, while also reflecting the host-level protection from e.g. rpm upgrades, that lie beyond one or more delegation and namespacing points in the tree. It's highly unusual from a cgroup interface POV that nested levels have to be aware of and reflect decisions made at higher levels for them to be effective. To enable such use cases and scale configurability for complex trees, this patch implements a resource inheritance model for memory that is similar to how the CPU and the IO controller implement work-conserving resource allocations: a share of a resource allocated to a subree always applies to the entire subtree recursively, while allowing, but not mandating, children to further specify distribution rules. That means that if protection is explicitly allocated among siblings, those configured shares are being followed during page reclaim just like they are now. However, if the memory.low set at a higher level is not fully claimed by the children in that subtree, the "floating" remainder is applied to each cgroup in the tree in proportion to its size. Since reclaim pressure is applied in proportion to size as well, each child in that tree gets the same boost, and the effect is neutral among siblings - with respect to each other, they behave as if no memory control was enabled at all, and the VM simply balances the memory demands optimally within the subtree. But collectively those cgroups enjoy a boost over the cgroups in neighboring trees. E.g. a leaf cgroup with a memory.low setting of 0 no longer means that it's not getting a share of the hierarchically assigned resource, just that it doesn't claim a fixed amount of it to protect from its siblings. This allows us to recursively protect one subtree (workload) from another (system management), while letting subgroups compete freely among each other - without having to assign fixed shares to each leaf, and without nested groups having to echo higher-level settings. The floating protection composes naturally with fixed protection. Consider the following example tree: A A: low = 2G / \ A1: low = 1G A1 A2 A2: low = 0G As outside pressure is applied to this tree, A1 will enjoy a fixed protection from A2 of 1G, but the remaining, unclaimed 1G from A is split evenly among A1 and A2, coming out to 1.5G and 0.5G. There is a slight risk of regressing theoretical setups where the top-level cgroups don't know about the true budgeting and set bogusly high "bypass" values that are meaningfully allocated down the tree. Such setups would rely on unclaimed protection to be discarded, and distributing it would change the intended behavior. Be safe and hide the new behavior behind a mount option, 'memory_recursiveprot'. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Link: http://lkml.kernel.org/r/20200227195606.46212-4-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2020-04-02 09:35:28 -07:00
..
kasan	RISC-V Patches for the 5.6 Merge Window, Part 1	2020-01-31 11:23:29 -08:00
backing-dev.c	memcg: fix a crash in wb_workfn when a device disappears	2020-01-31 10:30:36 -08:00
balloon_compaction.c	mm/balloon_compaction: suppress allocation warnings	2019-09-04 07:42:01 -04:00
cleancache.c	Driver Core and debugfs changes for 5.3-rc1	2019-07-12 12:24:03 -07:00
cma_debug.c	mm/cma_debug.c: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops	2019-12-01 12:59:09 -08:00
cma.c	mm/cma.c: switch to bitmap_zalloc() for cma bitmap allocation	2019-12-01 12:59:09 -08:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()	2019-10-14 15:04:01 -07:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	mm: dump_page(): additional diagnostics for huge pinned pages	2020-04-02 09:35:27 -07:00
dmapool.c	mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options	2019-07-12 11:05:46 -07:00
early_ioremap.c	mm/early_ioremap.c: use %pa to print resource_size_t variables	2020-01-31 10:30:38 -08:00
fadvise.c	fs: Export generic_fadvise()	2019-08-30 22:43:58 -07:00
failslab.c	mm/failslab.c: by default, do not fail allocations with direct reclaim only	2019-07-12 11:05:43 -07:00
filemap.c	mm/filemap.c: rewrite pagecache_get_page documentation	2020-04-02 09:35:27 -07:00
frame_vector.c	mm: untag user pointers in get_vaddr_frames	2019-09-25 17:51:41 -07:00
frontswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482	2019-06-19 17:09:52 +02:00
gup_benchmark.c	mm/gup_benchmark: support pin_user_pages() and related calls	2020-04-02 09:35:27 -07:00
gup.c	mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path	2020-04-02 09:35:27 -07:00
highmem.c	mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions	2019-12-10 10:12:55 +01:00
hmm.c	mm: pagewalk: add 'depth' parameter to pte_hole	2020-02-04 03:05:25 +00:00
huge_memory.c	mm/gup: track FOLL_PIN pages	2020-04-02 09:35:27 -07:00
hugetlb_cgroup.c	hugetlb_cgroup: fix illegal access to memory	2020-03-29 09:47:05 -07:00
hugetlb.c	mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages	2020-04-02 09:35:27 -07:00
hwpoison-inject.c	mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops	2019-12-01 12:59:09 -08:00
init-mm.c	mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch	2019-10-19 06:32:32 -04:00
internal.h	mm: swap: make page_evictable() inline	2020-04-02 09:35:27 -07:00
interval_tree.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248	2019-06-19 17:09:08 +02:00
Kconfig	mm/Kconfig: fix trivial help text punctuation	2019-12-01 12:59:10 -08:00
Kconfig.debug	mm: add generic ptdump	2020-02-04 03:05:25 +00:00
khugepaged.c	mm/thp: flush file for !is_shmem PageDirty() case in collapse_file()	2019-12-01 12:59:09 -08:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	mm/kmemleak.c: use address-of operator on section symbols	2020-04-02 09:35:26 -07:00
ksm.c	* PPC secure guest support	2019-12-04 11:08:30 -08:00
list_lru.c	mm: memcg/slab: use mem_cgroup_from_obj()	2020-04-02 09:35:28 -07:00
maccess.c	uaccess: Add strict non-pagefault kernel-space read function	2019-11-02 12:39:12 -07:00
madvise.c	mm: do not allow MADV_PAGEOUT for CoW pages	2020-03-21 18:56:06 -07:00
Makefile	mm/Makefile: disable KCSAN for kmemleak	2020-04-02 09:35:26 -07:00
mapping_dirty_helpers.c	mm: Add write-protect and clean utilities for address space ranges	2019-11-06 13:03:36 +01:00
memblock.c	memblock: Use __func__ in remaining memblock_dbg() call sites	2020-01-31 10:30:38 -08:00
memcontrol.c	mm: memcontrol: recursive memory.low protection	2020-04-02 09:35:28 -07:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-09-24 15:54:08 -07:00
memory_hotplug.c	mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled	2020-03-06 07:06:09 -06:00
memory-failure.c	mm/memory-failure.c: use page_shift() in add_to_kill()	2019-12-01 12:59:04 -08:00
memory.c	mm: avoid data corruption on CoW fault into PFN-mapped VMA	2020-03-06 07:06:09 -06:00
mempolicy.c	mm/mempolicy.c: fix out of bounds write in mpol_parse_str()	2020-01-31 10:30:36 -08:00
mempool.c	docs/core-api/mm: fix return value descriptions in mm/	2019-03-05 21:07:20 -08:00
memremap.c	mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()	2020-02-04 03:05:23 +00:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	mm: pagewalk: add 'depth' parameter to pte_hole	2020-02-04 03:05:25 +00:00
mincore.c	mm: pagewalk: add 'depth' parameter to pte_hole	2020-02-04 03:05:25 +00:00
mlock.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
mm_init.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmap.c	mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()	2020-02-20 10:03:14 +00:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_gather.c	asm-generic/tlb: provide MMU_GATHER_TABLE_FREE	2020-02-04 03:05:26 +00:00
mmu_notifier.c	mm/mmu_notifier: silence PROVE_RCU_LIST warnings	2020-03-21 18:56:06 -07:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa	2020-03-06 07:06:09 -06:00
mremap.c	mm/mremap: Add comment explaining the untagging behaviour of mremap()	2020-03-26 11:28:57 +00:00
msync.c	mm: untag user pointers passed to memory syscalls	2019-09-25 17:51:41 -07:00
nommu.c	x86/mm: split vmalloc_sync_all()	2020-03-21 18:56:06 -07:00
oom_kill.c	mm, oom: dump stack of victim when reaping failed	2020-01-31 10:30:38 -08:00
page_alloc.c	mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page()	2020-04-02 09:35:28 -07:00
page_counter.c	mm: memcontrol: fix memory.low proportional distribution	2020-04-02 09:35:28 -07:00
page_ext.c	mm, page_owner: fix off-by-one error in __set_page_owner_handle()	2019-10-14 15:04:00 -07:00
page_idle.c	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn	2019-06-29 16:43:45 +08:00
page_io.c	fs: Enable bmap() function to properly return errors	2020-02-03 08:05:37 -05:00
page_isolation.c	mm/page_isolation: fix potential warning from user	2020-01-31 10:30:39 -08:00
page_owner.c	mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo	2019-10-19 06:32:31 -04:00
page_poison.c	mm/page_poison.c: fix a typo in a comment	2019-09-24 15:54:08 -07:00
page_vma_mapped.c	mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page	2020-01-31 10:30:38 -08:00
page-writeback.c	mm/gup/writeback: add callbacks for inaccessible pages	2020-04-02 09:35:27 -07:00
pagewalk.c	x86: mm: avoid allocating struct mm_struct on the stack	2020-02-04 03:05:25 +00:00
percpu-internal.h	percpu: convert chunk hints to be based on pcpu_block_md	2019-03-13 12:25:31 -07:00
percpu-km.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-stats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-vm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu.c	bitmap: genericize percpu bitmap region iterators	2020-01-20 16:40:56 +01:00
pgtable-generic.c	asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED	2019-12-01 06:29:19 -08:00
process_vm_access.c	mm, tree-wide: rename put_user_page() to unpin_user_page()	2020-01-31 10:30:38 -08:00
ptdump.c	x86: mm: avoid allocating struct mm_struct on the stack	2020-02-04 03:05:25 +00:00
readahead.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
rmap.c	mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages	2020-04-02 09:35:27 -07:00
rodata_test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
shmem.c	tmpfs: deny and force are not huge mount options	2020-02-18 15:07:30 -05:00
shuffle.c	mm: fix -Wmissing-prototypes warnings	2019-10-07 15:47:19 -07:00
shuffle.h	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
slab_common.c	mm, memcg: fix build error around the usage of kmem_caches	2020-04-02 09:35:28 -07:00
slab.c	mm, debug_pagealloc: don't rely on static keys too early	2020-01-13 18:19:02 -08:00
slab.h	mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge()	2020-04-02 09:35:28 -07:00
slob.c	mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)	2019-10-07 15:47:20 -07:00
slub.c	slub: relocate freelist pointer to middle of object	2020-04-02 09:35:26 -07:00
sparse-vmemmap.c	mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()	2019-07-18 17:08:07 -07:00
sparse.c	mm/sparse: fix kernel crash with pfn_section_valid check	2020-03-29 09:47:06 -07:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm/swap_slots.c: assign\|reset cache slot by value directly	2020-04-02 09:35:27 -07:00
swap_state.c	mm/swap_state.c: use the same way to count page in [add_to\|delete_from]_swap_cache	2020-04-02 09:35:28 -07:00
swap.c	mm: swap: use smp_mb__after_atomic() to order LRU bit set	2020-04-02 09:35:28 -07:00
swapfile.c	mm/swapfile: fix data races in try_to_unuse()	2020-04-02 09:35:27 -07:00
truncate.c	mm/thp: allow dropping THP from page cache	2019-10-19 06:32:33 -04:00
usercopy.c	usercopy: Avoid HIGHMEM pfn warning	2019-09-17 15:20:17 -07:00
userfaultfd.c	mm: fix typos in comments when calling __SetPageUptodate()	2019-12-01 12:59:10 -08:00
util.c	mm/mmap.c: rb_parent is not necessary in __vma_link_list()	2019-12-01 06:29:19 -08:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	x86/mm: split vmalloc_sync_all()	2020-03-21 18:56:06 -07:00
vmpressure.c	mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()	2019-10-07 15:47:19 -07:00
vmscan.c	mm: swap: make page_evictable() inline	2020-04-02 09:35:27 -07:00
vmstat.c	mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting	2020-04-02 09:35:27 -07:00
workingset.c	mm: vmscan: detect file thrashing at the reclaim root	2019-12-01 12:59:07 -08:00
z3fold.c	mm/z3fold.c: do not include rwlock.h directly	2020-03-06 07:06:09 -06:00
zbud.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zpool.c	zpool: add malloc_support_movable to zpool_driver	2019-09-24 15:54:12 -07:00
zsmalloc.c	mm/zsmalloc.c: fix the migrated zspage statistics.	2020-01-04 13:55:09 -08:00
zswap.c	zswap: potential NULL dereference on error in init_zswap()	2020-01-31 10:30:39 -08:00