mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-25 00:40:56 +07:00
49f0ce5f92
Some applications that run on HPC clusters are designed around the availability of RAM and the overcommit ratio is fine tuned to get the maximum usage of memory without swapping. With growing memory, the 1%-of-all-RAM grain provided by overcommit_ratio has become too coarse for these workload (on a 2TB machine it represents no less than 20GB). This patch adds the new overcommit_kbytes sysctl variable that allow a much finer grain. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
81 lines
2.5 KiB
Plaintext
81 lines
2.5 KiB
Plaintext
The Linux kernel supports the following overcommit handling modes
|
|
|
|
0 - Heuristic overcommit handling. Obvious overcommits of
|
|
address space are refused. Used for a typical system. It
|
|
ensures a seriously wild allocation fails while allowing
|
|
overcommit to reduce swap usage. root is allowed to
|
|
allocate slightly more memory in this mode. This is the
|
|
default.
|
|
|
|
1 - Always overcommit. Appropriate for some scientific
|
|
applications. Classic example is code using sparse arrays
|
|
and just relying on the virtual memory consisting almost
|
|
entirely of zero pages.
|
|
|
|
2 - Don't overcommit. The total address space commit
|
|
for the system is not permitted to exceed swap + a
|
|
configurable amount (default is 50%) of physical RAM.
|
|
Depending on the amount you use, in most situations
|
|
this means a process will not be killed while accessing
|
|
pages but will receive errors on memory allocation as
|
|
appropriate.
|
|
|
|
Useful for applications that want to guarantee their
|
|
memory allocations will be available in the future
|
|
without having to initialize every page.
|
|
|
|
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
|
|
|
|
The overcommit amount can be set via `vm.overcommit_ratio' (percentage)
|
|
or `vm.overcommit_kbytes' (absolute value).
|
|
|
|
The current overcommit limit and amount committed are viewable in
|
|
/proc/meminfo as CommitLimit and Committed_AS respectively.
|
|
|
|
Gotchas
|
|
-------
|
|
|
|
The C language stack growth does an implicit mremap. If you want absolute
|
|
guarantees and run close to the edge you MUST mmap your stack for the
|
|
largest size you think you will need. For typical stack usage this does
|
|
not matter much but it's a corner case if you really really care
|
|
|
|
In mode 2 the MAP_NORESERVE flag is ignored.
|
|
|
|
|
|
How It Works
|
|
------------
|
|
|
|
The overcommit is based on the following rules
|
|
|
|
For a file backed map
|
|
SHARED or READ-only - 0 cost (the file is the map not swap)
|
|
PRIVATE WRITABLE - size of mapping per instance
|
|
|
|
For an anonymous or /dev/zero map
|
|
SHARED - size of mapping
|
|
PRIVATE READ-only - 0 cost (but of little use)
|
|
PRIVATE WRITABLE - size of mapping per instance
|
|
|
|
Additional accounting
|
|
Pages made writable copies by mmap
|
|
shmfs memory drawn from the same pool
|
|
|
|
Status
|
|
------
|
|
|
|
o We account mmap memory mappings
|
|
o We account mprotect changes in commit
|
|
o We account mremap changes in size
|
|
o We account brk
|
|
o We account munmap
|
|
o We report the commit status in /proc
|
|
o Account and check on fork
|
|
o Review stack handling/building on exec
|
|
o SHMfs accounting
|
|
o Implement actual limit enforcement
|
|
|
|
To Do
|
|
-----
|
|
o Account ptrace pages (this is hard)
|