linux_dsm_epyc7002/mm
David Gibson b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
..
bootmem.c [PATCH] FRV: Clean up bootmem allocator's page freeing algorithm 2006-01-06 08:33:26 -08:00
fadvise.c [PATCH] fadvise: return ESPIPE on FIFO/pipe 2006-01-08 20:14:00 -08:00
filemap_xip.c [PATCH] replace inode_update_time with file_update_time 2006-01-10 08:01:30 -08:00
filemap.c [PATCH] mm: make __put_page internal 2006-03-22 07:54:01 -08:00
filemap.h [PATCH] xip: reduce code duplication 2005-06-24 00:06:41 -07:00
fremap.c VM: add common helper function to create the page tables 2005-11-29 14:03:14 -08:00
highmem.c [PATCH] gfp_t: the rest 2005-10-28 08:16:51 -07:00
hugetlb.c [PATCH] hugepage: Strict page reservation for hugepage inodes 2006-03-22 07:54:03 -08:00
internal.h [PATCH] remove set_page_count() outside mm/ 2006-03-22 07:54:02 -08:00
Kconfig [PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support 2006-01-08 20:12:41 -08:00
madvise.c [PATCH] madvise MADV_DONTFORK/MADV_DOFORK 2006-02-14 16:09:34 -08:00
Makefile [PATCH] slob: introduce the SLOB allocator 2006-01-08 20:13:41 -08:00
memory_hotplug.c [PATCH] memory hotadd: pgdat->node_present_pages fix 2006-03-09 19:47:38 -08:00
memory.c [PATCH] mm: more CONFIG_DEBUG_VM 2006-03-22 07:54:02 -08:00
mempolicy.c [PATCH] mm: kill kmem_cache_t usage 2006-03-22 07:53:58 -08:00
mempool.c [PATCH] mm: kill kmem_cache_t usage 2006-03-22 07:53:58 -08:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] remove VM_DONTCOPY bogosities 2006-03-22 07:54:01 -08:00
mprotect.c [PATCH] Enable mprotect on huge pages 2006-03-22 07:54:03 -08:00
mremap.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
msync.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
nommu.c [PATCH] mm: nommu use compound pages 2006-03-22 07:54:01 -08:00
oom_kill.c [PATCH] out_of_memory() locking fix 2006-03-02 08:33:07 -08:00
page_alloc.c [PATCH] mm: prep_zero_page() in irq is a bug 2006-03-22 07:54:02 -08:00
page_io.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
page-writeback.c [PATCH] mm: dirty_exceeded speedup 2006-01-18 19:20:17 -08:00
pdflush.c [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap 2006-01-08 20:12:41 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] readahead: fix initial window size calculation 2006-03-22 07:54:03 -08:00
rmap.c [PATCH] mm: more CONFIG_DEBUG_VM 2006-03-22 07:54:02 -08:00
shmem.c [PATCH] shmem: inline to avoid warning 2006-03-22 07:54:02 -08:00
slab.c [PATCH] mm: nommu use compound pages 2006-03-22 07:54:01 -08:00
slob.c [PATCH] SLOB=y && SMP=y fix 2006-02-08 07:52:58 -08:00
sparse.c [PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp macros 2006-01-08 20:13:38 -08:00
swap_state.c [PATCH] Direct Migration V9: Avoid writeback / page_migrate() method 2006-02-01 08:53:17 -08:00
swap.c [PATCH] mm: less atomic ops 2006-03-22 07:53:57 -08:00
swapfile.c [PATCH] Direct Migration V9: remove_from_swap() to remove swap ptes 2006-02-01 08:53:16 -08:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] do_truncate() call fix in tiny-shmem.c 2006-01-12 09:08:49 -08:00
truncate.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
util.c [PATCH] slob: introduce mm/util.c for shared functions 2006-01-08 20:13:41 -08:00
vmalloc.c [PATCH] kernel-doc: fix warnings in vmalloc.c 2005-11-07 07:53:56 -08:00
vmscan.c [PATCH] vmscan: emove obsolete checks from shrink_list() and fix unlikely in refill_inactive_zone() 2006-03-22 07:54:02 -08:00