linux_dsm_epyc7002/mm
Minchan Kim 791b48b642 mm: vmscan: scan until it finds eligible pages
Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

  balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
  CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
   oom_kill_process+0x21d/0x3f0
   out_of_memory+0xd8/0x390
   __alloc_pages_slowpath+0xbc1/0xc50
   __alloc_pages_nodemask+0x1a5/0x1c0
   pte_alloc_one+0x20/0x50
   __pte_alloc+0x1e/0x110
   __handle_mm_fault+0x919/0x960
   handle_mm_fault+0x77/0x120
   __do_page_fault+0x27a/0x550
   trace_do_page_fault+0x43/0x150
   do_async_page_fault+0x2c/0x90
   async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:424716 inactive_anon:65314 isolated_anon:0
   active_file:52 inactive_file:46 isolated_file:0
   unevictable:0 dirty:27 writeback:0 unstable:0
   slab_reclaimable:3967 slab_unreclaimable:4125
   mapped:133 shmem:43 pagetables:1674 bounce:0
   free:4637 free_pcp:225 free_cma:0
  Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 992 992 1952
  DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 959
  Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
  DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
  Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
  378 total pagecache pages
  17 pages in swap cache
  Swap cache stats: add 17325, delete 17302, find 0/27
  Free swap  = 978940kB
  Total swap = 1048572kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  12629 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned
  [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
  [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.  Finally,
OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
	size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

	N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages.  If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db65812d6 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12 15:57:16 -07:00
..
kasan Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
backing-dev.c bdi: Drop 'parent' argument from bdi_register[_va]() 2017-04-20 12:09:55 -06:00
balloon_compaction.c mm: balloon: use general non-lru movable page feature 2016-07-26 16:19:19 -07:00
bootmem.c mm/bootmem.c: cosmetic improvement of code readability 2017-02-22 16:41:29 -08:00
cleancache.c
cma_debug.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
cma.c cma: Introduce cma_for_each_area 2017-04-18 20:41:12 +02:00
cma.h cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
compaction.c mm, compaction: finish whole pageblock to reduce fragmentation 2017-05-08 17:15:10 -07:00
debug_page_ref.c
debug.c mm, debug: print raw struct page data in __dump_page() 2016-12-12 18:55:08 -08:00
dmapool.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
early_ioremap.c
fadvise.c mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED 2016-12-20 09:48:46 -08:00
failslab.c
filemap.c Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:01:21 -07:00
frame_vector.c treewide: use kv[mz]alloc* rather than opencoded variants 2017-05-08 17:15:13 -07:00
frontswap.c mm, frontswap: convert frontswap_enabled to static key 2016-07-26 16:19:19 -07:00
gup.c mm/gup.c: fix access_ok() argument type 2017-05-03 15:52:12 -07:00
highmem.c
huge_memory.c mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required 2017-05-08 17:15:15 -07:00
hugetlb_cgroup.c
hugetlb.c mm/hugetlb.c: don't call region_abort if region_chg fails 2017-03-31 17:13:30 -07:00
hwpoison-inject.c mm: hwpoison: call shake_page() unconditionally 2017-05-03 15:52:12 -07:00
init-mm.c mm: Add a user_ns owner to mm_struct and fix ptrace permission checks 2016-11-22 11:49:48 -06:00
internal.h mm, compaction: finish whole pageblock to reduce fragmentation 2017-05-08 17:15:10 -07:00
interval_tree.c
Kconfig mm: remove AVR32 arch special handling in mm/Kconfig 2017-05-01 09:36:31 +02:00
Kconfig.debug mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
khugepaged.c mm, thp: copying user pages must schedule on collapse 2017-05-12 15:57:16 -07:00
kmemcheck.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
kmemleak-test.c
kmemleak.c mm: fix section name for .data..ro_after_init 2017-03-31 17:13:30 -07:00
ksm.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
list_lru.c mm/list_lru.c: avoid error-path NULL pointer deref 2016-10-27 18:43:42 -07:00
maccess.c
madvise.c mm/madvise: move up the behavior parameter validation 2017-05-03 15:52:11 -07:00
Makefile mm: add arch-independent testcases for RODATA 2017-02-27 18:43:48 -08:00
memblock.c memblock: add memblock_cap_memory_range() 2017-04-05 18:26:50 +01:00
memcontrol.c hwpoison, memcg: forcibly uncharge LRU pages 2017-05-12 15:57:15 -07:00
memory_hotplug.c mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx 2017-05-03 15:52:09 -07:00
memory-failure.c hwpoison, memcg: forcibly uncharge LRU pages 2017-05-12 15:57:15 -07:00
memory.c Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
mempolicy.c mm/mempolicy.c: fix error handling in set_mempolicy and mbind. 2017-04-08 10:57:55 -07:00
mempool.c Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" 2016-07-28 16:07:41 -07:00
memtest.c
migrate.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
mincore.c mm: remove shmem_mapping() shmem_zero_setup() duplicates 2017-02-24 17:46:56 -08:00
mlock.c mm: make try_to_munlock() return void 2017-05-03 15:52:10 -07:00
mm_init.c
mmap.c mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff 2017-05-03 15:52:10 -07:00
mmu_context.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
mmu_notifier.c mm: Use static initialization for "srcu" 2017-04-18 11:38:22 -07:00
mmzone.c mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() 2017-02-22 16:41:29 -08:00
mprotect.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
mremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
msync.c
nobootmem.c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping 2016-10-11 15:06:33 -07:00
nommu.c mm, vmalloc: use __GFP_HIGHMEM implicitly 2017-05-08 17:15:13 -07:00
oom_kill.c oom: improve oom disable handling 2017-05-03 15:52:10 -07:00
page_alloc.c mm: introduce memalloc_noreclaim_{save,restore} 2017-05-08 17:15:15 -07:00
page_counter.c
page_ext.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_idle.c mm: make rmap_one boolean function 2017-05-03 15:52:10 -07:00
page_io.c writeback: add wbc_to_write_flags() 2016-11-02 10:24:03 -06:00
page_isolation.c mm, page_alloc: count movable pages when stealing from pageblock 2017-05-08 17:15:10 -07:00
page_owner.c mm/page_owner: don't define fields on struct page_ext by hard-coding 2016-10-07 18:46:27 -07:00
page_poison.c mm: enable page poisoning early at boot 2017-05-03 15:52:10 -07:00
page_vma_mapped.c mm: fix page_vma_mapped_walk() for ksm pages 2017-04-08 00:47:48 -07:00
page-writeback.c Add GETFSMAP support; some performance improvements for very large 2017-05-08 11:30:05 -07:00
pagewalk.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
percpu-km.c
percpu-vm.c percpu: remove unused chunk_alloc parameter from pcpu_get_pages() 2017-03-06 15:56:55 -05:00
percpu.c Merge branch 'sched/core' into locking/core 2017-04-04 11:31:12 +02:00
pgtable-generic.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
process_vm_access.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
quicklist.c
readahead.c mm: don't cap request size based on read-ahead setting 2016-12-12 18:55:08 -08:00
rmap.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
rodata_test.c mm: remove rodata_test_data export, add pr_fmt 2017-05-03 15:52:09 -07:00
shmem.c Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
slab_common.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
slab.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-10 10:30:46 -07:00
slab.h mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
slob.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
slub.c mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU 2017-04-18 11:42:36 -07:00
sparse-vmemmap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
sparse.c mm/sparse: refine usemap_size() a little 2017-05-03 15:52:09 -07:00
swap_cgroup.c mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() 2017-04-08 00:47:49 -07:00
swap_slots.c mm, swap: use kvzalloc to allocate some swap data structures 2017-05-08 17:15:13 -07:00
swap_state.c mm, swap: use kvzalloc to allocate some swap data structures 2017-05-08 17:15:13 -07:00
swap.c mm: move MADV_FREE pages into LRU_INACTIVE_FILE list 2017-05-03 15:52:08 -07:00
swapfile.c mm, swap: use kvzalloc to allocate some swap data structures 2017-05-08 17:15:13 -07:00
truncate.c mm: fix data corruption due to stale mmap reads 2017-05-12 15:57:15 -07:00
usercopy.c mm/usercopy: Drop extra is_vmalloc_or_module() check 2017-04-05 12:30:18 -07:00
userfaultfd.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
util.c mm, vmalloc: fix vmalloc users tracking properly 2017-05-12 15:57:15 -07:00
vmacache.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
vmalloc.c mm, vmalloc: fix vmalloc users tracking properly 2017-05-12 15:57:15 -07:00
vmpressure.c mm: vmpressure: fix sending wrong events on underflow 2017-02-24 17:46:56 -08:00
vmscan.c mm: vmscan: scan until it finds eligible pages 2017-05-12 15:57:16 -07:00
vmstat.c mm, vmstat: Remove spurious WARN() during zoneinfo print 2017-05-12 15:57:15 -07:00
workingset.c mm: memcontrol: use node page state naming scheme for memcg 2017-05-03 15:52:11 -07:00
z3fold.c z3fold: fix page locking in z3fold_alloc() 2017-04-13 18:24:20 -07:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: expand class bit 2017-04-13 18:24:21 -07:00
zswap.c zswap: don't param_set_charp while holding spinlock 2017-02-27 18:43:45 -08:00