linux_dsm_epyc7002/mm
Mel Gorman 074c238177 mm: numa: slow PTE scan rate if migration failures occur
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226

  Across the board the 4.0-rc1 numbers are much slower, and the degradation
  is far worse when using the large memory footprint configs. Perf points
  straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:

   -   56.07%    56.07%  [kernel]            [k] default_send_IPI_mask_sequence_phys
      - default_send_IPI_mask_sequence_phys
         - 99.99% physflat_send_IPI_mask
            - 99.37% native_send_call_func_ipi
                 smp_call_function_many
               - native_flush_tlb_others
                  - 99.85% flush_tlb_page
                       ptep_clear_flush
                       try_to_unmap_one
                       rmap_walk
                       try_to_unmap
                       migrate_pages
                       migrate_misplaced_page
                     - handle_mm_fault
                        - 99.73% __do_page_fault
                             trace_do_page_fault
                             do_async_page_fault
                           + async_page_fault
              0.63% native_send_call_func_single_ipi
                 generic_exec_single
                 smp_call_function_single

This is showing excessive migration activity even though excessive
migrations are meant to get throttled.  Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults.  However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote.  This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen.  This patch tracks when migration failures occur and
slows the PTE scanner.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25 16:20:31 -07:00
..
kasan kasan, module, vmalloc: rework shadow allocation for modules 2015-03-12 18:46:08 -07:00
backing-dev.c Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
cleancache.c mm: fix cleancache debugfs directory path 2015-01-20 14:08:31 +01:00
cma.c mm: cma: fix CMA aligned offset calculation 2015-03-12 18:46:07 -07:00
compaction.c mm: page_alloc: add kasan hooks on alloc and free paths 2015-02-13 21:21:41 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: account pmd page tables to the process 2015-02-11 17:06:04 -08:00
dmapool.c mm/dmapool.c: fixed a brace coding style issue 2014-10-09 22:26:00 -04:00
early_ioremap.c mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
fadvise.c vfs: remove get_xip_mem 2015-02-16 17:56:03 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c dax,ext2: replace XIP read and write with DAX I/O 2015-02-16 17:56:03 -08:00
frontswap.c mm/frontswap.c: fix the condition in BUG_ON 2014-12-10 17:41:08 -08:00
gup.c Tighten rules for ACCESS_ONCE 2015-02-14 10:54:28 -08:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm: numa: slow PTE scan rate if migration failures occur 2015-03-25 16:20:31 -07:00
hugetlb_cgroup.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
hugetlb.c mm, hugetlb: close race when setting PageTail for gigantic pages 2015-03-12 18:46:07 -07:00
hwpoison-inject.c mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive 2014-08-06 18:01:19 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm/internal.h: don't split printk call in two 2015-02-12 18:54:10 -08:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2015-02-19 10:36:45 -08:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c kmemleak: disable kasan instrumentation for kmemleak 2015-02-13 21:21:41 -08:00
ksm.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
list_lru.c memcg: reparent list_lrus and free kmemcg_id on css offline 2015-02-12 18:54:10 -08:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c vfs: remove get_xip_mem 2015-02-16 17:56:03 -08:00
Makefile move iov_iter.c from mm/ to lib/ 2015-02-17 22:22:17 -05:00
memblock.c mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG 2014-12-13 12:42:46 -08:00
memcontrol.c memcg: disable hierarchy support if bound to the legacy cgroup hierarchy 2015-03-12 18:46:08 -07:00
memory_hotplug.c mm/memory hotplug: postpone the reset of obsolete pgdat 2015-03-25 16:20:30 -07:00
memory-failure.c mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page() 2015-02-12 18:54:11 -08:00
memory.c mm: numa: slow PTE scan rate if migration failures occur 2015-03-25 16:20:31 -07:00
mempolicy.c mm: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
mempool.c mm/mempool.c: update the kmemleak stack trace for mempool allocations 2014-06-06 16:08:17 -07:00
migrate.c mm: convert p[te|md]_mknonnuma and remaining page table manipulations 2015-02-12 18:54:08 -08:00
mincore.c mincore: apply page table walker on do_mincore() 2015-02-11 17:06:06 -08:00
mlock.c mm: reorder can_do_mlock to fix audit denial 2015-03-12 18:46:08 -07:00
mm_init.c mm/mm_init.c: mark mminit_loglevel __meminitdata 2015-02-12 18:54:11 -08:00
mmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-03-25 16:20:30 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mmu_notifier: add the callback for mmu_notifier_invalidate_range() 2014-11-13 13:46:09 +11:00
mmzone.c mm: microoptimize zonelist operations 2015-02-11 17:06:02 -08:00
mprotect.c mm: numa: preserve PTE write permissions across a NUMA hinting fault 2015-03-25 16:20:31 -07:00
mremap.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
msync.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
nobootmem.c mem-hotplug: reset node managed pages when hot-adding a new pgdat 2014-11-13 16:17:06 -08:00
nommu.c mm/nommu.c: export symbol max_mapnr 2015-03-12 18:46:08 -07:00
oom_kill.c mm: account pmd page tables to the process 2015-02-11 17:06:04 -08:00
page_alloc.c mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled 2015-03-12 18:46:07 -07:00
page_counter.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
page_ext.c mm/page_owner: keep track of page owners 2014-12-13 12:42:48 -08:00
page_io.c new helper: iov_iter_bvec() 2015-01-29 00:13:11 -05:00
page_isolation.c mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate 2015-03-25 16:20:30 -07:00
page_owner.c mm/page_owner.c: remove unnecessary stack_trace field 2015-02-11 17:06:07 -08:00
page-writeback.c Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
pagewalk.c mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers 2015-03-25 16:20:30 -07:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
pgtable-generic.c mm: convert p[te|md]_mknonnuma and remaining page table manipulations 2015-02-12 18:54:08 -08:00
process_vm_access.c mm: gup: use get_user_pages_unlocked 2015-02-11 17:06:05 -08:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info 2015-01-20 14:03:04 -07:00
rmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-03-25 16:20:30 -07:00
shmem.c mm: shmem: check for mapping owner before dereferencing 2015-02-23 10:00:11 -08:00
slab_common.c mm: slub: add kernel address sanitizer support for slub allocator 2015-02-13 21:21:41 -08:00
slab.c slub: make dead caches discard free slabs immediately 2015-02-12 18:54:10 -08:00
slab.h slub: make dead caches discard free slabs immediately 2015-02-12 18:54:10 -08:00
slob.c slub: make dead caches discard free slabs immediately 2015-02-12 18:54:10 -08:00
slub.c mm/slub: fix lockups on PREEMPT && !SMP kernels 2015-03-25 16:20:30 -07:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
swap.c Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
swapfile.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
truncate.c fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info 2015-01-20 14:03:04 -07:00
util.c mm/util: add kstrdup_const 2015-02-13 21:21:35 -08:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c kasan, module, vmalloc: rework shadow allocation for modules 2015-03-12 18:46:08 -07:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c Merge branch 'akpm' (patches from Andrew) 2015-02-12 18:54:28 -08:00
vmstat.c vmstat: Reduce time interval to stat update on idle cpu 2015-02-11 17:06:07 -08:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm/zpool: add name argument to create zpool 2015-02-12 18:54:12 -08:00
zpool.c mm/zpool: add name argument to create zpool 2015-02-12 18:54:12 -08:00
zsmalloc.c mm/zsmalloc: add statistics support 2015-02-12 18:54:12 -08:00
zswap.c mm/zpool: add name argument to create zpool 2015-02-12 18:54:12 -08:00