linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-26 13:40:54 +07:00

History

Vlastimil Babka 0ec3b74c7f mm: putback_lru_page: remove unnecessary call to page_lru_base_type() The goal of this patch series is to improve performance of munlock() of large mlocked memory areas on systems without THP. This is motivated by reported very long times of crash recovery of processes with such areas, where munlock() can take several seconds. See http://lwn.net/Articles/548108/ The work was driven by a simple benchmark (to be included in mmtests) that mmaps() e.g. 56GB with MAP_LOCKED \| MAP_POPULATE and measures the time of munlock(). Profiling was performed by attaching operf --pid to the process and sending a signal to trigger the munlock() part and then notify bach the monitoring wrapper to stop operf, so that only munlock() appears in the profile. The profiles have shown that CPU time is spent mostly by atomic operations and repeated locking per single pages. This series aims to reduce both, starting from simpler to more complex changes. Patch 1 performs a simple cleanup in putback_lru_page() so that page lru base type is not determined without being actually needed. Patch 2 removes an unnecessary call to lru_add_drain() which drains the per-cpu pagevec after each munlocked page is put there. Patch 3 changes munlock_vma_range() to use an on-stack pagevec for isolating multiple non-THP pages under a single lru_lock instead of locking and processing each page separately. Patch 4 changes the NR_MLOCK accounting to be called only once per the pvec introduced by previous patch. Patch 5 uses the introduced pagevec to batch also the work of putback_lru_page when possible, bypassing the per-cpu pvec and associated overhead. Patch 6 removes a redundant get_page/put_page pair which saves costly atomic operations. Patch 7 avoids calling follow_page_mask() on each individual page, and obtains multiple page references under a single page table lock where possible. Measurements were made using 3.11-rc3 as a baseline. The first set of measurements shows the possibly ideal conditions where batching should help the most. All memory is allocated from a single NUMA node and THP is disabled. timedmunlock 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 0 1 2 3 4 5 6 7 Elapsed min 3.38 ( 0.00%) 3.39 ( -0.13%) 3.00 ( 11.33%) 2.70 ( 20.20%) 2.67 ( 21.11%) 2.37 ( 29.88%) 2.20 ( 34.91%) 1.91 ( 43.59%) Elapsed mean 3.39 ( 0.00%) 3.40 ( -0.23%) 3.01 ( 11.33%) 2.70 ( 20.26%) 2.67 ( 21.21%) 2.38 ( 29.88%) 2.21 ( 34.93%) 1.92 ( 43.46%) Elapsed stddev 0.01 ( 0.00%) 0.01 (-43.09%) 0.01 ( 15.42%) 0.01 ( 23.42%) 0.00 ( 89.78%) 0.01 ( -7.15%) 0.00 ( 76.69%) 0.02 (-91.77%) Elapsed max 3.41 ( 0.00%) 3.43 ( -0.52%) 3.03 ( 11.29%) 2.72 ( 20.16%) 2.67 ( 21.63%) 2.40 ( 29.50%) 2.21 ( 35.21%) 1.96 ( 42.39%) Elapsed range 0.03 ( 0.00%) 0.04 (-51.16%) 0.02 ( 6.27%) 0.02 ( 14.67%) 0.00 ( 88.90%) 0.03 (-19.18%) 0.01 ( 73.70%) 0.06 (-113.35% The second set of measurements simulates the worst possible conditions for batching by using numactl --interleave, so that there is in fact only one page per pagevec. Even in this case the series seems to improve performance thanks to reduced atomic operations and removal of lru_add_drain(). timedmunlock 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 0 1 2 3 4 5 6 7 Elapsed min 4.00 ( 0.00%) 4.04 ( -0.93%) 3.87 ( 3.37%) 3.72 ( 6.94%) 3.81 ( 4.72%) 3.69 ( 7.82%) 3.64 ( 8.92%) 3.41 ( 14.81%) Elapsed mean 4.17 ( 0.00%) 4.15 ( 0.51%) 4.03 ( 3.49%) 3.89 ( 6.84%) 3.86 ( 7.48%) 3.89 ( 6.69%) 3.70 ( 11.27%) 3.48 ( 16.59%) Elapsed stddev 0.16 ( 0.00%) 0.08 ( 50.76%) 0.10 ( 41.58%) 0.16 ( 4.59%) 0.05 ( 72.38%) 0.19 (-12.91%) 0.05 ( 68.09%) 0.06 ( 66.03%) Elapsed max 4.34 ( 0.00%) 4.32 ( 0.56%) 4.19 ( 3.62%) 4.12 ( 5.15%) 3.91 ( 9.88%) 4.12 ( 5.25%) 3.80 ( 12.58%) 3.56 ( 18.08%) Elapsed range 0.34 ( 0.00%) 0.28 ( 17.91%) 0.32 ( 6.45%) 0.40 (-15.73%) 0.10 ( 70.06%) 0.43 (-24.84%) 0.15 ( 55.32%) 0.15 ( 56.16%) For completeness, a third set of measurements shows the situation where THP is enabled and allocations are again done on a single NUMA node. Here munlock() is already very fast thanks to huge pages, and this series does not compromise that performance. It seems that the removal of call to lru_add_drain() still helps a bit. timedmunlock 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 3.11-rc3 0 1 2 3 4 5 6 7 Elapsed min 0.01 ( 0.00%) 0.01 ( -0.11%) 0.01 ( 6.59%) 0.01 ( 5.41%) 0.01 ( 5.45%) 0.01 ( 5.03%) 0.01 ( 6.08%) 0.01 ( 5.20%) Elapsed mean 0.01 ( 0.00%) 0.01 ( -0.27%) 0.01 ( 6.39%) 0.01 ( 5.30%) 0.01 ( 5.32%) 0.01 ( 5.03%) 0.01 ( 5.97%) 0.01 ( 5.22%) Elapsed stddev 0.00 ( 0.00%) 0.00 ( -9.59%) 0.00 ( 10.77%) 0.00 ( 3.24%) 0.00 ( 24.42%) 0.00 ( 31.86%) 0.00 ( -7.46%) 0.00 ( 6.11%) Elapsed max 0.01 ( 0.00%) 0.01 ( -0.01%) 0.01 ( 6.83%) 0.01 ( 5.42%) 0.01 ( 5.79%) 0.01 ( 5.53%) 0.01 ( 6.08%) 0.01 ( 5.26%) Elapsed range 0.00 ( 0.00%) 0.00 ( 7.30%) 0.00 ( 24.38%) 0.00 ( 6.10%) 0.00 ( 30.79%) 0.00 ( 42.52%) 0.00 ( 6.11%) 0.00 ( 10.07%) This patch (of 7): In putback_lru_page() since commit `c53954a092` (""mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru") it is no longer needed to determine lru list via page_lru_base_type(). This patch replaces it with simple flag is_unevictable which says that the page was put on the inevictable list. This is the only information that matters in subsequent tests. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Jörn Engel <joern@logfs.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Michel Lespinasse <walken@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2013-09-11 15:57:57 -07:00
..
backing-dev.c	backing-dev: convert class code to use dev_groups	2013-08-19 21:22:34 -07:00
balloon_compaction.c	mm: introduce a common interface for balloon pages mobility	2012-12-11 17:22:26 -08:00
bootmem.c	mm: kill free_all_bootmem_node()	2013-07-03 16:07:39 -07:00
bounce.c	Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block	2013-05-08 10:13:35 -07:00
cleancache.c	mm: cleancache: clean up cleancache_enabled	2013-04-30 17:04:01 -07:00
compaction.c	mm: compaction: do not compact pgdat for order-0	2013-09-11 15:57:55 -07:00
debug-pagealloc.c	mm, x86: Remove debug_pagealloc_enabled	2011-12-06 09:24:07 +01:00
dmapool.c	dmapool: make DMAPOOL_DEBUG detect corruption of free marker	2012-12-11 17:22:24 -08:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c	switch debugfs to umode_t	2012-01-03 22:54:56 -05:00
filemap_xip.c	lift sb_start_write() out of ->write()	2013-04-09 14:12:56 -04:00
filemap.c	direct-io: Handle O_(D)SYNC AIO	2013-09-04 09:23:46 -04:00
fremap.c	mm: save soft-dirty bits on file pages	2013-08-13 17:57:48 -07:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	mm/huge_memory.c: fix potential NULL pointer dereference	2013-09-11 15:57:19 -07:00
hugetlb_cgroup.c	cgroup: pass around cgroup_subsys_state instead of cgroup in file methods	2013-08-08 20:11:24 -04:00
hugetlb.c	mm: prepare to remove /proc/sys/vm/hugepages_treat_as_movable	2013-09-11 15:57:49 -07:00
hwpoison-inject.c	memcg: rename config variables	2012-07-31 18:42:43 -07:00
init-mm.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
internal.h	mm: remove unused __put_page()	2013-07-09 10:33:22 -07:00
interval_tree.c	mm: add CONFIG_DEBUG_VM_RB build option	2012-10-09 16:22:42 +09:00
Kconfig	Merge remote-tracking branch 'origin/next' into kvm-ppc-next	2013-08-29 00:41:59 +02:00
Kconfig.debug	mm: more intensive memory corruption debugging	2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c	kmemleak: remove memset by using kzalloc	2011-01-27 18:31:51 +00:00
kmemleak.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
ksm.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
maccess.c	mm: Map most files to use export.h instead of module.h	2011-10-31 09:20:12 -04:00
madvise.c	mm/madvise.c: fix coding-style errors	2013-09-11 15:57:00 -07:00
Makefile	zswap: add to mm/	2013-07-10 18:11:34 -07:00
memblock.c	memblock, numa: binary search node id	2013-09-11 15:57:51 -07:00
memcontrol.c	kmemcg: don't allocate extra memory for root memcg_cache_params	2013-09-11 15:57:53 -07:00
memory_hotplug.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
memory-failure.c	mm: soft-offline: use migrate_pages() instead of migrate_huge_page()	2013-09-11 15:57:47 -07:00
memory.c	mm: migrate: add hugepage migration code to move_pages()	2013-09-11 15:57:48 -07:00
mempolicy.c	mbind: add BUG_ON(!vma) in new_vma_page()	2013-09-11 15:57:50 -07:00
mempool.c	mempool: add @gfp_mask to mempool_create_node()	2012-06-25 11:53:47 +02:00
migrate.c	mm: migrate: check movability of hugepage in unmap_and_move_huge_page()	2013-09-11 15:57:49 -07:00
mincore.c	swap: make each swap partition have one address_space	2013-02-23 17:50:17 -08:00
mlock.c	Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"	2013-03-28 17:45:51 -07:00
mm_init.c	mm: tune vm_committed_as percpu_counter batching size	2013-07-03 16:07:32 -07:00
mmap.c	mm: track vma changes with VM_SOFTDIRTY bit	2013-09-11 15:57:56 -07:00
mmu_context.c	mm: remove old aio use_mm() comment	2013-05-07 18:38:27 -07:00
mmu_notifier.c	treewide: relase -> release	2013-06-28 14:34:33 +02:00
mmzone.c	mm: rename page struct field helpers	2013-02-23 17:50:18 -08:00
mprotect.c	mm/mprotect.c: coding-style cleanups	2012-12-18 15:02:15 -08:00
mremap.c	mm: move_ptes -- Set soft dirty bit depending on pte type	2013-08-27 09:36:17 -07:00
msync.c
nobootmem.c	mm: concentrate modification of totalram_pages into the mm core	2013-07-03 16:07:33 -07:00
nommu.c	mm: remove free_area_cache	2013-07-10 18:11:34 -07:00
oom_kill.c	mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().	2013-07-15 11:25:05 +09:30
page_alloc.c	mm: page_alloc: fix comment get_page_from_freelist	2013-09-11 15:57:56 -07:00
page_cgroup.c	memcontrol: use N_MEMORY instead N_HIGH_MEMORY	2012-12-12 17:38:32 -08:00
page_io.c	mm: remove compressed copy from zram in-memory	2013-07-03 16:07:26 -07:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
page-writeback.c	mm: revert "page-writeback.c: subtract min_free_kbytes from dirtyable memory"	2013-09-11 15:57:23 -07:00
pagewalk.c	mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas	2013-05-24 16:22:53 -07:00
percpu-km.c
percpu-vm.c	mm: fix kernel-doc warnings	2012-06-20 14:39:36 -07:00
percpu.c	mm, percpu: Make sure percpu_alloc early parameter has an argument	2012-12-02 06:23:04 -08:00
pgtable-generic.c	mm: move pgtable related functions to right place	2013-09-11 15:57:30 -07:00
process_vm_access.c	Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys	2013-03-12 11:05:45 -07:00
quicklist.c	mm: delete various needless include <linux/module.h>	2011-10-31 09:20:11 -04:00
readahead.c	readahead: make context readahead more conservative	2013-09-11 15:57:39 -07:00
rmap.c	s390/mm: implement software referenced bits	2013-08-29 13:20:11 +02:00
shmem.c	shm_mnt is as longterm as it gets, TYVM...	2013-09-03 22:50:27 -04:00
slab_common.c	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2013-07-14 15:14:29 -07:00
slab.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
slab.h	memcg: check that kmem_cache has memcg_params before accessing it	2013-08-28 19:26:38 -07:00
slob.c	Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2013-07-14 15:14:29 -07:00
slub.c	mm: replace strict_strtoul() with kstrtoul()	2013-09-11 15:57:11 -07:00
sparse-vmemmap.c	sparse-vmemmap: specify vmemmap population range in bytes	2013-04-29 15:54:35 -07:00
sparse.c	mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE	2013-07-09 10:33:22 -07:00
swap_state.c	swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion	2013-06-12 16:29:45 -07:00
swap.c	mm: fix aio performance regression for database caused by THP	2013-09-11 15:57:55 -07:00
swapfile.c	swap: make cluster allocation per-cpu	2013-09-11 15:57:17 -07:00
truncate.c	mm: teach truncate_inode_pages_range() to handle non page aligned ranges	2013-05-27 23:32:35 -04:00
util.c	swap: clean-up #ifdef in page_mapping()	2013-09-11 15:57:31 -07:00
vmalloc.c	mm, vmalloc: use well-defined find_last_bit() func	2013-09-11 15:57:34 -07:00
vmpressure.c	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2013-09-03 18:25:03 -07:00
vmscan.c	mm: putback_lru_page: remove unnecessary call to page_lru_base_type()	2013-09-11 15:57:57 -07:00
vmstat.c	vmstat: use this_cpu() to avoid irqon/off sequence in refresh_cpu_vm_stats	2013-09-11 15:57:31 -07:00
zbud.c	mm/zbud: fix some trivial typos in comments	2013-09-11 15:57:35 -07:00
zswap.c	mm/zswap.c: get swapper address_space by using macro	2013-09-11 15:57:08 -07:00