linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-22 10:44:43 +07:00

History

Mel Gorman 1378447598 sched/numa: Stagger NUMA balancing scan periods for new threads Threads share an address space and each can change the protections of the same address space to trap NUMA faults. This is redundant and potentially counter-productive as any thread doing the update will suffice. Potentially only one thread is required but that thread may be idle or it may not have any locality concerns and pick an unsuitable scan rate. This patch uses independent scan period but they are staggered based on the number of address space users when the thread is created. The intent is that threads will avoid scanning at the same time and have a chance to adapt their scan rate later if necessary. This reduces the total scan activity early in the lifetime of the threads. The different in headline performance across a range of machines and workloads is marginal but the system CPU usage is reduced as well as overall scan activity. The following is the time reported by NAS Parallel Benchmark using unbound openmp threads and a D size class: 4.17.0-rc1 4.17.0-rc1 vanilla stagger-v1r1 Time bt.D 442.77 ( 0.00%) 419.70 ( 5.21%) Time cg.D 171.90 ( 0.00%) 180.85 ( -5.21%) Time ep.D 33.10 ( 0.00%) 32.90 ( 0.60%) Time is.D 9.59 ( 0.00%) 9.42 ( 1.77%) Time lu.D 306.75 ( 0.00%) 304.65 ( 0.68%) Time mg.D 54.56 ( 0.00%) 52.38 ( 4.00%) Time sp.D 1020.03 ( 0.00%) 903.77 ( 11.40%) Time ua.D 400.58 ( 0.00%) 386.49 ( 3.52%) Note it's not a universal win but we have no prior knowledge of which thread matters but the number of threads created often exceeds the size of the node when the threads are not bound. However, there is a reducation of overall system CPU usage: 4.17.0-rc1 4.17.0-rc1 vanilla stagger-v1r1 sys-time-bt.D 48.78 ( 0.00%) 48.22 ( 1.15%) sys-time-cg.D 25.31 ( 0.00%) 26.63 ( -5.22%) sys-time-ep.D 1.65 ( 0.00%) 0.62 ( 62.42%) sys-time-is.D 40.05 ( 0.00%) 24.45 ( 38.95%) sys-time-lu.D 37.55 ( 0.00%) 29.02 ( 22.72%) sys-time-mg.D 47.52 ( 0.00%) 34.92 ( 26.52%) sys-time-sp.D 119.01 ( 0.00%) 109.05 ( 8.37%) sys-time-ua.D 51.52 ( 0.00%) 45.13 ( 12.40%) NUMA scan activity is also reduced: NUMA alloc local 1042828 1342670 NUMA base PTE updates 140481138 93577468 NUMA huge PMD updates 272171 180766 NUMA page range updates 279832690 186129660 NUMA hint faults 1395972 1193897 NUMA hint local faults 877925 855053 NUMA hint local percent 62 71 NUMA pages migrated 12057909 9158023 Similar observations are made for other thread-intensive workloads. System CPU usage is lower even though the headline gains in performance tend to be small. For example, specjbb 2005 shows almost no difference in performance but scan activity is reduced by a third on a 4-socket box. I didn't find a workload (thread intensive or otherwise) that suffered badly. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20180504154109.mvrha2qo5wdl65vr@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-05-14 09:12:24 +02:00
..
autogroup.c	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]	2018-05-05 08:34:42 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
completion.c	sched/completions: Use bool in try_wait_for_completion()	2018-03-09 08:00:18 +01:00
core.c	sched/numa: Stagger NUMA balancing scan periods for new threads	2018-05-14 09:12:24 +02:00
cpuacct.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c	cpufreq: schedutil: Avoid using invalid next_freq	2018-05-09 12:21:17 +02:00
cpufreq.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cputime.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
deadline.c	sched/core: Simplify helpers for rq clock update skip requests	2018-04-05 09:20:46 +02:00
debug.c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-04-02 11:49:41 -07:00
fair.c	sched/numa: Stagger NUMA balancing scan periods for new threads	2018-05-14 09:12:24 +02:00
features.h	sched/fair: Update util_est only on util_avg updates	2018-03-20 08:11:09 +01:00
idle.c	sched: idle: Select idle state before stopping the tick	2018-04-09 11:54:07 +02:00
isolation.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
loadavg.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
Makefile	sched/idle: Merge kernel/sched/idle.c and kernel/sched/idle_task.c	2018-03-04 12:39:33 +01:00
membarrier.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
rt.c	sched/core: Simplify helpers for rq clock update skip requests	2018-04-05 09:20:46 +02:00
sched-pelt.h
sched.h	sched/numa: Stagger NUMA balancing scan periods for new threads	2018-05-14 09:12:24 +02:00
stats.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
stats.h	sched: Clean up and harmonize the coding style of the scheduler code base	2018-03-03 15:50:21 +01:00
stop_task.c	sched: Clean up and harmonize the coding style of the scheduler code base	2018-03-03 15:50:21 +01:00
swait.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
topology.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
wait_bit.c	sched/wait: Improve __var_waitqueue() code generation	2018-03-20 08:23:25 +01:00
wait.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00