linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-25 10:30:54 +07:00

History

Mel Gorman 52262ee567 sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression The following XFS commit: `8ab39f11d9` ("xfs: prevent CIL push holdoff in log recovery") changed the logic from using bound workqueues to using unbound workqueues. Functionally this makes sense but it was observed at the time that the dbench performance dropped quite a lot and CPU migrations were increased. The current pattern of the task migration is straight-forward. With XFS, an IO issuer delegates work to xlog_cil_push_work ()on an unbound kworker. This runs on a nearby CPU and on completion, dbench wakes up on its old CPU as it is still idle and no migration occurs. dbench then queues the real IO on the blk_mq_requeue_work() work item which runs on a bound kworker which is forced to run on the same CPU as dbench. When IO completes, the bound kworker wakes dbench but as the kworker is a bound but, real task, the CPU is not considered idle and dbench gets migrated by select_idle_sibling() to a new CPU. dbench may ping-pong between two CPUs for a while but ultimately it starts a round-robin of all CPUs sharing the same LLC. High-frequency migration on each IO completion has poor performance overall. It has negative implications both in commication costs and power management. mpstat confirmed that at low thread counts that all CPUs sharing an LLC has low level of activity. Note that even if the CIL patch was reverted, there still would be migrations but the impact is less noticeable. It turns out that individually the scheduler, XFS, blk-mq and workqueues all made sensible decisions but in combination, the overall effect was sub-optimal. This patch special cases the IO issue/completion pattern and allows a bound kworker waker and a task wakee to stack on the same CPU if there is a strong chance they are directly related. The expectation is that the kworker is likely going back to sleep shortly. This is not guaranteed as the IO could be queued asynchronously but there is a very strong relationship between the task and kworker in this case that would justify stacking on the same CPU instead of migrating. There should be few concerns about kworker starvation given that the special casing is only when the kworker is the waker. DBench on XFS MMTests config: io-dbench4-async modified to run on a fresh XFS filesystem UMA machine with 8 cores sharing LLC 5.5.0-rc7 5.5.0-rc7 tipsched-20200124 kworkerstack Amean 1 22.63 ( 0.00%) 20.54 * 9.23%* Amean 2 25.56 ( 0.00%) 23.40 * 8.44%* Amean 4 28.63 ( 0.00%) 27.85 * 2.70%* Amean 8 37.66 ( 0.00%) 37.68 ( -0.05%) Amean 64 469.47 ( 0.00%) 468.26 ( 0.26%) Stddev 1 1.00 ( 0.00%) 0.72 ( 28.12%) Stddev 2 1.62 ( 0.00%) 1.97 ( -21.54%) Stddev 4 2.53 ( 0.00%) 3.58 ( -41.19%) Stddev 8 5.30 ( 0.00%) 5.20 ( 1.92%) Stddev 64 86.36 ( 0.00%) 94.53 ( -9.46%) NUMA machine, 48 CPUs total, 24 CPUs share cache 5.5.0-rc7 5.5.0-rc7 tipsched-20200124 kworkerstack-v1r2 Amean 1 58.69 ( 0.00%) 30.21 * 48.53%* Amean 2 60.90 ( 0.00%) 35.29 * 42.05%* Amean 4 66.77 ( 0.00%) 46.55 * 30.28%* Amean 8 81.41 ( 0.00%) 68.46 * 15.91%* Amean 16 113.29 ( 0.00%) 107.79 * 4.85%* Amean 32 199.10 ( 0.00%) 198.22 * 0.44%* Amean 64 478.99 ( 0.00%) 477.06 * 0.40%* Amean 128 1345.26 ( 0.00%) 1372.64 * -2.04%* Stddev 1 2.64 ( 0.00%) 4.17 ( -58.08%) Stddev 2 4.35 ( 0.00%) 5.38 ( -23.73%) Stddev 4 6.77 ( 0.00%) 6.56 ( 3.00%) Stddev 8 11.61 ( 0.00%) 10.91 ( 6.04%) Stddev 16 18.63 ( 0.00%) 19.19 ( -3.01%) Stddev 32 38.71 ( 0.00%) 38.30 ( 1.06%) Stddev 64 100.28 ( 0.00%) 91.24 ( 9.02%) Stddev 128 186.87 ( 0.00%) 160.34 ( 14.20%) Dbench has been modified to report the time to complete a single "load file". This is a more meaningful metric for dbench that a throughput metric as the benchmark makes many different system calls that are not throughput-related Patch shows a 9.23% and 48.53% reduction in the time to process a load file with the difference partially explained by the number of CPUs sharing a LLC. In a separate run, task migrations were almost eliminated by the patch for low client counts. In case people have issue with the metric used for the benchmark, this is a comparison of the throughputs as reported by dbench on the NUMA machine. dbench4 Throughput (misleading but traditional) 5.5.0-rc7 5.5.0-rc7 tipsched-20200124 kworkerstack-v1r2 Hmean 1 321.41 ( 0.00%) 617.82 * 92.22%* Hmean 2 622.87 ( 0.00%) 1066.80 * 71.27%* Hmean 4 1134.56 ( 0.00%) 1623.74 * 43.12%* Hmean 8 1869.96 ( 0.00%) 2212.67 * 18.33%* Hmean 16 2673.11 ( 0.00%) 2806.13 * 4.98%* Hmean 32 3032.74 ( 0.00%) 3039.54 ( 0.22%) Hmean 64 2514.25 ( 0.00%) 2498.96 * -0.61%* Hmean 128 1778.49 ( 0.00%) 1746.05 * -1.82%* Note that this is somewhat specific to XFS and ext4 shows no performance difference as it does not rely on kworkers in the same way. No major problem was observed running other workloads on different machines although not all tests have completed yet. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200128154006.GD3466@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>		2020-02-10 11:24:37 +01:00
..
bpf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2019-12-22 09:54:33 -08:00
cgroup	Merge branch 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2019-11-25 19:23:46 -08:00
configs
debug	kdb: Tweak escape handling for vi users	2019-10-28 12:08:29 +00:00
dma	lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr	2019-12-04 19:44:13 -08:00
events	perf/core: Add SRCU annotation for pmus list walk	2019-12-17 13:32:46 +01:00
gcov	um: Enable CONFIG_CONSTRUCTORS	2019-09-15 21:37:13 +02:00
irq	irqchip updates for Linux 5.5	2019-11-20 14:16:34 +01:00
livepatch	New tracing features:	2019-11-27 11:42:01 -08:00
locking	Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"	2019-12-11 00:27:43 +01:00
power	Additional power management updates for 5.5-rc1	2019-12-04 10:48:09 -08:00
printk	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 09:29:50 -08:00
rcu	Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD	2019-10-30 08:47:13 -07:00
sched	sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression	2020-02-10 11:24:37 +01:00
time	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 12:20:25 -08:00
trace	tracing: Fix endianness bug in histogram trigger	2019-12-21 16:08:59 -05:00
.gitignore	Provide in-kernel headers to make extending kernel easier	2019-04-29 16:48:03 +02:00
acct.c	acct_on(): don't mess with freeze protection	2019-04-04 21:04:13 -04:00
async.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
audit_fsnotify.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
audit_tree.c	fsnotify: switch send_to_group() and ->handle_event to const struct qstr *	2019-04-26 13:51:03 -04:00
audit_watch.c	audit_get_nd(): don't unlock parent too early	2019-11-10 11:56:55 -05:00
audit.c	audit: remove redundant condition check in kauditd_thread()	2019-10-25 11:48:14 -04:00
audit.h	audit/stable-5.3 PR 20190702	2019-07-08 18:55:42 -07:00
auditfilter.c	audit/stable-5.3 PR 20190702	2019-07-08 18:55:42 -07:00
auditsc.c	Revert "bpf: Emit audit messages upon successful prog load and unload"	2019-11-23 09:56:02 -08:00
backtracetest.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
bounds.c
capability.c	LSM: add SafeSetID module that gates setid calls	2019-01-25 11:22:43 -08:00
compat.c	y2038: itimer: compat handling to itimer.c	2019-11-15 14:38:30 +01:00
configs.c	kernel/configs: Replace GPL boilerplate code with SPDX identifier	2019-07-30 18:34:15 +02:00
context_tracking.c	context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled()	2019-10-29 10:01:12 +01:00
cpu_pm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282	2019-06-05 17:36:37 +02:00
cpu.c	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order	2019-12-17 13:32:50 +01:00
crash_core.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230	2019-06-19 17:09:06 +02:00
crash_dump.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
cred.c	Merge branch 'access-creds'	2019-07-25 08:36:29 -07:00
delayacct.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25	2019-05-21 11:52:39 +02:00
dma.c
elfcore.c	kernel/elfcore.c: include proper prototypes	2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c	Pipework for general notification queue	2019-11-30 14:12:13 -08:00
extable.c	bpf: Add support for BTF pointers to x86 JIT	2019-10-17 16:44:36 +02:00
fail_function.c	fail_function: no need to check return value of debugfs_create functions	2019-06-03 15:49:06 +02:00
fork.c	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 12:20:25 -08:00
freezer.c	Revert "libata, freezer: avoid block device removal while system is frozen"	2019-10-06 09:11:37 -06:00
futex.c	futex: Prevent exit livelock	2019-11-20 09:40:38 +01:00
gen_kheaders.sh	kheaders: explain why include/config/autoconf.h is excluded from md5sum	2019-11-11 20:10:01 +09:00
groups.c
hung_task.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
iomem.c	mm/nvdimm: add is_ioremap_addr and use that to check ioremap address	2019-07-12 11:05:40 -07:00
irq_work.c	irq_work: Fix IRQ_WORK_BUSY bit clearing	2019-11-15 10:48:37 +01:00
jump_label.c	jump_label: Don't warn on __exit jump entries	2019-08-29 15:10:10 +01:00
kallsyms.c	kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol	2019-08-27 16:19:56 +01:00
kcmp.c
Kconfig.freezer	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Kconfig.hz	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Kconfig.locks	sched/rt, locking: Use CONFIG_PREEMPTION	2019-12-08 14:37:36 +01:00
Kconfig.preempt	sched/Kconfig: Fix spelling mistake in user-visible help text	2019-11-12 11:35:32 +01:00
kcov.c	kcov: remote coverage support	2019-12-04 19:44:14 -08:00
kexec_core.c	kexec: bail out upon SIGKILL when allocating memory.	2019-09-25 17:51:40 -07:00
kexec_elf.c	kexec_elf: support 32 bit ELF files	2019-09-06 23:58:44 +02:00
kexec_file.c	kexec: Fix pointer-to-int-cast warnings	2019-11-01 21:42:58 +01:00
kexec_internal.h
kexec.c	kexec_load: Disable at runtime if the kernel is locked down	2019-08-19 21:54:15 -07:00
kheaders.c	kheaders: Move from proc to sysfs	2019-05-24 20:16:01 +02:00
kmod.c
kprobes.c	Tracing updates:	2019-09-20 11:19:48 -07:00
ksysfs.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 170	2019-05-30 11:26:39 -07:00
kthread.c	kthread: make __kthread_queue_delayed_work static	2019-10-16 09:20:58 -07:00
latencytop.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
Makefile	Kbuild updates for v5.5	2019-12-02 17:35:04 -08:00
module_signature.c	MODSIGN: Export module signature definitions	2019-08-05 18:39:56 -04:00
module_signing.c	MODSIGN: Export module signature definitions	2019-08-05 18:39:56 -04:00
module-internal.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
module.c	This contains 3 changes:	2019-12-11 12:22:38 -08:00
notifier.c	kernel/notifier.c: remove blocking_notifier_chain_cond_register()	2019-12-04 19:44:12 -08:00
nsproxy.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
padata.c	padata: remove cpu_index from the parallel_queue	2019-09-13 21:15:41 +10:00
panic.c	locking/refcount: Remove unused 'refcount_error_report()' function	2019-11-25 09:15:42 +01:00
params.c	lockdown: Lock down module params that specify hardware parameters (eg. ioport)	2019-08-19 21:54:16 -07:00
pid_namespace.c	fork: extend clone3() to support setting a PID	2019-11-15 23:49:22 +01:00
pid.c	fork: extend clone3() to support setting a PID	2019-11-15 23:49:22 +01:00
profile.c	kernel/profile.c: use cpumask_available to check for NULL cpumask	2019-12-04 19:44:12 -08:00
ptrace.c	ptrace: add PTRACE_GET_SYSCALL_INFO request	2019-07-16 19:23:24 -07:00
range.c
reboot.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
relay.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-03-12 13:27:20 -07:00
resource.c	mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range()	2019-09-24 15:54:09 -07:00
rseq.c	signal: Remove task parameter from force_sig	2019-05-27 09:36:28 -05:00
seccomp.c	seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE	2019-10-10 14:45:51 -07:00
signal.c	cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()	2019-10-11 08:39:57 -07:00
smp.c	smp: Warn on function calls from softirq context	2019-07-20 11:27:16 +02:00
smpboot.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
smpboot.h
softirq.c	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-07-08 11:01:13 -07:00
stackleak.c	stackleak: Mark stackleak_track_stack() as notrace	2018-12-05 19:31:44 -08:00
stacktrace.c	stacktrace: Get rid of unneeded '!!' pattern	2019-11-11 10:30:59 +01:00
stop_machine.c	stop_machine: Make stop_cpus() static	2020-01-17 10:19:21 +01:00
sys_ni.c	y2038: allow disabling time32 system calls	2019-11-15 14:38:30 +01:00
sys.c	kernel/sys.c: avoid copying possible padding bytes in copy_to_user	2019-12-04 19:44:12 -08:00
sysctl_binary.c	sysctl: Remove the sysctl system call	2019-11-26 13:03:56 -06:00
sysctl-test.c	kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()	2019-09-30 17:35:01 -06:00
sysctl.c	kernel: sysctl: make drop_caches write-only	2019-12-01 12:59:07 -08:00
task_work.c
taskstats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
test_kprobes.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25	2019-05-21 11:52:39 +02:00
torture.c	torture: Remove exporting of internal functions	2019-08-01 14:30:22 -07:00
tracepoint.c	The main changes in this release include:	2019-07-18 11:51:00 -07:00
tsacct.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
ucount.c	proc/sysctl: add shared variables for range check	2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
up.c	smp: Remove smp_call_function() and on_each_cpu() return values	2019-06-23 14:26:26 +02:00
user_namespace.c	Keyrings namespacing	2019-07-08 19:36:47 -07:00
user-return-notifier.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
user.c	Keyrings namespacing	2019-07-08 19:36:47 -07:00
utsname_sysctl.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
utsname.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
watchdog_hld.c	kernel/watchdog_hld.c: hard lockup message should end with a newline	2019-04-19 09:46:05 -07:00
watchdog.c	watchdog: Remove soft_lockup_hrtimer_cnt and related code	2020-01-17 10:19:19 +01:00
workqueue_internal.h	sched/core, workqueues: Distangle worker accounting from rq lock	2019-04-16 16:55:15 +02:00
workqueue.c	Linux 5.5-rc3	2019-12-25 10:41:37 +01:00