linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 10:26:42 +07:00

History

Vladimir Davydov 685207963b sched: Move h_load calculation to task_h_load() The bad thing about update_h_load(), which computes hierarchical load factor for task groups, is that it is called for each task group in the system before every load balancer run, and since rebalance can be triggered very often, this function can eat really a lot of cpu time if there are many cpu cgroups in the system. Although the situation was improved significantly by commit `a35b646` ('sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies'), the problem still can arise under some kinds of loads, e.g. when cpus are switching from idle to busy and back very frequently. For instance, when I start 1000 of processes that wake up every millisecond on my 8 cpus host, 'top' and 'perf top' show: Cpu(s): 17.8%us, 24.3%sy, 0.0%ni, 57.9%id, 0.0%wa, 0.0%hi, 0.0%si Events: 243K cycles 7.57% [kernel] [k] __schedule 7.08% [kernel] [k] timerqueue_add 6.13% libc-2.12.so [.] usleep Then if I create 10000 idle cpu cgroups (no processes in them), cpu usage increases significantly although the 'wakers' are still executing in the root cpu cgroup: Cpu(s): 19.1%us, 48.7%sy, 0.0%ni, 31.6%id, 0.0%wa, 0.0%hi, 0.7%si Events: 230K cycles 24.56% [kernel] [k] tg_load_down 5.76% [kernel] [k] __schedule This happens because this particular kind of load triggers 'new idle' rebalance very frequently, which requires calling update_h_load(), which, in turn, calls tg_load_down() for every idle cpu cgroup even though it is absolutely useless, because idle cpu cgroups have no tasks to pull. This patch tries to improve the situation by making h_load calculation proceed only when h_load is really necessary. To achieve this, it substitutes update_h_load() with update_cfs_rq_h_load(), which computes h_load only for a given cfs_rq and all its ascendants, and makes the load balancer call this function whenever it considers if a task should be pulled, i.e. it moves h_load calculations directly to task_h_load(). For h_load of the same cfs_rq not to be updated multiple times (in case several tasks in the same cgroup are considered during the same balance run), the patch keeps the time of the last h_load update for each cfs_rq and breaks calculation when it finds h_load to be uptodate. The benefit of it is that h_load is computed only for those cfs_rq's, which really need it, in particular all idle task groups are skipped. Although this, in fact, moves h_load calculation under rq lock, it should not affect latency much, because the amount of work done under rq lock while trying to pull tasks is limited by sched_nr_migrate. After the patch applied with the setup described above (1000 wakers in the root cgroup and 10000 idle cgroups), I get: Cpu(s): 16.9%us, 24.8%sy, 0.0%ni, 58.4%id, 0.0%wa, 0.0%hi, 0.0%si Events: 242K cycles 7.57% [kernel] [k] __schedule 6.70% [kernel] [k] timerqueue_add 5.93% libc-2.12.so [.] usleep Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1373896159-1278-1-git-send-email-vdavydov@parallels.com Signed-off-by: Ingo Molnar <mingo@kernel.org>		2013-07-23 12:18:41 +02:00
..
cpu	idle: Enable interrupts in the weak arch_cpu_idle() implementation	2013-06-14 23:01:05 +02:00
debug	kgdb/sysrq: fix inconstistent help message of sysrq key	2013-04-30 17:04:10 -07:00
events	perf: Update perf_event_type documentation	2013-07-23 12:17:08 +02:00
gcov	kernel/gcov: remove depends on CONFIG_EXPERIMENTAL	2013-01-11 11:39:33 -08:00
irq	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-07-13 15:37:30 -07:00
power	Merge branch 'akpm' (updates from Andrew Morton)	2013-07-03 17:12:13 -07:00
sched	sched: Move h_load calculation to task_h_load()	2013-07-23 12:18:41 +02:00
time	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
trace	The majority of the changes here are cleanups for the large changes that	2013-07-11 09:02:09 -07:00
.gitignore	kernel/hz.bc: ignore.	2013-04-22 07:09:06 -07:00
acct.c	fs: Fix hang with BSD accounting on frozen filesystem	2013-05-04 14:57:58 -04:00
async.c	async: rename and redefine async_func_ptr	2013-03-12 13:59:14 -07:00
audit_tree.c	kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()	2013-06-12 16:29:46 -07:00
audit_watch.c	audit: catch possible NULL audit buffers	2013-01-11 14:54:55 -08:00
audit.c	audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE	2013-06-12 16:29:45 -07:00
audit.h	audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record	2013-07-09 10:33:19 -07:00
auditfilter.c	audit: Fix decimal constant description	2013-07-09 10:33:19 -07:00
auditsc.c	audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record	2013-07-09 10:33:19 -07:00
backtracetest.c
bounds.c
capability.c	Add file_ns_capable() helper function for open-time capability checking	2013-04-14 10:06:31 -07:00
cgroup_freezer.c	cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()	2012-11-19 08:13:38 -08:00
cgroup.c	cgroup: we can use simple_lookup() now	2013-07-14 17:50:23 +04:00
compat.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal	2013-05-01 07:21:43 -07:00
configs.c	proc: Supply PDE attribute setting accessor functions	2013-05-01 17:29:18 -04:00
context_tracking.c	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2013-06-20 08:18:35 -10:00
cpu_pm.c
cpu.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
cpuset.c	Merge branch 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2013-07-02 20:04:25 -07:00
crash_dump.c
cred.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2012-12-18 10:55:28 -08:00
delayacct.c	cputime: Use accessors to read task cputime stats	2013-01-27 19:23:31 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c	ptrace: revert "Prepare to fix racy accesses on task breakpoints"	2013-07-09 10:33:26 -07:00
extable.c	extable: Flip the sorting message	2013-04-15 13:25:16 +02:00
fork.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
freezer.c	freezer: skip waking up tasks with PF_FREEZER_SKIP set	2013-05-12 14:16:22 +02:00
futex_compat.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal	2013-02-23 18:50:11 -08:00
futex.c	futex: Use freezable blocking call	2013-06-25 23:11:19 +02:00
groups.c
hrtimer.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
hung_task.c
irq_work.c	Merge branch 'nohz/printk-v8' into irq/core	2013-02-05 00:48:46 +01:00
itimer.c
jump_label.c
kallsyms.c	kernel: kallsyms: memory override issue, need check destination buffer length	2013-04-15 15:17:26 +09:30
kcmp.c	kcmp: include linux/ptrace.h	2012-12-20 17:40:19 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks	locking: Fix copy/paste errors of "ARCH_INLINE_*_UNLOCK_BH"	2013-05-28 08:50:00 +02:00
Kconfig.preempt
kexec.c	kexec: Use min() and min_t() to simplify logic	2013-04-30 17:04:07 -07:00
kmod.c	usermodehelper: kill the sub_info->path[0] check	2013-07-03 16:08:02 -07:00
kprobes.c	kprobes/x86: Call out into INT3 handler directly instead of using notifier	2013-07-23 10:12:57 +02:00
ksysfs.c	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2012-12-11 18:10:49 -08:00
kthread.c	kthread: implement probe_kthread_data()	2013-04-30 17:04:02 -07:00
latencytop.c
lglock.c
lockdep_internals.h
lockdep_proc.c	lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name()	2012-10-24 12:39:09 +02:00
lockdep_states.h
lockdep.c	lockdep: remove task argument from debug_check_no_locks_held	2013-05-12 14:16:21 +02:00
Makefile	reboot: move shutdown/reboot related functions to kernel/reboot.c	2013-07-09 10:33:29 -07:00
modsign_certificate.S	CONFIG_SYMBOL_PREFIX: cleanup.	2013-03-15 15:09:43 +10:30
modsign_pubkey.c	keys: use keyring_alloc() to create module signing keyring	2012-12-20 17:40:21 -08:00
module_signing.c	MODSIGN: Don't use enum-type bitfields in module signature info block	2012-12-05 11:27:24 +10:30
module-internal.h	MODSIGN: Move the magic string to the end of a module and eliminate the search	2012-10-19 17:30:40 -07:00
module.c	Nothing interesting. Except the most embarrassing bugfix ever. But let's	2013-07-10 14:51:41 -07:00
mutex-debug.c
mutex-debug.h
mutex.c	mutex: Move ww_mutex definitions to ww_mutex.h	2013-07-12 12:07:46 +02:00
mutex.h
notifier.c
nsproxy.c	proc: Split the namespace stuff out into linux/proc_ns.h	2013-05-01 17:29:39 -04:00
padata.c	padata: use __this_cpu_read per-cpu helper	2012-12-06 17:16:23 +08:00
panic.c	The majority of the changes here are cleanups for the large changes that	2013-07-11 09:02:09 -07:00
params.c	There is no /sys/parameters	2013-07-02 15:38:19 +09:30
pid_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2013-05-01 17:51:54 -07:00
pid.c	kernel/pid.c: move statement	2013-07-03 16:08:05 -07:00
posix-cpu-timers.c	posix_timers: fix racy timer delta caching on task exit	2013-07-03 16:54:42 +02:00
posix-timers.c	posix-timers: Remove unused variable	2013-04-18 12:51:19 +02:00
printk.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
profile.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
ptrace.c	ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)	2013-07-09 10:33:26 -07:00
range.c	range: Do not add new blank slot with add_range_with_merge	2013-06-18 11:32:10 -05:00
rcu.h	rcu: Provide RCU CPU stall warnings for tiny RCU	2013-01-28 22:06:21 -08:00
rcupdate.c	Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD	2013-06-10 13:46:44 -07:00
rcutiny_plugin.h	rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs	2013-06-10 13:45:53 -07:00
rcutiny.c	rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs	2013-06-10 13:45:53 -07:00
rcutorture.c	rcu: delete __cpuinit usage from all rcu files	2013-07-14 19:36:58 -04:00
rcutree_plugin.h	rcu: delete __cpuinit usage from all rcu files	2013-07-14 19:36:58 -04:00
rcutree_trace.c	rcutrace: single_open() leaks	2013-05-05 00:16:35 -04:00
rcutree.c	rcu: delete __cpuinit usage from all rcu files	2013-07-14 19:36:58 -04:00
rcutree.h	rcu: delete __cpuinit usage from all rcu files	2013-07-14 19:36:58 -04:00
reboot.c	reboot: move arch/x86 reboot= handling to generic kernel	2013-07-09 10:33:29 -07:00
relay.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
res_counter.c	res_counter: return amount of charges after res_counter_uncharge()	2012-12-18 15:02:12 -08:00
resource.c	kernel/resource.c: remove the unneeded assignment in function __find_resource	2013-07-03 16:08:06 -07:00
rtmutex_common.h
rtmutex-debug.c	sched/rt: Move rt specific bits into new header file	2013-02-07 20:51:08 +01:00
rtmutex-debug.h
rtmutex-tester.c	locking/rtmutex/tester: Set correct permissions on sysfs files	2013-04-10 14:48:37 +02:00
rtmutex.c	rtmutex: Document rt_mutex_adjust_prio_chain()	2013-05-28 09:23:52 +02:00
rtmutex.h
rwsem.c	Revert "rw_semaphore: remove up/down_read_non_owner"	2013-03-23 15:53:52 -07:00
seccomp.c	seccomp: allow BPF_XOR based ALU instructions.	2013-03-26 11:07:19 +11:00
semaphore.c	semaphore: use `bool' type for semaphore_waiter's up	2013-04-30 17:04:08 -07:00
signal.c	sigtimedwait: use freezable blocking call	2013-05-12 14:16:23 +02:00
smp.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
smpboot.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
smpboot.h
softirq.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
spinlock.c
srcu.c	srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock()	2013-02-07 15:19:36 -08:00
stacktrace.c
stop_machine.c	stop_machine: Mark per cpu stopper enabled early	2013-02-26 22:25:17 +01:00
sys_ni.c	unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE	2013-05-09 13:46:38 -04:00
sys.c	reboot: move shutdown/reboot related functions to kernel/reboot.c	2013-07-09 10:33:29 -07:00
sysctl_binary.c	kernel: remove unnecessary head file	2013-06-26 18:01:46 +09:00
sysctl.c	Merge branch 'linus' into timers/urgent	2013-07-12 12:34:42 +02:00
task_work.c	task_work: task_work_add() should not succeed after exit_task_work()	2012-09-13 16:47:34 +02:00
taskstats.c	taskstats: cgroupstats_user_cmd() may leak on error	2012-10-06 03:05:31 +09:00
test_kprobes.c	kernel/: rename random32() to prandom_u32()	2013-04-29 18:28:42 -07:00
time.c	sched: Rename sched.c as sched/core.c in comments and Documentation	2013-06-19 12:58:42 +02:00
timeconst.bc	kernel: Replace timeconst.pl with a bc script	2013-02-16 23:17:25 +01:00
timer.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00
tracepoint.c	Tracing updates for Linux 3.10	2013-04-29 13:55:38 -07:00
tsacct.c	cputime: Use accessors to read task cputime stats	2013-01-27 19:23:31 +01:00
uid16.c	make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect	2013-03-03 22:58:33 -05:00
up.c
user_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2013-05-01 17:51:54 -07:00
user-return-notifier.c	hlist: drop the node parameter from iterators	2013-02-27 19:10:24 -08:00
user.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2013-05-01 17:51:54 -07:00
utsname_sysctl.c	kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block	2013-02-27 19:10:22 -08:00
utsname.c	proc: Split the namespace stuff out into linux/proc_ns.h	2013-05-01 17:29:39 -04:00
wait.c	Add wait_on_atomic_t() and wake_up_atomic_t()	2013-05-15 13:50:38 +01:00
watchdog.c	watchdog: Boot-disable by default on full dynticks	2013-06-20 15:46:32 +02:00
workqueue_internal.h	sched: Rename sched.c as sched/core.c in comments and Documentation	2013-06-19 12:58:42 +02:00
workqueue.c	kernel: delete __cpuinit usage from all core kernel files	2013-07-14 19:36:59 -04:00