linux_dsm_epyc7002/kernel
Qais Yousef 13685c4a08 sched/uclamp: Add a new sysctl to control RT default boost value
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d99997 ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402fb6: ("sched/rt: Make RT capacity-aware")

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
2020-07-29 13:51:47 +02:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-10 18:16:22 -07:00
cgroup cgroup: fix cgroup_sk_alloc() for sk_clone_lock() 2020-07-07 13:34:11 -07:00
configs
debug kgdb: enable arch to support XML packet. 2020-07-09 20:09:28 -07:00
dma dma-pool: do not allocate pool memory from CMA 2020-07-14 15:46:32 +02:00
events Merge branch 'akpm' (patches from Andrew) 2020-06-09 09:54:46 -07:00
gcov treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
irq genirq/affinity: Handle affinity setting on inactive interrupts correctly 2020-07-17 23:30:43 +02:00
kcsan kcsan: Support distinguishing volatile accesses 2020-06-11 20:04:01 +02:00
livepatch
locking The X86 entry, exception and interrupt code rework 2020-06-13 10:05:47 -07:00
power treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
printk Revert "kernel/printk: add kmsg SEEK_CUR handling" 2020-06-21 20:47:20 -07:00
rcu A single fix for a printk format warning in RCU. 2020-07-05 12:21:28 -07:00
sched sched/uclamp: Add a new sysctl to control RT default boost value 2020-07-29 13:51:47 +02:00
time sched: nohz: stop passing around unused "ticks" parameter. 2020-07-22 10:22:04 +02:00
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-06-25 18:27:40 -07:00
.gitignore
acct.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c
cpu.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
extable.c
fail_function.c
fork.c sched/uclamp: Add a new sysctl to control RT default boost value 2020-07-29 13:51:47 +02:00
freezer.c
futex.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c
hung_task.c kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected 2020-06-08 11:05:56 -07:00
iomem.c
irq_work.c
jump_label.c
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-08 15:59:57 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: check kcov_softirq in kcov_remote_stop() 2020-06-10 19:14:17 -07:00
kexec_core.c
kexec_elf.c
kexec_file.c kexec: do not verify the signature without the lockdown or mandatory signature 2020-06-26 00:27:36 -07:00
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c kprobes: Do not expose probe addresses to non-CAP_SYSLOG 2020-07-08 16:00:22 -07:00
ksysfs.c
kthread.c Merge branch 'sched/urgent' 2020-07-08 11:38:59 +02:00
latencytop.c
Makefile Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
module_signature.c
module_signing.c
module-internal.h
module.c Refactor kallsyms_show_value() users for correct cred 2020-07-09 13:09:30 -07:00
notifier.c
nsproxy.c nsproxy: restore EINVAL for non-namespace file descriptor 2020-06-17 00:33:12 +02:00
padata.c padata: upgrade smp_mb__after_atomic to smp_mb in padata_do_serial 2020-06-18 17:09:54 +10:00
panic.c bug: Annotate WARN/BUG/stackfail as noinstr safe 2020-06-11 15:14:36 +02:00
params.c
pid_namespace.c
pid.c
profile.c
ptrace.c
range.c
reboot.c
relay.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
resource.c
rseq.c
scs.c
seccomp.c
signal.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
smp.c smp: Fix a potential usage of stale nr_cpus 2020-07-22 10:22:04 +02:00
smpboot.c
smpboot.h
softirq.c x86/entry: Clarify irq_{enter,exit}_rcu() 2020-06-11 15:15:24 +02:00
stackleak.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c Add additional LSM hooks for SafeSetID 2020-06-14 11:39:31 -07:00
sysctl_binary.c
sysctl-test.c
sysctl.c sched/uclamp: Add a new sysctl to control RT default boost value 2020-07-29 13:51:47 +02:00
task_work.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
utsname_sysctl.c
utsname.c
watch_queue.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
watchdog_hld.c
watchdog.c kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases 2020-06-08 11:05:56 -07:00
workqueue_internal.h
workqueue.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00