linux_dsm_epyc7002/kernel/sched
Frederic Weisbecker 62188451f0 cputime: Avoid multiplication overflow on utime scaling
We scale stime, utime values based on rtime (sum_exec_runtime
converted to jiffies). During scaling we multiple rtime * utime,
which seems to be fine, since both values are converted to u64,
but it's not.

Let assume HZ is 1000 - 1ms tick. Process consist of 64 threads,
run for 1 day, threads utilize 100% cpu on user space. Machine
has 64 cpus.

Process rtime = utime will be 64 * 24 * 60 * 60 * 1000 jiffies,
which is 0x149970000. Multiplication rtime * utime result is
0x1a855771100000000, which can not be covered in 64 bits.

Result of overflow is stall of utime values visible in user
space (prev_utime in kernel), even if application still consume
lot of CPU time.

A solution to solve this is to perform the multiplication on
stime instead of utime. It's easy to grow the utime value fast
with a CPU bound thread in userspace for example. Now we assume
that doing so with stime is much harder. In most cases a task
shouldn't ever spend much time in kernel space as it tends to
sleep waiting for jobs completion when they take long to
achieve. IO is the typical example of that.

Hence scaling the cputime by performing the multiplication on
stime instead of utime should considerably reduce the chances of
an overflow on most workloads.

This is largely inspired by a patch from Stanislaw Gruszka:
http://lkml.kernel.org/r/20130107113144.GA7544@redhat.com

Inspired-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1359217182-25184-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-27 14:04:44 +01:00
..
auto_group.c Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c sched: Move all scheduler bits into kernel/sched/ 2011-11-17 12:20:22 +01:00
core.c wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task 2013-01-22 10:08:17 -08:00
cpupri.c sched: Fix minor code style issues 2012-07-26 11:47:00 +02:00
cpupri.h sched: Move all scheduler bits into kernel/sched/ 2011-11-17 12:20:22 +01:00
cputime.c cputime: Avoid multiplication overflow on utime scaling 2013-01-27 14:04:44 +01:00
debug.c sched: Replace update_shares weight distribution with per-entity computation 2012-10-24 10:27:28 +02:00
fair.c sched/fair: Set se->vruntime directly in place_entity() 2013-01-24 18:06:11 +01:00
features.h Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
idle_task.c sched/nohz: Rewrite and fix load-avg computation -- again 2012-07-05 20:58:13 +02:00
Makefile sched: Move cputime code to its own file 2012-08-20 13:05:17 +02:00
rt.c sched/rt: Avoid updating RT entry timeout twice within one tick period 2013-01-25 08:31:54 +01:00
sched.h Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
stats.c sched: Remove sched_switch 2012-01-27 13:28:53 +01:00
stats.h Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip 2011-12-19 19:23:15 +01:00
stop_task.c sched: Fix migration thread runtime bogosity 2012-08-13 18:41:55 +02:00