linux_dsm_epyc7002/include
Peter Zijlstra 10e2f1acd0 sched/core: Rewrite and improve select_idle_siblings()
select_idle_siblings() is a known pain point for a number of
workloads; it either does too much or not enough and sometimes just
does plain wrong.

This rewrite attempts to address a number of issues (but sadly not
all).

The current code does an unconditional sched_domain iteration; with
the intent of finding an idle core (on SMT hardware). The problems
which this patch tries to address are:

 - its pointless to look for idle cores if the machine is real busy;
   at which point you're just wasting cycles.

 - it's behaviour is inconsistent between SMT and !SMT hardware in
   that !SMT hardware ends up doing a scan for any idle CPU in the LLC
   domain, while SMT hardware does a scan for idle cores and if that
   fails, falls back to a scan for idle threads on the 'target' core.

The new code replaces the sched_domain scan with 3 explicit scans:

 1) search for an idle core in the LLC
 2) search for an idle CPU in the LLC
 3) search for an idle thread in the 'target' core

where 1 and 3 are conditional on SMT support and 1 and 2 have runtime
heuristics to skip the step.

Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu
goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT
siblings of the CPU going idle. Similarly, we clear
sd_llc_shared->has_idle_cores when we fail to find an idle core.

Step 2) tracks the average cost of the scan and compares this to the
average idle time guestimate for the CPU doing the wakeup. There is a
significant fudge factor involved to deal with the variability of the
averages. Esp. hackbench was sensitive to this.

Step 3) is unconditional; we assume (also per step 1) that scanning
all SMT siblings in a core is 'cheap'.

With this; SMT systems gain step 2, which cures a few benchmarks --
notably one from Facebook.

One 'feature' of the sched_domain iteration, which we preserve in the
new code, is that it would start scanning from the 'target' CPU,
instead of scanning the cpumask in cpu id order. This avoids multiple
CPUs in the LLC scanning for idle to gang up and find the same CPU
quite as much. The down side is that tasks can end up hopping across
the LLC for no apparent reason.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 11:03:09 +02:00
..
acpi treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
asm-generic Merge branch 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-09-14 09:35:05 -07:00
clocksource
crypto A number of improvements for the /dev/random driver; the most 2016-07-27 15:11:55 -07:00
drm Merge branch 'drm-next-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-next 2016-08-08 16:45:33 +10:00
dt-bindings ARM: DT updates for v4.8 2016-08-01 18:37:45 -04:00
keys
kvm KVM/ARM Changes for v4.8 - Take 2 2016-08-04 13:59:56 +02:00
linux sched/core: Rewrite and improve select_idle_siblings() 2016-09-30 11:03:09 +02:00
math-emu
media [media] cec: rename cec_devnode fhs_lock to just lock 2016-08-22 13:09:06 -03:00
memory
misc
net Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2016-09-22 02:56:23 -04:00
pcmcia
ras
rdma IB/core: Use memdup_user() rather than duplicating its implementation 2016-08-23 12:40:13 -04:00
rxrpc
scsi Merge remote-tracking branch 'mkp-scsi/4.8/scsi-fixes' into fixes 2016-08-19 07:41:12 -07:00
soc ARM: SoC driver updates for v4.8 2016-08-01 18:36:01 -04:00
sound Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux 2016-08-01 21:44:08 -04:00
target
trace Luiz Capitulino noticed that the tick_stop tracepoint wasn't being parsed 2016-08-09 10:34:09 -07:00
uapi include/uapi/linux/ipx.h: fix conflicting defitions with glibc netipx/ipx.h 2016-08-22 16:25:15 -07:00
video
xen xen: change the type of xen_vcpu_id to uint32_t 2016-08-24 18:17:27 +01:00
Kbuild