linux_dsm_epyc7002/arch
Chris Metcalf e658a6f14d tile: avoid using clocksource_cyc2ns with absolute cycle count
For large values of "mult" and long uptimes, the intermediate
result of "cycles * mult" can overflow 64 bits.  For example,
the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
we have mult = 853, and after 208.5 days, we overflow 64 bits.

Since clocksource_cyc2ns() is intended to be used for relative
cycle counts, not absolute cycle counts, performance is more
importance than accepting a wider range of cycle values.  So,
just use mult_frac() directly in tile's sched_clock().

Commit 4cecf6d401 ("sched, x86: Avoid unnecessary overflow
in sched_clock") by Salman Qazi results in essentially the same
generated code for x86 as this change does for tile.  In fact,
a follow-on change by Salman introduced mult_frac() and switched
to using it, so the C code was largely identical at that point too.

Peter Zijlstra then added mul_u64_u32_shr() and switched x86
to use it.  This is, in principle, better; by optimizing the
64x64->64 multiplies to be 32x32->64 multiplies we can potentially
save some time.  However, the compiler piplines the 64x64->64
multiplies pretty well, and the conditional branch in the generic
mul_u64_u32_shr() causes some bubbles in execution, with the
result that it's pretty much a wash.  If tilegx provided its own
implementation of mul_u64_u32_shr() without the conditional branch,
we could potentially save 3 cycles, but that seems like small gain
for a fair amount of additional build scaffolding; no other platform
currently provides a mul_u64_u32_shr() override, and tile doesn't
currently have an <asm/div64.h> header to put the override in.

Additionally, gcc currently has an optimization bug that prevents
it from recognizing the opportunity to use a 32x32->64 multiply,
and so the result would be no better than the existing mult_frac()
until such time as the compiler is fixed.

For now, just using mult_frac() seems like the right answer.

Cc: stable@kernel.org [v3.4+]
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
2016-11-23 15:28:54 -05:00
..
alpha Merge branch 'gup_flag-cleanups' 2016-10-19 08:39:47 -07:00
arc ARC fixes for 4.9-rc5 2016-11-11 16:51:50 -08:00
arm Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm 2016-11-20 10:27:39 -08:00
arm64 ARM: SoC fixes for v4.9-rc 2016-11-19 18:40:47 -08:00
avr32 Merge branch 'akpm' (patches from Andrew) 2016-10-07 21:38:00 -07:00
blackfin Merge branch 'gup_flag-cleanups' 2016-10-19 08:39:47 -07:00
c6x nmi_backtrace: generate one-line reports for idle cpus 2016-10-07 18:46:30 -07:00
cris cris/arch-v32: cryptocop: print a hex number after a 0x prefix 2016-10-27 18:43:43 -07:00
frv Merge branch 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-11 23:38:39 -07:00
h8300 h8300: fix syscall restarting 2016-10-27 18:43:42 -07:00
hexagon nmi_backtrace: generate one-line reports for idle cpus 2016-10-07 18:46:30 -07:00
ia64 Merge branch 'gup_flag-cleanups' 2016-10-19 08:39:47 -07:00
m32r mm: replace access_process_vm() write parameter with gup_flags 2016-10-19 08:31:25 -07:00
m68k Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-10-14 14:26:58 -07:00
metag Metag architecture fixes for v4.9-rc1 2016-10-14 11:11:39 -07:00
microblaze Merge branch 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-11 23:38:39 -07:00
mips One NULL pointer dereference, and two fixes for regressions introduced 2016-11-04 13:08:05 -07:00
mn10300 Merge branch 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-11 23:38:39 -07:00
nios2 nios2: fix timer initcall return value 2016-11-11 08:45:08 -08:00
openrisc openrisc: Define __ro_after_init to avoid crash 2016-11-06 08:01:12 -08:00
parisc parisc: Ignore the pkey system calls for now 2016-11-02 23:07:14 +01:00
powerpc powerpc fixes for 4.9 #5 2016-11-19 11:21:59 -08:00
s390 Merge branch 'maybe-uninitialized' (patches from Arnd) 2016-11-11 10:03:01 -08:00
score Merge branch 'gup_flag-cleanups' 2016-10-19 08:39:47 -07:00
sh Minor changes to improve J2 support and match Kconfig expectations of 2016-10-19 11:21:06 -07:00
sparc sparc: drop duplicate header scatterlist.h 2016-11-19 10:43:07 -05:00
tile tile: avoid using clocksource_cyc2ns with absolute cycle count 2016-11-23 15:28:54 -05:00
um nmi_backtrace: generate one-line reports for idle cpus 2016-10-07 18:46:30 -07:00
unicore32 unicore32: use simpler API for random address requests 2016-10-11 15:06:32 -07:00
x86 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-11-23 08:09:21 -08:00
xtensa xtensa: wire up new pkey_{mprotect,alloc,free} syscalls 2016-11-14 12:31:49 -08:00
.gitignore
Kconfig This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00