mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
133fd9f5cd
Previous code was using optimizations which were developed to work well even on narrow-word CPUs (by today's standards). But Linux runs only on 32-bit and wider CPUs. We can use that. First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly divide by 10 much larger numbers, and thus we can print groups of 9 digits instead of groups of 5 digits. Next: there are two algorithms to print larger numbers. One is generic: divide by 1000000000 and repeatedly print groups of (up to) 9 digits. It's conceptually simple, but requires an (unsigned long long) / 1000000000 division. Second algorithm splits 64-bit unsigned long long into 16-bit chunks, manipulates them cleverly and generates groups of 4 decimal digits. It so happens that it does NOT require long long division. If long is > 32 bits, division of 64-bit values is relatively easy, and we will use the first algorithm. If long long is > 64 bits (strange architecture with VERY large long long), second algorithm can't be used, and we again use the first one. Else (if long is 32 bits and long long is 64 bits) we use second one. And third: there is a simple optimization which takes fast path not only for zero as was done before, but for all one-digit numbers. In all tested cases new code is faster than old one, in many cases by 30%, in few cases by more than 50% (for example, on x86-32, conversion of 12345678). Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit case. This patch is based upon an original from Michal Nazarewicz. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Douglas W Jones <jones@cs.uiowa.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
37 lines
939 B
C
37 lines
939 B
C
#ifndef __ASM_GENERIC_BITS_PER_LONG
|
|
#define __ASM_GENERIC_BITS_PER_LONG
|
|
|
|
/*
|
|
* There seems to be no way of detecting this automatically from user
|
|
* space, so 64 bit architectures should override this in their
|
|
* bitsperlong.h. In particular, an architecture that supports
|
|
* both 32 and 64 bit user space must not rely on CONFIG_64BIT
|
|
* to decide it, but rather check a compiler provided macro.
|
|
*/
|
|
#ifndef __BITS_PER_LONG
|
|
#define __BITS_PER_LONG 32
|
|
#endif
|
|
|
|
#ifdef __KERNEL__
|
|
|
|
#ifdef CONFIG_64BIT
|
|
#define BITS_PER_LONG 64
|
|
#else
|
|
#define BITS_PER_LONG 32
|
|
#endif /* CONFIG_64BIT */
|
|
|
|
/*
|
|
* FIXME: The check currently breaks x86-64 build, so it's
|
|
* temporarily disabled. Please fix x86-64 and reenable
|
|
*/
|
|
#if 0 && BITS_PER_LONG != __BITS_PER_LONG
|
|
#error Inconsistent word size. Check asm/bitsperlong.h
|
|
#endif
|
|
|
|
#ifndef BITS_PER_LONG_LONG
|
|
#define BITS_PER_LONG_LONG 64
|
|
#endif
|
|
|
|
#endif /* __KERNEL__ */
|
|
#endif /* __ASM_GENERIC_BITS_PER_LONG */
|