linux_dsm_epyc7002/include
Denys Vlasenko 133fd9f5cd vsprintf: further optimize decimal conversion
Previous code was using optimizations which were developed to work well
even on narrow-word CPUs (by today's standards).  But Linux runs only on
32-bit and wider CPUs.  We can use that.

First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
divide by 10 much larger numbers, and thus we can print groups of 9 digits
instead of groups of 5 digits.

Next: there are two algorithms to print larger numbers.  One is generic:
divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
It's conceptually simple, but requires an (unsigned long long) /
1000000000 division.

Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
manipulates them cleverly and generates groups of 4 decimal digits.  It so
happens that it does NOT require long long division.

If long is > 32 bits, division of 64-bit values is relatively easy, and we
will use the first algorithm.  If long long is > 64 bits (strange
architecture with VERY large long long), second algorithm can't be used,
and we again use the first one.

Else (if long is 32 bits and long long is 64 bits) we use second one.

And third: there is a simple optimization which takes fast path not only
for zero as was done before, but for all one-digit numbers.

In all tested cases new code is faster than old one, in many cases by 30%,
in few cases by more than 50% (for example, on x86-32, conversion of
12345678).  Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
case.

This patch is based upon an original from Michal Nazarewicz.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
..
acpi ACPI: Add _PLD support 2012-05-11 17:03:12 -07:00
asm-generic vsprintf: further optimize decimal conversion 2012-05-31 17:49:27 -07:00
crypto
drm introduce SIZE_MAX 2012-05-31 17:49:26 -07:00
keys KEYS: Permit in-place link replacement in keyring list 2012-05-11 10:56:56 +01:00
linux introduce SIZE_MAX 2012-05-31 17:49:26 -07:00
math-emu
media [media] patch for Asus My Cinema PS3-100 (1043:48cd) 2012-05-20 16:05:02 -03:00
memory
misc
mtd UBI: amend commentaries WRT dtype 2012-05-20 20:25:59 +03:00
net memcg: decrement static keys at real destroy time 2012-05-29 16:22:28 -07:00
pcmcia
rdma Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus 2012-05-21 09:00:47 -07:00
rxrpc
scsi isci update for 3.5 2012-05-21 12:17:30 +01:00
sound ASoC: Last minute updates 2012-05-22 02:58:55 +02:00
target target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute 2012-05-17 00:45:58 -07:00
trace mm: vmscan: remove reclaim_mode_t 2012-05-29 16:22:19 -07:00
video arm-soc: First batch of cleanups 2012-05-22 09:23:24 -07:00
xen xen: do not map the same GSI twice in PVHVM guests. 2012-05-21 14:11:36 -04:00
Kbuild