mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-04 17:06:48 +07:00
1ee3630a3e
ARCH_USE_BUILTIN_BSWAP will use __builtin_bswap16(), __builtin_bswap32() and __builtin_bswap64() where available. This allows better instruction scheduling. On pre-R2 processors it will result in 32 bit and 64 bit swapping being performed in a call to a __bswapsi2() rsp. __bswapdi2() functions, so we add these, too. For a 4.2 kernel with GCC 4.9 this yields the following kernel sizes: text data bss dec hex filename 3996071 155804 88992 4240867 40b5e3 vmlinux ip22 baseline 3985687 159900 88992 4234579 409d53 vmlinux ip22 + bswap patch 6913157 378552 251024 7542733 7317cd vmlinux ip27 baseline 6878581 378552 251024 7508157 7290bd vmlinux ip27 + bswap patch 5773777 268752 187424 6229953 5f0fc1 vmlinux malta baseline 5773401 268752 187424 6229577 5f0e49 vmlinux malta + bswap patch Presumably the code size improvments yield better cache hit rate thus better performance compensating for the extra function call but this will still need to be benchmarked. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> |
||
---|---|---|
.. | ||
ashldi3.c | ||
ashrdi3.c | ||
bitops.c | ||
bswapdi.c | ||
bswapsi.c | ||
cmpdi2.c | ||
csum_partial.S | ||
delay.c | ||
dump_tlb.c | ||
iomap-pci.c | ||
iomap.c | ||
libgcc.h | ||
lshrdi3.c | ||
Makefile | ||
memcpy.S | ||
memset.S | ||
mips-atomic.c | ||
r3k_dump_tlb.c | ||
strlen_user.S | ||
strncpy_user.S | ||
strnlen_user.S | ||
ucmpdi2.c | ||
uncached.c |