linux_dsm_epyc7002/arch
Ard Biesheuvel efdb25efc7 arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.

Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.

Tested using the following test program:

  #include <stdlib.h>

  extern void crc32_le(unsigned short, char const*, int);

  int main(void)
  {
    static const char buf[4096];

    srand(20181126);

    for (int i = 0; i < 100 * 1000 * 1000; i++)
      crc32_le(0, buf, rand() % 1024);

    return 0;
  }

On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from

  $ time ./crc32

  real  0m10.149s
  user  0m10.149s
  sys   0m0.000s

to

  $ time ./crc32

  real  0m7.915s
  user  0m7.915s
  sys   0m0.000s

Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:58:04 +00:00
..
alpha TTY/Serial fixes for 4.20-rc2 2018-11-10 13:32:14 -06:00
arc mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
arm Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-11-18 10:45:09 -08:00
arm64 arm64/lib: improve CRC32 performance for deep pipelines 2018-11-30 13:58:04 +00:00
c6x c6x changes for 4.20 2018-10-31 15:39:25 -07:00
csky csky: dtb Kbuild fixup patches for linux-4.20-rc1 2018-11-01 09:04:30 -07:00
h8300 mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
hexagon mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
ia64 memblock: stop using implicit alignment to SMP_CACHE_BYTES 2018-10-31 08:54:16 -07:00
m68k s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
microblaze s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
mips MIPS: Fix `dma_alloc_coherent' returning a non-coherent allocation 2018-11-05 10:08:13 -08:00
nds32 s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
nios2 mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
openrisc mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
parisc Merge branch 'parisc-4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2018-11-14 13:42:41 -06:00
powerpc powerpc/64: Fix kernel stack 16-byte alignment 2018-11-15 14:48:43 +11:00
riscv RISC-V: Silence some module warnings on 32-bit 2018-11-12 18:12:24 -08:00
s390 s390 updates for 4.20-rc2 2018-11-09 06:30:44 -06:00
sh mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2018-11-01 09:07:04 -07:00
um for-linus-20181109 2018-11-09 16:31:51 -06:00
unicore32 memblock: stop using implicit alignment to SMP_CACHE_BYTES 2018-10-31 08:54:16 -07:00
x86 perf/x86/intel/uncore: Support CoffeeLake 8th CBOX 2018-11-12 05:03:26 +01:00
xtensa Xtensa fixes for v4.20-rc3 2018-11-16 10:10:27 -06:00
.gitignore
Kconfig New gcc plugin: stackleak 2018-11-01 11:46:27 -07:00