linux_dsm_epyc7002/arch/x86/crypto
Ard Biesheuvel bf93113d46 crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
[ Upstream commit 86ad60a65f29dd862a11c22bb4b5be28d6c5cef1 ]

The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.

Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.

As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)

Fixes: 9697fa39ef ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-20 10:43:43 +01:00
..
.gitignore
aegis128-aesni-asm.S
aegis128-aesni-glue.c
aes_ctrby8_avx-x86_64.S crypto: x86 - Remove include/asm/inst.h 2020-07-16 21:49:07 +10:00
aes_glue.c
aesni-intel_asm.S crypto: x86/aes-ni-xts - use direct calls to and 4-way stride 2021-03-20 10:43:43 +01:00
aesni-intel_avx-x86_64.S crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg 2021-03-20 10:43:43 +01:00
aesni-intel_glue.c crypto: x86/aes-ni-xts - use direct calls to and 4-way stride 2021-03-20 10:43:43 +01:00
blake2s-core.S
blake2s-glue.c crypto: algapi - Remove skbuff.h inclusion 2020-08-20 14:04:28 +10:00
blowfish_glue.c
blowfish-x86_64-asm_64.S
camellia_aesni_avx2_glue.c
camellia_aesni_avx_glue.c
camellia_glue.c
camellia-aesni-avx2-asm_64.S
camellia-aesni-avx-asm_64.S
camellia-x86_64-asm_64.S
cast5_avx_glue.c
cast5-avx-x86_64-asm_64.S
cast6_avx_glue.c
cast6-avx-x86_64-asm_64.S
chacha_glue.c crypto: algapi - Remove skbuff.h inclusion 2020-08-20 14:04:28 +10:00
chacha-avx2-x86_64.S
chacha-avx512vl-x86_64.S
chacha-ssse3-x86_64.S crypto: x86/chacha-sse3 - use unaligned loads for state array 2020-07-16 21:49:04 +10:00
crc32-pclmul_asm.S crypto: x86 - Remove include/asm/inst.h 2020-07-16 21:49:07 +10:00
crc32-pclmul_glue.c
crc32c-intel_glue.c crypto: x86/crc32c-intel - Use CRC32 mnemonic 2020-08-21 14:45:28 +10:00
crc32c-pcl-intel-asm_64.S crypto: x86/crc32c - fix building with clang ias 2020-07-23 17:34:16 +10:00
crct10dif-pcl-asm_64.S
crct10dif-pclmul_glue.c
curve25519-x86_64.c crypto: curve25519-x86_64 - Use XORL r32,32 2020-09-11 14:39:13 +10:00
des3_ede_glue.c
des3_ede-asm_64.S
ghash-clmulni-intel_asm.S crypto: x86 - Remove include/asm/inst.h 2020-07-16 21:49:07 +10:00
ghash-clmulni-intel_glue.c
glue_helper-asm-avx2.S
glue_helper-asm-avx.S
glue_helper.c
Makefile
nh-avx2-x86_64.S
nh-sse2-x86_64.S
nhpoly1305-avx2-glue.c crypto: algapi - Remove skbuff.h inclusion 2020-08-20 14:04:28 +10:00
nhpoly1305-sse2-glue.c crypto: algapi - Remove skbuff.h inclusion 2020-08-20 14:04:28 +10:00
poly1305_glue.c crypto: x86/poly1305 - add back a needed assignment 2020-10-24 09:38:32 +11:00
poly1305-x86_64-cryptogams.pl crypto: poly1305-x86_64 - Use XORL r32,32 2020-09-11 14:39:13 +10:00
serpent_avx2_glue.c
serpent_avx_glue.c
serpent_sse2_glue.c
serpent-avx2-asm_64.S
serpent-avx-x86_64-asm_64.S
serpent-sse2-i586-asm_32.S
serpent-sse2-x86_64-asm_64.S
sha1_avx2_x86_64_asm.S
sha1_ni_asm.S
sha1_ssse3_asm.S
sha1_ssse3_glue.c
sha256_ni_asm.S
sha256_ssse3_glue.c
sha256-avx2-asm.S
sha256-avx-asm.S
sha256-ssse3-asm.S
sha512_ssse3_glue.c
sha512-avx2-asm.S
sha512-avx-asm.S
sha512-ssse3-asm.S
twofish_avx_glue.c
twofish_glue_3way.c
twofish_glue.c
twofish-avx-x86_64-asm_64.S
twofish-i586-asm_32.S
twofish-x86_64-asm_64-3way.S
twofish-x86_64-asm_64.S