linux_dsm_epyc7002/arch/x86/crypto
Jussi Kivilinna 0b95ec56ae crypto: camellia - add assembler implementation for x86_64
Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
functions are provided. First set is regular 'one-block at time' encrypt/decrypt
functions. Second is 'two-block at time' functions that gain performance increase
on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
functions with in-order CPUs.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

AMD Phenom II 1055T (fam:16, model:10):

camellia-asm vs camellia_generic:
128bit key:                                             (lrw:256bit)    (xts:256bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.27x   1.22x   1.30x   1.42x   1.30x   1.34x   1.19x   1.05x   1.23x   1.24x
64B     1.74x   1.79x   1.43x   1.87x   1.81x   1.87x   1.48x   1.38x   1.55x   1.62x
256B    1.90x   1.87x   1.43x   1.94x   1.94x   1.95x   1.63x   1.62x   1.67x   1.70x
1024B   1.96x   1.93x   1.43x   1.95x   1.98x   2.01x   1.67x   1.69x   1.74x   1.80x
8192B   1.96x   1.96x   1.39x   1.93x   2.01x   2.03x   1.72x   1.64x   1.71x   1.76x

256bit key:                                             (lrw:384bit)    (xts:512bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.23x   1.23x   1.33x   1.39x   1.34x   1.38x   1.04x   1.18x   1.21x   1.29x
64B     1.72x   1.69x   1.42x   1.78x   1.81x   1.89x   1.57x   1.52x   1.56x   1.65x
256B    1.85x   1.88x   1.42x   1.86x   1.93x   1.96x   1.69x   1.65x   1.70x   1.75x
1024B   1.88x   1.86x   1.45x   1.95x   1.96x   1.95x   1.77x   1.71x   1.77x   1.78x
8192B   1.91x   1.86x   1.42x   1.91x   2.03x   1.98x   1.73x   1.71x   1.78x   1.76x

camellia-asm vs aes-asm (8kB block):
         128bit  256bit
ecb-enc  1.15x   1.22x
ecb-dec  1.16x   1.16x
cbc-enc  0.85x   0.90x
cbc-dec  1.20x   1.23x
ctr-enc  1.28x   1.30x
ctr-dec  1.27x   1.28x
lrw-enc  1.12x   1.16x
lrw-dec  1.08x   1.10x
xts-enc  1.11x   1.15x
xts-dec  1.14x   1.15x

Intel Core2 T8100 (fam:6, model:23, step:6):

camellia-asm vs camellia_generic:
128bit key:                                             (lrw:256bit)    (xts:256bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.10x   1.12x   1.14x   1.16x   1.16x   1.15x   1.02x   1.02x   1.08x   1.08x
64B     1.61x   1.60x   1.17x   1.68x   1.67x   1.66x   1.43x   1.42x   1.44x   1.42x
256B    1.65x   1.73x   1.17x   1.77x   1.81x   1.80x   1.54x   1.53x   1.58x   1.54x
1024B   1.76x   1.74x   1.18x   1.80x   1.85x   1.85x   1.60x   1.59x   1.65x   1.60x
8192B   1.77x   1.75x   1.19x   1.81x   1.85x   1.86x   1.63x   1.61x   1.66x   1.62x

256bit key:                                             (lrw:384bit)    (xts:512bit)
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B     1.10x   1.07x   1.13x   1.16x   1.11x   1.16x   1.03x   1.02x   1.08x   1.07x
64B     1.61x   1.62x   1.15x   1.66x   1.63x   1.68x   1.47x   1.46x   1.47x   1.44x
256B    1.71x   1.70x   1.16x   1.75x   1.69x   1.79x   1.58x   1.57x   1.59x   1.55x
1024B   1.78x   1.72x   1.17x   1.75x   1.80x   1.80x   1.63x   1.62x   1.65x   1.62x
8192B   1.76x   1.73x   1.17x   1.78x   1.80x   1.81x   1.64x   1.62x   1.68x   1.64x

camellia-asm vs aes-asm (8kB block):
         128bit  256bit
ecb-enc  1.17x   1.21x
ecb-dec  1.17x   1.20x
cbc-enc  0.80x   0.82x
cbc-dec  1.22x   1.24x
ctr-enc  1.25x   1.26x
ctr-dec  1.25x   1.26x
lrw-enc  1.14x   1.18x
lrw-dec  1.13x   1.17x
xts-enc  1.14x   1.18x
xts-dec  1.14x   1.17x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-14 17:25:56 +08:00
..
aes_glue.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
aes-i586-asm_32.S crypto: aes - Export x86 AES encrypt/decrypt functions 2009-02-18 16:48:05 +08:00
aes-x86_64-asm_64.S crypto: aes - Export x86 AES encrypt/decrypt functions 2009-02-18 16:48:05 +08:00
aesni-intel_asm.S crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes 2011-03-27 10:29:39 +08:00
aesni-intel_glue.c x86: fix up files really needing to include module.h 2011-10-31 19:30:36 -04:00
blowfish_glue.c crypto: blowfish-x86_64 - set alignmask to zero 2012-02-25 17:20:24 +08:00
blowfish-x86_64-asm_64.S crypto: blowfish-x86_64 - improve x86_64 blowfish 4-way performance 2011-10-21 14:23:07 +02:00
camellia_glue.c crypto: camellia - add assembler implementation for x86_64 2012-03-14 17:25:56 +08:00
camellia-x86_64-asm_64.S crypto: camellia - add assembler implementation for x86_64 2012-03-14 17:25:56 +08:00
crc32c-intel.c crypto: crc32c-intel - Switch to shash 2008-12-25 11:01:37 +11:00
fpu.c crypto: aesni-intel - Merge with fpu.ko 2011-05-16 15:12:47 +10:00
ghash-clmulni-intel_asm.S crypto: ghash-clmulni-intel - Put proper .data section in place 2009-11-23 20:19:47 +08:00
ghash-clmulni-intel_glue.c crypto: ghash-intel - Fix set but not used in ghash_async_setkey() 2011-06-30 07:43:42 +08:00
Makefile crypto: camellia - add assembler implementation for x86_64 2012-03-14 17:25:56 +08:00
salsa20_glue.c [CRYPTO] salsa20: Add x86-64 assembly version 2008-01-11 08:16:57 +11:00
salsa20-i586-asm_32.S [CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version) 2008-01-11 08:16:57 +11:00
salsa20-x86_64-asm_64.S [CRYPTO] salsa20: Add x86-64 assembly version 2008-01-11 08:16:57 +11:00
serpent_sse2_glue.c crypto: serpent-sse2 - combine ablk_*_init functions 2012-02-25 17:20:23 +08:00
serpent-sse2-i586-asm_32.S crypto: serpent-sse2 - change transpose_4x4 to only use integer instructions 2012-01-13 16:38:40 +11:00
serpent-sse2-x86_64-asm_64.S crypto: serpent-sse2 - change transpose_4x4 to only use integer instructions 2012-01-13 16:38:40 +11:00
sha1_ssse3_asm.S crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 2011-08-10 19:00:29 +08:00
sha1_ssse3_glue.c crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 2011-08-10 19:00:29 +08:00
twofish_glue_3way.c crypto: twofish-x86_64-3way - use crypto_[un]register_algs 2012-02-25 17:20:22 +08:00
twofish_glue.c crypto: twofish-x86_64/i586 - set alignmask to zero 2012-02-25 17:20:24 +08:00
twofish-i586-asm_32.S crypto: twofish-x86-asm - make assembler functions use twofish_ctx instead of crypto_tfm 2011-10-21 14:23:08 +02:00
twofish-x86_64-asm_64-3way.S crypto: twofish - add 3-way parallel x86_64 assembler implemention 2011-10-21 14:23:08 +02:00
twofish-x86_64-asm_64.S crypto: twofish-x86-asm - make assembler functions use twofish_ctx instead of crypto_tfm 2011-10-21 14:23:08 +02:00