Patch e68410ebf6 ("crypto: x86/sha512_ssse3 - move SHA-384/512
SSSE3 implementation to base layer") changed the prototypes of the
core asm SHA-512 implementations so that they are compatible with
the prototype used by the base layer.
However, in one instance, the register that was used for passing the
input buffer was reused as a scratch register later on in the code,
and since the input buffer param changed places with the digest param
-which needs to be written back before the function returns- this
resulted in the scratch register to be dereferenced in a memory write
operation, causing a GPF.
Fix this by changing the scratch register to use the same register as
the input buffer param again.
Fixes: e68410ebf6 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
Reported-By: Bobby Powers <bobbypowers@gmail.com>
Tested-By: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.1:
New interfaces:
- user-space interface for AEAD
- user-space interface for RNG (i.e., pseudo RNG)
New hashes:
- ARMv8 SHA1/256
- ARMv8 AES
- ARMv8 GHASH
- ARM assembler and NEON SHA256
- MIPS OCTEON SHA1/256/512
- MIPS img-hash SHA1/256 and MD5
- Power 8 VMX AES/CBC/CTR/GHASH
- PPC assembler AES, SHA1/256 and MD5
- Broadcom IPROC RNG driver
Cleanups/fixes:
- prevent internal helper algos from being exposed to user-space
- merge common code from assembly/C SHA implementations
- misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (169 commits)
crypto: arm - workaround for building with old binutils
crypto: arm/sha256 - avoid sha256 code on ARMv7-M
crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer
crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer
crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer
crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer
crypto: arm/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha1_neon - move SHA-1 NEON implementation to base layer
crypto: arm/sha1 - move SHA-1 ARM asm implementation to base layer
crypto: sha512-generic - move to generic glue implementation
crypto: sha256-generic - move to generic glue implementation
crypto: sha1-generic - move to generic glue implementation
crypto: sha512 - implement base layer for SHA-512
crypto: sha256 - implement base layer for SHA-256
crypto: sha1 - implement base layer for SHA-1
crypto: api - remove instance when test failed
crypto: api - Move alg ref count init to crypto_check_alg
...
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer. It also changes the
prototypes of the core asm functions to be compatible with the base
prototype
void (sha512_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)
so that they can be passed to the base layer directly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer. It also changes the
prototypes of the core asm functions to be compatible with the base
prototype
void (sha256_block_fn)(struct sha256_state *sst, u8 const *src, int blocks)
so that they can be passed to the base layer directly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.
This makes the code a bit smaller, while leaving functionality
unchanged.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Flag all Multi buffer SHA1 helper ciphers as internal ciphers
to prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Twofish AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent SSE2 helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent AVX helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all Serpent AVX2 helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all CAST6 helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AVX Camellia helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all CAST5 helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AES-NI Camellia helper ciphers as internal ciphers to
prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all ash clmulni helper ciphers as internal ciphers to prevent them
from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all AES-NI helper ciphers as internal ciphers to prevent them from
being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
type T;
identifier f;
@@
static T f (...) { ... }
@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@
-EXPORT_SYMBOL_GPL(f);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.
The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.
In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.
Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.
Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.
[1] http://www.chronox.de/libkcapi.html
CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Changed the __driver-gcm-aes-aesni to be a proper aead algorithm.
This required a valid setkey and setauthsize functions to be added and also
some changes to make sure that math context is not corrupted when the alg is
used directly.
Note that the __driver-gcm-aes-aesni should not be used directly by modules
that can use it in interrupt context as we don't have a good fallback mechanism
in this case.
Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
this patch fixes following sparse warning:
sha1_mb_mgr_init_avx2.c:59:31: warning: constant 0xF76543210 is so big it is long
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 3.20:
- Added 192/256-bit key support to aesni GCM.
- Added MIPS OCTEON MD5 support.
- Fixed hwrng starvation and race conditions.
- Added note that memzero_explicit is not a subsitute for memset.
- Added user-space interface for crypto_rng.
- Misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: tcrypt - do not allocate iv on stack for aead speed tests
crypto: testmgr - limit IV copy length in aead tests
crypto: tcrypt - fix buflen reminder calculation
crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed
crypto: caam - fix resource clean-up on error path for caam_jr_init
crypto: caam - pair irq map and dispose in the same function
crypto: ccp - terminate ccp_support array with empty element
crypto: caam - remove unused local variable
crypto: caam - remove dead code
crypto: caam - don't emit ICV check failures to dmesg
hwrng: virtio - drop extra empty line
crypto: replace scatterwalk_sg_next with sg_next
crypto: atmel - Free memory in error path
crypto: doc - remove colons in comments
crypto: seqiv - Ensure that IV size is at least 8 bytes
crypto: cts - Weed out non-CBC algorithms
MAINTAINERS: add linux-crypto to hw random
crypto: cts - Remove bogus use of seqiv
crypto: qat - don't need qat_auth_state struct
crypto: algif_rng - fix sparse non static symbol warning
...
These patches fix the RFC4106 implementation in the aesni-intel
module so it supports 192 & 256 bit keys.
Since the AVX support that was added to this module also only
supports 128 bit keys, and this patch only affects the SSE
implementation, changes were also made to use the SSE version
if key sizes other than 128 are specified.
RFC4106 specifies that 192 & 256 bit keys must be supported (section
8.4).
Also, this should fix Strongswan issue 341 where the aesni module
needs to be unloaded if 256 bit keys are used:
http://wiki.strongswan.org/issues/341
This patch has been tested with Sandy Bridge and Haswell processors.
With 128 bit keys and input buffers > 512 bytes a slight performance
degradation was noticed (~1%). For input buffers of less than 512
bytes there was no performance impact. Compared to 128 bit keys,
256 bit key size performance is approx. .5 cycles per byte slower
on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs.
SSE code).
This patch has also been tested with StrongSwan IPSec connections
where it worked correctly.
I created this diff from a git clone of crypto-2.6.git.
Any questions, please feel free to contact me.
Signed-off-by: Timothy McCaffrey <timothy.mccaffrey@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This module implements variations of "des3_ede" only. Drop the bogus
module aliases for "des".
Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.
Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.
Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.
Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes this allyesconfig target build error with older
binutils.
LD arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Vinson Lee <vlee@twitter.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.
The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).
Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.
Kudos to James for reporting the issue and providing a test case
showing the discrepancies.
Reported-by: James Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Memset on a local variable may be removed when it is called just before the
variable goes out of scope. Using memzero_explicit defeats this
optimization. A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier x;
type T;
@@
{
... when any
T x[...];
... when any
when exists
- memset
+ memzero_explicit
(x,
-0,
...)
... when != x
when strict
}
// </smpl>
This change was suggested by Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This adds the module loading prefix "crypto-" to the template lookup
as well.
For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":
net-pf-38
algif-hash
crypto-vfat(blowfish)
crypto-vfat(blowfish)-all
crypto-vfat
Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This can't be NULL and we dereferenced it earlier. Smatch used to
ignore these things where the pointer was obviously non-NULL but I've
found that sometimes the intention was to check something else so we
were maybe missing bugs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:
https://lkml.org/lkml/2013/3/4/70
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CPP identifier 'HAS_PCBC' is defined when the Kconfig
option CRYPTO_PCBC is set as 'y' or 'm', and is further
used in two ifdef blocks to conditionally compile source
code. This indirection hides the actual Kconfig dependency
and complicates readability. Moreover, it's inconsistent
with the rest of the ifdef blocks in the file, which
directly reference Kconfig options.
This patch removes 'HAS_PCBC' and replaces its occurrences
with the actual dependency on 'CRYPTO_PCBC' being set as
'y' or 'm'.
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit 7da4b29d49.
Now, that the issue is fixed, we can re-enable the code.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d49
("crypto: aesni - disable "by8" AVX CTR optimization").
Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.
As this change adds branches to the hot path, minor performance
regressions might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.
A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.
Tested-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The "by8" implementation introduced in commit 22cddcc7df ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.
As we're quite late in the release cycle, just disable the "by8" variant
for now.
Reported-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA1 jobs to the
multi-buffer algorithm. It also contains the flush routine to that's
called by the crypto daemon to complete the job when no new jobs arrive
before the deadline of maximum latency of a SHA1 crypto job.
The SHA1 multi-buffer crypto algorithm is defined and initialized in
this patch.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces the assembly routines to do SHA1 computation on
buffers belonging to serveral jobs at once. The assembly routines are
optimized with AVX2 instructions that have 8 data lanes and using AVX2
registers.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces the routines used to submit and flush buffers
belonging to SHA1 crypto jobs to the SHA1 multibuffer algorithm. It is
implemented mostly in assembly optimized with AVX2 instructions.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces the data structures and prototypes of functions
needed for computing SHA1 hash using multi-buffer. Included are the
structures of the multi-buffer SHA1 job, job scheduler in C and x86
assembly.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
- CTR(AES) optimisation on x86_64 using "by8" AVX.
- arm64 support to ccp
- Intel QAT crypto driver
- Qualcomm crypto engine driver
- x86-64 assembly optimisation for 3DES
- CTR(3DES) speed test
- move FIPS panic from module.c so that it only triggers on crypto
modules
- SP800-90A Deterministic Random Bit Generator (drbg).
- more test vectors for ghash.
- tweak self tests to catch partial block bugs.
- misc fixes.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits)
crypto: drbg - fix failure of generating multiple of 2**16 bytes
crypto: ccp - Do not sign extend input data to CCP
crypto: testmgr - add missing spaces to drbg error strings
crypto: atmel-tdes - Switch to managed version of kzalloc
crypto: atmel-sha - Switch to managed version of kzalloc
crypto: testmgr - use chunks smaller than algo block size in chunk tests
crypto: qat - Fixed SKU1 dev issue
crypto: qat - Use hweight for bit counting
crypto: qat - Updated print outputs
crypto: qat - change ae_num to ae_id
crypto: qat - change slice->regions to slice->region
crypto: qat - use min_t macro
crypto: qat - remove unnecessary parentheses
crypto: qat - remove unneeded header
crypto: qat - checkpatch blank lines
crypto: qat - remove unnecessary return codes
crypto: Resolve shadow warnings
crypto: ccp - Remove "select OF" from Kconfig
crypto: caam - fix DECO RSR polling
crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM
...
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:
CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long
Cc: <stable@vger.kernel.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces "by8" AES CTR mode AVX optimization inspired by
Intel Optimized IPSEC Cryptograhpic library. For additional information,
please see:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972
The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
aes_ctr_enc_256_avx_by8() are adapted from
Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
are enabled in a platform, the glue code in AESNI module overrieds the
existing "by4" CTR mode en/decryption with the "by8"
AES CTR mode en/decryption.
On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, the "by8" CTR mode optimization
shows better performance results across data & key sizes
as measured by tcrypt.
The average performance improvement of the "by8" version over the "by4"
version is as follows:
For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.
A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
optimization shows the following results:
tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------
testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes)
testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes)
tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------
testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1277 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8745 cycles (8192 bytes)
testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 348 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 335 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 451 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1030 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6611 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 354 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 346 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 488 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1154 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8390 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 357 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 362 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 515 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1284 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8681 cycles (8192 bytes)
crypto: Incorporate feed back to AES CTR mode optimization patch
Specifically, the following:
a) alignment around main loop in aes_ctrby8_avx_x86_64.S
b) .rodata around data constants used in the assembely code.
c) the use of CONFIG_AVX in the glue code.
d) fix up white space.
e) informational message for "by8" AES CTR mode optimization
f) "by8" AES CTR mode optimization can be simply enabled
if the platform supports both AES and AVX features. The
optimization works superbly on Sandybridge as well.
Testing on Haswell shows no performance change since the last.
Testing on Sandybridge shows that the "by8" AES CTR mode optimization
greatly improves performance.
tcrypt log with "by4" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------
testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 408 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 707 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1864 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12813 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 395 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 432 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 780 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2132 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15765 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 416 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 438 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 842 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2383 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16945 cycles (8192 bytes)
testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 389 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 409 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 704 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1865 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12783 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 409 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 434 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 792 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2151 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15804 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 421 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 444 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 840 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2394 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16928 cycles (8192 bytes)
tcrypt log with "by8" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------
testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 401 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 522 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1136 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7046 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 394 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 418 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 559 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1263 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9072 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 408 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 428 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1385 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9224 cycles (8192 bytes)
testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 390 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 402 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 530 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1135 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7079 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 414 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 417 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 572 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1312 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9073 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 415 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 454 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 598 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1407 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9288 cycles (8192 bytes)
crypto: Fix redundant checks
a) Fix the redundant check for cpu_has_aes
b) Fix the key length check when invoking the CTR mode "by8"
encryptor/decryptor.
crypto: fix typo in AES ctr mode transform
Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There's no need for the K_table to be made of 64-bit words. For some
reason, the original authors didn't fully reduce the values modulo the
CRC32C polynomial, and so had some 33-bit values in there. They can
all be reduced to 32 bits.
Doing that cuts the table size in half. Since the code depends on both
pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq
to fetch it in the correct format.
This adds (measured on Ivy Bridge) 1 cycle per main loop iteration
(CRC of up to 3K bytes), less than 0.2%. The hope is that the reduced
D-cache footprint will make up the loss in other code.
Two other related fixes:
* K_table is read-only, so belongs in .rodata, and
* There's no need for more than 8-byte alignment
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The internal key isn't actually in big-endian format so let's switch
to u128 which also happens to allow us to remove a sparse warning.
Based on suggestion by Ard Biesheuvel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>