linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 02:25:02 +07:00

Author	SHA1	Message	Date
Stephan Mueller	4dda66f62e	crypto: twofish_avx - mark Twofish AVX helper ciphers Flag all Twofish AVX helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:11 +08:00
Stephan Mueller	748be1f1bf	crypto: serpent_sse2 - mark Serpent SSE2 helper ciphers Flag all Serpent SSE2 helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:10 +08:00
Stephan Mueller	65aed53941	crypto: serpent_avx - mark Serpent AVX helper ciphers Flag all Serpent AVX helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:10 +08:00
Stephan Mueller	f82419acd8	crypto: serpent_avx2 - mark Serpent AVX2 helper ciphers Flag all Serpent AVX2 helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:09 +08:00
Stephan Mueller	e69b8a46ca	crypto: cast6_avx - mark CAST6 helper ciphers Flag all CAST6 helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:09 +08:00
Stephan Mueller	7d2c31dd70	crypto: camellia_aesni_avx - mark AVX Camellia helper ciphers Flag all AVX Camellia helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:08 +08:00
Stephan Mueller	680574e8b3	crypto: cast5_avx - mark CAST5 helper ciphers Flag all CAST5 helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:08 +08:00
Stephan Mueller	a62356a978	crypto: camellia_aesni_avx2 - mark AES-NI Camellia helper ciphers Flag all AES-NI Camellia helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:07 +08:00
Stephan Mueller	6a9b52b7fa	crypto: clmulni - mark ghash clmulni helper ciphers Flag all ash clmulni helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:06 +08:00
Stephan Mueller	eabdc320ec	crypto: aesni - mark AES-NI helper ciphers Flag all AES-NI helper ciphers as internal ciphers to prevent them from being called by normal users. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-31 21:21:05 +08:00
Ameen Ali	c42e9902f3	crypto: sha1-mb - Syntax error fixing a syntax-error . Signed-off-by: Ameen Ali <AmeenAli023@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-16 21:47:58 +11:00
Julia Lawall	05713ba905	crypto: don't export static symbol The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ type T; identifier f; @@ static T f (...) { ... } @@ identifier r.f; declarer name EXPORT_SYMBOL_GPL; @@ -EXPORT_SYMBOL_GPL(f); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-13 21:37:15 +11:00
Stephan Mueller	ccfe8c3f7e	crypto: aesni - fix memory usage in GCM decryption The kernel crypto API logic requires the caller to provide the length of (ciphertext \|\| authentication tag) as cryptlen for the AEAD decryption operation. Thus, the cipher implementation must calculate the size of the plaintext output itself and cannot simply use cryptlen. The RFC4106 GCM decryption operation tries to overwrite cryptlen memory in req->dst. As the destination buffer for decryption only needs to hold the plaintext memory but cryptlen references the input buffer holding (ciphertext \|\| authentication tag), the assumption of the destination buffer length in RFC4106 GCM operation leads to a too large size. This patch simply uses the already calculated plaintext size. In addition, this patch fixes the offset calculation of the AAD buffer pointer: as mentioned before, cryptlen already includes the size of the tag. Thus, the tag does not need to be added. With the addition, the AAD will be written beyond the already allocated buffer. Note, this fixes a kernel crash that can be triggered from user space via AF_ALG(aead) -- simply use the libkcapi test application from [1] and update it to use rfc4106-gcm-aes. Using [1], the changes were tested using CAVS vectors to demonstrate that the crypto operation still delivers the right results. [1] http://www.chronox.de/libkcapi.html CC: Tadeusz Struk <tadeusz.struk@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-13 21:32:21 +11:00
Tadeusz Struk	81e397d937	crypto: aesni - make driver-gcm-aes-aesni helper a proper aead alg Changed the __driver-gcm-aes-aesni to be a proper aead algorithm. This required a valid setkey and setauthsize functions to be added and also some changes to make sure that math context is not corrupted when the alg is used directly. Note that the __driver-gcm-aes-aesni should not be used directly by modules that can use it in interrupt context as we don't have a good fallback mechanism in this case. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-02-28 23:31:35 +13:00
Lad, Prabhakar	66c046b407	crypto: sha-mb - Fix big integer constant sparse warning this patch fixes following sparse warning: sha1_mb_mgr_init_avx2.c:59:31: warning: constant 0xF76543210 is so big it is long Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-02-27 22:48:49 +13:00
Linus Torvalds	fee5429e02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 3.20: - Added 192/256-bit key support to aesni GCM. - Added MIPS OCTEON MD5 support. - Fixed hwrng starvation and race conditions. - Added note that memzero_explicit is not a subsitute for memset. - Added user-space interface for crypto_rng. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits) crypto: tcrypt - do not allocate iv on stack for aead speed tests crypto: testmgr - limit IV copy length in aead tests crypto: tcrypt - fix buflen reminder calculation crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed crypto: caam - fix resource clean-up on error path for caam_jr_init crypto: caam - pair irq map and dispose in the same function crypto: ccp - terminate ccp_support array with empty element crypto: caam - remove unused local variable crypto: caam - remove dead code crypto: caam - don't emit ICV check failures to dmesg hwrng: virtio - drop extra empty line crypto: replace scatterwalk_sg_next with sg_next crypto: atmel - Free memory in error path crypto: doc - remove colons in comments crypto: seqiv - Ensure that IV size is at least 8 bytes crypto: cts - Weed out non-CBC algorithms MAINTAINERS: add linux-crypto to hw random crypto: cts - Remove bogus use of seqiv crypto: qat - don't need qat_auth_state struct crypto: algif_rng - fix sparse non static symbol warning ...	2015-02-14 09:47:01 -08:00
Timothy McCaffrey	e31ac32d3b	crypto: aesni - Add support for 192 & 256 bit keys to AESNI RFC4106 These patches fix the RFC4106 implementation in the aesni-intel module so it supports 192 & 256 bit keys. Since the AVX support that was added to this module also only supports 128 bit keys, and this patch only affects the SSE implementation, changes were also made to use the SSE version if key sizes other than 128 are specified. RFC4106 specifies that 192 & 256 bit keys must be supported (section 8.4). Also, this should fix Strongswan issue 341 where the aesni module needs to be unloaded if 256 bit keys are used: http://wiki.strongswan.org/issues/341 This patch has been tested with Sandy Bridge and Haswell processors. With 128 bit keys and input buffers > 512 bytes a slight performance degradation was noticed (~1%). For input buffers of less than 512 bytes there was no performance impact. Compared to 128 bit keys, 256 bit key size performance is approx. .5 cycles per byte slower on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs. SSE code). This patch has also been tested with StrongSwan IPSec connections where it worked correctly. I created this diff from a git clone of crypto-2.6.git. Any questions, please feel free to contact me. Signed-off-by: Timothy McCaffrey <timothy.mccaffrey@unisys.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-14 21:56:51 +11:00
Mathias Krause	d8219f52a7	crypto: x86/des3_ede - drop bogus module aliases This module implements variations of "des3_ede" only. Drop the bogus module aliases for "des". Cc: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-13 22:30:52 +11:00
Mathias Krause	3e14dcf7cb	crypto: add missing crypto module aliases Commit `5d26a105b5` ("crypto: prefix module autoloading with "crypto-"") changed the automatic module loading when requesting crypto algorithms to prefix all module requests with "crypto-". This requires all crypto modules to have a crypto specific module alias even if their file name would otherwise match the requested crypto algorithm. Even though commit `5d26a105b5` added those aliases for a vast amount of modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO annotations to those files to make them get loaded automatically, again. This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work with kernels v3.18 and below. Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former won't work for crypto modules any more. Fixes: `5d26a105b5` ("crypto: prefix module autoloading with "crypto-"") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-13 22:29:11 +11:00
Vinson Lee	0b8c960cf6	crypto: sha-mb - Add avx2_supported check. This patch fixes this allyesconfig target build error with older binutils. LD arch/x86/crypto/built-in.o ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Vinson Lee <vlee@twitter.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-05 21:35:02 +11:00
Mathias Krause	0b1e95b2fa	crypto: aesni - fix "by8" variant for 128 bit keys The "by8" counter mode optimization is broken for 128 bit keys with input data longer than 128 bytes. It uses the wrong key material for en- and decryption. The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved in case we're handling more than 128 bytes of input data -- they won't get reloaded after the initial load. They must therefore be (a) loaded on the first iteration and (b) be preserved for the latter ones. The implementation for 128 bit keys does not comply with (a) nor (b). Fix this by bringing the implementation back to its original source and correctly load the key registers and preserve their values by not re-using the registers for other purposes. Kudos to James for reporting the issue and providing a test case showing the discrepancies. Reported-by: James Yonan <james@openvpn.net> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.18 Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-01-05 21:35:02 +11:00
Julia Lawall	a6326ba025	crypto: sha - replace memset by memzero_explicit Memset on a local variable may be removed when it is called just before the variable goes out of scope. Using memzero_explicit defeats this optimization. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; type T; @@ { ... when any T x[...]; ... when any when exists - memset + memzero_explicit (x, -0, ...) ... when != x when strict } // </smpl> This change was suggested by Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-12-02 22:55:49 +08:00
Kees Cook	4943ba16bb	crypto: include crypto- module prefix in template This adds the module loading prefix "crypto-" to the template lookup as well. For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly includes the "crypto-" prefix at every level, correctly rejecting "vfat": net-pf-38 algif-hash crypto-vfat(blowfish) crypto-vfat(blowfish)-all crypto-vfat Reported-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-26 20:06:30 +08:00
Dan Carpenter	5d1b3c98ec	crypto: sha-mb - remove a bogus NULL check This can't be NULL and we dereferenced it earlier. Smatch used to ignore these things where the pointer was obviously non-NULL but I've found that sometimes the intention was to check something else so we were maybe missing bugs. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-25 22:50:43 +08:00
Kees Cook	5d26a105b5	crypto: prefix module autoloading with "crypto-" This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-24 22:43:57 +08:00
Valentin Rothberg	304576a776	crypto: aesni - remove unnecessary #define The CPP identifier 'HAS_PCBC' is defined when the Kconfig option CRYPTO_PCBC is set as 'y' or 'm', and is further used in two ifdef blocks to conditionally compile source code. This indirection hides the actual Kconfig dependency and complicates readability. Moreover, it's inconsistent with the rest of the ifdef blocks in the file, which directly reference Kconfig options. This patch removes 'HAS_PCBC' and replaces its occurrences with the actual dependency on 'CRYPTO_PCBC' being set as 'y' or 'm'. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-06 23:14:59 +08:00
Mathias Krause	5cfed7b335	Revert "crypto: aesni - disable "by8" AVX CTR optimization" This reverts commit `7da4b29d49`. Now, that the issue is fixed, we can re-enable the code. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-10-02 14:40:28 +08:00
Herbert Xu	9561dccb45	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Merging the crypto tree for 3.17 to pull in the "by8" AVX CTR revert.	2014-10-02 14:37:20 +08:00
Mathias Krause	e3b3bb5ac1	crypto: aesni - remove unused defines in "by8" variant The defines for xkey3, xkey6 and xkey9 are not used in the code. They're probably left overs from merging the three source files for 128, 192 and 256 bit AES. They can safely be removed. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-10-02 14:35:03 +08:00
Mathias Krause	80dca4734b	crypto: aesni - fix counter overflow handling in "by8" variant The "by8" CTR AVX implementation fails to propperly handle counter overflows. That was the reason it got disabled in commit `7da4b29d49` ("crypto: aesni - disable "by8" AVX CTR optimization"). Fix the overflow handling by incrementing the counter block as a double quad word, i.e. a 128 bit, and testing for overflows afterwards. We need to use VPTEST to do so as VPADD* does not set the flags itself and silently drops the carry bit. As this change adds branches to the hot path, minor performance regressions might be a side effect. But, OTOH, we now have a conforming implementation -- the preferable goal. A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical numbers for the old and this version with differences within the noise range. A dm-crypt test with the fixed version gave even slightly better results for this version. So the performance impact might not be as big as expected. Tested-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-10-02 14:35:03 +08:00
Mathias Krause	7da4b29d49	crypto: aesni - disable "by8" AVX CTR optimization The "by8" implementation introduced in commit `22cddcc7df` ("crypto: aes - AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it handles counter block overflows differently. It only accounts the right most 32 bit as a counter -- not the whole block as all other implementations do. This makes it fail the cryptomgr test #4 that specifically tests this corner case. As we're quite late in the release cycle, just disable the "by8" variant for now. Reported-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-09-24 21:15:31 +08:00
Fengguang Wu	4c1948fc47	crypto: sha-mb - sha1_mb_alg_state can be static CC: Tim Chen <tim.c.chen@linux.intel.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-08-26 14:40:52 +08:00
Tim Chen	ad61e042e9	crypto: sha-mb - SHA1 multibuffer job manager and glue code This patch introduces the multi-buffer job manager which is responsible for submitting scatter-gather buffers from several SHA1 jobs to the multi-buffer algorithm. It also contains the flush routine to that's called by the crypto daemon to complete the job when no new jobs arrive before the deadline of maximum latency of a SHA1 crypto job. The SHA1 multi-buffer crypto algorithm is defined and initialized in this patch. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-08-25 20:32:30 +08:00
Tim Chen	12d2513d5f	crypto: sha-mb - SHA1 multibuffer crypto computation (x8 AVX2) This patch introduces the assembly routines to do SHA1 computation on buffers belonging to serveral jobs at once. The assembly routines are optimized with AVX2 instructions that have 8 data lanes and using AVX2 registers. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-08-25 20:32:29 +08:00
Tim Chen	2249cbb53e	crypto: sha-mb - SHA1 multibuffer submit and flush routines for AVX2 This patch introduces the routines used to submit and flush buffers belonging to SHA1 crypto jobs to the SHA1 multibuffer algorithm. It is implemented mostly in assembly optimized with AVX2 instructions. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-08-25 20:32:28 +08:00
Tim Chen	1161777823	crypto: sha-mb - SHA1 multibuffer algorithm data structures This patch introduces the data structures and prototypes of functions needed for computing SHA1 hash using multi-buffer. Included are the structures of the multi-buffer SHA1 job, job scheduler in C and x86 assembly. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-08-25 20:32:26 +08:00
Linus Torvalds	3e7a716a92	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - CTR(AES) optimisation on x86_64 using "by8" AVX. - arm64 support to ccp - Intel QAT crypto driver - Qualcomm crypto engine driver - x86-64 assembly optimisation for 3DES - CTR(3DES) speed test - move FIPS panic from module.c so that it only triggers on crypto modules - SP800-90A Deterministic Random Bit Generator (drbg). - more test vectors for ghash. - tweak self tests to catch partial block bugs. - misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits) crypto: drbg - fix failure of generating multiple of 2**16 bytes crypto: ccp - Do not sign extend input data to CCP crypto: testmgr - add missing spaces to drbg error strings crypto: atmel-tdes - Switch to managed version of kzalloc crypto: atmel-sha - Switch to managed version of kzalloc crypto: testmgr - use chunks smaller than algo block size in chunk tests crypto: qat - Fixed SKU1 dev issue crypto: qat - Use hweight for bit counting crypto: qat - Updated print outputs crypto: qat - change ae_num to ae_id crypto: qat - change slice->regions to slice->region crypto: qat - use min_t macro crypto: qat - remove unnecessary parentheses crypto: qat - remove unneeded header crypto: qat - checkpatch blank lines crypto: qat - remove unnecessary return codes crypto: Resolve shadow warnings crypto: ccp - Remove "select OF" from Kconfig crypto: caam - fix DECO RSR polling crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM ...	2014-08-04 09:52:51 -07:00
Jussi Kivilinna	cfe82d4f45	crypto: sha512_ssse3 - fix byte count to bit count conversion Byte-to-bit-count computation is only partly converted to big-endian and is mixing in CPU-endian values. Problem was noticed by sparce with warning: CHECK arch/x86/crypto/sha512_ssse3_glue.c arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types) arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident> arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long Cc: <stable@vger.kernel.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-06-25 21:55:02 +08:00
Jussi Kivilinna	5e50d43d65	crypto: des3_ede-x86_64 - fix parse warning Patch fixes following sparse warning: CHECK arch/x86/crypto/des3_ede_glue.c arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to integer arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to integer Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-06-25 21:38:43 +08:00
chandramouli narayanan	22cddcc7df	crypto: aes - AES CTR x86_64 "by8" AVX optimization This patch introduces "by8" AES CTR mode AVX optimization inspired by Intel Optimized IPSEC Cryptograhpic library. For additional information, please see: http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972 The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and aes_ctr_enc_256_avx_by8() are adapted from Intel Optimized IPSEC Cryptographic library. When both AES and AVX features are enabled in a platform, the glue code in AESNI module overrieds the existing "by4" CTR mode en/decryption with the "by8" AES CTR mode en/decryption. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, the "by8" CTR mode optimization shows better performance results across data & key sizes as measured by tcrypt. The average performance improvement of the "by8" version over the "by4" version is as follows: For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement. For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement. For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement. A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8" optimization shows the following results: tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop: --------------------------------------------------------------------------- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes) tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop: --------------------------------------------------------------------------- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1277 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 8745 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 348 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 335 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 451 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1030 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 6611 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 354 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 346 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 488 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1154 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 8390 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 357 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 362 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 515 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1284 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 8681 cycles (8192 bytes) crypto: Incorporate feed back to AES CTR mode optimization patch Specifically, the following: a) alignment around main loop in aes_ctrby8_avx_x86_64.S b) .rodata around data constants used in the assembely code. c) the use of CONFIG_AVX in the glue code. d) fix up white space. e) informational message for "by8" AES CTR mode optimization f) "by8" AES CTR mode optimization can be simply enabled if the platform supports both AES and AVX features. The optimization works superbly on Sandybridge as well. Testing on Haswell shows no performance change since the last. Testing on Sandybridge shows that the "by8" AES CTR mode optimization greatly improves performance. tcrypt log with "by4" AES CTR mode optimization on Sandybridge -------------------------------------------------------------- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 408 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 707 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1864 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 12813 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 395 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 432 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 780 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 2132 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 15765 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 416 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 438 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 842 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 2383 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 16945 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 389 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 409 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 704 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1865 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 12783 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 409 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 434 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 792 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 2151 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 15804 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 421 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 444 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 840 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 2394 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 16928 cycles (8192 bytes) tcrypt log with "by8" AES CTR mode optimization on Sandybridge -------------------------------------------------------------- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 401 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 522 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1136 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7046 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 394 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 418 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 559 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1263 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9072 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 408 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 428 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1385 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 9224 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 390 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 402 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 530 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1135 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7079 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 414 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 417 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 572 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1312 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9073 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 415 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 454 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 598 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1407 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 9288 cycles (8192 bytes) crypto: Fix redundant checks a) Fix the redundant check for cpu_has_aes b) Fix the key length check when invoking the CTR mode "by8" encryptor/decryptor. crypto: fix typo in AES ctr mode transform Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com> Reviewed-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-06-20 21:27:58 +08:00
Jussi Kivilinna	6574e6c64e	crypto: des_3des - add x86-64 assembly implementation Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm. Two assembly implementations are provided. First is regular 'one-block at time' encrypt/decrypt function. Second is 'three-blocks at time' function that gains performance increase on out-of-order CPUs. tcrypt test results: Intel Core i5-4570: des3_ede-asm vs des3_ede-generic: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 1.21x 1.22x 1.27x 1.36x 1.25x 1.25x 64B 1.98x 1.96x 1.23x 2.04x 2.01x 2.00x 256B 2.34x 2.37x 1.21x 2.40x 2.38x 2.39x 1024B 2.50x 2.47x 1.22x 2.51x 2.52x 2.51x 8192B 2.51x 2.53x 1.21x 2.56x 2.54x 2.55x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-06-20 21:27:58 +08:00
George Spelvin	473946e674	crypto: crc32c-pclmul - Shrink K_table to 32-bit words There's no need for the K_table to be made of 64-bit words. For some reason, the original authors didn't fully reduce the values modulo the CRC32C polynomial, and so had some 33-bit values in there. They can all be reduced to 32 bits. Doing that cuts the table size in half. Since the code depends on both pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq to fetch it in the correct format. This adds (measured on Ivy Bridge) 1 cycle per main loop iteration (CRC of up to 3K bytes), less than 0.2%. The hope is that the reduced D-cache footprint will make up the loss in other code. Two other related fixes: * K_table is read-only, so belongs in .rodata, and * There's no need for more than 8-byte alignment Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-06-20 21:27:57 +08:00
Herbert Xu	0ea481466d	crypto: ghash-clmulni-intel - Use u128 instead of be128 for internal key The internal key isn't actually in big-endian format so let's switch to u128 which also happens to allow us to remove a sparse warning. Based on suggestion by Ard Biesheuvel. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2014-04-04 21:06:14 +08:00
Ard Biesheuvel	8ceee72808	crypto: ghash-clmulni-intel - use C implementation for setkey() The GHASH setkey() function uses SSE registers but fails to call kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and then having to deal with the restriction that they cannot be called from interrupt context, move the setkey() implementation to the C domain. Note that setkey() does not use any particular SSE features and is not expected to become a performance bottleneck. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Fixes: `0e1227d356` (crypto: ghash - Add PCLMULQDQ accelerated implementation) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-04-01 17:22:47 +08:00
Mathias Krause	37b2894717	crypto: x86/sha1 - reduce size of the AVX2 asm implementation There is really no need to page align sha1_transform_avx2. The default alignment is just fine. This is not the hot code but only the entry point, after all. Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-03-25 20:25:43 +08:00
Mathias Krause	6c8c17cc7a	crypto: x86/sha1 - fix stack alignment of AVX2 variant The AVX2 implementation might waste up to a page of stack memory because of a wrong alignment calculation. This will, in the worst case, increase the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big for a kernel function. Even worse, it might also allocate less bytes than needed if the stack pointer is already aligned bacause in that case the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards, not downwards. Fix those issues by changing and simplifying the alignment calculation to use a 32 byte alignment, the alignment really needed. Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-03-25 20:25:43 +08:00
Mathias Krause	6ca5afb8c2	crypto: x86/sha1 - re-enable the AVX variant Commit `7c1da8d0d0` "crypto: sha - SHA1 transform x86_64 AVX2" accidentally disabled the AVX variant by making the avx_usable() test not only fail in case the CPU doesn't support AVX or OSXSAVE but also if it doesn't support AVX2. Fix that regression by splitting up the AVX/AVX2 test into two functions. Also test for the BMI1 extension in the avx2_usable() test as the AVX2 implementation not only makes use of BMI2 but also BMI1 instructions. Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-03-25 20:25:42 +08:00
chandramouli narayanan	7c1da8d0d0	crypto: sha - SHA1 transform x86_64 AVX2 This git patch adds x86_64 AVX2 optimization of SHA1 transform to crypto support. The patch has been tested with 3.14.0-rc1 kernel. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, tcrypt shows AVX2 performance improvement from 3% for 256 bytes update to 16% for 1024 bytes update over AVX implementation. This patch adds sha1_avx2_transform(), the glue, build and configuration changes needed for AVX2 optimization of SHA1 transform to crypto support. sha1-ssse3 is one module which adds the necessary optimization support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function. With better optimization support, transform function is overridden as the case may be. In the case of AVX2, due to performance reasons across datablock sizes, the AVX or AVX2 transform function is used at run-time as it suits best. The Makefile change therefore appends the necessary objects to the linkage. Due to this, the patch merely appends AVX2 transform to the existing build mix and Kconfig support and leaves the configuration build support as is. Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com> Reviewed-by: Marek Vasut <marex@denx.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-03-21 21:54:30 +08:00
Dan Carpenter	b3bd5869fd	crypto: remove a duplicate checks in __cbc_decrypt() We checked "nbytes < bsize" before so it can't happen here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by: Johannes Götzfried <johannes.goetzfried@cs.fau.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-02-27 05:56:54 +08:00
Tim Chen	79ba451d66	crypto: aesni - fix build on x86 (32bit) We rename aesni-intel_avx.S to aesni-intel_avx-x86_64.S to indicate that it is only used by x86_64 architecture. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-01-15 11:36:34 +08:00
Andy Shevchenko	8610d7bf60	crypto: aesni - fix build on x86 (32bit) It seems commit `d764593a` "crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode" breaks a build on x86_32 since it's designed only for x86_64. This patch makes a compilation unit conditional to CONFIG_64BIT and functions usage to CONFIG_X86_64. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-12-31 19:47:46 +08:00
Tim Chen	d764593af9	crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode We have added AVX and AVX2 routines that optimize AESNI-GCM encode/decode. These routines are optimized for encrypt and decrypt of large buffers. In tests we have seen up to 6% speedup for 1K, 11% speedup for 2K and 18% speedup for 8K buffer over the existing SSE version. These routines should provide even better speedup for future Intel x86_64 cpus. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-12-20 20:06:24 +08:00
Daniel Borkmann	fed286110f	crypto: arch - use crypto_memneq instead of memcmp Replace remaining occurences (just as we did in crypto/) under arch/*/crypto/ that make use of memcmp() for comparing keys or authentication tags for usage with crypto_memneq(). It can simply be used as a drop-in replacement for the normal memcmp(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: James Yonan <james@openvpn.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-12-20 20:06:24 +08:00
Oliver Neukum	16c0c4e165	crypto: sha256_ssse3 - also test for BMI2 The AVX2 implementation also uses BMI2 instructions, but doesn't test for their availability. The assumption that AVX2 and BMI2 always go together is false. Some Haswells have AVX2 but not BMI2. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-10-07 14:17:10 +08:00
Ard Biesheuvel	801201aa25	crypto: move x86 to the generic version of ablk_helper Move all users of ablk_helper under x86/ to the generic version and delete the x86 specific version. Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-09-24 06:02:24 +10:00
Jussi Kivilinna	58497204aa	crypto: x86 - restore avx2_supported check Commit `3d387ef08c` (Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher") reverted too much as it removed the 'assembler supports AVX2' check and therefore disabled remaining AVX2 implementations of Camellia and Serpent. Patch restores the check and enables these implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-09-13 21:43:52 +10:00
Jussi Kivilinna	7d444909a2	crypto: sha256_ssse3 - use correct module alias for sha224 Commit `a710f761f` (crypto: sha256_ssse3 - add sha224 support) attempted to add MODULE_ALIAS for SHA-224, but it ended up being "sha384", probably because mix-up with previous commit `340991e30` (crypto: sha512_ssse3 - add sha384 support). Patch corrects module alias to "sha224". Reported-by: Pierre-Mayeul Badaire <pierre-mayeul.badaire@m4x.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-09-13 21:43:52 +10:00
Herbert Xu	68411521cc	Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" This patch reinstates commits `67822649d7` `39761214ee` `0b95a7f857` `31d939625a` `2d31e518a4` Now that module softdeps are in the kernel we can use that to resolve the boot issue which cause the revert. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-09-07 12:56:26 +10:00
Herbert Xu	eeca9fad52	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Merge upstream tree in order to reinstate crct10dif.	2013-09-07 12:53:35 +10:00
Julia Lawall	2a128b4b74	crypto: camellia-x86-64 - replace commas by semicolons and adjust code alignment Adjust alignment and replace commas by semicolons in automatically generated code. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-08-21 21:08:32 +10:00
Andi Kleen	f22d08111a	crypto: make tables used from assembler __visible Tables used from assembler should be marked __visible to let the compiler know. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-08-14 20:42:03 +10:00
Linus Torvalds	b48a97be8e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This push fixes a memory corruption issue in caam, as well as reverting the new optimised crct10dif implementation as it breaks boot on initrd systems. Hopefully crct10dif will be reinstated once the supporting code is added so that it doesn't break boot" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" crypto: caam - Fixed the memory out of bound overwrite issue	2013-07-24 11:05:18 -07:00
Herbert Xu	e70308ec0e	Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" This reverts commits `67822649d7` `39761214ee` `0b95a7f857` `31d939625a` `2d31e518a4` Unfortunately this change broke boot on some systems that used an initrd which does not include the newly created crct10dif modules. As these modules are required by sd_mod under certain configurations this is a serious problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-07-24 17:04:16 +10:00
Linus Torvalds	b2c311075d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - Do not idle omap device between crypto operations in one session. - Added sha224/sha384 shims for SSSE3. - More optimisations for camellia-aesni-avx2. - Removed defunct blowfish/twofish AVX2 implementations. - Added unaligned buffer self-tests. - Added PCLMULQDQ optimisation for CRCT10DIF. - Added support for Freescale's DCP co-processor - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (44 commits) crypto: testmgr - test hash implementations with unaligned buffers crypto: testmgr - test AEADs with unaligned buffers crypto: testmgr - test skciphers with unaligned buffers crypto: testmgr - check that entries in alg_test_descs are in correct order Revert "crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher" Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher" crypto: camellia-aesni-avx2 - tune assembly code for more performance hwrng: bcm2835 - fix MODULE_LICENSE tag hwrng: nomadik - use clk_prepare_enable() crypto: picoxcell - replace strict_strtoul() with kstrtoul() crypto: dcp - Staticize local symbols crypto: dcp - Use NULL instead of 0 crypto: dcp - Use devm_* APIs crypto: dcp - Remove redundant platform_set_drvdata() hwrng: use platform_{get,set}_drvdata() crypto: omap-aes - Don't idle/start AES device between Encrypt operations crypto: crct10dif - Use PTR_RET crypto: ux500 - Cocci spatch "resource_size.spatch" crypto: sha256_ssse3 - add sha224 support crypto: sha512_ssse3 - add sha384 support ...	2013-07-05 12:12:33 -07:00
Linus Torvalds	92616ee654	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes an unaligned crash in XTS mode when using aseni_intel" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni_intel - fix accessing of unaligned memory	2013-06-21 06:28:39 -10:00
Herbert Xu	02c0241b60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto Merge crypto to resolve conflict in crypto/Kconfig.	2013-06-21 15:13:27 +08:00
Jussi Kivilinna	99f42f937a	Revert "crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher" This reverts commit `cf1521a1a5`. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX implementation is therefore faster and this implementation should be removed. Converting this implementation to use the same method as in twofish/AVX for table look-ups would give additional ~3% speed up vs twofish/AVX, but would hardly be worth of the added code and binary size. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-21 14:44:29 +08:00
Jussi Kivilinna	3d387ef08c	Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher" This reverts commit `6048801070`. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 4-way blowfish implementation is therefore faster and this implementation should be removed. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-21 14:44:28 +08:00
Jussi Kivilinna	acfffdb803	crypto: camellia-aesni-avx2 - tune assembly code for more performance Add implementation tuned for more performance on real hardware. Changes are mostly around the part mixing 128-bit extract and insert instructions and AES-NI instructions. Also 'vpbroadcastb' instructions have been change to 'vpshufb with zero mask'. Tests on Intel Core i5-4570: tcrypt ECB results, old-AVX2 vs new-AVX2: size 128bit key 256bit key enc dec enc dec 256 1.00x 1.00x 1.00x 1.00x 1k 1.08x 1.09x 1.05x 1.06x 8k 1.06x 1.06x 1.06x 1.06x tcrypt ECB results, AVX vs new-AVX2: size 128bit key 256bit key enc dec enc dec 256 1.00x 1.00x 1.00x 1.00x 1k 1.51x 1.50x 1.52x 1.50x 8k 1.47x 1.48x 1.48x 1.48x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-21 14:44:23 +08:00
Jussi Kivilinna	fe6510b5d6	crypto: aesni_intel - fix accessing of unaligned memory The new XTS code for aesni_intel uses input buffers directly as memory operands for pxor instructions, which causes crash if those buffers are not aligned to 16 bytes. Patch changes XTS code to handle unaligned memory correctly, by loading memory with movdqu instead. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-06-13 14:57:42 +08:00
Linus Torvalds	484b002e28	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: - Three EFI-related fixes - Two early memory initialization fixes - build fix for older binutils - fix for an eager FPU performance regression -- currently we don't allow the use of the FPU at interrupt time at all in eager mode, which is clearly wrong. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Allow FPU to be used at interrupt time even with eagerfpu x86, crc32-pclmul: Fix build with older binutils x86-64, init: Fix a possible wraparound bug in switchover in head_64.S x86, range: fix missing merge during add range x86, efi: initial the local variable of DataSize to zero efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse efivarfs: Never return ENOENT from firmware again	2013-05-31 09:44:10 +09:00
Jan Beulich	2baad6121e	x86, crc32-pclmul: Fix build with older binutils binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-05-30 16:36:23 -07:00
Jussi Kivilinna	a710f761fc	crypto: sha256_ssse3 - add sha224 support Add sha224 implementation to sha256_ssse3 module. This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used before 'sha256'. Previously in such case, just sha256_generic was loaded and not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used after 'sha224' usage, sha256_ssse3 would remain unloaded. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 15:43:05 +08:00
Jussi Kivilinna	340991e30c	crypto: sha512_ssse3 - add sha384 support Add sha384 implementation to sha512_ssse3 module. This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used before 'sha512'. Previously in such case, just sha512_generic was loaded and not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this happens with tcrypt testing module since it tests 'sha384' before 'sha512'. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 15:43:05 +08:00
Jussi Kivilinna	de614e561b	crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As these implementations align the stack pointer to 16 bytes, this corruption did not happen every time. Patch corrects this issue. Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Tested-by: Julian Wollrath <jwollrath@web.de> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-28 13:46:47 +08:00
Tim Chen	0b95a7f857	crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned on to enable the feature. The crc_t10dif crypto library function will use this faster algorithm when crct10dif_pclmul module is loaded. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-24 17:55:27 +08:00
Tim Chen	31d939625a	crypto: crct10dif - Accelerated CRC T10 DIF computation with PCLMULQDQ instruction This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ instructions. Details discussing the implementation can be found in the paper: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-05-20 20:11:06 +08:00
Jussi Kivilinna	f3f935a76a	crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	56d76c96a9	crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:07 +08:00
Jussi Kivilinna	cf1521a1a5	crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:05 +08:00
Jussi Kivilinna	6048801070	crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:09:04 +08:00
Jussi Kivilinna	c456a9cd1a	crypto: aesni_intel - add more optimized XTS mode for x86-64 Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:53 +08:00
Jussi Kivilinna	b5c5b072dc	crypto: x86/camellia-aesni-avx - add more optimized XTS code Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage and small boost for speed. tcrypt results, with Intel i5-2450M: enc dec 16B 1.10x 1.01x 64B 0.82x 0.77x 256B 1.14x 1.10x 1024B 1.17x 1.16x 8192B 1.10x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of camellia-2way for block sized smaller than 256 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:52 +08:00
Jussi Kivilinna	70177286e1	crypto: cast6-avx: use new optimized XTS code Change cast6-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.01x 1.01x 64B 1.01x 1.00x 256B 1.09x 1.02x 1024B 1.08x 1.06x 8192B 1.08x 1.07x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:52 +08:00
Jussi Kivilinna	18be45270a	crypto: x86/twofish-avx - use optimized XTS code Change twofish-avx to use the new XTS code, for smaller stack usage and small boost to performance. tcrypt results, with Intel i5-2450M: enc dec 16B 1.03x 1.02x 64B 0.91x 0.91x 256B 1.10x 1.09x 1024B 1.12x 1.11x 8192B 1.12x 1.11x Since XTS is practically always used with data blocks of size 512 bytes or more, I chose to not make use of twofish-3way for block sized smaller than 128 bytes. This causes slower result in tcrypt for 64 bytes. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:51 +08:00
Jussi Kivilinna	a05248ed2d	crypto: x86 - add more optimized XTS-mode for serpent-avx This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:51 +08:00
Sandy Wu	57ae1b0532	crypto: crc32-pclmul - Use gas macro for pclmulqdq Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y. Older versions of bintuils do not support the pclmulqdq instruction. The PCLMULQDQ gas macro is used instead. Signed-off-by: Sandy Wu <sandyw@twitter.com> Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:44 +08:00
Tim Chen	87de4579f9	crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:01:42 +08:00
Tim Chen	5663535b69	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction. Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	e01d69cb01	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions. Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	bf215cee23	crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental SSE3 instructions. Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:58 +08:00
Tim Chen	8275d1aa64	crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-25 21:00:57 +08:00
Tim Chen	d34a460092	crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-03 09:06:32 +08:00
Tim Chen	ec2b4c851f	crypto: sha256 - Optimized sha256 x86_64 assembly routine with AVX instructions. Provides SHA256 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-03 09:06:32 +08:00
Tim Chen	46d208a2bd	crypto: sha256 - Optimized sha256 x86_64 assembly routine using Supplemental SSE3 instructions. Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-03 09:06:31 +08:00
Jussi Kivilinna	873b9cafa8	crypto: x86 - build AVX block cipher implementations only if assembler supports AVX instructions These modules require AVX support in assembler, so add new check to Makefile for this. Other option would be to use CONFIG_AS_AVX inside source files, but that would result dummy/empty/no-fuctionality modules being created. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-03 09:06:30 +08:00
Jussi Kivilinna	eca1726997	crypto: x86/crc32-pclmul - assembly clean-ups: use ENTRY/ENDPROC Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-04-03 09:06:29 +08:00
Tim Chen	918731fa28	crypto: crc32c - Update the links to the white papers on CRC32C calculations with PCLMULQDQ instructions. Herbert, The following patch update the stale link to the CRC32C white paper that was referenced. Tim Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-03-10 16:46:43 +08:00
Herbert Xu	ca81a1a1b8	crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 option This bool option can never be set to anything other than y. So let's just kill it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-02-26 17:52:15 +08:00
Linus Torvalds	32dc43e40a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 3.9: - Added accelerated implementation of crc32 using pclmulqdq. - Added test vector for fcrypt. - Added support for OMAP4/AM33XX cipher and hash. - Fixed loose crypto_user input checks. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits) crypto: user - ensure user supplied strings are nul-terminated crypto: user - fix empty string test in report API crypto: user - fix info leaks in report API crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding. crypto: use ERR_CAST crypto: atmel-aes - adjust duplicate test crypto: crc32-pclmul - Kill warning on x86-32 crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_* crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: aesni-intel - add ENDPROC statements for assembler functions crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets crypto: testmgr - add test vector for fcrypt ...	2013-02-25 15:56:15 -08:00
Herbert Xu	7983627657	crypto: crc32-pclmul - Kill warning on x86-32 This patch removes a gratuitous warning on x86-32: arch/x86/crypto/crc32-pclmul_asm.S:87:2: warning: #warning Using 32bit code support [-Wcpp] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 18:05:02 +11:00
Jussi Kivilinna	d3f5188dfe	crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:51 +11:00
Jussi Kivilinna	ac9d55dd42	crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:51 +11:00
Jussi Kivilinna	2dcfd44dee	crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:50 +11:00
Jussi Kivilinna	044438082c	crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_* Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:50 +11:00
Jussi Kivilinna	b05d3f3756	crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions Signed-off-by: Jussi Kivilinna <jussi.kivilinn@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:49 +11:00
Jussi Kivilinna	698a5abbb0	crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:49 +11:00
Jussi Kivilinna	1985fecf01	crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:49 +11:00
Jussi Kivilinna	e17e209ea4	crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:48 +11:00
Jussi Kivilinna	59990684b0	crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:48 +11:00
Jussi Kivilinna	5186e395fe	crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:48 +11:00
Jussi Kivilinna	8309b745bb	crypto: aesni-intel - add ENDPROC statements for assembler functions Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:47 +11:00
Jussi Kivilinna	3f29974383	crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:47 +11:00
Alexander Boyko	78c37d191d	crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation This patch adds crc32 algorithms to shash crypto api. One is wrapper to gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal. This instruction present from Intel Westmere and AMD Bulldozer CPUs. For intel core i5 I got 450MB/s for table implementation and 2100MB/s for pclmulqdq implementation. Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2013-01-20 10:16:45 +11:00
Jussi Kivilinna	0024dc5371	crypto: aesni-intel - remove rfc3686(ctr(aes)), utilize rfc3686 from ctr-module instead rfc3686 in CTR module is now able of using asynchronous ctr(aes) from aesni-intel, so rfc3686(ctr(aes)) in aesni-intel is no longer needed. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-01-08 07:04:47 +01:00
Linus Torvalds	1ed55eac3b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - Added aesni/avx/x86_64 implementations for camellia. - Optimised AVX code for cast5/serpent/twofish/cast6. - Fixed vmac bug with unaligned input. - Allow compression algorithms in FIPS mode. - Optimised crc32c implementation for Intel. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits) crypto: caam - Updated SEC-4.0 device tree binding for ERA information. crypto: testmgr - remove superfluous initializers for xts(aes) crypto: testmgr - allow compression algs in fips mode crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel crypto: testmgr - clean alg_test_null entries in alg_test_descs[] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests crypto: cast5/cast6 - move lookup tables to shared module padata: use __this_cpu_read per-cpu helper crypto: s5p-sss - Fix compilation error crypto: picoxcell - Add terminating entry for platform_device_id table crypto: omap-aes - select BLKCIPHER2 crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file crypto: tcrypt - add async speed test for camellia cipher crypto: tegra-aes - fix error-valued pointer dereference crypto: tegra - fix missing unlock on error case crypto: cast5/avx - avoid using temporary stack buffers crypto: serpent/avx - avoid using temporary stack buffers crypto: twofish/avx - avoid using temporary stack buffers crypto: cast6/avx - avoid using temporary stack buffers ...	2012-12-15 12:35:19 -08:00
Jussi Kivilinna	044ab52578	crypto: cast5/cast6 - move lookup tables to shared module CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-12-06 17:16:26 +08:00
Jussi Kivilinna	d9b1d2e7e1	crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block cipher. Implementation process data in sixteen block chunks, which are byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre- and post-filtering. Patch has been tested with tcrypt and automated filesystem tests. tcrypt test results: Intel Core i5-2450M: camellia-aesni-avx vs camellia-asm-x86_64-2way: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.98x 0.96x 0.99x 0.96x 0.96x 0.95x 0.95x 0.94x 0.97x 0.98x 64B 0.99x 0.98x 1.00x 0.98x 0.98x 0.99x 0.98x 0.93x 0.99x 0.98x 256B 2.28x 2.28x 1.01x 2.29x 2.25x 2.24x 1.96x 1.97x 1.91x 1.90x 1024B 2.57x 2.56x 1.00x 2.57x 2.51x 2.53x 2.19x 2.17x 2.19x 2.22x 8192B 2.49x 2.49x 1.00x 2.53x 2.48x 2.49x 2.17x 2.17x 2.22x 2.22x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.98x 0.99x 0.97x 0.97x 0.96x 0.97x 0.98x 0.98x 0.99x 64B 1.00x 1.00x 1.01x 0.99x 0.98x 0.99x 0.99x 0.99x 0.99x 0.99x 256B 2.37x 2.37x 1.01x 2.39x 2.35x 2.33x 2.10x 2.11x 1.99x 2.02x 1024B 2.58x 2.60x 1.00x 2.58x 2.56x 2.56x 2.28x 2.29x 2.28x 2.29x 8192B 2.50x 2.52x 1.00x 2.56x 2.51x 2.51x 2.24x 2.25x 2.26x 2.29x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-11-09 17:32:32 +08:00
Jussi Kivilinna	cf582cceda	crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file Prepare camellia-x86_64 functions to be reused from AVX/AESNI implementation module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-11-09 17:32:31 +08:00
Jussi Kivilinna	c12ab20b16	crypto: cast5/avx - avoid using temporary stack buffers Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~5%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:55 +08:00
Jussi Kivilinna	facd416fbc	crypto: serpent/avx - avoid using temporary stack buffers Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~3%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:55 +08:00
Jussi Kivilinna	8435a3c300	crypto: twofish/avx - avoid using temporary stack buffers Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.2% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~3%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:55 +08:00
Jussi Kivilinna	cba1cce054	crypto: cast6/avx - avoid using temporary stack buffers Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~2%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:54 +08:00
Jussi Kivilinna	58990986f1	crypto: x86/glue_helper - use le128 instead of u128 for CTR mode 'u128' currently used for CTR mode is on little-endian 'long long' swapped and would require extra swap operations by SSE/AVX code. Use of le128 instead of u128 allows IV calculations to be done with vector registers easier. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-24 21:10:54 +08:00
Jussi Kivilinna	32bec973a8	crypto: aesni - fix XTS mode on x86-32, add wrapper function for asmlinkage aesni_enc() Calling convention for internal functions and 'asmlinkage' functions is different on x86-32. Therefore do not directly cast aesni_enc as XTS tweak function, but use wrapper function in between. Fixes crash with "XTS + aesni_intel + x86-32" combination. Cc: stable@vger.kernel.org Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-18 14:01:33 -07:00
Tim Chen	6a8ce1ef39	crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu. A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-15 22:18:24 +08:00
Tim Chen	35b80920d4	crypto: crc32c - Rename crc32c-intel.c to crc32c-intel_glue.c This patch renames the crc32c-intel.c file to crc32c-intel_glue.c file in preparation for linking with the new crc32c-pcl-intel-asm.S file, which contains optimized crc32c calculation based on PCLMULQDQ instruction. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-15 22:18:22 +08:00
Jussi Kivilinna	c9f97a27ce	crypto: x86/glue_helper - fix storing of new IV in CBC encryption Glue_helper incorrectly XORs new IV over old IV at end of CBC encryption function when it should store. This causes CBC encryption to give incorrect output on multi-page encryption requests. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-10-04 17:45:29 +08:00
Jussi Kivilinna	200429cc63	crypto: cast5/avx - fix storing of new IV in CBC encryption cast5/avx incorrectly XORs new IV over old IV at end of CBC encryption function when it should store. This causes CBC encryption to give incorrect output on multi-page encryption requests. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-27 15:50:40 +08:00
Jussi Kivilinna	1ffb72a39a	crypto: camellia-x86_64 - fix sparse warnings (constant is so big) Fix "constant 0xXXXXXXXXXXXXXXXX is so big it's unsigned long" sparse warnings. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-07 04:17:05 +08:00
Jussi Kivilinna	c09220e1bc	crypto: cast6-avx - tune assembler code for more performance Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies, interleaves instructions better for out-of-order scheduling and merges constant 16-bit rotation with round-key variable rotation. tcrypt ECB results: Intel Core i5-2450M: size old-vs-new new-vs-generic old-vs-generic enc dec enc dec enc dec 256 1.13x 1.19x 2.05x 2.17x 1.82x 1.82x 1k 1.18x 1.21x 2.26x 2.33x 1.93x 1.93x 8k 1.19x 1.19x 2.32x 2.33x 1.95x 1.95x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Improvements to round-key variable rotation handling. - Further interleaving improvements for better out-of-order scheduling. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-07 04:17:05 +08:00
Jussi Kivilinna	ddaea7869d	crypto: cast5-avx - tune assembler code for more performance Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies, interleaves instructions better for out-of-order scheduling and merges constant 16-bit rotation with round-key variable rotation. tcrypt ECB results (128bit key): Intel Core i5-2450M: size old-vs-new new-vs-generic old-vs-generic enc dec enc dec enc dec 256 1.18x 1.18x 2.45x 2.47x 2.08x 2.10x 1k 1.20x 1.20x 2.73x 2.73x 2.28x 2.28x 8k 1.20x 1.19x 2.73x 2.73x 2.28x 2.29x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Improvements to round-key variable rotation handling. - Further interleaving improvements for better out-of-order scheduling. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-07 04:17:04 +08:00
Jussi Kivilinna	f94a73f8dd	crypto: twofish-avx - tune assembler code for more performance Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies and interleaves instructions better for out-of-order scheduling. Tested on Intel Core i5-2450M and AMD FX-8100. tcrypt ECB results: Intel Core i5-2450M: size old-vs-new new-vs-3way old-vs-3way enc dec enc dec enc dec 256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x 1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x 8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33x AMD FX-8100: size old-vs-new new-vs-3way old-vs-3way enc dec enc dec enc dec 256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x 1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x 8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Further interleaving improvements for better out-of-order scheduling. Tested-by: Borislav Petkov <bp@alien8.de> Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-07 04:17:04 +08:00
Jussi Kivilinna	023af60825	crypto: aesni_intel - improve lrw and xts performance by utilizing parallel AES-NI hardware pipelines Use parallel LRW and XTS encryption facilities to better utilize AES-NI hardware pipelines and gain extra performance. Tcrypt benchmark results (async), old vs new ratios: Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7) aes:128bit lrw:256bit xts:256bit size lrw-enc lrw-dec xts-dec xts-dec 16B 0.99x 1.00x 1.22x 1.19x 64B 1.38x 1.50x 1.58x 1.61x 256B 2.04x 2.02x 2.27x 2.29x 1024B 2.56x 2.54x 2.89x 2.92x 8192B 2.85x 2.99x 3.40x 3.23x aes:192bit lrw:320bit xts:384bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.08x 1.08x 1.16x 1.17x 64B 1.48x 1.54x 1.59x 1.65x 256B 2.18x 2.17x 2.29x 2.28x 1024B 2.67x 2.67x 2.87x 3.05x 8192B 2.93x 2.84x 3.28x 3.33x aes:256bit lrw:348bit xts:512bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.07x 1.07x 1.18x 1.19x 64B 1.56x 1.56x 1.70x 1.71x 256B 2.22x 2.24x 2.46x 2.46x 1024B 2.76x 2.77x 3.13x 3.05x 8192B 2.99x 3.05x 3.40x 3.30x Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Reviewed-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-08-20 16:28:10 +08:00
Johannes Goetzfried	4ea1277d30	crypto: cast6 - add x86_64/avx assembler implementation This patch adds a x86_64/avx assembler implementation of the Cast6 block cipher. The implementation processes eight blocks in parallel (two 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast6-avx-x86_64 vs. cast6-generic 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x 64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x 256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x 1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x 8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x 64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x 256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x 1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x 8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-08-01 17:47:30 +08:00
Johannes Goetzfried	4d6d6a2c85	crypto: cast5 - add x86_64/avx assembler implementation This patch adds a x86_64/avx assembler implementation of the Cast5 block cipher. The implementation processes sixteen blocks in parallel (four 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast5-avx-x86_64 vs. cast5-generic 64bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.02x 1.01x 64B 1.00x 1.00x 0.98x 1.00x 1.01x 1.02x 256B 2.03x 2.01x 0.95x 2.11x 2.12x 2.13x 1024B 2.30x 2.24x 0.95x 2.29x 2.35x 2.35x 8192B 2.31x 2.27x 0.95x 2.31x 2.39x 2.39x 128bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.01x 1.01x 64B 1.00x 1.00x 0.98x 1.01x 1.02x 1.01x 256B 2.17x 2.13x 0.96x 2.19x 2.19x 2.19x 1024B 2.29x 2.32x 0.95x 2.34x 2.37x 2.38x 8192B 2.35x 2.32x 0.95x 2.35x 2.39x 2.39x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-08-01 17:47:30 +08:00
Jussi Kivilinna	7af6c24568	crypto: arch/x86 - cleanup - remove unneeded crypto_alg.cra_list initializations Initialization of cra_list is currently mixed, most ciphers initialize this field and most shashes do not. Initialization however is not needed at all since cra_list is initialized/overwritten in __crypto_register_alg() with list_add(). Therefore perform cleanup to remove all unneeded initializations of this field in 'arch/x86/crypto/'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-08-01 17:47:27 +08:00
Johannes Goetzfried	a43478863b	crypto: twofish-avx - remove useless instruction The register %rdx is written, but never read till the end of the encryption routine. Therefore let's delete the useless instruction. Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-07-11 11:08:30 +08:00
Milan Broz	bf084d8f6e	crypto: aesni-intel - fix wrong kfree pointer kfree(new_key_mem) in rfc4106_set_key() should be called on malloced pointer, not on aligned one, otherwise it can cause invalid pointer on free. (Seen at least once when running tcrypt tests with debug kernel.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-07-11 11:06:13 +08:00
Jussi Kivilinna	70ef2601fe	crypto: move arch/x86/include/asm/aes.h to arch/x86/include/asm/crypto/ Move AES header to the new asm/crypto directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:03 +08:00
Jussi Kivilinna	d4af0e9d6e	crypto: move arch/x86/include/asm/serpent-{sse2\|avx}.h to arch/x86/include/asm/crypto/ Move serpent crypto headers to the new asm/crypto/ directory. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:02 +08:00
Jussi Kivilinna	a7378d4e55	crypto: twofish-avx - remove duplicated glue code and use shared glue code from glue_helper Now that shared glue code is available, convert twofish-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:02 +08:00
Jussi Kivilinna	414cb5e7cc	crypto: twofish-x86_64-3way - remove duplicated glue code and use shared glue code from glue_helper Now that shared glue code is available, convert twofish-x86_64-3way to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:02 +08:00
Jussi Kivilinna	964263afdc	crypto: camellia-x86_64 - remove duplicated glue code and use shared glue code from glue_helper Now that shared glue code is available, convert camellia-x86_64 to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:02 +08:00
Jussi Kivilinna	1d0debbd46	crypto: serpent-avx: remove duplicated glue code and use shared glue code from glue_helper Now that shared glue code is available, convert serpent-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:01 +08:00
Jussi Kivilinna	596d875052	crypto: serpent-sse2 - split generic glue code to new helper module Now that serpent-sse2 glue code has been made generic, it can be split to separate module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:01 +08:00
Jussi Kivilinna	e81792fbc2	crypto: serpent-sse2 - prepare serpent-sse2 glue code into generic x86 glue code for 128bit block ciphers Block cipher implementations in arch/x86/crypto/ contain common glue code that is currently duplicated in each module (camellia-x86_64, twofish-x86_64-3way, twofish-avx, serpent-sse2 and serpent-avx). This patch prepares serpent-sse2 glue into generic glue code for all 128bit block ciphers to use in arch/x86/crypto. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:01 +08:00
Jussi Kivilinna	a9629d7142	crypto: aes_ni - change to use shared ablk_* functions Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:01 +08:00
Jussi Kivilinna	30a0400882	crypto: twofish-avx - change to use shared ablk_* functions Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:01 +08:00
Jussi Kivilinna	ffaf915632	crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code to shared module Move ablk-* functions to separate module to share common code between cipher implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-06-27 14:42:00 +08:00

1 2 3 4 5 ...

349 Commits