linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-24 14:47:31 +07:00

Author	SHA1	Message	Date
Ard Biesheuvel	6d6254d728	crypto: arm64/ghash-ce - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:16 +08:00
Ard Biesheuvel	5d3d9c8bda	crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-11 17:50:38 +08:00
Ard Biesheuvel	12fcd92305	crypto: arm64/aes - replace scalar fallback with plain NEON fallback The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	1abee99eaf	crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64 This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% () faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:51 +08:00
Ard Biesheuvel	bed593c0e8	crypto: arm64/aes - add scalar implementation This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:49 +08:00
Ard Biesheuvel	b7171ce9eb	crypto: arm64/chacha20 - implement NEON version based on SSE3 code This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:48 +08:00
Herbert Xu	5386e5d1f8	Revert "crypto: arm64/ARM: NEON accelerated ChaCha20" This patch reverts the following commits: `8621caa0d4` `8096667273` I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit `9ae433bc79`. Fixes: `9ae433bc79` ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-28 17:39:26 +08:00
Ard Biesheuvel	8621caa0d4	crypto: arm64/chacha20 - implement NEON version based on SSE3 code This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-27 17:47:28 +08:00
Ard Biesheuvel	8fefde90e9	crypto: arm64/crc32 - accelerated support based on x86 SSE implementation This is a combination of the the Intel algorithm implemented using SSE and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in version 8 of the architecture. Two versions of the above combo are provided, one for CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:22 +08:00
Ard Biesheuvel	6ef5737f39	crypto: arm64/crct10dif - port x86 SSE implementation to arm64 This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:17 +08:00
Herbert Xu	585b5fa63d	crypto: arm/aes - Select SIMD in Kconfig The skcipher conversion for ARM missed the select on CRYPTO_SIMD, causing build failures if SIMD was not otherwise enabled. Fixes: `da40e7a4ba` ("crypto: aes-ce - Convert to skcipher") Fixes: `211f41af53` ("crypto: aesbs - Convert to skcipher") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-29 16:11:14 +08:00
Ard Biesheuvel	7918ecef07	crypto: arm64/sha2 - integrate OpenSSL implementations of SHA256/SHA512 This integrates both the accelerated scalar and the NEON implementations of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. Relative performance compared to the respective generic C versions: \| SHA256-scalar \| SHA256-NEON* \| SHA512 \| ------------+-----------------+--------------+----------+ Cortex-A53 \| 1.63x \| 1.63x \| 2.34x \| Cortex-A57 \| 1.43x \| 1.59x \| 1.95x \| Cortex-A73 \| 1.26x \| 1.56x \| ? \| The core crypto code was authored by Andy Polyakov of the OpenSSL project, in collaboration with whom the upstream code was adapted so that this module can be built from the same version of sha512-armv8.pl. The version in this patch was taken from OpenSSL commit 32bbb62ea634 ("sha/asm/sha512-armv8.pl: fix big-endian support in __KERNEL__ case.") * The core SHA algorithm is fundamentally sequential, but there is a secondary transformation involved, called the schedule update, which can be performed independently. The NEON version of SHA-224/SHA-256 only implements this part of the algorithm using NEON instructions, the sequential part is always done using scalar instructions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 19:58:05 +08:00
Linus Torvalds	e3aa91a7cb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: - The crypto API is now documented :) - Disallow arbitrary module loading through crypto API. - Allow get request with empty driver name through crypto_user. - Allow speed testing of arbitrary hash functions. - Add caam support for ctr(aes), gcm(aes) and their derivatives. - nx now supports concurrent hashing properly. - Add sahara support for SHA1/256. - Add ARM64 version of CRC32. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: tcrypt - Allow speed testing of arbitrary hash functions crypto: af_alg - add user space interface for AEAD crypto: qat - fix problem with coalescing enable logic crypto: sahara - add support for SHA1/256 crypto: sahara - replace tasklets with kthread crypto: sahara - add support for i.MX53 crypto: sahara - fix spinlock initialization crypto: arm - replace memset by memzero_explicit crypto: powerpc - replace memset by memzero_explicit crypto: sha - replace memset by memzero_explicit crypto: sparc - replace memset by memzero_explicit crypto: algif_skcipher - initialize upon init request crypto: algif_skcipher - removed unneeded code crypto: algif_skcipher - Fixed blocking recvmsg crypto: drbg - use memzero_explicit() for clearing sensitive data crypto: drbg - use MODULE_ALIAS_CRYPTO crypto: include crypto- module prefix in template crypto: user - add MODULE_ALIAS crypto: sha-mb - remove a bogus NULL check crytpo: qat - Fix 64 bytes requests ...	2014-12-13 13:33:26 -08:00
Yazen Ghannam	f6f203faa3	crypto: crc32 - Add ARM64 CRC32 hw accelerated module This module registers a crc32 algorithm and a crc32c algorithm that use the optional CRC32 and CRC32C instructions in ARMv8. Tested on AMD Seattle. Improvement compared to crc32c-generic algorithm: TCRYPT CRC32C speed test shows ~450% speedup. Simple dd write tests to btrfs filesystem show ~30% speedup. Signed-off-by: Yazen Ghannam <yazen.ghannam@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-11-20 22:39:39 +08:00
Ard Biesheuvel	12ac3efe74	arm64/crypto: use crypto instructions to generate AES key schedule This patch implements the AES key schedule generation using ARMv8 Crypto Instructions. It replaces the table based C implementation in aes_generic.ko, which means we can drop the dependency on that module. Tested-by: Steve Capper <steve.capper@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2014-11-06 17:25:28 +00:00
Ard Biesheuvel	49788fe2a1	arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions This adds ARMv8 implementations of AES in ECB, CBC, CTR and XTS modes, both for ARMv8 with Crypto Extensions and for plain ARMv8 NEON. The Crypto Extensions version can only run on ARMv8 implementations that have support for these optional extensions. The plain NEON version is a table based yet time invariant implementation. All S-box substitutions are performed in parallel, leveraging the wide range of ARMv8's tbl/tbx instructions, and the huge NEON register file, which can comfortably hold the entire S-box and still have room to spare for doing the actual computations. The key expansion routines were borrowed from aes_generic. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:04:16 -07:00
Ard Biesheuvel	a3fd82105b	arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions This patch adds support for the AES-CCM encryption algorithm for CPUs that have support for the AES part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:04:15 -07:00
Ard Biesheuvel	317f2f750d	arm64/crypto: AES using ARMv8 Crypto Extensions This patch adds support for the AES symmetric encryption algorithm for CPUs that have support for the AES part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:04:11 -07:00
Ard Biesheuvel	fdd2389457	arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call carry-less multiply). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:04:07 -07:00
Ard Biesheuvel	6ba6c74dfc	arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:04:01 -07:00
Ard Biesheuvel	2c98833a42	arm64/crypto: SHA-1 using ARMv8 Crypto Extensions This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that have support for the SHA-1 part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2014-05-14 10:03:17 -07:00

21 Commits