This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).
Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.
* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
for 128-bit keys.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch reverts the following commits:
8621caa0d48096667273
I should not have applied them because they had already been
obsoleted by a subsequent patch series. They also cause a build
failure because of the subsequent commit 9ae433bc79.
Fixes: 9ae433bc79 ("crypto: chacha20 - convert generic and...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a combination of the the Intel algorithm implemented using SSE
and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and
the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in
version 8 of the architecture. Two versions of the above combo are
provided, one for CRC32 and one for CRC32C.
The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The variable aes_simd_algs should be static. In fact if it isn't
it causes build errors when multiple copies of aes-ce-glue.c are
built into the kernel.
Fixes: da40e7a4ba ("crypto: aes-ce - Convert to skcipher")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CBC encryption routine should use the encryption round keys, not
the decryption round keys.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds one more missing SIMD select for AES_ARM_BS. It
also changes selects on ALGAPI to BLKCIPHER.
Fixes: 211f41af53 ("crypto: aesbs - Convert to skcipher")
Reported-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The skcipher conversion for ARM missed the select on CRYPTO_SIMD,
causing build failures if SIMD was not otherwise enabled.
Fixes: da40e7a4ba ("crypto: aes-ce - Convert to skcipher")
Fixes: 211f41af53 ("crypto: aesbs - Convert to skcipher")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES key schedule generation is mostly endian agnostic, with the
exception of the rotation and the incorporation of the round constant
at the start of each round. So implement a big endian specific version
of that part to make the whole routine big endian compatible.
Fixes: 86464859cc ("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES-CTR glue code avoids calling into the blkcipher API for the
tail portion of the walk, by comparing the remainder of walk.nbytes
modulo AES_BLOCK_SIZE with the residual nbytes, and jumping straight
into the tail processing block if they are equal. This tail processing
block checks whether nbytes != 0, and does nothing otherwise.
However, in case of an allocation failure in the blkcipher layer, we
may enter this code with walk.nbytes == 0, while nbytes > 0. In this
case, we should not dereference the source and destination pointers,
since they may be NULL. So instead of checking for nbytes != 0, check
for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in
non-error conditions.
Fixes: 86464859cc ("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions")
Cc: stable@vger.kernel.org
Reported-by: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The fact that the internal synchrous hash implementation is called
"ghash" like the publicly visible one is causing the testmgr code
to misidentify it as an algorithm that requires testing at boottime.
So rename it to "__ghash" to prevent this.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since commit 8996eafdcb ("crypto: ahash - ensure statesize is non-zero"),
all ahash drivers are required to implement import()/export(), and must have
a non-zero statesize. Fix this for the ARM Crypto Extensions GHASH
implementation.
Fixes: 8996eafdcb ("crypto: ahash - ensure statesize is non-zero")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ARMv7 NEON module is explicitly built in ARM mode, which is not
supported by the Thumb2 kernel. So remove the explicit override, and
leave it up to the build environment to decide whether the core SHA1
routines are assembled as ARM or as Thumb2 code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes an old bug where requests can be reordered because
some are processed by cryptd while others are processed directly
in softirq context.
The fix is to always postpone to cryptd if there are currently
requests outstanding from the same tfm.
This patch also removes the redundant use of cryptd in the async
init function as init never touches the FPU.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.6:
API:
- Convert remaining crypto_hash users to shash or ahash, also convert
blkcipher/ablkcipher users to skcipher.
- Remove crypto_hash interface.
- Remove crypto_pcomp interface.
- Add crypto engine for async cipher drivers.
- Add akcipher documentation.
- Add skcipher documentation.
Algorithms:
- Rename crypto/crc32 to avoid name clash with lib/crc32.
- Fix bug in keywrap where we zero the wrong pointer.
Drivers:
- Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
- Add PIC32 hwrng driver.
- Support BCM6368 in bcm63xx hwrng driver.
- Pack structs for 32-bit compat users in qat.
- Use crypto engine in omap-aes.
- Add support for sama5d2x SoCs in atmel-sha.
- Make atmel-sha available again.
- Make sahara hashing available again.
- Make ccp hashing available again.
- Make sha1-mb available again.
- Add support for multiple devices in ccp.
- Improve DMA performance in caam.
- Add hashing support to rockchip"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: qat - remove redundant arbiter configuration
crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
crypto: qat - Change the definition of icp_qat_uof_regtype
hwrng: exynos - use __maybe_unused to hide pm functions
crypto: ccp - Add abstraction for device-specific calls
crypto: ccp - CCP versioning support
crypto: ccp - Support for multiple CCPs
crypto: ccp - Remove check for x86 family and model
crypto: ccp - memset request context to zero during import
lib/mpi: use "static inline" instead of "extern inline"
lib/mpi: avoid assembler warning
hwrng: bcm63xx - fix non device tree compatibility
crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
crypto: qat - The AE id should be less than the maximal AE number
lib/mpi: Endianness fix
crypto: rockchip - add hash support for crypto engine in rk3288
crypto: xts - fix compile errors
crypto: doc - add skcipher API documentation
crypto: doc - update AEAD AD handling
...
Commit 28856a9e52 missed the addition of the crypto/xts.h include file
for different architecture-specific AES implementations.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The patch centralizes the XTS key check logic into the service function
xts_check_key which is invoked from the different XTS implementations.
With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
a sanity check for the XTS keys similar to the other arches.
In addition, this service function received a check to ensure that the
key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
check is not present in the standards defining XTS, it is only enforced
in FIPS mode of the kernel.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ECB modes don't use an initialization vector. The kernel
/proc/crypto interface doesn't reflect this properly.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This trims off a couple of instructions of the total size of the
core AES transform by reordering the final branch in the AES-192
code path with the rounds that are performed regardless of whether
the branch is taken or not. Other than the slight size reduction,
this has no performance benefit.
Fix up a comment regarding the prototype of this function while
we're at it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This replaces the SHA-512 NEON module with the faster and more
versatile implementation from the OpenSSL project. It consists
of both a NEON and a generic ASM version of the core SHA-512
transform, where the NEON version reverts to the ASM version
when invoked in non-process context.
This patch is based on the OpenSSL upstream version b1a5d1c65208
of sha512-armv4.pl, which can be found here:
https://git.openssl.org/gitweb/?p=openssl.git;h=b1a5d1c65208
Performance relative to the generic implementation (measured
using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
KVM):
input size block size asm neon old neon
16 16 1.39 2.54 2.21
64 16 1.32 2.33 2.09
64 64 1.38 2.53 2.19
256 16 1.31 2.28 2.06
256 64 1.38 2.54 2.25
256 256 1.40 2.77 2.39
1024 16 1.29 2.22 2.01
1024 256 1.40 2.82 2.45
1024 1024 1.41 2.93 2.53
2048 16 1.33 2.21 2.00
2048 256 1.40 2.84 2.46
2048 1024 1.41 2.96 2.55
2048 2048 1.41 2.98 2.56
4096 16 1.34 2.20 1.99
4096 256 1.40 2.84 2.46
4096 1024 1.41 2.97 2.56
4096 4096 1.41 3.01 2.58
8192 16 1.34 2.19 1.99
8192 256 1.40 2.85 2.47
8192 1024 1.41 2.98 2.56
8192 4096 1.41 2.71 2.59
8192 8192 1.51 3.51 2.69
Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.1:
New interfaces:
- user-space interface for AEAD
- user-space interface for RNG (i.e., pseudo RNG)
New hashes:
- ARMv8 SHA1/256
- ARMv8 AES
- ARMv8 GHASH
- ARM assembler and NEON SHA256
- MIPS OCTEON SHA1/256/512
- MIPS img-hash SHA1/256 and MD5
- Power 8 VMX AES/CBC/CTR/GHASH
- PPC assembler AES, SHA1/256 and MD5
- Broadcom IPROC RNG driver
Cleanups/fixes:
- prevent internal helper algos from being exposed to user-space
- merge common code from assembly/C SHA implementations
- misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (169 commits)
crypto: arm - workaround for building with old binutils
crypto: arm/sha256 - avoid sha256 code on ARMv7-M
crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer
crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer
crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer
crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer
crypto: arm/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha1_neon - move SHA-1 NEON implementation to base layer
crypto: arm/sha1 - move SHA-1 ARM asm implementation to base layer
crypto: sha512-generic - move to generic glue implementation
crypto: sha256-generic - move to generic glue implementation
crypto: sha1-generic - move to generic glue implementation
crypto: sha512 - implement base layer for SHA-512
crypto: sha256 - implement base layer for SHA-256
crypto: sha1 - implement base layer for SHA-1
crypto: api - remove instance when test failed
crypto: api - Move alg ref count init to crypto_check_alg
...
Old versions of binutils (before 2.23) do not yet understand the
crypto-neon-fp-armv8 fpu instructions, and an attempt to build these
files results in a build failure:
arch/arm/crypto/aes-ce-core.S:133: Error: selected processor does not support ARM mode `vld1.8 {q10-q11},[ip]!'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q8'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q9'
arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
Since the affected versions are still in widespread use, and this breaks
'allmodconfig' builds, we should try to at least get a successful kernel
build. Unfortunately, I could not come up with a way to make the Kconfig
symbol depend on the binutils version, which would be the nicest solution.
Instead, this patch uses the 'as-instr' Kbuild macro to find out whether
the support is present in the assembler, and otherwise emits a non-fatal
warning indicating which selected modules could not be built.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: http://storage.kernelci.org/next/next-20150410/arm-allmodconfig/build.log
Fixes: 864cbeed4a ("crypto: arm - add support for SHA1 using ARMv8 Crypto Instructions")
[ard.biesheuvel:
- omit modules entirely instead of building empty ones if binutils is too old
- update commit log accordingly]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sha256 assembly implementation can deal with all architecture levels
from ARMv4 to ARMv7-A, but not with ARMv7-M. Enabling it in an
ARMv7-M kernel results in this build failure:
arm-linux-gnueabi-ld: error: arch/arm/crypto/sha256_glue.o: Conflicting architecture profiles M/A
arm-linux-gnueabi-ld: failed to merge target specific data of file arch/arm/crypto/sha256_glue.o
This adds a Kconfig dependency to prevent the code from being disabled
for ARMv7-M.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all ARMv8 AES helper ciphers as internal ciphers to prevent
them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all NEON bit sliced AES helper ciphers as internal ciphers to
prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Flag all GHASH ARMv8 vmull.p64 helper ciphers as internal ciphers
to prevent them from being called by normal users.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This fixes a bug in the new v8 Crypto Extensions GHASH code
that only manifests itself in big-endian mode.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This implements the GHASH hash algorithm (as used by the GCM AEAD
chaining mode) using the AArch32 version of the 64x64 to 128 bit
polynomial multiplication instruction (vmull.p64) that is part of
the ARMv8 Crypto Extensions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This implements the ECB, CBC, CTR and XTS asynchronous block ciphers
using the AArch32 versions of the ARMv8 Crypto Extensions for AES.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This implements the SHA-224/256 secure hash algorithm using the AArch32
versions of the ARMv8 Crypto Extensions for SHA2.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This implements the SHA1 secure hash algorithm using the AArch32
versions of the ARMv8 Crypto Extensions for SHA1.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This moves all Kconfig symbols defined in crypto/Kconfig that depend
on CONFIG_ARM to a dedicated Kconfig file in arch/arm/crypto, which is
where the code that implements those features resides as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This updates the bit sliced AES module to the latest version in the
upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a
bug in the XTS decryption path, where data chunked in a certain way
could trigger the ciphertext stealing code, which is not supposed to
be active in the kernel build (The kernel implementation of XTS only
supports round multiples of the AES block size of 16 bytes, whereas
the conformant OpenSSL implementation of XTS supports inputs of
arbitrary size by applying ciphertext stealing). This is fixed in
the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK
around the offending instructions.
The upstream code also contains the change applied by Russell to
build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7,
but implemented slightly differently.
Cc: stable@vger.kernel.org
Fixes: e4e7f10bfc ("ARM: add support for bit sliced AES using NEON instructions")
Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Memset on a local variable may be removed when it is called just before the
variable goes out of scope. Using memzero_explicit defeats this
optimization. A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier x;
type T;
@@
{
... when any
T x[...];
... when any
when exists
- memset
+ memzero_explicit
(x,
-0,
...)
... when != x
when strict
}
// </smpl>
This change was suggested by Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:
https://lkml.org/lkml/2013/3/4/70
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This tweaks the SHA-1 NEON code slightly so it works correctly under big
endian, and removes the Kconfig condition preventing it from being
selected if CONFIG_CPU_BIG_ENDIAN is set.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull ARM updates from Russell King:
"Included in this update:
- perf updates from Will Deacon:
The main changes are callchain stability fixes from Jean Pihet and
event mapping and PMU name rework from Mark Rutland
The latter is preparatory work for enabling some code re-use with
arm64 in the future.
- updates for nommu from Uwe Kleine-König:
Two different fixes for the same problem making some ARM nommu
configurations not boot since 3.6-rc1. The problem is that
user_addr_max returned the biggest available RAM address which
makes some copy_from_user variants fail to read from XIP memory.
- deprecate legacy OMAP DMA API, in preparation for it's removal.
The popular drivers have been converted over, leaving a very small
number of rarely used drivers, which hopefully can be converted
during the next cycle with a bit more visibility (and hopefully
people popping out of the woodwork to help test)
- more tweaks for BE systems, particularly with the kernel image
format. In connection with this, I've cleaned up the way we
generate the linker script for the decompressor.
- removal of hard-coded assumptions of the kernel stack size, making
everywhere depend on the value of THREAD_SIZE_ORDER.
- MCPM updates from Nicolas Pitre.
- Make it easier for proper CPU part number checks (which should
always include the vendor field).
- Assembly code optimisation - use the "bx" instruction when
returning from a function on ARMv6+ rather than "mov pc, reg".
- Save the last kernel misaligned fault location and report it via
the procfs alignment file.
- Clean up the way we create the initial stack frame, which is a
repeated pattern in several different locations.
- Support for 8-byte get_user(), needed for some DRM implementations.
- mcs locking from Will Deacon.
- Save and restore a few more Cortex-A9 registers (for errata
workarounds)
- Fix various aspects of the SWP emulation, and the ELF hwcap for the
SWP instruction.
- Update LPAE logic for pte_write and pmd_write to make it more
correct.
- Support for Broadcom Brahma15 CPU cores.
- ARM assembly crypto updates from Ard Biesheuvel"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (53 commits)
ARM: add comments to the early page table remap code
ARM: 8122/1: smp_scu: enable SCU standby support
ARM: 8121/1: smp_scu: use macro for SCU enable bit
ARM: 8120/1: crypto: sha512: add ARM NEON implementation
ARM: 8119/1: crypto: sha1: add ARM NEON implementation
ARM: 8118/1: crypto: sha1/make use of common SHA-1 structures
ARM: 8113/1: remove remaining definitions of PLAT_PHYS_OFFSET from <mach/memory.h>
ARM: 8111/1: Enable erratum 798181 for Broadcom Brahma-B15
ARM: 8110/1: do CPU-specific init for Broadcom Brahma15 cores
ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
ARM: hwcap: disable HWCAP_SWP if the CPU advertises it has exclusives
ARM: SWP emulation: only initialise on ARMv7 CPUs
ARM: SWP emulation: always enable when SMP is enabled
ARM: 8103/1: save/restore Cortex-A9 CP15 registers on suspend/resume
ARM: 8098/1: mcs lock: implement wfe-based polling for MCS locking
ARM: 8091/2: add get_user() support for 8 byte types
ARM: 8097/1: unistd.h: relocate comments back to place
ARM: 8096/1: Describe required sort order for textofs-y (TEXT_OFFSET)
ARM: 8090/1: add revision info for PL310 errata 588369 and 727915
...
Common SHA-1 structures are defined in <crypto/sha.h> for code sharing.
This patch changes SHA-1/ARM glue code to use these structures.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.13+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls. Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).
We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.
Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code. This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
Tested-by: Shawn Guo <shawn.guo@freescale.com>
Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Building a multi-arch kernel results in:
arch/arm/crypto/built-in.o: In function `aesbs_xts_decrypt':
sha1_glue.c:(.text+0x15c8): undefined reference to `bsaes_xts_decrypt'
arch/arm/crypto/built-in.o: In function `aesbs_xts_encrypt':
sha1_glue.c:(.text+0x1664): undefined reference to `bsaes_xts_encrypt'
arch/arm/crypto/built-in.o: In function `aesbs_ctr_encrypt':
sha1_glue.c:(.text+0x184c): undefined reference to `bsaes_ctr32_encrypt_blocks'
arch/arm/crypto/built-in.o: In function `aesbs_cbc_decrypt':
sha1_glue.c:(.text+0x19b4): undefined reference to `bsaes_cbc_encrypt'
This code is already runtime-conditional on NEON being supported, so
there's no point compiling it out depending on the minimum build
architecture.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.
This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.
The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.
Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Put the struct definitions for AES keys and the asm function prototypes in a
separate header and export the asm functions from the module.
This allows other drivers to use them directly.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Patch 638591c enabled building the AES assembler code in Thumb2 mode.
However, this code used arithmetic involving PC rather than adr{l}
instructions to generate PC-relative references to the lookup tables,
and this needs to take into account the different PC offset when
running in Thumb mode.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Make the SHA1 asm code ABI conformant by making sure all stack
accesses occur above the stack pointer.
Origin:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=1a9d60d2
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes aes-armv4.S and sha1-armv4-large.S to work
natively in Thumb. This allows ARM/Thumb interworking workarounds
to be removed.
I also take the opportunity to convert some explicit assembler
directives for exported functions to the standard
ENTRY()/ENDPROC().
For the code itself:
* In sha1_block_data_order, use of TEQ with sp is deprecated in
ARMv7 and not supported in Thumb. For the branches back to
.L_00_15 and .L_40_59, the TEQ is converted to a CMP, under the
assumption that clobbering the C flag here will not cause
incorrect behaviour.
For the first branch back to .L_20_39_or_60_79 the C flag is
important, so sp is moved temporarily into another register so
that TEQ can be used for the comparison.
* In the AES code, most forms of register-indexed addressing with
shifts and rotates are not permitted for loads and stores in
Thumb, so the address calculation is done using a separate
instruction for the Thumb case.
The resulting code is unlikely to be optimally scheduled, but it
should not have a large impact given the overall size of the code.
I haven't run any benchmarks.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Tested-by: David McCullough <ucdevel@gmail.com> (ARM only)
Acked-by: David McCullough <ucdevel@gmail.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add assembler versions of AES and SHA1 for ARM platforms. This has provided
up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.
Platform CPU SPeed Endian Before (bps) After (bps) Improvement
IXP425 533 MHz big 11217042 15566294 ~38%
KS8695 166 MHz little 3828549 5795373 ~51%
Signed-off-by: David McCullough <ucdevel@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>