Commit Graph

17 Commits

Author SHA1 Message Date
Ard Biesheuvel
5a22b198cd crypto: arm64/ghash - register PMULL variants as separate algos
The arm64 GHASH implementation either uses 8-bit or 64-bit
polynomial multiplication instructions, since the latter are
faster but not mandatory in the architecture.

Since that prevents us from testing both implementations on the
same system, let's expose both implementations to the crypto API,
with the priorities reflecting that the P64 version is the
preferred one if available.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-01 14:44:38 +08:00
Ard Biesheuvel
c2b24c36e0 crypto: arm64/aes-gcm-ce - fix scatterwalk API violation
Commit 71e52c278c ("crypto: arm64/aes-ce-gcm - operate on
two input blocks at a time") modified the granularity at which
the AES/GCM code processes its input to allow subsequent changes
to be applied that improve performance by using aggregation to
process multiple input blocks at once.

For this reason, it doubled the algorithm's 'chunksize' property
to 2 x AES_BLOCK_SIZE, but retained the non-SIMD fallback path that
processes a single block at a time. In some cases, this violates the
skcipher scatterwalk API, by calling skcipher_walk_done() with a
non-zero residue value for a chunk that is expected to be handled
in its entirety. This results in a WARN_ON() to be hit by the TLS
self test code, but is likely to break other user cases as well.
Unfortunately, none of the current test cases exercises this exact
code path at the moment.

Fixes: 71e52c278c ("crypto: arm64/aes-ce-gcm - operate on two ...")
Reported-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-25 19:50:43 +08:00
Ard Biesheuvel
22240df7ac crypto: arm64/ghash-ce - implement 4-way aggregation
Enhance the GHASH implementation that uses 64-bit polynomial
multiplication by adding support for 4-way aggregation. This
more than doubles the performance, from 2.4 cycles per byte
to 1.1 cpb on Cortex-A53.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:40 +08:00
Ard Biesheuvel
8e492eff7d crypto: arm64/ghash-ce - replace NEON yield check with block limit
Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores
with fast crypto instructions and comparatively slow memory accesses.

On algorithms such as GHASH, which executes at ~1 cycle per byte on
cores that implement support for 64 bit polynomial multiplication,
there is really no need to check the TIF_NEED_RESCHED particularly
often, and so we can remove the NEON yield check from the assembler
routines.

However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take
arbitrary input lengths, and so there needs to be some sanity check
to ensure that we don't hog the CPU for excessive amounts of time.

So let's simply cap the maximum input size that is processed in one go
to 64 KB.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:51:39 +08:00
Ard Biesheuvel
30f1a9f53e crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
Squeeze out another 5% of performance by minimizing the number
of invocations of kernel_neon_begin()/kernel_neon_end() on the
common path, which also allows some reloads of the key schedule
to be optimized away.

The resulting code runs at 2.3 cycles per byte on a Cortex-A53.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Ard Biesheuvel
e0bd888dc4 crypto: arm64/aes-ce-gcm - implement 2-way aggregation
Implement a faster version of the GHASH transform which amortizes
the reduction modulo the characteristic polynomial across two
input blocks at a time.

On a Cortex-A53, the gcm(aes) performance increases 24%, from
3.0 cycles per byte to 2.4 cpb for large input sizes.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Ard Biesheuvel
71e52c278c crypto: arm64/aes-ce-gcm - operate on two input blocks at a time
Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:38:04 +08:00
Herbert Xu
c5f5aeef9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merge mainline to pick up c7513c2a27 ("crypto/arm64: aes-ce-gcm -
add missing kernel_neon_begin/end pair").
2018-08-03 17:55:12 +08:00
Ard Biesheuvel
c7513c2a27 crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
kernel_neon_end() to be used since the routine touches the NEON
register file. Add the missing calls.

Also, since NEON register contents are not preserved outside of
a kernel mode NEON region, pass the key schedule array again.

Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-31 13:20:30 +01:00
Eric Biggers
e50944e219 crypto: shash - remove useless setting of type flags
Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH.  But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the shash algorithms.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-09 00:30:24 +08:00
Ard Biesheuvel
7c50136a8a crypto: arm64/aes-ghash - yield NEON after every block of input
Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON after every block of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-12 00:13:10 +08:00
Ard Biesheuvel
03c9a333fe crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.

On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:25 +08:00
Ard Biesheuvel
537c1445ab crypto: arm64/gcm - implement native driver using v8 Crypto Extensions
Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.

On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.

So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:23 +08:00
Ard Biesheuvel
6d6254d728 crypto: arm64/ghash-ce - add non-SIMD scalar fallback
The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:16 +08:00
Ard Biesheuvel
b913a6404c arm64/crypto: improve performance of GHASH algorithm
This patches modifies the GHASH secure hash implementation to switch to a
faster, polynomial multiplication based reduction instead of one that uses
shifts and rotates.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-06-18 12:40:54 +01:00
Ard Biesheuvel
6aa8b209f5 arm64/crypto: fix data corruption bug in GHASH algorithm
This fixes a bug in the GHASH algorithm resulting in the calculated hash to be
incorrect if the input is presented in chunks whose size is not a multiple of
16 bytes.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: fdd2389457 ("arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-06-18 12:40:53 +01:00
Ard Biesheuvel
fdd2389457 arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14 10:04:07 -07:00