Commit Graph

7 Commits

Author SHA1 Message Date
Russell King
75c912a367 ARM: add .gitignore entry for aesbs-core.S
This avoids this file being incorrectly added to git.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-10-07 15:43:53 +01:00
Ard Biesheuvel
e4e7f10bfc ARM: add support for bit sliced AES using NEON instructions
Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
and around 25% for decryption. This implementation of the AES algorithm
does not rely on any lookup tables so it is believed to be invulnerable
to cache timing attacks.

This algorithm processes up to 8 blocks in parallel in constant time. This
means that it is not usable by chaining modes that are strictly sequential
in nature, such as CBC encryption. CBC decryption, however, can benefit from
this implementation and runs about 25% faster. The other chaining modes
implemented in this module, XTS and CTR, can execute fully in parallel in
both directions.

The core code has been adopted from the OpenSSL project (in collaboration
with the original author, on cc). For ease of maintenance, this version is
identical to the upstream OpenSSL code, i.e., all modifications that were
required to make it suitable for inclusion into the kernel have been made
upstream. The original can be found here:

    http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130

Note to integrators:
While this implementation is significantly faster than the existing table
based ones (generic or ARM asm), especially in CTR mode, the effects on
power efficiency are unclear as of yet. This code does fundamentally more
work, by calculating values that the table based code obtains by a simple
lookup; only by doing all of that work in a SIMD fashion, it manages to
perform better.

Cc: Andy Polyakov <appro@openssl.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-10-04 20:48:38 +02:00
Ard Biesheuvel
5ce26f3b5a ARM: move AES typedefs and function prototypes to separate header
Put the struct definitions for AES keys and the asm function prototypes in a
separate header and export the asm functions from the module.
This allows other drivers to use them directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-10-04 09:26:54 +02:00
Ard Biesheuvel
40190c85f4 ARM: 7837/3: fix Thumb-2 bug in AES assembler code
Patch 638591c enabled building the AES assembler code in Thumb2 mode.
However, this code used arithmetic involving PC rather than adr{l}
instructions to generate PC-relative references to the lookup tables,
and this needs to take into account the different PC offset when
running in Thumb mode.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-09-22 11:43:38 +01:00
Ard Biesheuvel
934fc24df1 ARM: 7723/1: crypto: sha1-armv4-large.S: fix SP handling
Make the SHA1 asm code ABI conformant by making sure all stack
accesses occur above the stack pointer.

Origin:
http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=1a9d60d2

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-22 22:01:35 +01:00
Dave Martin
638591cd7b ARM: 7626/1: arm/crypto: Make asm SHA-1 and AES code Thumb-2 compatible
This patch fixes aes-armv4.S and sha1-armv4-large.S to work
natively in Thumb.  This allows ARM/Thumb interworking workarounds
to be removed.

I also take the opportunity to convert some explicit assembler
directives for exported functions to the standard
ENTRY()/ENDPROC().

For the code itself:

  * In sha1_block_data_order, use of TEQ with sp is deprecated in
    ARMv7 and not supported in Thumb.  For the branches back to
    .L_00_15 and .L_40_59, the TEQ is converted to a CMP, under the
    assumption that clobbering the C flag here will not cause
    incorrect behaviour.

    For the first branch back to .L_20_39_or_60_79 the C flag is
    important, so sp is moved temporarily into another register so
    that TEQ can be used for the comparison.

  * In the AES code, most forms of register-indexed addressing with
    shifts and rotates are not permitted for loads and stores in
    Thumb, so the address calculation is done using a separate
    instruction for the Thumb case.

The resulting code is unlikely to be optimally scheduled, but it
should not have a large impact given the overall size of the code.
I haven't run any benchmarks.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Tested-by: David McCullough <ucdevel@gmail.com> (ARM only)
Acked-by: David McCullough <ucdevel@gmail.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-01-13 12:41:22 +00:00
David McCullough
f0be44f4fb arm/crypto: Add optimized AES and SHA1 routines
Add assembler versions of AES and SHA1 for ARM platforms.  This has provided
up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.

Platform   CPU SPeed    Endian   Before (bps)   After (bps)   Improvement

IXP425      533 MHz      big     11217042        15566294        ~38%
KS8695      166 MHz     little    3828549         5795373        ~51%

Signed-off-by: David McCullough <ucdevel@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-09-07 04:17:02 +08:00