Commit Graph

648335 Commits

Author SHA1 Message Date
Cyrille Pitchen
69303cf0f1 crypto: atmel-sha - add simple DMA transfers
This patch adds a simple function to perform data transfer with the DMA
controller.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:13 +08:00
Cyrille Pitchen
eec12f66b0 crypto: atmel-sha - add atmel_sha_cpu_start()
This patch adds a simple function to perform data transfer with PIO, hence
handled by the CPU.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:13 +08:00
Cyrille Pitchen
563c47df79 crypto: atmel-sha - add SHA_MR_MODE_IDATAR0
This patch defines an alias macro to SHA_MR_MODE_PDC, which is not suited
for DMA usage.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:13 +08:00
Cyrille Pitchen
9064ed9269 crypto: atmel-sha - add atmel_sha_wait_for_data_ready()
This patch simply defines a helper function to test the 'Data Ready' flag
of the Status Register. It also gives a chance for the crypto request to
be processed synchronously if this 'Data Ready' flag is already set when
polling the Status Register. Indeed, running synchronously avoid the
latency of the 'Data Ready' interrupt.

When the 'Data Ready' flag has not been set yet, we enable the associated
interrupt and resume processing the crypto request asynchronously from the
'done' task just as before.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:13 +08:00
Cyrille Pitchen
f07cebad63 crypto: atmel-sha - redefine SHA_FLAGS_SHA* flags to match SHA_MR_ALGO_SHA*
This patch modifies the SHA_FLAGS_SHA* flags: those algo flags are now
organized as values of a single bitfield instead of individual bits.
This allows to reduce the number of bits needed to encode all possible
values. Also the new values match the SHA_MR_ALGO_SHA* values hence
the algorithm bitfield of the SHA_MR register could simply be set with:

mr = (mr & ~SHA_FLAGS_ALGO_MASK) | (ctx->flags & SHA_FLAGS_ALGO_MASK)

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:12 +08:00
Cyrille Pitchen
b5ce82a7b4 crypto: atmel-sha - make atmel_sha_done_task more generic
This patch is a transitional patch. It updates atmel_sha_done_task() to
make it more generic. Indeed, it adds a new .resume() member in the
atmel_sha_dev structure. This hook is called from atmel_sha_done_task()
to resume processing an asynchronous request.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:12 +08:00
Cyrille Pitchen
a29af939b2 crypto: atmel-sha - update request queue management to make it more generic
This patch is a transitional patch. It splits the atmel_sha_handle_queue()
function. Now atmel_sha_handle_queue() only manages the request queue and
calls a new .start() hook from the atmel_sha_ctx structure.
This hook allows to implement different kind of requests still handled by
a single queue.

Also when the req parameter of atmel_sha_handle_queue() refers to the very
same request as the one returned by crypto_dequeue_request(), the queue
management now gives a chance to this crypto request to be handled
synchronously, hence reducing latencies. The .start() hook returns 0 if
the crypto request was handled synchronously and -EINPROGRESS if the
crypto request still need to be handled asynchronously.

Besides, the new .is_async member of the atmel_sha_dev structure helps
tagging this asynchronous state. Indeed, the req->base.complete() callback
should not be called if the crypto request is handled synchronously.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:11 +08:00
Cyrille Pitchen
8340c7fd28 crypto: atmel-sha - create function to get an Atmel SHA device
This is a transitional patch: it creates the atmel_sha_find_dev() function,
which will be used in further patches to share the source code responsible
for finding a Atmel SHA device.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:11 +08:00
Rabin Vincent
379d972b81 crypto: doc - Fix hash export state information
The documentation states that crypto_ahash_reqsize() provides the size
of the state structure used by crypto_ahash_export().  But it's actually
crypto_ahash_statesize() which provides this size.

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:16:11 +08:00
Herbert Xu
34cb582139 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree to pick up arm64 output IV patch.
2017-02-03 18:14:10 +08:00
Harsh Jain
7c2cf1c461 crypto: chcr - Fix key length for RFC4106
Check keylen before copying salt to avoid wrap around of Integer.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 18:09:31 +08:00
Harsh Jain
0b529f143e crypto: algif_aead - Fix kernel panic on list_del
Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.

Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Reviewed-by: Stephan Müller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:48 +08:00
Herbert Xu
c268199000 crypto: aesni - Fix failure when pcbc module is absent
When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load.  However, the pcbc module may not
be present for reasons such as its absence on initramfs.  This patch
allows the aesni to function even if the pcbc module is enabled but
not present.

Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:48 +08:00
Gary R Hook
e5da5c5667 crypto: ccp - Fix double add when creating new DMA command
Eliminate a double-add by creating a new list to manage
command descriptors when created; move the descriptor to
the pending list when the command is submitted.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:46 +08:00
Gary R Hook
500c0106e6 crypto: ccp - Fix DMA operations when IOMMU is enabled
An I/O page fault occurs when the IOMMU is enabled on a
system that supports the v5 CCP.  DMA operations use a
Request ID value that does not match what is expected by
the IOMMU, resulting in the I/O page fault.  Setting the
Request ID value to 0 corrects this issue.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:45 +08:00
Harsh Jain
f5f7bebc91 crypto: chcr - Check device is allocated before use
Ensure dev is allocated for crypto uld context before using the device
for crypto operations.

Cc: <stable@vger.kernel.org>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:44 +08:00
Harsh Jain
94e1dab1c9 crypto: chcr - Fix panic on dma_unmap_sg
Save DMA mapped sg list addresses to request context buffer.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:44 +08:00
Giovanni Cabiddu
685ce06268 crypto: qat - zero esram only for DH85x devices
Zero embedded ram in DH85x devices. This is not
needed for newer generations as it is done by HW.

Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-02 21:54:53 +08:00
Giovanni Cabiddu
3484ecbe0e crypto: qat - fix bar discovery for c62x
Some accelerators of the c62x series have only two bars.
This patch skips BAR0 if the accelerator does not have it.

Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-02 21:54:52 +08:00
Li Zhong
7dede913fc crypto: vmx - disable preemption to enable vsx in aes_ctr.c
Some preemptible check warnings were reported from enable_kernel_vsx(). This
patch disables preemption in aes_ctr.c before enabling vsx, and they are now
consistent with other files in the same directory.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:34 +08:00
Ryder Lee
d03f7b0d58 crypto: mediatek - add support to GCM mode
This patch adds support to the GCM mode.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:33 +08:00
Ryder Lee
e04a31d7f5 crypto: mediatek - add support to CTR mode
This patch adds support to the CTR mode.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:32 +08:00
Ryder Lee
059b14947a crypto: mediatek - fix typo and indentation
Dummy patch to fix typo and indentation.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:32 +08:00
Ryder Lee
0abc271494 crypto: mediatek - regroup functions by usage
This patch only regroup functions by usage.
This will help to integrate the GCM support patch later by
adjusting some shared code section, such as common code which
will be reused by GCM, AES mode setting, and DMA transfer.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:31 +08:00
Ryder Lee
87421984b4 crypto: mediatek - rework crypto request completion
This patch introduces a new callback 'resume' in the struct mtk_aes_rec.
This callback is run to resume/complete the processing of the crypto
request when woken up by AES interrupts when DMA completion.

This callback will help implementing the GCM mode support in further
patches.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:30 +08:00
Ryder Lee
382ae57d5e crypto: mediatek - make crypto request queue management more generic
This patch changes mtk_aes_handle_queue() to make it more generic.
The function argument is now a pointer to struct crypto_async_request,
which is the common base of struct ablkcipher_request and
struct aead_request.

Also this patch introduces struct mtk_aes_base_ctx which will be the
common base of all the transformation contexts.

Hence the very same queue will be used to manage both block cipher and
AEAD requests (such as gcm and authenc implemented in further patches).

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:30 +08:00
Ryder Lee
4432861fb9 crypto: mediatek - fix incorrect data transfer result
This patch fixes mtk_aes_xmit() data transfer bug.

The original function uses the same loop and ring->pos
to handle both command and result descriptors. But this
produces incomplete results when src.sg_len != dst.sg_len.

To solve the problem, we splits the descriptors into different
loops and uses cmd_pos and res_pos to record them respectively.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:29 +08:00
Ryder Lee
a873996238 crypto: mediatek - move HW control data to transformation context
This patch moves hardware control block members from
mtk_*_rec to transformation context and refines related
definition. This makes operational context to manage its
own control information easily for each DMA transfer.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:29 +08:00
Denys Vlasenko
e183914af0 crypto: x86 - make constants readonly, allow linker to merge them
A lot of asm-optimized routines in arch/x86/crypto/ keep its
constants in .data. This is wrong, they should be on .rodata.

Mnay of these constants are the same in different modules.
For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
exists in at least half a dozen places.

There is a way to let linker merge them and use just one copy.
The rules are as follows: mergeable objects of different sizes
should not share sections. You can't put them all in one .rodata
section, they will lose "mergeability".

GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used.
This patch does the same:

	.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16

It is important that all data in such section consists of
16-byte elements, not larger ones, and there are no implicit
use of one element from another.

When this is not the case, use non-mergeable section:

	.section .rodata[.VAR_NAME], "a", @progbits

This reduces .data by ~15 kbytes:

    text    data     bss     dec      hex filename
11097415 2705840 2630712 16433967  fac32f vmlinux-prev.o
11112095 2690672 2630712 16433479  fac147 vmlinux.o

Merged objects are visible in System.map:

ffffffff81a28810 r POLY
ffffffff81a28810 r POLY
ffffffff81a28820 r TWOONE
ffffffff81a28820 r TWOONE
ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of
ffffffff81a28830 r SHUF_MASK   <------------- the name difference
ffffffff81a28830 r SHUF_MASK
ffffffff81a28830 r SHUF_MASK
..
ffffffff81a28d00 r K512 <- merged three identical 640-byte tables
ffffffff81a28d00 r K512
ffffffff81a28d00 r K512

Use of object names in section name suffixes is not strictly necessary,
but might help if someday link stage will use garbage collection
to eliminate unused sections (ld --gc-sections).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Xiaodong Liu <xiaodong.liu@intel.com>
CC: Megha Dey <megha.dey@intel.com>
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:29 +08:00
Denys Vlasenko
587d531b8f crypto: x86/crc32c - fix %progbits -> @progbits
%progbits form is used on ARM (where @ is a comment char).

x86 consistently uses @progbits everywhere else.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Xiaodong Liu <xiaodong.liu@intel.com>
CC: Megha Dey <megha.dey@intel.com>
CC: George Spelvin <linux@horizon.com>
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:26 +08:00
Ard Biesheuvel
13954e788d crypto: arm/aes-neonbs - fix issue with v2.22 and older assembler
The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;

.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'

Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:25 +08:00
Rabin Vincent
76512f2d8c crypto: tcrypt - Add debug prints
tcrypt is very tight-lipped when it succeeds, but a bit more feedback
would be useful when developing or debugging crypto drivers, especially
since even a successful run ends with the module failing to insert. Add
a couple of debug prints, which can be enabled with dynamic debug:

Before:

 # insmod tcrypt.ko mode=10
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable

After:

 # insmod tcrypt.ko mode=10 dyndbg
 tcrypt: testing ecb(aes)
 tcrypt: testing cbc(aes)
 tcrypt: testing lrw(aes)
 tcrypt: testing xts(aes)
 tcrypt: testing ctr(aes)
 tcrypt: testing rfc3686(ctr(aes))
 tcrypt: all tests passed
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:24 +08:00
Nicolas Iooss
3bfb2e6b32 crypto: img-hash - use dma_data_direction when calling dma_map_sg
The fourth argument of dma_map_sg() and dma_unmap_sg() is an item of
dma_data_direction enum. Function img_hash_xmit_dma() wrongly used
DMA_MEM_TO_DEV, which is an item of dma_transfer_direction enum.

Replace DMA_MEM_TO_DEV (which value is 1) with DMA_TO_DEVICE (which
value is fortunately also 1) when calling dma_map_sg() and
dma_unmap_sg().

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:23 +08:00
Ard Biesheuvel
11e3b725cf crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
Update the ARMv8 Crypto Extensions and the plain NEON AES implementations
in CBC and CTR modes to return the next IV back to the skcipher API client.
This is necessary for chaining to work correctly.

Note that for CTR, this is only done if the request is a round multiple of
the block size, since otherwise, chaining is impossible anyway.

Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:41:33 +08:00
Salvatore Benedetto
d6040764ad crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with
the algorithm registration. This fixes qat-dh registration when
driver is restarted

Cc: <stable@vger.kernel.org>
Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:41:32 +08:00
Gonglei \(Arei\)
87170961f3 crypto: virtio - adjust priority of algorithm
Some hardware accelerators (like intel aesni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than software
implementations but lower than the hardware ones.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:22 +08:00
Ard Biesheuvel
658fa754cd crypto: arm/aes - avoid reserved 'tt' mnemonic in asm code
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.

Fixes: 81edb42629 ("crypto: arm/aes - replace scalar AES cipher")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:21 +08:00
Shannon Nelson
0ff1436fb2 hwrng: n2 - update version info
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:20 +08:00
Shannon Nelson
07e25d43be hwrng: n2 - support new hardware register layout
Add the new register layout constants and the requisite logic
for using them.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:19 +08:00
Shannon Nelson
becbc4940a hwrng: n2 - add device data descriptions
Since we're going to need to keep track of more than just one
attribute of the hardware, we'll change the use of the data field
from the match struct from a single flag to a struct pointer.
This patch adds the struct template and initial descriptions.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:19 +08:00
Shannon Nelson
db602a7f94 hwrng: n2 - limit error spewage when self-test fails
If the self-test fails, it probably won't actually suddenly
start working.  Currently, this causes an endless spew of
error messages on the console and in the logs, so this patch
adds a limiter to the test.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:18 +08:00
Wei Yongjun
de0f96d772 crypto: mediatek - make symbol of_crypto_id static
Fixes the following sparse warning:

drivers/crypto/mediatek/mtk-platform.c:585:27: warning:
 symbol 'of_crypto_id' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:16 +08:00
Ard Biesheuvel
21c8e72037 crypto: testmgr - use calculated count for number of test vectors
When working on AES in CCM mode for ARM, my code passed the internal
tcrypt test before I had even bothered to implement the AES-192 and
AES-256 code paths, which is strange because the tcrypt does contain
AES-192 and AES-256 test vectors for CCM.

As it turned out, the define AES_CCM_ENC_TEST_VECTORS was out of sync
with the actual number of test vectors, causing only the AES-128 ones
to be executed.

So get rid of the defines, and wrap the test vector references in a
macro that calculates the number of vectors automatically.

The following test vector counts were out of sync with the respective
defines:

    BF_CTR_ENC_TEST_VECTORS          2 ->  3
    BF_CTR_DEC_TEST_VECTORS          2 ->  3
    TF_CTR_ENC_TEST_VECTORS          2 ->  3
    TF_CTR_DEC_TEST_VECTORS          2 ->  3
    SERPENT_CTR_ENC_TEST_VECTORS     2 ->  3
    SERPENT_CTR_DEC_TEST_VECTORS     2 ->  3
    AES_CCM_ENC_TEST_VECTORS         8 -> 14
    AES_CCM_DEC_TEST_VECTORS         7 -> 17
    AES_CCM_4309_ENC_TEST_VECTORS    7 -> 23
    AES_CCM_4309_DEC_TEST_VECTORS   10 -> 23
    CAMELLIA_CTR_ENC_TEST_VECTORS    2 ->  3
    CAMELLIA_CTR_DEC_TEST_VECTORS    2 ->  3

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:15 +08:00
Ard Biesheuvel
cc477bf645 crypto: arm/aes - replace bit-sliced OpenSSL NEON code
This replaces the unwieldy generated implementation of bit-sliced AES
in CBC/CTR/XTS modes that originated in the OpenSSL project with a
new version that is heavily based on the OpenSSL implementation, but
has a number of advantages over the old version:
- it does not rely on the scalar AES cipher that also originated in the
  OpenSSL project and contains redundant lookup tables and key schedule
  generation routines (which we already have in crypto/aes_generic.)
- it uses the same expanded key schedule for encryption and decryption,
  reducing the size of the per-key data structure by 1696 bytes
- it adds an implementation of AES in ECB mode, which can be wrapped by
  other generic chaining mode implementations
- it moves the handling of corner cases that are non critical to performance
  to the glue layer written in C
- it was written directly in assembler rather than generated from a Perl
  script

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:27:31 +08:00
Ard Biesheuvel
1abee99eaf crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64
This is a reimplementation of the NEON version of the bit-sliced AES
algorithm. This code is heavily based on Andy Polyakov's OpenSSL version
for ARM, which is also available in the kernel. This is an alternative for
the existing NEON implementation for arm64 authored by me, which suffers
from poor performance due to its reliance on the pathologically slow four
register variant of the tbl/tbx NEON instruction.

This version is about ~30% (*) faster than the generic C code, but only in
cases where the input can be 8x interleaved (this is a fundamental property
of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR
are implemented. (The significance of ECB is that it could potentially be
used by other chaining modes)

* Measured on Cortex-A57. Note that this is still an order of magnitude
  slower than the implementations that use the dedicated AES instructions
  introduced in ARMv8, but those are part of an optional extension, and so
  it is good to have a fallback.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:51 +08:00
Ard Biesheuvel
81edb42629 crypto: arm/aes - replace scalar AES cipher
This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).

Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.

* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
  for 128-bit keys.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:50 +08:00
Ard Biesheuvel
bed593c0e8 crypto: arm64/aes - add scalar implementation
This adds a scalar implementation of AES, based on the precomputed tables
that are exposed by the generic AES code. Since rotates are cheap on arm64,
this implementation only uses the 4 core tables (of 1 KB each), and avoids
the prerotated ones, reducing the D-cache footprint by 75%.

On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster
than the generic C code. (Note that this is still >13x slower than the code
that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
byte.)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:49 +08:00
Ard Biesheuvel
293614ce3e crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well
In addition to wrapping the AES-CTR cipher into the async SIMD wrapper,
which exposes it as an async skcipher that defers processing to process
context, expose our AES-CTR implementation directly as a synchronous cipher
as well, but with a lower priority.

This makes the AES-CTR transform usable in places where synchronous
transforms are required, such as the MAC802.11 encryption code, which
executes in sotfirq context, where SIMD processing is allowed on arm64.
Users of the async transform will keep the existing behavior.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:49 +08:00
Ard Biesheuvel
afaf712e99 crypto: arm/chacha20 - implement NEON version based on SSE3 code
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:48 +08:00
Ard Biesheuvel
b7171ce9eb crypto: arm64/chacha20 - implement NEON version based on SSE3 code
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:48 +08:00