This patch avoids copy of buffered data to hash from bufnext to buf
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
SEC1 doesn't support S/G in descriptors so for hash operations,
the CPU has to build a buffer containing the buffered block and
the incoming data. This generates a lot of memory copies which
represents more than 50% of CPU time of a md5sum operation as
shown below with a 'perf record'.
|--86.24%-- kcapi_md_digest
| |
| |--86.18%-- _kcapi_common_vmsplice_chunk_fd
| | |
| | |--83.68%-- splice
| | | |
| | | |--83.59%-- ret_from_syscall
| | | | |
| | | | |--83.52%-- sys_splice
| | | | | |
| | | | | |--83.49%-- splice_from_pipe
| | | | | | |
| | | | | | |--83.04%-- __splice_from_pipe
| | | | | | | |
| | | | | | | |--80.67%-- pipe_to_sendpage
| | | | | | | | |
| | | | | | | | |--78.25%-- hash_sendpage
| | | | | | | | | |
| | | | | | | | | |--60.08%-- ahash_process_req
| | | | | | | | | | |
| | | | | | | | | | |--56.36%-- sg_copy_buffer
| | | | | | | | | | | |
| | | | | | | | | | | |--55.29%-- memcpy
| | | | | | | | | | | |
However, unlike SEC2+, SEC1 offers the possibility to chain
descriptors. It is therefore possible to build a first descriptor
pointing to the buffered data and a second descriptor pointing to
the incoming data, hence avoiding the memory copy to a single
buffer.
With this patch, the time necessary for a md5sum on a 90Mbytes file
is approximately 3 seconds. Without the patch it takes 6 seconds.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
At every request, we map and unmap the same hash hw_context.
This patch moves the dma mapping/unmapping in functions ahash_init()
and ahash_import().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
dma_map_single() is an heavy operation which doesn't need to
be done at each request as the key doesn't change.
Instead of DMA mapping the key at every request, this patch maps it
once in setkey()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Do (desc->hdr & DESC_HDR_TYPE_IPSEC_ESP) only once.
Limit number of if/else paths
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
to_talitos_ptr() and to_talitos_ptr_len() are always called together
in order to fully set a ptr, so lets merge them into a single
helper.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The number of channels is known from the beginning, no need to
test it everytime.
This patch defines two additional done functions handling only channel 0.
Then the probe registers the correct one based on the number of channels.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use of_property_read_u32() to simplify DT read
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
talitos_handle_buggy_hash() and talitos_sg_map() are only used
locally, make them static
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch zeroize the descriptor at allocation using memset().
This has two advantages:
- It reduces the number of places where data has to be set to 0
- It avoids reading memory and loading the cache with data that
will be entirely replaced.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
sg_link_tbl_len shall be used instead of cryptlen, otherwise
SECs which perform HW CICV verification will fail.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
sha224 AEAD test fails with:
[ 2.803125] talitos ff020000.crypto: DEUISR 0x00000000_00000000
[ 2.808743] talitos ff020000.crypto: MDEUISR 0x80100000_00000000
[ 2.814678] talitos ff020000.crypto: DESCBUF 0x20731f21_00000018
[ 2.820616] talitos ff020000.crypto: DESCBUF 0x0628d64c_00000010
[ 2.826554] talitos ff020000.crypto: DESCBUF 0x0631005c_00000018
[ 2.832492] talitos ff020000.crypto: DESCBUF 0x0628d664_00000008
[ 2.838430] talitos ff020000.crypto: DESCBUF 0x061b13a0_00000080
[ 2.844369] talitos ff020000.crypto: DESCBUF 0x0631006c_00000080
[ 2.850307] talitos ff020000.crypto: DESCBUF 0x0631006c_00000018
[ 2.856245] talitos ff020000.crypto: DESCBUF 0x063100ec_00000000
[ 2.884972] talitos ff020000.crypto: failed to reset channel 0
[ 2.890503] talitos ff020000.crypto: done overflow, internal time out, or rngu error: ISR 0x20000000_00020000
[ 2.900652] alg: aead: encryption failed on test 1 for authenc-hmac-sha224-cbc-3des-talitos: ret=22
This is due to SHA224 not being supported by the HW. Allthough for
hash we are able to init the hash context by SW, it is not
possible for AEAD. Therefore SHA224 AEAD has to be deactivated.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto manager test report the following failures:
[ 3.061081] alg: skcipher: setkey failed on test 5 for ecb-des-talitos: flags=100
[ 3.069342] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-talitos: flags=100
[ 3.077754] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-talitos: flags=100
This is due to setkey being expected to detect weak keys.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds aes-gcm support to crypto4xx.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch enhances existing interfaces and
functions to support AEAD ciphers in the next
patches.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks to the big overhaul of crypto4xx_build_pd(), the request-local
sa_in, sa_out and state_record allocation can be simplified.
There's no need to setup any dma coherent memory anymore and
much of the support code can be removed.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the crypto4xx device is continuously loaded by dm-crypt
and ipsec work, it will start to work intermittent after a
few (between 20-30) seconds, hurting throughput and latency.
This patch contains various stability improvements in order
to fix this issue. So far, the hardware has survived more
than a day without suffering any stalls under the continuous
load.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto4xx_core.c:179:6: warning: symbol 'crypto4xx_free_state_record'
was not declared. Should it be static?
crypto4xx_core.c:331:5: warning: symbol 'crypto4xx_get_n_gd'
was not declared. Should it be static?
crypto4xx_core.c:652:6: warning: symbol 'crypto4xx_return_pd'
was not declared. Should it be static?
crypto4xx_return_pd() is not used by anything. Therefore it is removed.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch overhauls and fixes code related to crypto4xx_build_pd()
* crypto4xx_build_pd() did not handle chained source scatterlist.
This is fixed by replacing the buggy indexed-access of &src[idx]
with sg_next() in the gather array setup loop.
* The redundant is_hash, direction, save_iv and pd_ctl members
in the crypto4xx_ctx struct have been removed.
- is_hash can be derived from the crypto_async_request parameter.
- direction is already part of the security association's
bf.dir bitfield.
- save_iv is unused.
- pd_ctl always had the host_ready bit enabled anyway.
(the hash_final case is rather pointless, since the ahash
code has been deactivated).
* make crypto4xx_build_pd()'s caller responsible for converting
the IV to the LE32 format.
* change crypto4xx_ahash_update() and crypto4xx_ahash_digest() to
initialize a temporary destination scatterlist. This allows the
removal of an ugly cast of req->result (which is a pointer to an
u8-array) to a scatterlist pointer.
* change crypto4xx_build_pd() return type to int. After all
it returns -EINPROGRESS/-EBUSY.
* fix crypto4xx_build_pd() thread-unsafe sa handling.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The hardware expects that the keys, IVs (and inner/outer hashes)
are in the le32 format.
This patch changes all hardware interface declarations to use
the correct LE32 data format for each field.
In order to pass __CHECK_ENDIAN__ checks, crypto4xx_memcpy_le
has to be honest about the endianness of its parameters.
The function was split and moved to the common crypto4xx_core.h
header. This allows the compiler to generate better code if the
sizes/len is a constant (various *_IV_LEN).
Please note that the hardware isn't consistent with the endiannes
of the save_digest field in the state record struct though.
The hashes produced by GHASH and CBC (for CCM) will be in LE32.
Whereas md5 and sha{1/,256,...} do not need any conversion.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Previously, If the crypto4xx driver used all available
security contexts, it would simply refuse new requests
with -EAGAIN. CRYPTO_TFM_REQ_MAY_BACKLOG was ignored.
in case of dm-crypt.c's crypt_convert() function this was
causing the following errors to manifest, if the system was
pushed hard enough:
| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| JBD2: Detected IO errors while flushing file data on dm-1-8
| Aborting journal on device dm-1-8.
| EXT4-fs error : ext4_journal_check_start:56: Detected aborted journal
| EXT4-fs (dm-1): Remounting filesystem read-only
| EXT4-fs : ext4_writepages: jbd2_start: 2048 pages, inode 498...; err -30
(This did cause corruptions due to failed writes)
To fix this mess, the crypto4xx driver needs to notifiy the
user to slow down. This can be achieved by returning -EBUSY
on requests, once the crypto hardware was falling behind.
Note: -EBUSY has two different meanings. Setting the flag
CRYPTO_TFM_REQ_MAY_BACKLOG implies that the request was
successfully queued, by the crypto driver. To achieve this
requirement, the implementation introduces a threshold check and
adds logic to the completion routines in much the same way as
AMD's Cryptographic Coprocessor (CCP) driver do.
Note2: Tests showed that dm-crypt starved ipsec traffic.
Under load, ipsec links dropped to 0 Kbits/s. This is because
dm-crypt's callback would instantly queue the next request.
In order to not starve ipsec, the driver reserves a small
portion of the available crypto contexts for this purpose.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If crypto4xx is used in conjunction with dm-crypt, the available
ring buffer elements are not enough to handle the load properly.
On an aes-cbc-essiv:sha256 encrypted swap partition the read
performance is abyssal: (tested with hdparm -t)
/dev/mapper/swap_crypt:
Timing buffered disk reads: 14 MB in 3.68 seconds = 3.81 MB/sec
The patch increases both PPC4XX_NUM_SD and PPC4XX_NUM_PD to 256.
This improves the performance considerably:
/dev/mapper/swap_crypt:
Timing buffered disk reads: 104 MB in 3.03 seconds = 34.31 MB/sec
Furthermore, PPC4XX_LAST_SD, PPC4XX_LAST_GD and PPC4XX_LAST_PD
can be easily calculated from their respective PPC4XX_NUM_*
constant.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes a type mismatch error that I accidentally
introduced when I moved and refactored the dynamic_contents
helpers.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
I used aes-cbc as a template for ofb. But sadly I forgot
to update set_key method to crypto4xx_setkey_aes_ofb().
this was caught by the testmgr:
alg: skcipher: Test 1 failed (invalid result) on encr. for ofb-aes-ppc4xx
00000000: 76 49 ab ac 81 19 b2 46 ce e9 8e 9b 12 e9 19 7d
00000010: 50 86 cb 9b 50 72 19 ee 95 db 11 3a 91 76 78 b2
00000020: 73 be d6 b8 e3 c1 74 3b 71 16 e6 9e 22 22 95 16
00000030: 3f f1 ca a1 68 1f ac 09 12 0e ca 30 75 86 e1 a7
With the correct set_key method, the aes-ofb cipher passes the test.
name : ofb(aes)
driver : ofb-aes-ppc4xx
module : crypto4xx
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ablkcipher
async : yes
blocksize : 16
min keysize : 16
max keysize : 32
ivsize : 16
geniv : <default>
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The hmac_mc parameter of set_dynamic_sa_command_1()
was defined but not used. On closer inspection it
turns out, it was never wired up.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The code is simplified by using two __be64 values for the operation
instead of using two arrays of u8. This allows to get rid of the memory
alignment code. In addition, the crypto_xor can be replaced with a
native XOR operation. Finally, the definition of the variables is
re-arranged such that the data structures come before simple variables
to potentially reduce memory space.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
DH_KPP_SECRET_MIN_SIZE and dh_data_size() are both returning
unsigned values.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
p->key_size, p->p_size, p->g_size are all of unsigned int type.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ECDH_KPP_SECRET_MIN_SIZE and params->key_size are both returning
unsigned values.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the cipher name does not start with 'ecb(' we should bail out, as done
in the 'create()' function in 'crypto/xts.c'.
Fixes: 700cb3f5fe ("crypto: lrw - Convert to skcipher")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All error handling paths 'goto err_drop_spawn' except this one.
In order to avoid some resources leak, we should do it as well here.
Fixes: 700cb3f5fe ("crypto: lrw - Convert to skcipher")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The variable len is set to zero, never read and then later updated
to p - name, so clearly the zero'ing of len is redundant and
can be removed.
Detected by clang scan-build:
" warning: Value stored to 'len' is never read"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the algorithm we're parallelizing is asynchronous we might change
CPUs between padata_do_parallel() and padata_do_serial(). However, we
don't expect this to happen as we need to enqueue the padata object into
the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel
queue.
Ensure we're not switching CPUs for a given padata object by tracking
the CPU within the padata object. If the serial callback gets called on
the wrong CPU, defer invoking padata_reorder() via a kernel worker on
the CPU we're expected to run on.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The reorder timer function runs on the CPU where the timer interrupt was
handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask
set.
Ensure the padata_reorder() callback runs on the correct CPU, which is
one in the 'pcpu' CPU mask set and, preferrably, the next expected one.
Do so by comparing the current CPU with the expected target CPU. If they
match, call padata_reorder() right away. If they differ, schedule a work
item on the target CPU that does the padata_reorder() call for us.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>