The decision to rebuild .S_shipped is made based on the relative
timestamps of .S_shipped and .pl files but git makes this essentially
random. This means that the perl script might run anyway (usually at
most once per checkout), defeating the whole purpose of _shipped.
Fix by skipping the rule unless explicit make variables are provided:
REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO.
This can produce nasty occasional build failures downstream, for example
for toolchains with broken perl. The solution is minimally intrusive to
make it easier to push into stable.
Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Keystone Security Accelerator module has a hardware random generator
sub-module. This commit adds the driver for this sub-module.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
[t-kristo@ti.com: dropped one unnecessary dev_err message]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Keystone SA module has a hardware random generator module.
This commit adds binding doc for the KS2 SA HWRNG driver.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On Armada 7K/8K we need to explicitly enable the register clock. This
clock is optional because not all the SoCs using this IP need it but at
least for Armada 7K/8K it is actually mandatory.
The binding documentation is updated accordingly.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The clock is optional, but if it is present we should managed it. If
there is an error while trying getting it, we should exit and report this
error.
So instead of returning an error only in the -EPROBE case, turn it in an
other way and ignore the clock only if it is not present (-ENOENT case).
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In this driver the clock is got but never put when the driver is removed
or if there is an error in the probe.
Using the managed version of clk_get() allows to let the kernel take care
of it.
Fixes: 1b44c5a60c ("crypto: inside-secure - add SafeXcel EIP197 crypto
engine driver")
cc: stable@vger.kernel.org
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the missing unlock before return from function
safexcel_ahash_send_req() in the error handling case.
Fixes: cff9a17545 ("crypto: inside-secure - move cache result dma mapping to request")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tweak the SHA256 update routines to invoke the SHA256 block transform
block by block, to avoid excessive scheduling delays caused by the
NEON algorithm running with preemption disabled.
Also, remove a stale comment which no longer applies now that kernel
mode NEON is actually disallowed in some contexts.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CBC MAC is strictly sequential, and so the current AES code simply
processes the input one block at a time. However, we are about to add
yield support, which adds a bit of overhead, and which we prefer to
align with other modes in terms of granularity (i.e., it is better to
have all routines yield every 64 bytes and not have an exception for
CBC MAC which yields every 16 bytes)
So unroll the loop by 4. We still cannot perform the AES algorithm in
parallel, but we can at least merge the loads and stores.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CBC encryption is strictly sequential, and so the current AES code
simply processes the input one block at a time. However, we are
about to add yield support, which adds a bit of overhead, and which
we prefer to align with other modes in terms of granularity (i.e.,
it is better to have all routines yield every 64 bytes and not have
an exception for CBC encrypt which yields every 16 bytes)
So unroll the loop by 4. We still cannot perform the AES algorithm in
parallel, but we can at least merge the loads and stores.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES block mode implementation using Crypto Extensions or plain NEON
was written before real hardware existed, and so its interleave factor
was made build time configurable (as well as an option to instantiate
all interleaved sequences inline rather than as subroutines)
We ended up using INTERLEAVE=4 with inlining disabled for both flavors
of the core AES routines, so let's stick with that, and remove the option
to configure this at build time. This makes the code easier to modify,
which is nice now that we're adding yield support.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.
Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.
So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.
Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.
So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.
Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.
So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)
Note that this requires some reshuffling of the registers in the asm
code, because the XTS routines can no longer rely on the registers to
retain their contents between invocations.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.
Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.
So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In order to be able to test yield support under preempt, add a test
vector for CRC-T10DIF that is long enough to take multiple iterations
(and thus possible preemption between them) of the primary loop of the
accelerated x86 and arm64 implementations.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On the quest to remove all VLAs from the kernel[1], this switches to
a pair of kmalloc regions instead of using the stack. This also moves
the get_random_bytes() after all allocations (and drops the needless
"nbytes" variable).
[1] https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CCP driver copies data between scatter/gather lists and DMA buffers.
The length of the requested copy operation must be checked against
the available destination buffer length.
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Prevent improper use of req->result field in ahash update, init, export and
import functions in drivers code. A driver should use ahash request context
if it needs to save internal state.
Signed-off-by: Kamil Konieczny <k.konieczny@partner.samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
virtio_crypto does not use function crypto_authenc_extractkeys, remove
this unnecessary dependency. Compiles fine and passes cryptodev-linux
cipher and speed tests from https://wiki.qemu.org/Features/VirtioCrypto
Fixes: dbaf0624ff ("crypto: add virtio-crypto driver")
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).
SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.
SMS4 was originally created for use in protecting wireless
networks, and is mandated in the Chinese National Standard for
Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
(GB.15629.11-2003).
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Send multiple WRs to H/W when No. of entries received in scatter list
cannot be sent in single request.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We use ctr(aes) to fallback rfc3686(ctr) request. Send updated IV to fallback path.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CBC Decryption requires Last Block as IV. In case src/dst buffer
are same last block will be replaced by plain text. This patch copies
the Last Block before sending request to HW.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver works well on i.MX31 powered boards with device description
taken from board device tree, the only change to add to the driver is
the missing OF device id, the affected list of included headers and
indentation in platform driver struct are beautified a little.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Freescale i.MX21 and i.MX31 SoCs contain a Random Number Generator
Accelerator module (RNGA), which is replaced by RNGB and RNGC modules
on later i.MX SoC series, the change adds a new compatible property
to describe the controller.
Since all versions of Freescale RNG modules are legacy, apparently
the documentation file has no more potential for further extensions,
nevertheless generalize it by removing explicit RNGC specifics.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
for ARM64. This is ported from the 32-bit version. It may be useful on
devices with 64-bit ARM CPUs that don't have the Cryptography
Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
processor on the Raspberry Pi 3.
It generally works the same way as the 32-bit version, but there are
some slight differences due to the different instructions, registers,
and syntax available in ARM64 vs. in ARM32. For example, in the 64-bit
version there are enough registers to hold the XTS tweaks for each
128-byte chunk, so they don't need to be saved on the stack.
Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
Algorithm Encryption Decryption
--------- ---------- ----------
Speck64/128-XTS (NEON) 92.2 MB/s 92.2 MB/s
Speck128/256-XTS (NEON) 75.0 MB/s 75.0 MB/s
Speck128/256-XTS (generic) 47.4 MB/s 35.6 MB/s
AES-128-XTS (NEON bit-sliced) 33.4 MB/s 29.6 MB/s
AES-256-XTS (NEON bit-sliced) 24.6 MB/s 21.7 MB/s
The code performs well on higher-end ARM64 processors as well, though
such processors tend to have the Crypto Extensions which make AES
preferred. For example, here are the same benchmarks run on a HiKey960
(with CPU affinity set for the A73 cores), with the Crypto Extensions
implementation of AES-256-XTS added:
Algorithm Encryption Decryption
--------- ----------- -----------
AES-256-XTS (Crypto Extensions) 1273.3 MB/s 1274.7 MB/s
Speck64/128-XTS (NEON) 359.8 MB/s 348.0 MB/s
Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
Speck128/256-XTS (generic) 186.3 MB/s 181.8 MB/s
AES-128-XTS (NEON bit-sliced) 142.0 MB/s 124.3 MB/s
AES-256-XTS (NEON bit-sliced) 104.7 MB/s 91.1 MB/s
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Any change to the result buffer should only happen on final, finup
and digest operations. Changes to the buffer for update, import, export,
etc, are not allowed.
Fixes: 66d7b9f6175e ("crypto: testmgr - test misuse of result in ahash")
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Apparently the ecdh use case was in bluetooth which always has single
element scatterlists, so the ecdh module was hard coded to expect
them. Now we're using this in TPM, we need multi-element
scatterlists, so remove this limitation.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
TPM security routines require encryption and decryption with AES in
CFB mode, so add it to the Linux Crypto schemes. CFB is basically a
one time pad where the pad is generated initially from the encrypted
IV and then subsequently from the encrypted previous block of
ciphertext. The pad is XOR'd into the plain text to get the final
ciphertext.
https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CFB
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Improve the code (safety and readability) by indicating that data passed
through pointer is not modified. This adds const keyword in many places,
most notably:
- the driver data (pointer to struct samsung_aes_variant),
- scatterlist addresses written as value to device registers,
- key and IV arrays.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ahash_request 'req' argument passed by the caller
s5p_hash_handle_queue() cannot be NULL here because it is obtained from
non-NULL pointer via container_of().
This fixes smatch warning:
drivers/crypto/s5p-sss.c:1213 s5p_hash_prepare_request() warn: variable dereferenced before check 'req' (see line 1208)
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 8043bb1ae0 ("crypto: omap-sham - convert driver logic to use
sgs for data xmit") removed the if() clause leaving the statement as is.
The intention was in that case to finish the request always so the goto
instruction seems sensible.
Remove the indentation to fix Smatch warning:
drivers/crypto/omap-sham.c:1761 omap_sham_done_task() warn: inconsistent indenting
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ahash_request 'req' argument passed by the caller
omap_sham_handle_queue() cannot be NULL here because it is obtained from
non-NULL pointer via container_of().
This fixes smatch warning:
drivers/crypto/omap-sham.c:812 omap_sham_prepare_request() warn: variable dereferenced before check 'req' (see line 805)
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Inline IPSec driver does not offload csum.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On Armada 7K/8K we need to explicitly enable the register clock. This
clock is optional because not all the SoCs using this IP need it but at
least for Armada 7K/8K it is actually mandatory.
The binding documentation is updating accordingly.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver queue size can now be configured from userspace. This
allows optimizing the queue usage based on use case. Default queue
size is still 10 entries.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver fallback size can now be configured from userspace. This
allows optimizing the DMA usage based on use case. Detault fallback
size of 200 is still used.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver queue size can now be configured from userspace. This
allows optimizing the queue usage based on use case. Default queue
size is still 10 entries.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver fallback size can now be configured from userspace. This
allows optimizing the DMA usage based on use case. Default fallback
size of 256 is still used.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In certain platforms like DRA7xx having memory > 2GB with LPAE enabled
has a constraint that DMA can be done with the initial 2GB and marks it
as ZONE_DMA. But openssl when used with cryptodev does not make sure that
input buffer is DMA capable. So, adding a check to verify if the input
buffer is capable of DMA.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In certain platforms like DRA7xx having memory > 2GB with LPAE enabled
has a constraint that DMA can be done with the initial 2GB and marks it
as ZONE_DMA. But openssl when used with cryptodev does not make sure that
input buffer is DMA capable. So, adding a check to verify if the input
buffer is capable of DMA.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
req_ctx->hw_context is mainly used only by the HW. So it is not needed
to sync the HW and the CPU each time hw_context in DMA mapped.
This patch modifies the DMA mapping in order to limit synchronisation
to necessary situations.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>