Create a new function attribute __optimize, which allows to specify an
optimization level on a per-function basis.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As reported by kbuild test robot, the optimized SHA3 C implementation
compiles to mn10300 code that uses a disproportionate amount of stack
space, i.e.,
crypto/sha3_generic.c: In function 'keccakf':
crypto/sha3_generic.c:147:1: warning: the frame size of 1232 bytes is larger than 1024 bytes [-Wframe-larger-than=]
As kindly diagnosed by Arnd, this does not only occur when building for
the mn10300 architecture (which is what the report was about) but also
for h8300, and builds for other 32-bit architectures show an increase in
stack space utilization as well.
Given that SHA3 operates on 64-bit quantities, and keeps a state matrix
of 25 64-bit words, it is not surprising that 32-bit architectures with
few general purpose registers are impacted the most by this, and it is
therefore reasonable to implement a workaround that distinguishes between
32-bit and 64-bit architectures.
Arnd figured out that taking the round calculation out of the loop, and
inlining it explicitly but only on 64-bit architectures preserves most
of the performance gain achieved by the rewrite, and also gets rid of
the excessive use of stack space.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The SHA-512 multibuffer code keeps track of the number of blocks pending
in each lane. The minimum of these values is used to identify the next
lane that will be completed. Unused lanes are set to a large number
(0xFFFFFFFF) so that they don't affect this calculation.
However, it was forgotten to set the lengths to this value in the
initial state, where all lanes are unused. As a result it was possible
for sha512_mb_mgr_get_comp_job_avx2() to select an unused lane, causing
a NULL pointer dereference. Specifically this could happen in the case
where ->update() was passed fewer than SHA512_BLOCK_SIZE bytes of data,
so it then called sha_complete_job() without having actually submitted
any blocks to the multi-buffer code. This hit a NULL pointer
dereference if another task happened to have submitted blocks
concurrently to the same CPU and the flush timer had not yet expired.
Fix this by initializing sha512_mb_mgr->lens correctly.
As usual, this bug was found by syzkaller.
Fixes: 45691e2d9b ("crypto: sha512-mb - submit/flush routines for AVX2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If clk_get() fails, device_remove_file() looks inappropriate.
The error path, where all crypto_register fail, misses resource
deallocations.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a missing symbol export that prevents this code to be built as a
module. Also, move the round constant table to the .rodata section,
and use a more optimized version of the core transform.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement the Chinese SM3 secure hash algorithm using the new
special instructions that have been introduced as an optional
extension in ARMv8.2.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement the various flavours of SHA3 using the new optional
EOR3/RAX1/XAR/BCAX instructions introduced by ARMv8.2.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All current SHA3 test cases are smaller than the SHA3 block size, which
means not all code paths are being exercised. So add a new test case to
each variant, and make one of the existing test cases chunked.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To allow accelerated implementations to fall back to the generic
routines, e.g., in contexts where a SIMD based implementation is
not allowed to run, expose the generic SHA3 init/update/final
routines to other modules.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In preparation of exposing the generic SHA3 implementation to other
versions as a fallback, simplify the code, and remove an inconsistency
in the output handling (endian swabbing rsizw words of state before
writing the output does not make sense)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The way the KECCAK transform is currently coded involves many references
into the state array using indexes that are calculated at runtime using
simple but non-trivial arithmetic. This forces the compiler to treat the
state matrix as an array in memory rather than keep it in registers,
which results in poor performance.
So instead, let's rephrase the algorithm using fixed array indexes only.
This helps the compiler keep the state matrix in registers, resulting
in the following speedup (SHA3-256 performance in cycles per byte):
before after speedup
Intel Core i7 @ 2.0 GHz (2.9 turbo) 100.6 35.7 2.8x
Cortex-A57 @ 2.0 GHz (64-bit mode) 101.6 12.7 8.0x
Cortex-A53 @ 1.0 GHz 224.4 15.8 14.2x
Cortex-A57 @ 2.0 GHz (32-bit mode) 201.8 63.0 3.2x
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ensure that the input is byte swabbed before injecting it into the
SHA3 transform. Use the get_unaligned() accessor for this so that
we don't perform unaligned access inadvertently on architectures
that do not support that.
Cc: <stable@vger.kernel.org>
Fixes: 53964b9ee6 ("crypto: sha3 - Add SHA-3 hash algorithm")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GCM can be invoked with a zero destination buffer. This is possible if
the AAD and the ciphertext have zero lengths and only the tag exists in
the source buffer (i.e. a source buffer cannot be zero). In this case,
the GCM cipher only performs the authentication and no decryption
operation.
When the destination buffer has zero length, it is possible that no page
is mapped to the SG pointing to the destination. In this case,
sg_page(req->dst) is an invalid access. Therefore, page accesses should
only be allowed if the req->dst->length is non-zero which is the
indicator that a page must exist.
This fixes a crash that can be triggered by user space via AF_ALG.
CC: <stable@vger.kernel.org>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since CRYPTO_SHA384 does not exists, Kconfig should not select it.
Anyway, all SHA384 stuff is in CRYPTO_SHA512 which is already selected.
Fixes: a21eb94fc4d3i ("crypto: axis - add ARTPEC-6/7 crypto accelerator driver")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
devm_ioremap_resource() already checks if the resource is NULL, so
remove the unnecessary platform_get_resource() error check.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Async hash operations can use result pointer in final/finup/digest,
but not in init/update/export/import, so test it for misuse.
Signed-off-by: Kamil Konieczny <k.konieczny@partner.samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The function safexcel_try_push_requests is local to the source and does
not need to be in global scope, so make it static.
Cleans up sparse warning:
symbol 'safexcel_try_push_requests' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
[Antoine: fixed alignment]
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
My last bugfix added -Os on the command line, which unfortunately caused
a build regression on powerpc in some configurations.
I've done some more analysis of the original problem and found slightly
different workaround that avoids this regression and also results in
better performance on gcc-7.0: -fcode-hoisting is an optimization step
that got added in gcc-7 and that for all gcc-7 versions causes worse
performance.
This disables -fcode-hoisting on all compilers that understand the option.
For gcc-7.1 and 7.2 I found the same performance as my previous patch
(using -Os), in gcc-7.0 it was even better. On gcc-8 I could see no
change in performance from this patch. In theory, code hoisting should
not be able make things better for the AES cipher, so leaving it
disabled for gcc-8 only serves to simplify the Makefile change.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Link: https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg30418.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Fixes: 148b974dee ("crypto: aes-generic - build with -Os on gcc-7+")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Load the four SHA-1 round constants using immediates rather than literal
pool entries, to avoid having executable data that may be exploitable
under speculation attacks.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the SHA2 round constant table to the .rodata section where it is
safe from being exploited by speculative execution.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the CRC-T10DIF literal data to the .rodata section where it is
safe from being exploited by speculative execution.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move CRC32 literal data to the .rodata section where it is safe from
being exploited by speculative execution.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the S-boxes and some other literals to the .rodata section where
it is safe from being exploited by speculative execution.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the AES inverse S-box to the .rodata section where it is safe from
abuse by speculation.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the SIMPLE_DEV_PM_OPS() macro instead of populating a struct
dev_pm_ops directly. The suspend and resume functions will now be used
for both hibernation and suspend to ram.
If power management is disabled, SIMPLE_DEV_PM_OPS() evaluates to
nothing, The two functions won't be used and won't be included in the
kernel. Mark them as __maybe_unused to clarify that this is intended
behaviour.
With these modifications in place, we don't need the #ifdefs for power
management any more.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
sg_nents_xlen will take care of zero length sg list.
Remove Destination sg list size zero check.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add ctr and sha combination of algo in authenc mode.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Skip decrypt operation on IV received from HW for last request.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add warning message if sg is NULL after skipping bytes.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
phys_to_dma() is an internal helper for certain DMA API implementations,
and is not appropriate for drivers to use. It appears that what the CESA
driver really wants to be using is dma_map_resource() - admittedly that
didn't exist when the offending code was first merged, but it does now.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
"val" needs to be signed for the error handling to work.
Fixes: 6cd225cc5d ("hwrng: exynos - add Samsung Exynos True RNG driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When hw_random device's quality is non-zero, it will automatically fill
the kernel's entropy pool at boot. For the purpose, one conservative
quality value is being picked up as the default value.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement the SHA-512 using the new special instructions that have
been introduced as an optional extension in ARMv8.2.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace GPL license statement with SPDX GPL-2.0 license identifier and
correct the module license to GPLv2.
The license itself was a generic GPL because of copy-and-paste from old
drivers/char/hw_random/exynos-rng.c driver (on which this was based on).
However the module license indicated GPL-2.0 or later. GPL-2.0 was
intended by author so fix up this mess.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the SafeXcel EIP97 compatible to the Inside Secure
device tree bindings documentation.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
SPHINX build emits multiple warnings of kind:
warning: duplicate section name 'Note'
(when building kernel via make target 'htmldocs')
This is caused by repeated use of comments of form:
* Note: soau soaeusoa uoe
We can change the format without loss of clarity and clear the build
warnings.
Add '**[mandatory]**' or '**[optional]**' as kernel-doc field element
description prefix
This renders in HTML as (prefixes in bold)
final
[mandatory] Retrieve result from the driver. This function finalizes the
transformation and retrieves the resulting hash from the driver and
pushes it back to upper layers. No data processing happens at this
point unless hardware requires it to finish the transformation (then
the data buffered by the device driver is processed).
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert salsa20-asm from the deprecated "blkcipher" API to the
"skcipher" API, in the process fixing it up to use the generic helpers.
This allows removing the salsa20_keysetup() and salsa20_ivsetup()
assembly functions, which aren't performance critical; the C versions do
just fine.
This also fixes the same bug that salsa20-generic had, where the state
array was being maintained directly in the transform context rather than
on the stack or in the request context. Thus, if multiple threads used
the same Salsa20 transform concurrently they produced the wrong results.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Export the Salsa20 constants, transform context, and initialization
functions so that they can be reused by the x86 implementation.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert salsa20-generic from the deprecated "blkcipher" API to the
"skcipher" API, in the process fixing it up to be thread-safe (as the
crypto API expects) by maintaining each request's state separately from
the transform context.
Also remove the unnecessary cra_alignmask and tighten validation of the
key size by accepting only 16 or 32 bytes, not anything in between.
These changes bring the code close to the way chacha20-generic does
things, so hopefully it will be easier to maintain in the future.
However, the way Salsa20 interprets the IV is still slightly different;
that was not changed.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While testing other changes, I discovered that gcc-7.2.1 produces badly
optimized code for aes_encrypt/aes_decrypt. This is especially true when
CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
large stack usage that in turn might cause kernel stack overflows:
crypto/aes_generic.c: In function 'aes_encrypt':
crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
crypto/aes_generic.c: In function 'aes_decrypt':
crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
I verified that this problem exists on all architectures that are
supported by gcc-7.2, though arm64 in particular is less affected than
the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
stack usage but still produce worse code than earlier versions for this
file, apparently because of optimization passes that generally provide
a substantial improvement in object code quality but understandably fail
to find any shortcuts in the AES algorithm.
Possible workarounds include
a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier
patch I tried, which reliably fixed the stack usage, but caused a
serious performance regression in some versions, as later testing
found.
b) disabling UBSAN on this file or all ciphers, as suggested by Ard
Biesheuvel. This would lead to massively better crypto performance in
UBSAN-enabled kernels and avoid the stack usage, but there is a concern
over whether we should exclude arbitrary files from UBSAN at all.
c) Forcing the optimization level in a different way. Similar to a),
but rather than deselecting specific optimization stages,
this now uses "gcc -Os" for this file, regardless of the
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable
workaround for the stack consumption on all architecture, and I've
retested the performance results now on x86, cycles/byte (lower is
better) for cbc(aes-generic) with 256 bit keys:
-O2 -Os
gcc-6.3.1 14.9 15.1
gcc-7.0.1 14.7 15.3
gcc-7.1.1 15.3 14.7
gcc-7.2.1 16.8 15.9
gcc-8.0.0 15.5 15.6
This implements the option c) by enabling forcing -Os on all compiler
versions starting with gcc-7.1. As a workaround for PR83356, it would
only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows
better performance on gcc-7.1 without UBSAN, it seems appropriate to
use the faster version here as well.
Side note: during testing, I also played with the AES code in libressl,
which had a similar performance regression from gcc-6 to gcc-7.2,
but was three times slower overall. It might be interesting to
investigate that further and possibly port the Linux implementation
into that.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Cc: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@gcc.gnu.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Similar to what was done for the hash API, update the AEAD API to track
whether each transform has been keyed, and reject encryption/decryption
if a key is needed but one hasn't been set.
This isn't quite as important as the equivalent fix for the hash API
because AEADs always require a key, so are unlikely to be used without
one. Still, tracking the key will prevent accidental unkeyed use.
algif_aead also had to track the key anyway, so the new flag replaces
that and slightly simplifies the algif_aead implementation.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>