The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the offset for unaligned sg lists is not handled properly
leading into wrong results with certain testmgr self tests. Fix the
handling to account for proper offset within the current sg list.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The updated crypto manager finds a couple of new bugs from the omap-sham
driver. Basically the split update cases fail to calculate the amount of
data to be sent properly, leading into failed results and hangs with the
hw accelerator.
To fix these, the buffer handling needs to be fixed, but we do some cleanup
for the code at the same time to cut away some unnecessary code so that
it is easier to fix.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Current buffer handling logic fails in a case where the buffer contains
existing data from previous update which is divisible by block size.
This results in a block size of data to be left missing from the sg
list going out to the hw accelerator, ending up in stalling the
crypto accelerator driver (the last request never completes fully due
to missing data.)
Fix this by passing the total size of the data instead of the data size
of current request, and also parsing the buffer contents within the
prepare request handling.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver removal should also cleanup the created sysfs group. If not,
the driver fails the subsequent probe as the files exist already.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When using huge data amount, allocating free pages fails as the kernel
isn't able to process get_free_page requests larger than MAX_ORDER.
Also, the DMA subsystem has an inherent limitation that data size
larger than some 2MB can't be handled properly. In these cases,
split up the data instead to smaller requests so that the kernel
can allocate the data, and also so that the DMA driver can handle
the separate SG elements.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Tested-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We don't need dev_err() messages when platform_get_irq() fails now that
platform_get_irq() prints an error message itself when something goes
wrong. Let's remove these prints with a simple semantic patch.
// <smpl>
@@
expression ret;
struct platform_device *E;
@@
ret =
(
platform_get_irq(E, ...)
|
platform_get_irq_byname(E, ...)
);
if ( \( ret < 0 \| ret <= 0 \) )
{
(
-if (ret != -EPROBE_DEFER)
-{ ...
-dev_err(...);
-... }
|
...
-dev_err(...);
)
...
}
// </smpl>
While we're here, remove braces on if statements that only have one
statement (manually).
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <linux-crypto@vger.kernel.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.
With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.
Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.
Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.
So, remove the useless assignment from all the ahash algorithms.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 8043bb1ae0 ("crypto: omap-sham - convert driver logic to use sgs for data xmit")
The memory pages freed in omap_sham_finish_req() were less than those
allocated in omap_sham_copy_sgs().
Cc: stable@vger.kernel.org
Signed-off-by: Bin Liu <b-liu@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 8043bb1ae0 ("crypto: omap-sham - convert driver logic to use
sgs for data xmit") removed the if() clause leaving the statement as is.
The intention was in that case to finish the request always so the goto
instruction seems sensible.
Remove the indentation to fix Smatch warning:
drivers/crypto/omap-sham.c:1761 omap_sham_done_task() warn: inconsistent indenting
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ahash_request 'req' argument passed by the caller
omap_sham_handle_queue() cannot be NULL here because it is obtained from
non-NULL pointer via container_of().
This fixes smatch warning:
drivers/crypto/omap-sham.c:812 omap_sham_prepare_request() warn: variable dereferenced before check 'req' (see line 805)
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver queue size can now be configured from userspace. This
allows optimizing the queue usage based on use case. Default queue
size is still 10 entries.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver fallback size can now be configured from userspace. This
allows optimizing the DMA usage based on use case. Default fallback
size of 256 is still used.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In certain platforms like DRA7xx having memory > 2GB with LPAE enabled
has a constraint that DMA can be done with the initial 2GB and marks it
as ZONE_DMA. But openssl when used with cryptodev does not make sure that
input buffer is DMA capable. So, adding a check to verify if the input
buffer is capable of DMA.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable dd. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a difference in the object file size.
This log is the output of the size command, before and after the code
change:
before:
text data bss dec hex filename
26135 11944 128 38207 953f drivers/crypto/omap-sham.o
after:
text data bss dec hex filename
26084 11856 64 38004 9474 drivers/crypto/omap-sham.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This was previously missed from the code, causing SDMA to hang in
some cases where the buffer ended up being not aligned.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently there is an interesting corner case failure with omap-sham
driver, if the finalize call is done separately with no data, but
all previous data has already been processed. In this case, it is not
possible to close the hash with the hardware without providing any data,
so we get incorrect results. Fix this by adjusting the size of data
sent to the hardware crypto engine in case the non-final data size falls
on the block size boundary, by reducing the amount of data sent by one
full block. This makes it sure that we always have some data available
for the finalize call and we can close the hash properly.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the hash later code only handles the cases when we have
either new data coming in with the request or old data in the buffer,
but not the combination when we have both. Fix this by changing the
ordering of the code a bit and handling both cases properly
simultaneously if needed. Also, fix an issue with omap_sham_update
that surfaces with this fix, so that the code checks the bufcnt
instead of total data amount against buffer length to avoid any
buffer overflows.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch simply replace all occurrence of HMAC IPAD/OPAD value by their
define.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current internal buffer size is way too large for crypto core, so
shrink it to be smaller. This makes the buffer to fit into the space
reserved for the export/import buffers also.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the driver has been converted to use scatterlists for data
handling, add proper implementation for the export/import stubs also.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the internal buffer has been used for data transmission. Change
this so that scatterlists are used instead, and change the driver to
actually use the previously introduced helper functions for scatterlist
preparation.
This patch also removes the old buffer handling code which is no longer
needed.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the threshold value was hardcoded in the driver. Having a define
for it makes it easier to configure.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently omap-sham uses a huge internal buffer for caching data, and
pushing this out to the DMA as large chunks. This, unfortunately,
doesn't work too well with the export/import functionality required
for ahash algorithms, and must be changed towards more scatterlist
centric approach.
This patch adds support functions for (mostly) scatterlist based data
handling. omap_sham_prepare_request() prepares a scatterlist for DMA
transfer to SHA crypto accelerator. This requires checking the data /
offset / length alignment of the data, splitting the data to SHA block
size granularity, and adding any remaining data back to the buffer.
With this patch, the code doesn't actually go live yet, the support code
will be taken properly into use with additional patches that modify the
SHA driver functionality itself.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current usage of sgl will be deprecated, and will be replaced by an
array required by the sg based driver implementation. Rename the existing
variable as sgl_tmp so that it can be removed from the driver easily later.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
OMAP HW generally expects data for DMA to be on word boundary, so make the
SHA driver inform crypto framework of the same preference.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Initially these just return -ENOTSUPP to indicate that they don't
really do anything yet. Some sort of implementation is required
for the driver to at least probe.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If software fallback is used on older hardware accelerator setup (OMAP2/
OMAP3), the first block of data must be purged from the buffer. The
first block contains the pre-generated ipad value required by the HW,
but the software fallback algorithm generates its own, causing wrong
results.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If we have processed any data with the hardware accelerator (digcnt > 0),
we must complete the entire hash by using it. This is because the current
hash value can't be imported to the software fallback algorithm. Otherwise
we end up with wrong hash results.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some of the call paths of OMAP SHA driver can avoid executing the next
step of the crypto queue under tasklet; instead, execute the next step
directly via function call. This avoids a costly round-trip via the
scheduler giving a slight performance boost.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm-neon-sha implementations have cra_priority of 150...300, so
increase omap-sham priority to 400 to ensure it is on top of any
software alg.
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adds software fallback support for small crypto requests. In these cases,
it is undesirable to use DMA, as setting it up itself is rather heavy
operation. Gives about 40% extra performance in ipsec usecase.
Signed-off-by: Bin Liu <b-liu@ti.com>
[t-kristo@ti.com: dropped the extra traces, updated some comments
on the code]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The extra call to dmaengine_terminate_all is not needed, as the DMA
is not running at this point. This improves performance slightly.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Change crypto queue size from 1 to 10 for omap SHA driver. This should
allow clients to enqueue requests more effectively to avoid serializing
whole crypto sequences, giving extra performance.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API for every block causes serious performance hit to
crypto operations that are done on a long buffer. As crypto is performed
on a page boundary, encrypting large buffers can cause a series of crypto
operations divided by page. The runtime PM API is also called those many
times.
Convert the driver to use runtime_pm autosuspend instead, with a default
timeout value of 1 second. This results in upto ~50% speedup.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This if statement is reversed so we end up either leaking or Oopsing on
error.
Fixes: dbe246209b ('crypto: omap-sham - Use dma_request_chan() for requesting DMA channel')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the new dma_request_chan() the client driver does not need to look for
the DMA resource and it does not need to pass filter_fn anymore.
By switching to the new API the driver can now support deferred probing
against DMA.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: David S. Miller <davem@davemloft.net>
CC: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[hch: split from a larger patch by Dan]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jens Axboe <axboe@fb.com>
omap3 support is same as omap2, just with different IO address (specified in DT)
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Function pm_runtime_get_sync could fail and we need to check return
value to prevent kernel crash.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kmap_atomic() gives only the page address of the input page.
Driver should take care of adding the offset of the scatterlist
within the page to the returned page address.
omap-sham driver is not adding the offset to page and directly operates
on the return vale of kmap_atomic(), because of which the following
error comes when running crypto tests:
00000000: d9 a1 1b 7c aa 90 3b aa 11 ab cb 25 00 b8 ac bf
[ 2.338169] 00000010: c1 39 cd ff 48 d0 a8 e2 2b fa 33 a1
[ 2.344008] alg: hash: Chunking test 1 failed for omap-sha256
So adding the scatterlist offset to vaddr.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
omap_sham_handle_queue() can be called as part of done_task tasklet.
During this its atomic and any calls to pm functions cannot sleep.
But there is a call to pm_runtime_get_sync() (which can sleep) in
omap_sham_handle_queue(), because of which the following appears:
" [ 116.169969] BUG: scheduling while atomic: kworker/0:2/2676/0x00000100"
Add pm_runtime_irq_safe() to avoid this.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
compliant equivalent. This patch allocates the appropriate amount of memory
using a char array using the SHASH_DESC_ON_STACK macro.
The new code can be compiled with both gcc and clang.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
HIGHMEM pages may not be mapped so we must kmap them before accessing.
This resolves a random OOPs error that was showing up during OpenSSL SHA tests.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Command "tcrypt sec=1 mode=403" give the follwoing error for Polling
mode:
root@am335x-evm:/# insmod tcrypt.ko sec=1 mode=403
[...]
[ 346.982754] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 4352 opers/sec, 17825792 bytes/sec
[ 347.992661] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 7095 opers/sec, 29061120 bytes/sec
[ 349.002667] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates):
[ 349.010882] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 349.020037] pgd = ddeac000
[ 349.022884] [00000000] *pgd=9dcb4831, *pte=00000000, *ppte=00000000
[ 349.029816] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[ 349.035482] Modules linked in: tcrypt(+)
[ 349.039617] CPU: 0 PID: 1473 Comm: insmod Not tainted 3.12.4-01566-g6279006-dirty #38
[ 349.047832] task: dda91540 ti: ddcd2000 task.ti: ddcd2000
[ 349.053517] PC is at omap_sham_xmit_dma+0x6c/0x238
[ 349.058544] LR is at omap_sham_xmit_dma+0x38/0x238
[ 349.063570] pc : [<c04eb7cc>] lr : [<c04eb798>] psr: 20000013
[ 349.063570] sp : ddcd3c78 ip : 00000000 fp : 9d8980b8
[ 349.075610] r10: 00000000 r9 : 00000000 r8 : 00000000
[ 349.081090] r7 : 00001000 r6 : dd898000 r5 : 00000040 r4 : ddb10550
[ 349.087935] r3 : 00000004 r2 : 00000010 r1 : 53100080 r0 : 00000000
[ 349.094783] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 349.102268] Control: 10c5387d Table: 9deac019 DAC: 00000015
[ 349.108294] Process insmod (pid: 1473, stack limit = 0xddcd2248)
[...]
This is because polling_mode is not enabled for ctx without FLAGS_FINUP.
For polling mode the bufcnt is made 0 unconditionally. But it should be made 0
only if it is a final update or a total is not zero(This condition is similar
to what is done in DMA case). Because of this wrong hashes are produced.
Fixing the same.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>