The way Job Ring platform devices are created and released does not
allow for multiple create-release cycles.
JR0 Platform device creation error
JR0 Platform device creation error
caam 2100000.caam: no queues configured, terminating
caam: probe of 2100000.caam failed with error -12
The reason is that platform devices are created for each job ring:
for_each_available_child_of_node(nprop, np)
if (of_device_is_compatible(np, "fsl,sec-v4.0-job-ring") ||
of_device_is_compatible(np, "fsl,sec4.0-job-ring")) {
ctrlpriv->jrpdev[ring] =
of_platform_device_create(np, NULL, dev);
which sets OF_POPULATED on the device node, but then it cleans these up:
/* Remove platform devices for JobRs */
for (ring = 0; ring < ctrlpriv->total_jobrs; ring++) {
if (ctrlpriv->jrpdev[ring])
of_device_unregister(ctrlpriv->jrpdev[ring]);
}
which leaves OF_POPULATED set.
Use of_platform_populate / of_platform_depopulate instead.
This allows for a bit of driver clean-up, jrpdev is no longer needed.
Logic changes a bit too:
-exit in case of_platform_populate fails, since currently even QI backend
depends on JR; true, we no longer support the case when "some" of the JR
DT nodes are incorrect
-when cleaning up, caam_remove() would also depopulate RTIC in case
it would have been populated somewhere else - not the case for now
Cc: <stable@vger.kernel.org>
Fixes: 313ea293e9 ("crypto: caam - Add Platform driver for Job Ring")
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CAAM engine supports two interfaces for crypto job submission:
-job ring interface - already existing caam/jr driver
-Queue Interface (QI) - caam/qi driver added in current patch
QI is present in CAAM engines found on DPAA platforms.
QI gets its I/O (frame descriptors) from QMan (Queue Manager) queues.
This patch adds a platform device for accessing CAAM's queue interface.
The requests are submitted to CAAM using one frame queue per
cryptographic context. Each crypto context has one shared descriptor.
This shared descriptor is attached to frame queue associated with
corresponding driver context using context_a.
The driver hides the mechanics of FQ creation, initialisation from its
applications. Each cryptographic context needs to be associated with
driver context which houses the FQ to be used to transport the job to
CAAM. The driver provides API for:
(a) Context creation
(b) Job submission
(c) Context deletion
(d) Congestion indication - whether path to/from CAAM is congested
The driver supports affining its context to a particular CPU.
This means that any responses from CAAM for the context in question
would arrive at the given CPU. This helps in implementing one CPU
per packet round trip in IPsec application.
The driver processes CAAM responses under NAPI contexts.
NAPI contexts are instantiated only on cores with affined portals since
only cores having their own portal can receive responses from DQRR.
The responses from CAAM for all cryptographic contexts ride on a fixed
set of FQs. We use one response FQ per portal owning core. The response
FQ is configured in each core's and thus portal's dedicated channel.
This gives the flexibility to direct CAAM's responses for a crypto
context on a given core.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit 66d2e20280.
Quoting from Russell's findings:
https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html
[quote]
Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines. I've decided to
use this way to measure the performance:
dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5
For the threaded IRQs case gives:
0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
=> 5.36s => 25.0MB/s
and the tasklet case:
0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
=> 4.95 => 27.1MB/s
which corresponds to an 8% slowdown for the threaded IRQ case. So,
tasklets are indeed faster than threaded IRQs.
[...]
I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.
[/quote]
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Threaded interrupts can perform the function of the tasklet, and much
more safely too - without races when trying to take the tasklet and
interrupt down on device removal.
With the old code, there is a window where we call tasklet_kill(). If
the interrupt handler happens to be running on a different CPU, and
subsequently calls tasklet_schedule(), the tasklet will be re-scheduled
for execution.
Switching to a hardirq/threadirq combination implementation avoids this,
and it also means generic code deals with the teardown sequencing of the
threaded and non-threaded parts.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ARM-based systems may disable clocking to the CAAM device on the
Freescale i.MX platform for power management purposes. This patch
enables the required clocks when the CAAM module is initialized and
disables the required clocks when the CAAM module is shut down.
Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com>
Tested-by: Horia Geantă <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CAAM's memory is broken into following address blocks:
Block Included Registers
0 General Registers
1-4 Job ring registers
6 RTIC registers
7 QI registers
8 DECO and CCB
Size of the above stated blocks varies in various platforms. The block size can be 4K or 64K.
The block size can be dynamically determined by reading CTPR register in CAAM.
This patch initializes the block addresses dynamically based on the value read from this register.
Signed-off-by: Ruchika Gupta <r66431@freescale.com>
Signed-off-by: Nitesh Narayan Lal <b44382@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For platforms with virtualization enabled
1. The job ring registers can be written to only is the job ring has been
started i.e STARTR bit in JRSTART register is 1
2. For DECO's under direct software control, with virtualization enabled
PL, BMT, ICID and SDID values need to be provided. These are provided by
selecting a Job ring in start mode whose parameters would be used for the
DECO access programming.
Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- Earlier interface layers - caamalg, caamhash, caamrng were
directly using the Controller driver private structure to access
the Job ring.
- Changed the above to use alloc/free API's provided by Job Ring Drive
Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Reviewed-by: Garg Vakul-B16394 <vakul@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With each of the Job Ring available as a platform device, the
Job Ring driver needs to take care of allocation/deallocation
of the Job Rings to the above interface layers. Added APIs
in Job Ring Driver to allocate/free Job rings
Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Reviewed-by: Garg Vakul-B16394 <vakul@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The SEC Job Rings are now available as individual devices.
This would enable sharing of job rings between kernel and
user space. Job Rings can now be dynamically bound/unbound
from kernel.
Changes are made in the following layers of CAAM Driver
1. Controller driver
- Does basic initialization of CAAM Block.
- Creates platform devices for Job Rings.
(Earlier the initialization of Job ring was done
by the controller driver)
2. JobRing Platform driver
- Manages the platform Job Ring devices created
by the controller driver
Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Reviewed-by: Garg Vakul-B16394 <vakul@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
RNG4 block contains multiple (i.e. 2) state handles that can be
initialized. This patch adds the necessary code for detecting
which of the two state handles has been instantiated by another
piece of software e.g. u-boot and instantiate the other one (or
both if none was instantiated). Only the state handle(s)
instantiated by this driver will be deinstantiated when removing
the module.
Signed-off-by: Alex Porosanu <alexandru.porosanu@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the caam driver module instantiates the RNG state handle 0, then
upon the removal of the module, the RNG state handle is left
initialized. This patch takes care of reverting the state of the
handle back to its previous uninstantatied state.
Signed-off-by: Alex Porosanu <alexandru.porosanu@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
there is no noticeable benefit for multiple cores to process one
job ring's output ring: in fact, we can benefit from cache effects
of having the back-half stay on the core that receives a particular
ring's interrupts, and further relax general contention and the
locking involved with reading outring_used, since tasklets run
atomically.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
caam supports ahash hmac with sha algorithms and md5.
Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
remove caam_jr_register and caam_jr_deregister
to allow sharing of job rings.
Signed-off-by: Yuan Kang <Yuan.Kang@freescale.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The SEC4 supercedes the SEC2.x/3.x as Freescale's
Integrated Security Engine. Its programming model is
incompatible with all prior versions of the SEC (talitos).
The SEC4 is also known as the Cryptographic Accelerator
and Assurance Module (CAAM); this driver is named caam.
This initial submission does not include support for Data Path
mode operation - AEAD descriptors are submitted via the job
ring interface, while the Queue Interface (QI) is enabled
for use by others. Only AEAD algorithms are implemented
at this time, for use with IPsec.
Many thanks to the Freescale STC team for their contributions
to this driver.
Signed-off-by: Steve Cornelius <sec@pobox.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>