The minimum width and height for VPE input/output was kept as 128 pixels. VPE
doesn't have a constraint on the image height, it requires the image width to
be at least 16 bytes.
Change the minimum supported dimensions to 32x32. This allows us to de-interlace
qcif content. A smaller image size than 32x32 didn't make much sense, so stopped
at this.
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
The video_device struct is currently embedded in the driver data struct vpe_dev.
A vpe_dev instance is allocated by the driver, and the memory for the vfd is a
part of this struct.
The v4l2 core, however, manages the removal of the vfd region, through the
video_device's .release() op, which currently is the helper
video_device_release. This causes memory corruption, and leads to issues when
we try to re-insert the vpe module.
Use the video_device_release_empty helper function instead.
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
VPE has a ctrl parameter which decides how many mem to mem transactions the
active job from the job queue can perform.
The driver's job_ready() made sure that the number of ready source buffers are
sufficient for the job to execute successfully. But it didn't make sure if
there are sufficient ready destination buffers in the capture queue for the
VPE output.
If the time taken by VPE to process a single frame is really slow, then it's
possible that we don't need to imply such a restriction on the dst queue, but
really fast transactions(small resolution, no de-interlacing) may cause us to
hit the condition where we don't have any free buffers for the VPE to write on.
Add the extra check in job_ready() to make sure we have the sufficient amount
of destination buffers.
Acked-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Copy the flags containing the timestamp source from source buffer flags to
the destination buffer flags on memory-to-memory devices. This is analogous
to copying the timestamp field from source to destination.
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Acked-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
The timestamp_type field used to contain only the timestamp type. Soon it
will be used for timestamp source flags as well. Rename the field
accordingly.
[m.chehab@samsung.com: do the change also to drivers/staging/media and at s2255]
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Use the csc library functions to configure the CSC block in VPE.
Some changes are required in try_fmt to handle the pix->colorspace parameter
more correctly. Previously, we copied the source queue colorspace to the
destination queue colorspace as we didn't support RGB formats. Now, we configure
pix->colorspace based on the color format set(and the height of the image if
it's a YUV format).
Add basic RGB color formats to the list of supported vpe formats.
If the destination format is RGB colorspace, we also need to use the RGB output
port instead of the Luma and Chroma output ports. This requires configuring the
output data descriptors differently.
Also, make the default colorspace V4L2_COLORSPACE_SMPTE170M as that resembles
the Standard Definition colorspace more closely.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
VPE and VIP IPs in DAR7x contain a color space converter(CSC) sub block. Create
a library which will perform CSC related configurations and hold CSC register
definitions. The functions provided by this library will be called by the vpe
and vip drivers using a csc_data handle.
The vpe_dev holds the csc_data handle. The handle represents an instance of the
CSC hardware, and the vpe driver uses it to access the CSC register offsets or
helper functions to configure these registers.
The CSC register offsets are now relative to the CSC block itself, so we need
to use the macro GET_OFFSET_TOP to get the CSC register offset relative to the
VPE IP in the vpe driver.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Add the required SC register configurations which lets us perform linear scaling
for the supported range of horizontal and vertical scaling ratios.
The horizontal scaler performs polyphase scaling using it's 8 tap 32 phase
filter, decimation is performed when downscaling passes beyond 2x or 4x.
The vertical scaler performs polyphase scaling using it's 5 tap 32 phase filter,
it switches to a simpler form of scaling using the running average filter when
the downscale ratio is more than 4x.
Many of the SC features like peaking, trimming and non-linear scaling aren't
implemented for now. Only the minimal register fields required for basic scaling
operation are configured.
The function to configure SC registers takes the sc_data handle, the source and
destination widths and heights, and the scaler address data block offsets for
the current context so that they can be configured.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Make the driver allocate dma buffers to store horizontal and scaler coeffs.
Use the scaler library api to choose and copy scaler coefficients to a
the above buffers based on the scaling ratio. Since the SC block comes after
the de-interlacer, make sure that the source height is doubled if de-interlacer
was used.
These buffers now need to be used by VPDMA to load the coefficients into the
SRAM within SC.
In device_run, add configuration descriptors which have payloads pointing to
the scaler coefficients in memory. Use the members in sc_data handle to prevent
addition of these descriptors if there isn't a need to re-load coefficients into
SC. This comes helps unnecessary re-loading of the coefficients when we switch
back and forth between vpe contexts.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
VPE and VIP IPs in DAR7x contain a scaler(SC) sub block. Create a library which
will perform scaler block related configurations and hold SC register
definitions. The functions provided by this library will be called by the vpe
and vip drivers using a sc_data handle.
The vpe_dev holds the sc_data handle. The handle represents an instance of the
SC hardware, and the vpe driver uses it to access the scaler register offsets
or helper functions to configure these registers.
We move the SC register definitions to sc.h so that they aren't specific to
VPE anymore. The register offsets are now relative to the sub-block, and not the
VPE IP as a whole. In order for VPDMA to configure registers, it requires it's
offset from the top level VPE module. A macro called GET_OFFSET_TOP is added to
return the offset of the register relative to the VPE IP.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
When VPDMA fetches or writes to an image buffer, the line stride must be a
multiple of 16 bytes. If it isn't, VPDMA HW will write/fetch until the next
16 byte boundry. This causes VPE to work incorrectly for source or destination
widths which don't satisfy the above alignment requirement.
In order to prevent this, we now make sure that when we set pix format for the
input and output buffers, the VPE source and destination image line strides are
16 byte aligned. Also, the motion vector buffers for the de-interlacer are
allocated in such a way that it ensures the same alignment.
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
In case of error, the function devm_kzalloc() and devm_ioremap()
returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return
value check should be replaced with NULL test.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Archit Taneja <archit@ti.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Add support for the de-interlacer block in VPE. For de-interlacer to
work, we need to enable 2 more sets of VPE input ports which fetch data
from the 'last' and 'last to last' fields of the interlaced video. Apart
from that, we need to enable the Motion vector output and input ports,
and also allocate DMA buffers for them.
We need to make sure that two most recent fields in the source queue are
available and in the 'READY' state. Once a mem2mem context gets access
to the VPE HW(in device_run), it extracts the addresses of the 3
buffers, and provides it to the data descriptors for the 3 sets of input
ports((LUMA1, CHROMA1), (LUMA2, CHROMA2), and (LUMA3, CHROMA3))
respectively for the 3 consecutive fields. The motion vector and output
port descriptors are configured and the list is submitted to VPDMA.
Once the transaction is done, the v4l2 buffer corresponding to the
oldest field(the 3rd one) is changed to the state 'DONE', and the
buffers corresponding to 1st and 2nd fields become the 2nd and 3rd field
for the next de-interlace operation. This way, for each deinterlace
operation, we have the 3 most recent fields. After each transaction, we
also swap the motion vector buffers, the new input motion vector buffer
contains the resultant motion information of all the previous frames,
and the new output motion vector buffer will be used to hold the updated
motion vector to capture the motion changes in the next field. The
motion vector buffers are allocated using the DMA allocation API.
The de-interlacer is removed from bypass mode, it requires some extra
default configurations which are now added. The chrominance upsampler
coefficients are added for interlaced frames. Some VPDMA parameters like
frame start event and line mode are configured for the 2 extra sets of
input ports.
Signed-off-by: Archit Taneja <archit@ti.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
VPE is a block which consists of a single memory to memory path which
can perform chrominance up/down sampling, de-interlacing, scaling, and
color space conversion of raster or tiled YUV420 coplanar, YUV422
coplanar or YUV422 interleaved video formats.
We create a mem2mem driver based primarily on the mem2mem-testdev
example. The de-interlacer, scaler and color space converter are all
bypassed for now to keep the driver simple. Chroma up/down sampler
blocks are implemented, so conversion beteen different YUV formats is
possible.
Each mem2mem context allocates a buffer for VPE MMR values which it will
use when it gets access to the VPE HW via the mem2mem queue, it also
allocates a VPDMA descriptor list to which configuration and data
descriptors are added.
Based on the information received via v4l2 ioctls for the source and
destination queues, the driver configures the values for the MMRs, and
stores them in the buffer. There are also some VPDMA parameters like
frame start and line mode which needs to be configured, these are
configured by direct register writes via the VPDMA helper functions.
The driver's device_run() mem2mem op will add each descriptor based on
how the source and destination queues are set up for the given ctx, once
the list is prepared, it's submitted to VPDMA, these descriptors when
parsed by VPDMA will upload MMR registers, start DMA of video buffers on
the various input and output clients/ports.
When the list is parsed completely(and the DMAs on all the output ports
done), an interrupt is generated which we use to notify that the source
and destination buffers are done. The rest of the driver is quite
similar to other mem2mem drivers, we use the multiplane v4l2 ioctls as
the HW support coplanar formats.
Signed-off-by: Archit Taneja <archit@ti.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>